樓文高 樓際通 宋雷娟 王浪慶
摘 要 從上海市某區(qū)386家中小企業(yè)申報的15項(xiàng)稅收指標(biāo)數(shù)據(jù)中篩選出對判定企業(yè)納稅情況具有重要影響的10個評價指標(biāo),并將全部386個樣本分成性質(zhì)相似的建模樣本和測試樣本(其中測試樣本個數(shù)占45%),建立了基于投影尋蹤分類(PPC)技術(shù)的稅務(wù)稽查評價模型.與多元線性回歸(MLR)、判別分析(MDA)、Logistic和支持向量機(jī)(SVM)模型相比,PPC模型的識別錯誤率最低,建模樣本和測試樣本的平均分類錯誤率低于6%,改進(jìn)型PPC模型包含的評價指標(biāo)少,兩類錯誤率很接近,非常適用于實(shí)際企業(yè)的稅務(wù)稽查評估研究和實(shí)踐.對339家待判斷企業(yè)納稅情況的判定結(jié)果研究表明,建立的改進(jìn)型PPC模型具有很好的泛化能力和魯棒性.
關(guān)鍵詞 稅務(wù)稽查;投影尋蹤分類技術(shù);分類錯誤率;樣本分組
中圖分類號 TV139.1; N945.12 文獻(xiàn)標(biāo)識碼 A
Abstract Based on the 15 variables (indexes ) taxreporting data of 386 woodenfurniture manufacturing small and mediumsized enterprises (WFMSMEs) located in some districts of Shanghai city, the ten variables mainly influencing the taxchecking situation (tax evasion or compliance) of the 386 WFMSMEs were obtained by applying sensitivity analysis method (SAM) for selecting input variables. The modelling set data and testing set data (about taking up 45%) with similar characteristics similar mean values and variancewere divided using selforganizing map (SOM) approach. The practical, feasible and effective projection pursuit clustering (PPC) model for taxchecking assessment was thus established. Compared with the multivariate linear regression (MLR), the multivariate discriminant analysis (MDA), Logistic and the support vector machine (SVM), the established PPC model possesses the most accurate and the lowest classificationerror percentage (CEP) of the models. The mean CEP of modelling set data and the testing set data is lower than 6%. The improved PPC model including fewer variables is thus suitable to taxchecking assessment and research. The taxchecking situation of the other 339 WFMEs was also assessed and judged, and the results show that the established improved PPC model possesses high generalization and robustness.
Key words taxchecking assessment; projection pursuit clustering (PPC) model; classificationerror percentage; samples splitting
1 引 言
中小企業(yè)在國家創(chuàng)新經(jīng)濟(jì)發(fā)展模式和解決就業(yè)問題中占有越來越重要的位置,量大面廣,給基層稅務(wù)稽查和納稅評估工作帶來了很大的風(fēng)險.因此,建立實(shí)用性強(qiáng)和可靠的稅務(wù)稽查評價模型,既能幫助企業(yè)提高涉稅風(fēng)險的防控能力,又能幫助稅務(wù)部門足額征收稅款,日益受到政府有關(guān)部門(稅務(wù)局等)和學(xué)界的重視[1,2].樓文高等[3]對Tobit模型、層次分析法(AHP)、主成分法(PCA)、判別分析(MDA)、Logistic模型和多元線性回歸(MLR)等傳統(tǒng)統(tǒng)計模型以及新興的多層感知器神經(jīng)網(wǎng)絡(luò)(BPNN)、概率神經(jīng)網(wǎng)絡(luò)(PNN)、支持向量機(jī)(SVM)、自組織神經(jīng)網(wǎng)絡(luò)(SOM)等數(shù)據(jù)挖掘技術(shù)與傳統(tǒng)統(tǒng)計模型的組合模型[1-2, 4-8]的優(yōu)缺點(diǎn)、適用情況以及現(xiàn)有文獻(xiàn)存在的問題等進(jìn)行了詳細(xì)的評述,并應(yīng)用廣義回歸神經(jīng)網(wǎng)絡(luò)(GRNN)和多重交叉檢驗(yàn)法,建立了適用于小樣本情況的稅務(wù)稽查GRNN模型,分類錯誤率10%左右,明顯低于傳統(tǒng)統(tǒng)計模型和SVM模型,取得了較好的效果.但是,由于GRNN建模過程中確定合理的光滑因子值是相當(dāng)繁瑣的,而且GRNN模型是隱性模型[3, 9-10],無法顯性地直接揭示出企業(yè)納稅情況與各個評價指標(biāo)之間的非線性關(guān)系,給后續(xù)的稅務(wù)稽查工作(判定、研究企業(yè)納稅情況)以及企業(yè)如何制定合理的納稅策略、降低涉稅風(fēng)險帶來不便.
另一方面,投影尋蹤分類(Projection Pursuit Clustering,簡稱PPC)技術(shù)是一種適用于高維、非線性、非正態(tài)分布數(shù)據(jù)處理的新興統(tǒng)計建模方法[11-14],不僅數(shù)學(xué)意義清晰,而且是顯性模型,便于對樣本和評價指標(biāo)的重要性進(jìn)行排序和分類研究.本文首次將PPC技術(shù)引入到企業(yè)稅務(wù)稽查研究中,212個建模樣本和174個測試樣本(占45%)的平均分類錯誤率低于6.00%,低于MLR、MDA、Logistic等傳統(tǒng)統(tǒng)計模型和SVM模型,建立了更加簡潔、實(shí)用、可靠和有效的稅務(wù)稽查模型,應(yīng)優(yōu)先用于中小企業(yè)的稅務(wù)稽查研究和實(shí)踐中.