孔賀慶 張楠 岳曉冬 童向榮 于天佑
摘 要:現(xiàn)有的屬性約簡方法大部分關(guān)注決策系統(tǒng)中的所有決策類,而在實際決策過程中決策者往往僅關(guān)注決策系統(tǒng)中的一種或幾種決策類。針對上述問題,提出基于多特定決策類的不完備決策系統(tǒng)正域約簡的理論框架。首先,給出不完備決策系統(tǒng)單特定決策類正域約簡的概念;第二,將單特定決策類正域約簡推廣到多特定決策類,構(gòu)造了相應的差別矩陣及區(qū)分函數(shù);第三,分析并證明了相關(guān)定理,提出基于差別矩陣的不完備決策系統(tǒng)多特定決策類正域約簡算法(PRMDM);最后,選取4組UCI數(shù)據(jù)集進行實驗。在數(shù)據(jù)集Teachingassistantevaluation、House、Connectionistbench和Cardiotocography上,基于差別矩陣的不完備決策系正域約簡算法(PRDM)的平均約簡長度分別為4.00、13.00、9.00和20.00,PRMDM算法(多特定決策類中決策類數(shù)目為2)的平均約簡長度分別為3.00、8.00、8.00和18.00。實驗結(jié)果驗證了PRMDM算法的有效性。
關(guān)鍵詞:粗糙集;不完備決策系統(tǒng);多特定決策類;正域約簡;差別矩陣
中圖分類號:TP181
文獻標志碼:A
英文標題
Abstract: The existing attribute reduction algorithms mostly focus on all decision classes in decision systems, but in actual decision process, decision makers may only focus on one or several decision classes in the decision systems. To solve this problem, a theoretical framework of positive region preservation reduction based on multispecific decision classes in incomplete decision systems was proposed. Firstly, the positive region preservation reduction for single specific decision class in incomplete decision systems was defined. Secondly, the positive region preservation reduction for single specific decision class was extended to multispecific decision classes, and the corresponding discernibility matrix and function were constructed. Thirdly, with related theorems analyzed and proved, an algorithm of Positive region preservation Reduction for Multispecific decision classes reduction based on Discernibility Matrix in incomplete decision systems (PRMDM) was proposed. Finally, four UCI datasets were selected for experiments. On Teachingassistantevaluation, House, Connectionistbench and Cardiotocography dataset, the average reduction length of Positive region preservation Reduction based on Discernibility Matrix in incomplete decision systems (PRDM) algorithm is 4.00, 13.00, 9.00 and 20.00 respectively while that of the PRMDM algorithm (with decision classes in the multispecific decision classes is 2) is 3.00, 8.00, 8.00 and 18.00 respectively. The validity of PRMDM algorithm is verified by experimental results.
英文關(guān)鍵詞Key words: rough set; incomplete decision system; multispecific decision classes; positive region preservation reduction; discernibility matrix
0 引言
1982年由波蘭科學家Pawlak提出的粗糙集理論[1-6]是一種重要的知識推理工具,作為一種求取集合近似的方法,該理論現(xiàn)已應用于機器學習、數(shù)據(jù)挖掘、模式識別、智能信息處理等領(lǐng)域。屬性約簡[7-13]是粗糙集理論的重要研究內(nèi)容之一,屬性約簡通過刪除冗余屬性得到保持原決策系統(tǒng)某種分類信息不變的最小屬性子集。
在決策系統(tǒng)中,若決策系統(tǒng)中條件屬性值存在缺失,則稱該決策系統(tǒng)為不完備決策系統(tǒng)。在現(xiàn)實生活中,存在一定數(shù)量的不完備信息。目前,相關(guān)學者對不完備決策系統(tǒng)下的屬性約簡進行了大量的研究,并將經(jīng)典Pawlak粗糙集模型進行推廣,取得了一系列成果: 1998年,Kryszkiewicz[14]在不完備決策系統(tǒng)下引入廣義決策保持約簡,介紹了相關(guān)決策規(guī)則的提取,并提出了基于差別矩陣[15]的廣義決策保持約簡方法; 2002年,Liang等[16]基于粗糙熵提出不完備決策系統(tǒng)的知識約簡的啟發(fā)式算法。2003年,周獻中等[17]100-104在不完備決策系統(tǒng)下提出分配約簡; 2005年,黃兵等[17]52-56提出不完備決策系統(tǒng)的上下近似約簡,并給出求解所有決策類約簡的差別矩陣方法; 2010年,Qian等[18]基于極大相容塊在不協(xié)調(diào)不完備決策系統(tǒng)下提出上下近似約簡的概念,并構(gòu)造了相應的差別矩陣;2014年,Shu等[19]在不完備決策系統(tǒng)下提出通過評估候選屬性重要度快速求取屬性約簡的方法;2015年,Qian等[20]提出動態(tài)不完備決策系統(tǒng)下基于緊湊差別矩陣的特征選擇方法。
在屬性約簡中,正域約簡針對所有決策屬性的決策類,約簡結(jié)果保證了整個決策系統(tǒng)約簡前后正域不變。在實際應用中,決策者往往僅關(guān)注于決策系統(tǒng)中的一種或幾種決策類。例如,在醫(yī)療診斷中,多種癥狀構(gòu)成條件屬性集,不同類型的疾病構(gòu)成不同的決策值,醫(yī)生通常建議根據(jù)不同類型的疾病尋找不同的發(fā)病原因。2005年,Chen等[21]提出決策系統(tǒng)中局部約簡的概念,與定義決策系統(tǒng)所有決策類的約簡不同,局部約簡只定義部分決策類的約簡; 2017年,Yao等[22]在完備決策系統(tǒng)下定義了特定決策類的正域約簡,提出特定決策類正域約簡的判定定理,并討論了特定決策類正域約簡與所有決策類正域約簡的關(guān)系; 2017年,Liu等[23]在完備系統(tǒng)下提出第l決策類約簡和β約簡的概念,并給出了基于差別矩陣的約簡算法。
基于上述研究,文獻[17]對不完備決策系統(tǒng)的所有決策類的約簡進行了討論,文獻[22-23]在完備決策系統(tǒng)下對單特定決策類的正域約簡進行了研究。由于在實際應用中存在大量的不完備數(shù)據(jù),且決策者往往傾向于關(guān)注部分決策類,現(xiàn)有的不完備決策系統(tǒng)的正域約簡方法針對上述情況討論較少。另外,基于差別矩陣的約簡方法可以求取所有約簡,用戶可以根據(jù)個人偏好選擇具有自身偏好的約簡,并且通過所有約簡可以求取最短約簡。為此,本文提出了基于多特定決策類的不完備決策系統(tǒng)正域約簡的理論框架,當選取的多特定決策類中決策類數(shù)目為1時,基于多特定決策類的不完備決策系統(tǒng)正域約簡退化為不完備決策系統(tǒng)單特定決策類的正域約簡;當選取的多特定決策類為決策系統(tǒng)中所有決策類時,基于多特定決策類的不完備決策系統(tǒng)正域約簡退化為不完備決策系統(tǒng)所有決策類的正域約簡。首先,本文介紹了不完備決策系統(tǒng)的相關(guān)概念;然后,定義了不完備決策系統(tǒng)的多特定決策類的正域約簡,構(gòu)造了相應的差別矩陣及區(qū)分函數(shù),提出了基于差別矩陣的不完備決策系統(tǒng)多特定決策類正域約簡算法(Positive region preservation Reduction for Multispecific decision classes reduction based on Discernibility Matrix in incomplete decision systems, PRMDM);最后,實驗驗證了PRMDM算法的有效性。
由于實驗采用標準UCI數(shù)據(jù)集,預處理后數(shù)據(jù)集中的決策系統(tǒng)是完備決策系統(tǒng),所以需要將完備決策系統(tǒng)轉(zhuǎn)換為不完備決策系統(tǒng)。本文采用文獻[19]中的方式對數(shù)據(jù)集進行處理,具體處理方式為:數(shù)據(jù)集中除決策屬性外,每一列隨機缺失10%的屬性值,缺失值在決策系統(tǒng)中用表示。處理后的數(shù)據(jù)集詳見https://github.com/KInfinite/datasets。
3.1 約簡結(jié)果對比
選取4組UCI數(shù)據(jù)集進行約簡結(jié)果對比,約簡結(jié)果如表3所示。其中,表3中“所有決策類約簡”對應PRDM算法的約簡結(jié)果,“單特定決策類約簡”對應多特定決策類中決策數(shù)目為1時PRMDM算法的約簡結(jié)果,“多特定決策類約簡”對應多特定決策類中決策類數(shù)目為2 時PRMDM算法的約簡結(jié)果。表4列出了約簡數(shù)目及平均約簡長度,其中,表4中“所有決策類約簡”和“特定決策類約簡”分別對應PRDM算法和PRMDM算法的約簡數(shù)目和平均約簡長度。
由表4可知,對于數(shù)據(jù)集Teachingassistantevaluation、House、Connectionistbench和Cardiotocography,當所選多特定決策類中決策類數(shù)為1或2時,PRMDM算法的平均約簡長度小于PRDM算法的平均約簡長度。
3.2 約簡效率對比
本節(jié)選取4組UCI數(shù)據(jù)集分別按照對象個數(shù)遞增和屬性個數(shù)遞增的方式進行約簡效率對比。約簡效率如圖1和圖2所示,其中,圖1和圖2中“所有決策類”的約簡耗時曲線對應PRDM算法的約簡耗時,圖1和圖2中“決策類1”“決策類2”和“決策類:1,2”的約簡耗時曲線對應PRMDM算法的約簡耗時。在數(shù)據(jù)集按照對象個數(shù)遞增的方式進行約簡效率對比的實驗中,對于每個數(shù)據(jù)集,將數(shù)據(jù)集分成10等份,第1份構(gòu)成1號數(shù)據(jù)集,第1份和第2份構(gòu)成2號數(shù)據(jù)集,以此類推,10號數(shù)據(jù)集即為完整的數(shù)據(jù)集。
圖1是各數(shù)據(jù)集隨對象數(shù)目增加約簡耗時的變化曲線,隨著對象數(shù)目的增加,約簡耗時逐漸增加。圖2是各數(shù)據(jù)集隨屬性數(shù)目增加約簡耗時的變化曲線,隨著屬性數(shù)目的增加,約簡耗時逐漸增加。由于在屬性數(shù)目較少時,區(qū)分函數(shù)相對簡單,求解區(qū)分函數(shù)所需的時間較少,所以在屬性增加的前期階段約簡耗時無明顯增加,在屬性增加的后期階段,由于區(qū)分函數(shù)相對復雜,求解區(qū)分函數(shù)所需的時間逐漸增加,所以約簡耗時不斷增加。
雖然PRDM算法及PRMDM算法的時間復雜度均為O(|C||U|2),但通過圖1和圖2可以發(fā)現(xiàn),當選取的多特定決策類中決策類的數(shù)目遠少于決策系統(tǒng)中所有決策類的數(shù)目時,PRMDM算法在約簡效率上高于PRDM算法。例如,數(shù)據(jù)集Connectionistbench和Cardiotocography中的所有決策類的數(shù)目分別為11和10,對于每個數(shù)據(jù)集,當選取的多特定決策類中決策類的數(shù)目分別為1和2時,PRMDM算法在約簡效率上高于PRDM算法,這是由于當選取的多特定決策類中決策類的數(shù)目遠少于決策系統(tǒng)中所有決策類的數(shù)目時,多特定決策類正域約簡的差別矩陣中非空差別屬性集的數(shù)目小于所有決策類正域約簡的差別矩陣中非空差別屬性集的數(shù)目,所以PRMDM算法在構(gòu)造差別矩陣的耗時上少于PRDM算法。另外,由于多特定決策類正域約簡的差別矩陣中非空差別屬性集的數(shù)目小于所有決策類正域約簡的差別矩陣中非空差別屬性集的數(shù)目,所以多特定決策類正域約簡的區(qū)分函數(shù)中合取項的數(shù)目小于所有決策類正域約簡的區(qū)分函數(shù)中合取項的數(shù)目,從而PRMDM算法在將區(qū)分函數(shù)轉(zhuǎn)化為極小析取范式的耗時上少于PRDM算法。需要注意的是,只有當選取的多特定決策類中決策類的數(shù)目遠少于決策系統(tǒng)中所有決策類的數(shù)目時,PRMDM算法在約簡效率上才能高于PRDM算法;當PRMDM算法中選取的多特定決策類為決策系統(tǒng)中的所有決策類時,PRMDM算法退化為PRDM算法,此時PRMDM算法相比PRDM算法在約簡效率上并無提升。
4 結(jié)語
本文提出了基于多特定決策類的不完備決策系統(tǒng)正域約簡的理論框架,定義了不完備決策系統(tǒng)單特定決策類正域約簡,構(gòu)造了相應的差別矩陣及區(qū)分函數(shù),提出了基于差別矩陣的不完備決策系統(tǒng)多特定決策類正域約簡算法。本文選取4組UCI數(shù)據(jù)集進行實驗,實驗結(jié)果驗證了本文所提算法的有效性。由于本文提出的算法是基于差別矩陣的約簡算法,為提高算法的約簡效率,下一步將研究差別矩陣的優(yōu)化問題。
參考文獻 (References)
[1] PAWLAK Z. Rough sets[J]. International Journal of Computer and Information Sciences, 1982, 11(5): 341-356.
[2] 王國胤,姚一豫,于洪.粗糙集理論與應用研究綜述[J].計算機學報,2009,32(7):1229-1246.(WANG G Y, YAO Y Y, YU H. A survey on rough set theory and applications [J]. Chinese Journal of Computers, 2009, 32(7): 1229-1246.)
[3] 于洪,王國胤,姚一豫.決策粗糙集理論研究現(xiàn)狀與展望[J].計算機學報,2015,38(8):1628-1639.(YU H, WANG G Y, YAO Y Y. Current research and future perspectives on decisiontheoretic rough sets [J]. Chinese Journal of Computers, 2015, 38(8): 1628-1639.)
[4] LIANG D C, LIU D, PEDRYCZ W, et al. Triangular fuzzy decisiontheoretic rough sets [J]. International Journal of Approximate Reasoning, 2013, 54(8): 1087-1106.
[5] ZHANG Q H, ZHANG P, WANG G Y. Research on approximation set of rough set based on fuzzy similarity [J]. Journal of Intelligent and Fuzzy Systems, 2017, 32(3): 2549-2562.
[6] QIN K Y, YANG J L, PEI Z. Generalized rough sets based on reflexive and transitive relations [J]. Information Sciences, 2008, 178(21): 4138-4141.
[7] MIAO D Q, ZHAO Y, YAO Y Y, et al. Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model [J]. Information Sciences, 2009, 179(24): 4140-4150.
[8] WANG F, LIANG J Y, QIAN Y H. Attribute reduction: a dimension incremental strategy [J]. KnowledgeBased Systems, 2013, 39: 95-108.
[9] WU W Z, Q Y H, LI T J, et al. On rule acquisition in incomplete multiscale decision tables [J]. Information Sciences, 2017, 378: 282-302.
[10] CHEN H M, LI T R, CAI Y, et al. Parallel attribute reduction in dominancebased neighborhood rough set [J]. Information Sciences, 2016, 373: 351-368.
[11] LIU D, LI T R, ZHANG J B. Incremental updating approximations in probabilistic rough sets under the variation of attributes[J]. KnowledgeBased Systems, 2015, 73: 81-96.
[12] MIN F, ZHANG Z H, DONG J. Ant colony optimization with partialcomplete searching for attribute reduction [J]. Journal of Computational Science, 2018, 25: 170-182.
[13] ZHU W, WANG F Y. Reduction and axiomization of covering generalized rough sets[J]. Information Sciences, 2003, 152(2): 217-230.
[14] KRYSZKIEWICZ M. Rough set approach to incomplete information systems [J]. Information Sciences, 1998, 112(1/2/3/4): 39-49.
[15] SKOWRON A, RAUSZER C. The discernibility matrices and functions in information systems [M]// SLOWINSKI R. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory. Dordrecht: Kluwer Academic Publishers, 1992: 331-362.
[16] LIANG J Y, XU Z B. The algorithm on knowledge reduction in incomplete information systems [J]. International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, 2002, 10(1): 95-103.
[17] 周獻中,黃兵,李華雄,等.不完備信息系統(tǒng)知識獲取的粗糙集理論與方法[M].南京:南京大學出版社,2010:52-104.(ZHOU X Z, HUANG B, LI H X, et al. Rough Set Theory and Method of Knowledge Acquisition in Incomplete Information Systems [M]. Nanjing: Nanjing University Press, 2010: 52-104.)
[18] QIAN Y H, LIANG J Y, LI D Y, et al. Approximation reduction in inconsistent incomplete decision tables [J]. KnowledgeBased Systems, 2010, 23(5): 427-433.
[19] SHU W H, QIAN W B. A fast approach to attribute reduction from perspective of attribute measures in incomplete decision systems[J]. KnowledgeBased Systems, 2014, 72: 60-71.
[20] QIAN W B, SHU W H, XIE Y H, et al. Feature selection using compact discernibility matrixbased approach in dynamic incomplete decision system [J]. Journal of Information Science and Engineering, 2015, 31(2): 509-527.
[21] CHEN D G, TSANG E C C. On the local reduction of information system [C]// Proceedings of the 2005 International Conference on Advances in Machine Learning and Cybernetics. Heidelberg: SpringerVerlag, 2006: 588-594.
[22] YAO Y Y, ZHANG X Y. Classspecific attribute reducts in rough set theory [J]. Information Sciences, 2017, 418/419: 601-618.
[23] LIU G L, HUA Z, ZOU J Y. Local attribute reductions for decision tables[J]. Information Sciences, 2017, 422: 204-217.