唐晨 李勇華 饒夢妮 胡鋼俊
摘 要:雖然與信息檢索(IR)方法相比,基于本體的動態(tài)需求跟蹤方法能提高跟蹤鏈的精度,但構(gòu)建一個合理、有效的本體特別是領(lǐng)域本體是一個相當(dāng)復(fù)雜和繁瑣的過程。為了減小構(gòu)建領(lǐng)域本體帶來的時間成本和人力成本,通過將修飾詞和通用本體相結(jié)合,提出基于修飾詞本體的關(guān)鍵詞語義判斷方法(MOKSJM)。首先,對關(guān)鍵詞和修飾詞的搭配關(guān)系進行分析;然后,采用修飾詞本體結(jié)合規(guī)則的方式來確定關(guān)鍵詞的語義,以避免關(guān)鍵詞的多義性對動態(tài)需求跟蹤結(jié)果造成的偏差;最后,根據(jù)上述分析的結(jié)果,對關(guān)鍵詞語義作出調(diào)整,并通過相似度得分來體現(xiàn)其語義。修飾詞在需求文檔、設(shè)計文檔等中數(shù)量較少,因此建立修飾詞本體所帶來的時間成本和人力成本相對較小。實驗結(jié)果表明,MOKSJM與基于領(lǐng)域本體的動態(tài)跟蹤方法在召回率相當(dāng)時,精度差距更小;與向量空間模型(VSM)方法相比,MOKSJM能有效提高需求跟蹤結(jié)果的精度。
關(guān)鍵詞:動態(tài)需求跟蹤;本體;修飾詞;需求工程;軟件工程
中圖分類號:TP311.5
文獻標(biāo)志碼:A
Abstract: Although ontologybased dynamic requirement traceability methods can improve the accuracy of trace links compared with Information Retrieval (IR), but it is rather complicated and tedious to construct a reasonable and effective ontology, especially domain ontology. In order to reduce time cost and labor cost brought by the domain ontology construction, a Modifier Ontologybased Keyword Semantic Judgment Method (MOKSJM) which combined modifiers with general ontology was proposed. Firstly, the collocation relationship between keywords and modifiers was analyzed. Then, the semantics of keywords were determined by combining modifier ontologies with rules, so as to avoid the bias of dynamic requirements traceability results caused by the polysemy of keywords. Finally, based on results of the above analysis, the semantics of keywords were adjusted and reflected by similarity scores. The number of modifiers is small in the requirements document, design documents, etc., so the time cost and labor cost brought by establishing the modifier ontology is relatively small. The experimental results show that compared to domain ontologybased dynamic requirement traceability method, MOKSJM has a small gap in precision with the same recall rate, and when compared to Vector Space Model (VSM) method, MOKSJM can effectively improve the accuracy of the requirements traceability result.
英文關(guān)鍵詞Key words: dynamic requirements traceability; ontology; modifier; requirements engineering; software engineering
0 引言
語義問題是目前動態(tài)需求跟蹤[1]中的關(guān)鍵問題。本體研究的深入和本體技術(shù)的廣泛應(yīng)用使得其關(guān)注度不斷提升,越來越多的學(xué)者采用本體解決動態(tài)需求中的語義問題:Chen等 [2]提出了一種評估語義挖掘的方法,將WordNet中九種語義關(guān)系和一體化醫(yī)學(xué)語言系統(tǒng)(Unified Medical Language System, UMLS)中的同義詞關(guān)系相結(jié)合得到一個標(biāo)準(zhǔn)數(shù)據(jù)集,并通過這個標(biāo)準(zhǔn)數(shù)據(jù)集來評估嵌入詞, 該方法適用于大部分的語義關(guān)系,但是測量方法采用余弦相似度的計算方式,結(jié)果并不足夠準(zhǔn)確; Kolhe等[3]為了方便對大型文本數(shù)據(jù)庫進行數(shù)據(jù)檢索和管理,采用潛在語義索引(Latent Semantic Index, LSI)聚類并創(chuàng)建標(biāo)簽,然后將WordNet擴展查詢和余弦相似度相結(jié)合計算相似度, 該方法通過 WordNet 的語義算法解決了多義詞等問題,但是當(dāng)矩陣變換的數(shù)量增多時,對于內(nèi)存的需求就會增大; Besbes 等[4]為了幫助用戶理解或表達醫(yī)學(xué)術(shù)語,通過自動提取用戶查詢概念并構(gòu)建醫(yī)療本體,然后考慮分類關(guān)系及用戶個人資料信息,對本體進行模糊化,最后將本體納入查詢的重定義中, 但實驗結(jié)果和所應(yīng)用的領(lǐng)域密切相關(guān); Matei等[5]通過WordNet計算單詞之間的語義距離,然后根據(jù)動態(tài)時序來計算文本之間的相似度,所提出的時間序列模型,與傳統(tǒng)的向量空間模型相比,考慮了單詞的順序?qū)τ谡Z義的影響,提高了結(jié)果的準(zhǔn)確性; Kulathunga等[6]通過將本體和聚類方法相融合來識別金融文本中含糊的單詞含義,該方法雖消除了文本的語義歧義并提高了聚類算法的性能,但并未使用金融數(shù)據(jù)集去驗證該方法的有效性; Mai等[7]提出了一種基于統(tǒng)計和本體的語義核函數(shù),并將語義核函數(shù)嵌入到支持向量機中進行中文文本分類,充分地利用了文本中的語義關(guān)系來改善文本分類性能,但是構(gòu)建與語義核函數(shù)相關(guān)聯(lián)的特征矩陣是非常耗時的; 鞏皓等[8]以微博短文為素材,構(gòu)建安全領(lǐng)域本體知識庫,利用本體知識對初始查詢詞進行擴展,并結(jié)合局部查詢反饋對候選擴展詞進行篩選,最后進行二次查詢和迭代操作得到最后結(jié)果。微博以短文為主且關(guān)鍵詞和信息量較稀疏,因此該方法隨查詢結(jié)果不斷增多,準(zhǔn)確性會降低。
根據(jù)相關(guān)研究表明,需求文檔中78%的詞和名詞相關(guān)[9],因此動名詞成為了動態(tài)需求跟蹤中的主要研究對象。而名詞具有多義性,以動名詞為研究對象,容易因語義分歧造成動態(tài)需求跟蹤的誤差。信息檢索(Information Retrieval, IR)方法便無法解決名詞的“一詞多義”和“一義多詞”的這類問題[10],雖然基于領(lǐng)域本體的動態(tài)跟蹤方法能夠有效解決此類問題,但是該方法必須構(gòu)建相關(guān)的領(lǐng)域本體,而構(gòu)建領(lǐng)域本體是一個相當(dāng)復(fù)雜和繁瑣的過程。由于修飾詞在需求文檔中的數(shù)量較少,因此與建立領(lǐng)域本體相比,建立修飾詞本體代價較小。為此,本文提出了一種基于修飾詞本體的關(guān)鍵詞語義判斷方法(Modifier Ontologybased Keyword Semantic Judgment Method, MOKSJM),在通用本體WordNet的基礎(chǔ)上,通過與修飾詞本體相結(jié)合的方式,共同決定名詞在素材中的語義,減少因“一詞多義” 和“一義多詞”造成的語義混淆,降低因構(gòu)建領(lǐng)域本體帶來的時間成本和人力成本。
3 結(jié)語
領(lǐng)域本體已成為了動態(tài)需求跟蹤的重要研究手段,但目前構(gòu)建領(lǐng)域本體的方法并不能實現(xiàn)自動化,且構(gòu)建領(lǐng)域本體的質(zhì)量和規(guī)模上受到了一定程度上的限制。
本文提出了一種基于修飾詞本體的關(guān)鍵詞語義判斷方法(MOKSJM)。該方法在通用本體WordNet的基礎(chǔ)上,根據(jù)修飾詞類別和修飾詞語義距離,以及通過調(diào)整關(guān)鍵字的相似度來體現(xiàn)語義選擇的目的,消除語義分歧,實驗證明了該方法的有效性。
下一步工作將集中于如何將句式結(jié)構(gòu)和修飾詞相結(jié)合,利用淺層語義分析的方法,從句式層面上,集中的體現(xiàn)句子語義的中心含義,提高推薦跟蹤鏈的準(zhǔn)確性。
參考文獻 (References)
[1] ??? CLELANDHUANG J, SETTIMI R, DUAN C, et al. Utilizing supporting evidence to improve dynamic requirements traceability[C]// Proceedings of the 13th IEEE International Conference on Requirements Engineering. Piscataway, NJ: IEEE, 2005: 135-144.
[2] ??? CHEN Z, HE Z, LIU X, et al. An exploration of semantic relations in neural word embeddings using extrinsic knowledge[C]// Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. Washington, DC: IEEE Computer Society, 2017:1246-1251.
[3] ??? KOLHE S R, SAWARKAR S D. A concept driven document clustering using WordNet[C]// Proceedings of the 2017 International Conference on Nascent Technologies in Engineering. Piscataway, NJ: IEEE, 2017:1-5.
[4] ??? BESBES G, BAAZAOUIZGHAL H. Fuzzy ontologybased medical information retrieval[C]// Proceedings of the 2016 IEEE International Conference on Fuzzy Systems. Piscataway, NJ: IEEE, 2016:178-185.
[5] ??? MATEI L S, MATU S T. Document semantic distance based on the time series model[C]// Proceedings of the 2016 15th RoEduNet Conference: Networking in Education and Research. Piscataway, NJ: IEEE, 2016:1-4.
[6] ??? KULATHUNGA C, KARUNARATNE D D. An ontologybased and domain specific clustering methodology for financial documents[C]// Proceedings of the 17th International Conference on Advances in ICT for Emerging Regions. Piscataway, NJ: IEEE, 2018:1-8.
[7] ??? MAI F J, HUANG L, TAN J, et al. The research of semantic kernel in SVM for Chinese text classification[C]// Proceedings of the 2nd International Conference on Intelligent Information Processing. New York: ACM, 2017: Article No. 8.
[8] ??? 鞏皓, 杜軍平, 賴金財,等. 基于本體和局部查詢反饋的微博查詢擴展算法[J]. 南京大學(xué)學(xué)報(自然科學(xué)版), 2017, 53(6):1004-1011.(GONG H, DU J P, LAI J C, et al. Microblog query expansion algorithm based on ontology and local query feedbace[J]. Journal of Nanjing University (Natural Sciences), 2017, 53(6):1004-1011.)
[9] ??? CUNNINGHAM H, MAYNARD D, BONTCHEVA K, et al. Developing language processing components with GATE version 7 (a user guide)[EB/OL].[2018-03-20]. http://gate.ac.uk/sale/tao/tao.pdf.
[10] ?? 李引, 李娟, 李明樹. 動態(tài)需求跟蹤方法及跟蹤精度問題研究[J]. 軟件學(xué)報, 2009, 20(2):177-192. (LI Y, LI J, LI M S. Research on dynamic requirement traceability method and traces precision[J]. Journal of Software, 2009, 20(2):177-192.)
[11] ?? Stanford University. The Stanford parser: a statistical parser[CP/OL]. [2018-07-21]. https://nlp.stanford.edu/software/lexparser.shtml.
[12] ?? 徐健, 張智雄. 基于詞語軟匹配和修飾詞權(quán)重差異化的術(shù)語相似度算法[J]. 情報學(xué)報, 2011, 30(11):1145-1151.(XU J, ZHANG Z X. An term similarity algorithm based on word soft matching and weight difference of modifying words[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(11):1145-1151.)
[13] ?? LI Y, CLELANDHUANG J. Ontologybased trace retrieval[C]// Proceedings of the 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering. Washington, DC: IEEE Computer Society, 2013: 30-36.
[14] ?? SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[15] ?? ?MANNING C D, RAGHAVAN P, SCHUTZE H. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008: 142-145.