孫崇 王海榮 荊博祥 馬赫
摘 要:針對(duì)知識(shí)推理過(guò)程中,隨著推理路徑長(zhǎng)度的增加,節(jié)點(diǎn)的動(dòng)作空間急劇增長(zhǎng),使得推理難度不斷提升的問(wèn)題,提出一種分層強(qiáng)化學(xué)習(xí)的知識(shí)推理方法(knowledge reasoning method of hierarchical reinforcement learning,MutiAg-HRL),降低推理過(guò)程中的動(dòng)作空間大小。MutiAg-HRL調(diào)用高級(jí)智能體對(duì)知識(shí)圖譜中的關(guān)系進(jìn)行粗略推理,通過(guò)計(jì)算下一步關(guān)系及給定查詢(xún)關(guān)系之間的相似度,確定目標(biāo)實(shí)體大致位置,依據(jù)高級(jí)智能體給出的關(guān)系,指導(dǎo)低級(jí)智能體進(jìn)行細(xì)致推理,選擇下一步動(dòng)作;模型還構(gòu)造交互獎(jiǎng)勵(lì)機(jī)制,對(duì)兩個(gè)智能體的關(guān)系和動(dòng)作選擇及時(shí)給予獎(jiǎng)勵(lì),防止模型出現(xiàn)獎(jiǎng)勵(lì)稀疏問(wèn)題。為驗(yàn)證該方法的有效性,在FB15K-237和NELL-995數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn),將實(shí)驗(yàn)結(jié)果與TransE、MINERVA、HRL等11種主流方法進(jìn)行對(duì)比分析,MutiAg-HRL方法在鏈接預(yù)測(cè)任務(wù)上的hits@k平均提升了1.85%,MRR平均提升了2%。
關(guān)鍵詞:知識(shí)推理; 分層強(qiáng)化學(xué)習(xí); 交互獎(jiǎng)勵(lì); 鏈接預(yù)測(cè)
中圖分類(lèi)號(hào):TP391?? 文獻(xiàn)標(biāo)志碼:A
文章編號(hào):1001-3695(2024)03-023-0805-06
doi:10.19734/j.issn.1001-3695.2023.07.0309
Knowledge reasoning method based on hierarchical reinforcement learning
Sun Chonga, Wang Haironga,b, Jing Boxianga, Ma Hea
(a.College of Computer Science & Engineering, b.The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China)
Abstract:In the process of knowledge inference, with the increase of the length of the inference path, the action space of the node increases sharply, which makes the inference difficulty continue to increase. This paper proposed a knowledge reasoning method of hierarchical reinforcement learning(MutiAg-HRL) to reduce the size of action space in the reasoning process. MutiAg-HRL invoked high-level agents to perform rough reasoning on the relationships in the knowledge graph, and determined the approximate location of the target entity by calculating the similarity between the next step relationship and the given query relationship. According to the relationship given by the high-level agent, the low-level agents were guided to conduct detailed reasoning and select the next action. The model also constructed an interactive reward mechanism to reward the relationship between the two agents and the choice of actions in time to prevent the problem of sparse reward in the model. To verify the effectiveness of the proposed method, it carried out experiments on FB15K-237 and NELL-995 datasets. The experimental results were compared with those of 11 mainstream methods such as TransE, MINERVA and HRL. The average value of the MutiAg-HRL method on the link prediction task hits@k is increased by 1.85%, MRR increases by an average of 2%.
Key words:knowledge reasoning; hierarchical reinforcement learning; interactive reward; link prediction
0 引言
近年來(lái),知識(shí)圖譜(KG)已經(jīng)逐漸成為管理海量知識(shí)的有效手段[1],被廣泛應(yīng)用于各個(gè)領(lǐng)域,產(chǎn)生了一些大規(guī)模知識(shí)圖譜,如Freebase、Wikidata等。這些知識(shí)圖譜雖然已經(jīng)頗具規(guī)模,涵蓋了音樂(lè)、電影、書(shū)籍等領(lǐng)域,但仍存在大量信息缺失問(wèn)題,如Freebase中71%的人沒(méi)有出生地信息,Wikidata中只有2%的人類(lèi)實(shí)體具有父親信息。據(jù)統(tǒng)計(jì),大部分知識(shí)圖譜中,69%~99%的實(shí)體至少缺失一個(gè)屬性信息[2]。知識(shí)缺失將直接影響基于知識(shí)圖譜的下游任務(wù)的效果[3~5]。知識(shí)推理作為知識(shí)圖譜補(bǔ)全的有效方法,已經(jīng)成為日益重要的研究課題[6]。
知識(shí)推理通過(guò)知識(shí)圖譜中已知的知識(shí),挖掘尚未被發(fā)現(xiàn)的知識(shí),對(duì)殘缺的知識(shí)圖譜進(jìn)行補(bǔ)全[7]。眾多學(xué)者從不同的角度出發(fā),對(duì)知識(shí)推理進(jìn)行研究,并取得了一定的研究成果。例如,基于嵌入的推理,通過(guò)將知識(shí)圖譜中的實(shí)體和關(guān)系,映射到低維向量空間中得到其向量表示[8],這些向量包含了實(shí)體原有的語(yǔ)義信息,可作為判斷實(shí)體間相似度的依據(jù),以此進(jìn)行推理。典型的有TransE[9]、TransH[10],此類(lèi)方法簡(jiǎn)單且易于擴(kuò)展,但對(duì)于復(fù)雜關(guān)系的建模效果不佳。因此,Trouillon等人[11]構(gòu)建了ComplEx模型,在對(duì)知識(shí)圖譜進(jìn)行嵌入時(shí)引入了復(fù)數(shù)空間,能更好地對(duì)非對(duì)稱(chēng)關(guān)系進(jìn)行建模;Dettmers等人[12]構(gòu)建了ConvE模型,引入了一種多層卷積網(wǎng)絡(luò),提高了模型的特征表達(dá)能力,可以更好地建模三元組。由于基于嵌入的推理方法是將推理過(guò)程轉(zhuǎn)換為單一的向量計(jì)算,沒(méi)有考慮知識(shí)圖譜中路徑上的信息,使得該類(lèi)方法在多跳推理路徑上的推理能力受到限制。所以,針對(duì)多跳問(wèn)題的推理方法相繼被提出,如SRGCN[13]、MKGN[14]、ConvHiA[15],該類(lèi)方法通過(guò)多步推理找到目標(biāo)實(shí)體,同時(shí)生成從頭實(shí)體到尾實(shí)體的完整推理路徑,增強(qiáng)了知識(shí)推理的可解釋性[16]。在多跳推理中,基于深度強(qiáng)化學(xué)習(xí)的推理方法成為當(dāng)前知識(shí)推理研究的熱門(mén)方向,其方法被應(yīng)用于諸多知識(shí)圖譜下游任務(wù)[5]。例如DAPath[17]、SparKGR[18]、MemoryPath[19]、HRRL[20]等方法,使用神經(jīng)網(wǎng)絡(luò)提取特征向量,對(duì)知識(shí)圖譜中的事實(shí)進(jìn)行建模,在推理的過(guò)程中,外部環(huán)境通過(guò)給予智能體一定的獎(jiǎng)勵(lì)來(lái)促使智能體做出最優(yōu)動(dòng)作,以取得最大化的預(yù)期效果。
最近,有學(xué)者提出了離線(xiàn)強(qiáng)化學(xué)習(xí)[21]的知識(shí)推理方法,該方法不需要智能體頻繁與外部環(huán)境進(jìn)行交互,相對(duì)傳統(tǒng)的基于強(qiáng)化學(xué)習(xí)的方法而言開(kāi)銷(xiāo)較小,但卻存在當(dāng)智能體選擇錯(cuò)誤動(dòng)作時(shí)無(wú)法被及時(shí)糾正的問(wèn)題,最終導(dǎo)致推理任務(wù)失敗?;趶?qiáng)化學(xué)習(xí)的知識(shí)推理方法可以通過(guò)智能體與環(huán)境的不斷交互,給予智能體懲罰,來(lái)糾正錯(cuò)誤的動(dòng)作選擇,從而保證了推理路徑的可靠性,進(jìn)而有效提高了知識(shí)推理的準(zhǔn)確度。但在知識(shí)推理過(guò)程中,隨著路徑長(zhǎng)度的增加,推理的難度也會(huì)隨之增加?,F(xiàn)存的多跳推理中基于強(qiáng)化學(xué)習(xí)的單智能體推理方法,對(duì)短路徑推理較為有效,而長(zhǎng)推理路徑上的推理往往效果不佳,而且會(huì)導(dǎo)致獎(jiǎng)勵(lì)稀疏的問(wèn)題。為此,本文提出一種分層強(qiáng)化學(xué)習(xí)的知識(shí)推理方法(knowledge reasoning method of hierarchical reinforcement learning,MutiAg-HRL)。首先對(duì)知識(shí)圖譜進(jìn)行聚類(lèi)處理,幫助高級(jí)智能體進(jìn)行粗略推理,選擇出與查詢(xún)關(guān)系高度相關(guān)的關(guān)系,在此基礎(chǔ)上指導(dǎo)低級(jí)智能體進(jìn)行細(xì)致推理,選擇出下一步動(dòng)作,通過(guò)分層策略降低了模型的動(dòng)作空間,有效解決了長(zhǎng)推理路徑問(wèn)題,在構(gòu)建策略網(wǎng)絡(luò)時(shí)引入dropout策略,防止模型出現(xiàn)過(guò)擬合問(wèn)題。此外,本文方法通過(guò)交互獎(jiǎng)勵(lì)構(gòu)造,及時(shí)對(duì)智能體每一步動(dòng)作的選擇給予及時(shí)獎(jiǎng)勵(lì),避免獎(jiǎng)勵(lì)稀疏的問(wèn)題。
1 MutiAg-HRL方法模型
MutiAg-HRL方法使用分層強(qiáng)化學(xué)習(xí),將知識(shí)推理過(guò)程看作兩個(gè)馬爾可夫序列決策過(guò)程(Markov sequence decision process,MDP)。該方法主要包含策略網(wǎng)絡(luò)構(gòu)建和交互獎(jiǎng)勵(lì)構(gòu)造兩個(gè)模塊,通過(guò)策略網(wǎng)絡(luò)的構(gòu)建來(lái)指導(dǎo)智能體進(jìn)行關(guān)系和動(dòng)作的選擇,并使用交互獎(jiǎng)勵(lì)構(gòu)造模塊對(duì)每一時(shí)刻智能體的選擇及時(shí)給予獎(jiǎng)勵(lì)。方法模型如圖1所示。
MutiAg-HRL首先使用K-means++算法對(duì)實(shí)體嵌入進(jìn)行聚類(lèi),根據(jù)與當(dāng)前節(jié)點(diǎn)et相連接的關(guān)系類(lèi)型,將知識(shí)圖譜分為若干個(gè)節(jié)點(diǎn)簇,并通過(guò)節(jié)點(diǎn)之間的關(guān)系將這些節(jié)點(diǎn)簇連接起來(lái),這些關(guān)系則作為高級(jí)智能體下一步關(guān)系選擇的候選關(guān)系集合,在此基礎(chǔ)上進(jìn)行知識(shí)推理任務(wù)。
本方法通過(guò)分層策略,將傳統(tǒng)基于強(qiáng)化學(xué)習(xí)的知識(shí)推理過(guò)程分為關(guān)系選擇和動(dòng)作選擇兩部分。首先,將與當(dāng)前實(shí)體相連接的關(guān)系作為當(dāng)前實(shí)體的下一步關(guān)系選擇的候選關(guān)系集合,通過(guò)高級(jí)策略網(wǎng)絡(luò)對(duì)候選關(guān)系集合進(jìn)行概率分布計(jì)算,指導(dǎo)高級(jí)智能體選擇出分?jǐn)?shù)較高的下一時(shí)刻的關(guān)系;其次,通過(guò)低級(jí)策略網(wǎng)絡(luò)對(duì)與高級(jí)智能體所選關(guān)系相連接的實(shí)體進(jìn)行概率分布計(jì)算,指導(dǎo)低級(jí)智能體選擇出下一時(shí)刻的動(dòng)作實(shí)體,直至到達(dá)目標(biāo)實(shí)體ep則此次推理任務(wù)結(jié)束。在構(gòu)建策略網(wǎng)絡(luò)時(shí),采用門(mén)控循環(huán)神經(jīng)網(wǎng)絡(luò)(gated recurrent unit,GRU)對(duì)歷史推理路徑進(jìn)行編碼,將歷史編碼與當(dāng)前節(jié)點(diǎn)狀態(tài)作為高級(jí)策略網(wǎng)絡(luò)輸入進(jìn)行粗略推理,最終得到與給定查詢(xún)關(guān)系rq高度相關(guān)的關(guān)系rt+1。rt+1與當(dāng)前節(jié)點(diǎn)狀態(tài)作為低級(jí)策略網(wǎng)絡(luò)的輸入進(jìn)行細(xì)致推理得到下一步的動(dòng)作選擇。為防止模型出現(xiàn)過(guò)擬合問(wèn)題,本文還引入了dropout策略對(duì)高級(jí)策略網(wǎng)絡(luò)和低級(jí)策略網(wǎng)絡(luò)進(jìn)行動(dòng)作退出處理,暫時(shí)性地隨機(jī)隱藏部分神經(jīng)元,降低模型的參數(shù)量。
為了使模型收益最大化,本文通過(guò)引入全局獎(jiǎng)勵(lì)和交互獎(jiǎng)勵(lì)函數(shù)構(gòu)造出交互獎(jiǎng)勵(lì)模塊,智能體到達(dá)目標(biāo)關(guān)系和實(shí)體時(shí)給予其全局獎(jiǎng)勵(lì),否則通過(guò)交互獎(jiǎng)勵(lì)函數(shù)對(duì)高級(jí)智能體和低級(jí)智能體作出的每一步選擇進(jìn)行相似度計(jì)算,并將其作為獎(jiǎng)勵(lì)分?jǐn)?shù)及時(shí)給予智能體相應(yīng)的獎(jiǎng)勵(lì),加強(qiáng)了高級(jí)策略網(wǎng)絡(luò)和低級(jí)策略網(wǎng)絡(luò)之間的聯(lián)系,提高了模型推理準(zhǔn)確度。
2 策略網(wǎng)絡(luò)
首先對(duì)預(yù)訓(xùn)練的實(shí)體嵌入進(jìn)行K-means++聚類(lèi),將原知識(shí)圖譜根據(jù)關(guān)系相似度劃分為多個(gè)節(jié)點(diǎn)簇,再利用實(shí)體之間的關(guān)系來(lái)加強(qiáng)這些節(jié)點(diǎn)簇之間的聯(lián)系,得到處理后的知識(shí)圖譜G′,在此基礎(chǔ)上進(jìn)行分層強(qiáng)化學(xué)習(xí)(hierarchical reinforcement learning,HRL)的知識(shí)推理。為了保證模型的糾錯(cuò)能力,本文方法還對(duì)知識(shí)圖譜中的三元組〈h,r,t〉添加了逆三元組〈t,r-1,h〉,通過(guò)這些逆三元組,智能體能夠在推理出現(xiàn)錯(cuò)誤時(shí)實(shí)現(xiàn)后退操作。本文將知識(shí)推理過(guò)程分為高級(jí)策略網(wǎng)絡(luò)推理和低級(jí)策略網(wǎng)絡(luò)推理兩部分,通過(guò)高級(jí)策略網(wǎng)絡(luò)獲得的關(guān)系,指導(dǎo)低級(jí)策略網(wǎng)絡(luò)完成具體的動(dòng)作選擇,找到目標(biāo)實(shí)體后,此次推理任務(wù)結(jié)束。
2.1 高級(jí)策略網(wǎng)絡(luò)
將當(dāng)前節(jié)點(diǎn)et、給定查詢(xún)關(guān)系rq通過(guò)GRU模塊對(duì)歷史推理路徑進(jìn)行編碼,得到的歷史編碼信息ht-1作為高級(jí)策略網(wǎng)絡(luò)的輸入,在對(duì)知識(shí)圖譜進(jìn)行K-means++算法分簇處理后,將與當(dāng)前時(shí)刻所在節(jié)點(diǎn)相連接的關(guān)系作為候選關(guān)系集合,構(gòu)建出初步的高級(jí)策略網(wǎng)絡(luò)πh′θ,再通過(guò)dropout策略對(duì)中部分神經(jīng)元進(jìn)行隨機(jī)隱藏,得到最終的高級(jí)策略網(wǎng)絡(luò)πh′θ。高級(jí)智能體在高級(jí)策略網(wǎng)絡(luò)的指導(dǎo)下,選擇出概率較高的關(guān)系作為下一時(shí)刻的關(guān)系選擇。高級(jí)策略網(wǎng)絡(luò)如圖2所示。
由表1可以看出,本文方法在NELL-995數(shù)據(jù)集上取得了最好的推理效果。對(duì)比數(shù)據(jù)集來(lái)看,F(xiàn)B15K-237的數(shù)據(jù)比NELL-995稀疏,而稀疏環(huán)境往往會(huì)導(dǎo)致大量的路徑被截?cái)?,不利于RL代理的多跳推理路徑的探索,所以NELL-995上的知識(shí)推理效果要普遍優(yōu)于FB15K-237。在FB15K-237數(shù)據(jù)集中,hits@1和hits@3指標(biāo)均有提高,而hits@10和MRR指標(biāo)卻下降,導(dǎo)致該現(xiàn)象的原因可能是MultiHop模型是單智能體推理模型,雖然部分指標(biāo)在長(zhǎng)路徑推理中效果不如本文方法,但也因此導(dǎo)致MultiHop模型結(jié)構(gòu)要比本文方法簡(jiǎn)單,降低了模型的復(fù)雜度,使其hits@10和MRR指標(biāo)要優(yōu)于本文方法;在NELL-995數(shù)據(jù)集中,各項(xiàng)指標(biāo)均有明顯提升,分析其原因,即使本文方法由于需要多智能體導(dǎo)致資源消耗較大,可由于NELL-995數(shù)據(jù)集相比較FB15K-237數(shù)據(jù)集而言規(guī)模較小,模型復(fù)雜導(dǎo)致的資源消耗問(wèn)題也不會(huì)使模型效果太差。TransE模型作為結(jié)構(gòu)最簡(jiǎn)單的基于嵌入的模型,在保證連通性的同時(shí)大大降低了計(jì)算復(fù)雜度,相比較其他模型而言,也能在MRR指標(biāo)上取得較好的效果,但是該類(lèi)方法解釋性較低;而基于強(qiáng)化學(xué)習(xí)的知識(shí)推理不僅推理出結(jié)果,還可以提供整條推理路徑,大大增加了推理過(guò)程的可解釋性。
4.2 消融實(shí)驗(yàn)
為了更好地論證本文模型引入聚類(lèi)算法、分層策略網(wǎng)絡(luò)結(jié)構(gòu)和交互獎(jiǎng)勵(lì)機(jī)制的有效性,使用平均倒數(shù)排名MRR和推理結(jié)果命中率hits@k作為評(píng)價(jià)指標(biāo),在FB15K-237和NELL-995數(shù)據(jù)集上對(duì)方法中三個(gè)模塊進(jìn)行了消融實(shí)驗(yàn),結(jié)果如表2所示。
通過(guò)消融實(shí)驗(yàn)證明了本文方法中三個(gè)模塊的有效性。由表2結(jié)果可知,三個(gè)模塊均對(duì)模型推理效果存在一定的影響,在NELL-995數(shù)據(jù)集上的影響尤為明顯。這主要是該數(shù)據(jù)集規(guī)模較小,只有18個(gè)關(guān)系類(lèi)型,而高級(jí)策略網(wǎng)絡(luò)在進(jìn)行關(guān)系選擇時(shí)受數(shù)據(jù)集關(guān)系數(shù)量的影響較大,所以與FB15K-237數(shù)據(jù)集相比,在NELL-995數(shù)據(jù)集進(jìn)行知識(shí)推理時(shí)加入高級(jí)策略進(jìn)行關(guān)系選擇,會(huì)大大提高推理的準(zhǔn)確度。本文方法在進(jìn)行知識(shí)推理任務(wù)前,先對(duì)知識(shí)圖譜進(jìn)行聚類(lèi)處理,使相似度高的節(jié)點(diǎn)彼此靠近,提高了智能體所做選擇的準(zhǔn)確度;在進(jìn)行知識(shí)推理時(shí),通過(guò)交互獎(jiǎng)勵(lì)機(jī)制對(duì)智能體的每一步關(guān)系選擇和動(dòng)作選擇及時(shí)地給予獎(jiǎng)勵(lì),對(duì)于常見(jiàn)的基于強(qiáng)化學(xué)習(xí)的知識(shí)推理模型中的路徑多樣性獎(jiǎng)勵(lì)、路徑長(zhǎng)度獎(jiǎng)勵(lì)和單獨(dú)的全局獎(jiǎng)勵(lì)而言,本文模型的交互獎(jiǎng)勵(lì)函數(shù)可以對(duì)智能體每一時(shí)刻的選擇及時(shí)地給予獎(jiǎng)勵(lì),幫助模型選擇更高獎(jiǎng)勵(lì)的行為,有效解決了基于強(qiáng)化學(xué)習(xí)的知識(shí)推理模型在推理過(guò)程中遇到的稀疏獎(jiǎng)勵(lì)問(wèn)題,實(shí)現(xiàn)收益最大化,因此刪除模型中的聚類(lèi)模塊和獎(jiǎng)勵(lì)模塊對(duì)模型的推理效果也存在一定影響。
4.3 案例研究
通過(guò)實(shí)驗(yàn)結(jié)果與消融實(shí)驗(yàn)結(jié)果分析,可以看出本文分層強(qiáng)化學(xué)習(xí)框架在知識(shí)圖譜推理上的優(yōu)越性。首先,通過(guò)聚類(lèi)算法對(duì)知識(shí)圖譜進(jìn)行分簇處理,使關(guān)系類(lèi)型較為相似的事實(shí)聚集在一起,便于后續(xù)進(jìn)行知識(shí)推理時(shí)的關(guān)系選擇和動(dòng)作選擇;其次,通過(guò)分層策略將知識(shí)推理分為兩部分,高級(jí)智能體在高級(jí)策略網(wǎng)絡(luò)的指導(dǎo)下先選擇出與查詢(xún)關(guān)系高度相關(guān)的關(guān)系作為下一時(shí)刻的關(guān)系選擇,而低級(jí)策略網(wǎng)絡(luò)只針對(duì)與高級(jí)策略網(wǎng)絡(luò)中選擇的關(guān)系相連接的實(shí)體進(jìn)行概率分布計(jì)算,低級(jí)智能體根據(jù)實(shí)體概率分布選擇出下一時(shí)刻的動(dòng)作,大大降低了動(dòng)作空間的大小,不僅節(jié)約了計(jì)算資源,還提高了模型推理準(zhǔn)確度。為了更直觀地表示本文模型在推理時(shí)尋找推理路徑的過(guò)程,本節(jié)對(duì)路徑推理進(jìn)行案例研究,如圖4所示。
圖4中的例(a)(b)說(shuō)明了本文模型能夠完成不同推理任務(wù)的路徑推理,其中,由于知識(shí)圖譜中存在對(duì)已知事實(shí)構(gòu)建的逆三元組,所以在完成例(a)推理任務(wù)時(shí)可以通過(guò)逆三元組及時(shí)找到與目標(biāo)實(shí)體高度相關(guān)的實(shí)體。逆三元組還可以在智能體作出錯(cuò)誤決策時(shí)及時(shí)退回,糾正之前作出的錯(cuò)誤決定,實(shí)現(xiàn)路徑糾錯(cuò)。
為了評(píng)估本文模型的分層策略在進(jìn)行知識(shí)推理過(guò)程中路徑搜索的效率,本節(jié)還對(duì)分層策略進(jìn)行消融(-HRL),與本模型的路徑搜索成功率進(jìn)行對(duì)比分析,路徑搜索成功率如表3所示。
由表3的實(shí)驗(yàn)結(jié)果可知,引入分層策略,F(xiàn)B15K-237數(shù)據(jù)集中的路徑搜索成功率提升了2.4%,NELL-995數(shù)據(jù)集中的路徑搜索成功率提升了4.2%。其原因可能是在知識(shí)推理過(guò)程中進(jìn)行具體的動(dòng)作選擇時(shí),只將與所選關(guān)系相連接的實(shí)體作為候選動(dòng)作空間,大大降低了動(dòng)作空間的大小,提高了模型的路徑搜索成功率。因此,本文通過(guò)分層策略進(jìn)行知識(shí)推理,能夠有效提升知識(shí)推理過(guò)程中的路徑搜索成功率,從而提升模型的知識(shí)推理準(zhǔn)確度。
5 結(jié)束語(yǔ)
本文提出了一種多智能體的強(qiáng)化學(xué)習(xí)知識(shí)推理方法MutiAg-HRL。模型通過(guò)使用分層策略方法,將知識(shí)推理過(guò)程分解為兩個(gè)馬爾可夫序列決策過(guò)程,有效解決了模型在進(jìn)行長(zhǎng)路徑推理時(shí)動(dòng)作空間過(guò)大的問(wèn)題;在進(jìn)行強(qiáng)化學(xué)習(xí)推理之前采用聚類(lèi)算法對(duì)知識(shí)圖譜進(jìn)行處理,輔助智能體更準(zhǔn)確地進(jìn)行下一步選擇;通過(guò)交互獎(jiǎng)勵(lì)機(jī)制及時(shí)給予智能體獎(jiǎng)勵(lì),防止模型出現(xiàn)獎(jiǎng)勵(lì)稀疏的問(wèn)題,提高了模型的推理能力。
在未來(lái)的研究中,將考慮在分層強(qiáng)化學(xué)習(xí)中引入規(guī)則挖掘模塊,在進(jìn)行知識(shí)推理之前對(duì)與給定查詢(xún)高度相關(guān)的規(guī)則進(jìn)行挖掘,用于指導(dǎo)模型進(jìn)行知識(shí)推理,進(jìn)而再度提升模型的知識(shí)推理效果。此外,還將進(jìn)一步優(yōu)化模型的獎(jiǎng)勵(lì)機(jī)制,幫助模型更快地找到目標(biāo)實(shí)體。
參考文獻(xiàn):
[1]劉玉華, 翟如鈺, 張翔, 等. 知識(shí)圖譜可視分析研究綜述[J]. 計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào), 2023,35(1): 23-36. (Liu Yuhua, Zhai Ruyu, Zhang Xiang, et al. Review of knowledge graph visual analysis[J]. Journal of Computer Aided Design & Computer Graphics, 2023,35(1): 23-36.)
[2]官賽萍, 靳小龍, 賈巖濤, 等. 面向知識(shí)圖譜的知識(shí)推理研究進(jìn)展[J]. 軟件學(xué)報(bào), 2018, 29(10): 2966-2994. (Guan Saiping, Jin Xiaolong, Jia Yantao, et al. Knowledge reasoning over knowledge graph: a survey[J]. Journal of Software, 2018,29(10): 2966-2994.)
[3]Wu Wenqing, Zhou Zhenfang, Qi Jiangtao, et al. A dynamic graph expansion network for multi-hop knowledge base question answering[J]. Neurocomputing, 2023,515: 37-47.
[4]Shahryar S, Han Qi, Bauke D V, et al. Methodology for development of an expert system to derive knowledge from existing nature-based solutions experiences[J]. MethodsX, 2023,10: 101978.
[5]Cui Hai, Peng Tao, Xiao Feng, et al. Incorporating anticipation embedding into reinforcement learning framework for multi-hop know-ledge graph question answering[J]. Information Sciences, 2022, 619: 745-761.
[6]Ji Shaoxiong, Pan Shirui, Cambria E, et al. A survey on knowledge graphs: representation, acquisition, and applications[J]. IEEE Trans on Neural Networks and Learning Systems, 2022,33(2): 494-514.
[7]Liu Hao, Zhou Shuwang, Chen Changfang, et al. Dynamic know-ledge graph reasoning based on deep reinforcement learning[J]. Knowledge-Based Systems, 2022,241: 108235.
[8]于鐵忠, 羅婧, 王利琴, 等. 融合TuckER嵌入和強(qiáng)化學(xué)習(xí)的知識(shí)推理[J]. 計(jì)算機(jī)系統(tǒng)應(yīng)用, 2022,31(9): 127-135. (Yu Tiezhong, Luo Jing, Wang Liqin, et al. Knowledge reasoning combining tucker embedding and reinforcement learning[J]. Computer Systems & Applications, 2022,31(9): 127-135.)
[9]Bordes A, Usunier N, Alberto G, et al. Translating embeddings for modeling multi-relational data[C]//Proc of Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2013: 2787-2795.
[10]Wang Zhen, Zhang Jianwen, Feng Jianlin, et al. Knowledge graph embedding by translating on hyperplanes[C]//Proc of the 28th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2014: 1112-1119.
[11]Trouillon T, Welbl J, Riedel S, et al. Complex embeddings for simple link prediction[C]//Proc of the 33rd International Conference on International Conference on Machine Learning.[S.l.]: JMLR.org, 2016: 2071-2080.
[12]Dettmers T, Minervini P, Stenetorp P, et al. Convolutional 2D know-ledge graph embeddings[C]//Proc of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018: 1811-1818.
[13]Wang Zikang, Li Linjing, Zeng Dajun. SRGCN: graph-based multi-hop reasoning on knowledge graphs[J]. Neurocomputing, 2021, 454: 280-290.
[14]Zhang Ying, Meng Fandong, Zhang Jinchao, et al. MKGN: a multi-dimensional knowledge enhanced graph network for multi-hop question and answering[J]. IEICE Trans on Information and Systems, 2022, E105.D(4): 807-819.
[15]Li Dengao, Miao Shuyi, Zhao Baofeng, et al. ConvHiA: convolutio-nal network with hierarchical attention for knowledge graph multi-hop reasoning[J]. International Journal of Machine Learning and Cybernetics, 2023,14: 2301-2315.
[16]Du Zhengxiao, Zhou Chang, Yao Jiangchao, et al. CogKR: cognitive graph for multi-hop knowledge reasoning[J]. IEEE Trans on Knowledge and Data Engineering, 2021,35(2): 1283-1295.
[17]Tiwari P, Zhu Hongyin, Pandey H M. DAPath: distance-aware know-ledge graph reasoning based on deep reinforcement learning[J]. Neural Networks, 2021,135(5-6): 1-12.
[18]Xiao Yi, Lan Mingjing, Luo Junyong, et al. Iterative rule-guided reasoning over sparse knowledge graphs with deep reinforcement learning[J]. Information Processing and Management, 2022,59(5): 103040.
[19]Li Shuangyin, Wang Heng, Pan Rong, et al. MemoryPath: a deep reinforcement learning framework for incorporating memory component into knowledge graph reasoning[J]. Neurocomputing, 2021, 419: 273-286.
[20]Saebi M, Krieg S, Zhang Chuxu, et al. Heterogeneous relational reasoning in knowledge graphs with reinforcement learning[J]. Information Fusion, 2022, 88: 12-21.
[21]Paulo H, Jemin G, Mou Shaoshuai. Distributed offline reinforcement learning[C]//Proc of the 61st Conference on Decision and Control. Piscataway, NJ: IEEE Press, 2022: 4621-4626.
[22]Kristina T, Danqi C, Patrick P, et al. Representing text for joint embedding of text and knowledge bases[C]//Proc of Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 1499-1509.
[23]Li Zixuan, Jin Xiaolong, Guan Saiping, et al. Path reasoning over knowledge graph: a multi-agent and reinforcement learning based method[C]//Proc of IEEE International Conference on Data Mining Workshops. Piscataway, NJ: IEEE Press, 2018: 929-936.
[24]Lin X V, Socher R, Xiong Caiming. Multi-hop knowledge graph reasoning with reward shaping [EB/OL]. (2018-09-11). https://arxiv.org/abs/1808.10568.
[25]Adnan Z, Summaya S, Junde C, et al. Complex graph convolutional network for link prediction in knowledge graphs[J]. Expert Systems with Applications, 2022,200: 116796.
[26]Feng Jianzhou, Wei Qikai, Cui Jinman, et al. Novel translation knowledge graph completion model based on 2D convolution[J]. Applied Intelligence, 2022, 52(3): 3266-3275.
[27]Zhang Denghui, Yuan Zixuan, Liu Hao, et al. Learning to walk with dual agents for knowledge graph reasoning[C]//Proc of the 36th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2022: 5932-5941.