易 詩,李俊杰,賈 勇
基于紅外熱成像的夜間農(nóng)田實(shí)時(shí)語義分割
易 詩,李俊杰,賈 勇
(成都理工大學(xué)信息科學(xué)與技術(shù)學(xué)院(網(wǎng)絡(luò)安全學(xué)院、牛津布魯克斯學(xué)院),成都 610059)
農(nóng)田環(huán)境實(shí)時(shí)語義分割是構(gòu)成智能農(nóng)機(jī)的視覺環(huán)境感知的重要環(huán)節(jié),夜間農(nóng)田語義分割可以使智能農(nóng)機(jī)在夜間通過視覺感知農(nóng)田環(huán)境進(jìn)行全天候作業(yè),而夜間無光環(huán)境下,可見光攝像頭成像效果較差,將造成語義分割精度的下降。為保證夜間農(nóng)田環(huán)境下紅外圖像語義分割的精度與實(shí)時(shí)性,該研究提出了一種適用于紅外圖像的紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(Infrared Real-time Bilateral Semantic Segmentation Network,IR-BiSeNet),根據(jù)紅外圖像分辨率低,細(xì)節(jié)模糊的特點(diǎn)該網(wǎng)絡(luò)在實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(Bilateral Semantic Segmentation Net,BiSeNet)結(jié)構(gòu)基礎(chǔ)上進(jìn)行改進(jìn),在其空間路徑上,進(jìn)一步融合紅外圖像低層特征,在該網(wǎng)絡(luò)構(gòu)架中的注意力提升模塊、特征融合模塊上使用全局最大池化層替換全局平均池化層以保留紅外圖像紋理細(xì)節(jié)信息。為驗(yàn)證提出方法的有效性,通過在夜間使用紅外熱成像采集的農(nóng)田數(shù)據(jù)集上進(jìn)行試驗(yàn),數(shù)據(jù)集分割目標(biāo)包括田地、行人、植物、障礙物、背景。經(jīng)試驗(yàn)驗(yàn)證,提出方法在夜間農(nóng)田紅外數(shù)據(jù)集上達(dá)到了85.1%的平均交并比(Mean Intersection over Union,MIoU),同時(shí)達(dá)到40幀/s的處理速度,滿足對(duì)夜間農(nóng)田的實(shí)時(shí)語義分割。
智能農(nóng)機(jī);語義分割;紅外熱成像;紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò);夜間農(nóng)田數(shù)據(jù)集
對(duì)于智能農(nóng)機(jī)的視覺導(dǎo)航與環(huán)境感知,語義分割能夠起到通過視覺圖像理解農(nóng)機(jī)周圍環(huán)境的作用。然而在夜間或雨霧煙塵等成像條件惡劣的情況下,單一使用可見光圖像進(jìn)行語義分割將導(dǎo)致智能農(nóng)機(jī)視覺感知能力下降。紅外熱成像系統(tǒng)成像原理為物體的溫差,其不依賴光源,受天氣影響小,探測(cè)距離遠(yuǎn)[1]。對(duì)夜間無光,存在雨霧等環(huán)境下成像更為清晰,穩(wěn)定。使用紅外熱成像圖像進(jìn)行農(nóng)田環(huán)境實(shí)時(shí)語義分割可以協(xié)助智能農(nóng)機(jī)在可見光成像條件不理想情況下感知所處的農(nóng)田環(huán)境,使其在特定季節(jié),特定環(huán)境進(jìn)行全天候作業(yè),提高農(nóng)業(yè)生產(chǎn)效率與智能化程度。
目前實(shí)時(shí)語義分割技術(shù)主要用于自動(dòng)駕駛領(lǐng)域,可以幫助自動(dòng)駕駛汽車通過可見光視覺傳感器分割車道、車輛、人和其他感興趣目標(biāo)或區(qū)域,以理解車輛所處環(huán)境。近年來應(yīng)用于自動(dòng)駕駛的語義分割方法在城市街道環(huán)境的可見光圖像上取得了良好的效果。Yang等[2]提出通過稠密金字塔特征提取結(jié)構(gòu)(Dense Atrous Spatial Pyramid Pooling,DenseASPP)改進(jìn)語義分割精度的語義分割網(wǎng)絡(luò),通過金字塔特征提取結(jié)構(gòu)優(yōu)化了城市街景的語義分割精度,Chen等[3]提出通過編解碼網(wǎng)絡(luò)結(jié)構(gòu)(Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation)提升語義分割精度網(wǎng)絡(luò),通過編碼解碼結(jié)構(gòu)進(jìn)一步提升了自動(dòng)駕駛環(huán)境下城市街景的語義分割精度,Wu等[4]提出的上下文柵格語義分割網(wǎng)絡(luò)(Context Grided Network,CGNet)使用輕量化構(gòu)架與上下文聯(lián)合特征提高了街景語義分割的實(shí)時(shí)性,Li等[5]提出的基于輕量級(jí)主干網(wǎng)絡(luò),多尺度特征提取的實(shí)時(shí)性語義分割網(wǎng)絡(luò)(Deep Feature Aggregation for real-time semantic segmentation Net,DFANet)分割城市道路街景,在各個(gè)公開城市道路街景上數(shù)據(jù)集上取得了語義分割精度與實(shí)時(shí)性的較好均衡。Yu等[6]提出的雙邊語義分割網(wǎng)絡(luò)(Bilateral Semantic segmentation Net,BiSeNet),該網(wǎng)絡(luò)采用空間路徑與上下文路徑雙邊結(jié)構(gòu),兼顧了語義分割精度與實(shí)時(shí)性,同時(shí)網(wǎng)絡(luò)結(jié)構(gòu)簡(jiǎn)單利于后期優(yōu)化。
自動(dòng)駕駛實(shí)時(shí)語義分割需要考慮夜間無光環(huán)境與雨霧等惡劣氣候條件下的成像系統(tǒng),因此,運(yùn)用紅外熱成像圖像進(jìn)行夜間環(huán)境自動(dòng)駕駛語義分割的研究工作近年來取得一定進(jìn)展。王晨等[7]以深度卷積神經(jīng)網(wǎng)絡(luò)為基礎(chǔ),結(jié)合條件隨機(jī)場(chǎng)后處理優(yōu)化模型,搭建端到端的紅外語義分割算法框架并進(jìn)行訓(xùn)練,對(duì)夜間街景紅外圖像進(jìn)行語義分割能實(shí)現(xiàn)圖像的像素級(jí)分類,并獲得較高的預(yù)測(cè)精度。從而獲得紅外圖像中景物的形狀、種類、位置分布等信息,實(shí)現(xiàn)紅外場(chǎng)景的語義理解。吳駿逸等[8]針對(duì)夜間道路場(chǎng)景解析困難的問題,提出了一種聯(lián)合可見光與紅外熱像圖實(shí)現(xiàn)夜間場(chǎng)景語義分割的方法。對(duì)雙譜圖像進(jìn)行自適應(yīng)直方圖均衡及雙邊濾波,并利用基于雙譜圖像信息的稠密條件隨機(jī)場(chǎng)對(duì)語義分割結(jié)果進(jìn)行優(yōu)化,對(duì)夜間道路場(chǎng)景進(jìn)行更準(zhǔn)確的解析。
相對(duì)城市道路的自動(dòng)駕駛語義分割技術(shù),智能農(nóng)機(jī)視覺感知中的語義分割技術(shù)發(fā)展較晚,且農(nóng)田環(huán)境地形更為復(fù)雜,目前運(yùn)用于農(nóng)田環(huán)境中的實(shí)時(shí)語義分割處于起步階段,當(dāng)前研究工作較少。李云伍等[9]研制了一種在丘陵山區(qū)田間道路上自主行駛的轉(zhuǎn)運(yùn)車及其視覺導(dǎo)航系統(tǒng)。在其視覺導(dǎo)航模塊中,針對(duì)丘陵山區(qū)復(fù)雜的田間道路場(chǎng)景,提出了基于改進(jìn)空洞卷積神經(jīng)網(wǎng)絡(luò)的田間道路場(chǎng)景圖像語義分割模型,以有效地對(duì)丘陵地區(qū)田間地形進(jìn)行分割。尚未出現(xiàn)使用紅外熱成像圖像的夜間農(nóng)田環(huán)境實(shí)時(shí)語義分割技術(shù)方面研究,而使用紅外圖像對(duì)夜間農(nóng)田復(fù)雜環(huán)境進(jìn)行實(shí)時(shí)語義分割,將提高夜間智能農(nóng)機(jī)的視覺感知能力與環(huán)境理解能力,有助于智能農(nóng)機(jī)全天候智能化作業(yè)。使用紅外熱成像進(jìn)行夜間田地環(huán)境實(shí)時(shí)語義分割主要涉及以下4個(gè)問題:1)農(nóng)田環(huán)境的復(fù)雜性。2)紅外熱成像圖像分辨率低,細(xì)節(jié)特征模糊,缺乏色彩特征。3)語義分割網(wǎng)絡(luò)模型計(jì)算量大,難以達(dá)到實(shí)時(shí)性要求。4)缺乏公開夜間農(nóng)田環(huán)境紅外數(shù)據(jù)集。為解決以上問題,本文使用戶外熱像儀采集制作適用于夜間農(nóng)田環(huán)境視覺導(dǎo)航的紅外數(shù)據(jù)集,結(jié)合紅外熱成像夜間對(duì)田地環(huán)境的成像與基于深度學(xué)習(xí)的實(shí)時(shí)語義分割提出了一種夜間農(nóng)田實(shí)時(shí)語義分割方法。針對(duì)紅外圖像分辨率低,細(xì)節(jié)特征模糊,缺乏色彩特征特點(diǎn)[10]以及智能農(nóng)機(jī)視覺導(dǎo)航的實(shí)時(shí)性要求,在雙邊語義分割網(wǎng)絡(luò)結(jié)構(gòu)基礎(chǔ)上改進(jìn)提出適用于紅外圖像的紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(Infrared Real-time Bilateral Semantic segmentation Net,IR-BiSeNet)。該網(wǎng)絡(luò)根據(jù)紅外圖像特點(diǎn)進(jìn)行改進(jìn),改進(jìn)方法包括:1)根據(jù)參考文獻(xiàn)[11]中提到的全局最大池化層相對(duì)全局平均池化層更能有效保留圖像紋理信息,因此在網(wǎng)絡(luò)的注意力提升模塊,特征融合模塊上使用全局最大池化層替換全局平均池化層以保留紅外圖像紋理細(xì)節(jié)信息。2)在獲取空間信息路徑上進(jìn)一步融合紅外圖像低層特征。
為進(jìn)行夜間農(nóng)田實(shí)時(shí)語義分割,需要大量夜間環(huán)境下農(nóng)田紅外熱成像圖像建立數(shù)據(jù)集,而目前無公開的夜間農(nóng)田環(huán)境紅外熱成像數(shù)據(jù)集,本文使用與智能農(nóng)機(jī)相類似的采集平臺(tái),即搭載于車輛前端的云臺(tái)+紅外熱成像儀進(jìn)行夜間環(huán)境農(nóng)田的紅外數(shù)據(jù)采集。通過該采集平臺(tái)獲取夜間農(nóng)田紅外數(shù)據(jù)集包括各種夜間復(fù)雜農(nóng)田環(huán)境的紅外圖像。為采集建立適用于智能農(nóng)機(jī)夜間視覺導(dǎo)航中語義分割的紅外數(shù)據(jù)集,使用該平臺(tái)在2019年秋季夜間野外進(jìn)行實(shí)地采集,采集地點(diǎn)為中國四川省農(nóng)業(yè)大縣梓潼縣,鹽亭縣,劍閣縣以上區(qū)域的地形主要為丘陵山地農(nóng)田,具有中國西南地區(qū)丘陵地形農(nóng)田環(huán)境地形起伏,田間障礙物較多,植被種類較繁雜的典型性。
實(shí)地采集的農(nóng)田環(huán)境紅外數(shù)據(jù)集包括60余個(gè)夜間農(nóng)田典型場(chǎng)景下的9 000張實(shí)拍紅外熱成像圖像。為降低對(duì)計(jì)算機(jī)顯存的需求,將采集到的圖像像素全部縮放為512像素×512像素。
為保證網(wǎng)絡(luò)模型訓(xùn)練的魯棒性與泛化性,需通過數(shù)據(jù)集增廣方法獲取更大量的數(shù)據(jù)集。紅外數(shù)據(jù)集的增廣根據(jù)文獻(xiàn)[12]方法,將采集數(shù)據(jù)集進(jìn)行水平、垂直方向上的反轉(zhuǎn)以及38°~56°的隨機(jī)旋轉(zhuǎn)處理,數(shù)據(jù)集增廣方法如圖1所示,采用這3種方式,每種方式可擴(kuò)充2倍的數(shù)據(jù)量,對(duì)每個(gè)圖像樣本隨機(jī)采用上述一種增廣方式,由此將數(shù)據(jù)擴(kuò)充為原始數(shù)據(jù)集的2倍。按7.5∶1.5∶1比例劃分訓(xùn)練集、驗(yàn)證集與測(cè)試集。其中驗(yàn)證集與測(cè)試集使用原始圖像。
圖1 紅外數(shù)據(jù)集增廣方法
根據(jù)智能農(nóng)機(jī)視覺導(dǎo)航中主要需要分割目標(biāo),將數(shù)據(jù)集中夜間田間對(duì)象的分割類別分為:行人、植被、障礙物、田地、背景5個(gè)類別。在圖像縮放后使用Labelme工具[13]進(jìn)行手工標(biāo)注。語義分割任務(wù)在標(biāo)注分割類別時(shí),要求對(duì)需要分割的每個(gè)類別賦予不同顏色掩膜標(biāo)簽(Mask),因此每種類別掩膜標(biāo)簽對(duì)應(yīng)不同標(biāo)注色的RGB值對(duì)應(yīng)的標(biāo)注色如表1所示,類別標(biāo)注圖像如圖2所示。
表1 各類別對(duì)應(yīng)標(biāo)注色
圖2 類別標(biāo)注圖像
本研究提出適用于紅外圖像的紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(IR-BiSeNet),該網(wǎng)絡(luò)根據(jù)紅外圖像分辨率低,細(xì)節(jié)模糊的特點(diǎn)在實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(BiSeNet)結(jié)構(gòu)基礎(chǔ)上進(jìn)行改進(jìn),使其適用于夜間農(nóng)田環(huán)境下紅外圖像的語義分割。
1.4.1 實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)結(jié)構(gòu)
實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(BiSeNet)結(jié)構(gòu)如圖3所示,為保證不犧牲空間信息的前提下實(shí)現(xiàn)快速的實(shí)時(shí)分割,實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(BiSeNet)結(jié)構(gòu)劃分為2個(gè)分支:空間路徑(Spatial Path,SP)與上下文路徑(Context Path,CP)。
注:7×7、3×3、1×1代表為卷積核大小。下同。
SP結(jié)構(gòu)用于提取高分辨率特征圖,獲取精確的空間信息。CP結(jié)構(gòu)用于獲得大的感受野,為保證實(shí)時(shí)性,減少計(jì)算量,采用輕量特征提取網(wǎng)絡(luò),如淺層殘差網(wǎng)絡(luò)(Residual Network-50,ResNet-50),結(jié)合全局池化操作,合并ResNet-50的中間結(jié)果(16倍下采樣、32倍下采樣)與全局池化的輸出,作為該部分輸出。
BiSeNet在上下文路徑中使用注意力提升模塊(Attention refinement model,Arm)以提高精度,注意力提升模型結(jié)構(gòu)如圖3所示,其首先使用全局平均池化獲取注意力向量,對(duì)該向量進(jìn)行1×1卷積操作、歸一化和非線性操作,并將該輸出與原特征圖相乘。對(duì)ResNet-50不同層特征進(jìn)行提升,且并未增加過多參數(shù)與計(jì)算量。
BiSeNet使用特征融合模塊(Feature fusion module,F(xiàn)fm)以結(jié)合SP與CP輸出的特征。特征融合模塊結(jié)構(gòu)如圖3所示,該模塊首先將2個(gè)路徑輸出進(jìn)行連接,并將連接的特征進(jìn)行3×3卷積、批歸一化、線性整流操作,將該輸出全局平均池化獲取特征向量,經(jīng)過1×1卷積、線性整流函數(shù)、S型函數(shù)。計(jì)算連接特征,如式(1)所示
式中為3×3卷積+批歸一化+線性整流輸出,為S型函數(shù)非線性輸出,得到FFM的輸出特征。
1.4.2 紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)結(jié)構(gòu)
在實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)結(jié)構(gòu)基礎(chǔ)上,本文針對(duì)夜間農(nóng)田紅外圖像語義分割提出紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)結(jié)構(gòu)(IR-BiSeNet)。
在實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)空間路徑上,根據(jù)參考文獻(xiàn)[14]處理紅外圖像的思路,提出了一種通過增加池化層,反卷積層,全連接層的方式融合紅外圖像低層特征,從而使網(wǎng)絡(luò)能更好恢復(fù)紅外圖像空間分辨率,提供更大的感受視野。紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)結(jié)構(gòu)如圖4所示。對(duì)實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)上下文路徑中的注意提升模型以及融合兩路特征的特征融合模塊,根據(jù)紅外圖像細(xì)節(jié)模糊、對(duì)比度低的特點(diǎn),在模塊網(wǎng)絡(luò)構(gòu)架的每個(gè)池化層部分使用全局最大池化層替換全局平均池化層以保留紅外圖像紋理細(xì)節(jié)信息。
考慮到紅外圖像特點(diǎn)以及系統(tǒng)實(shí)時(shí)性,上下文路徑采用ResNet-50特征提取網(wǎng)絡(luò),結(jié)合全局池化操作,合并ResNet-50中間結(jié)果(16倍下采樣、32倍下采樣)與全局池化的輸出。
紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)在上下文路徑中為提高精度所使用的紅外注意力提升模塊(IR-Attention refinement model,IR-Arm)如圖4所示。
紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)融合空間路徑與上下文路徑輸出的特征所使用的紅外特征融合模塊(IR-Feature fusion module,IR-Ffm)如圖4所示。
1.4.3 紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)損失函數(shù)
本文提出的紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)使用主損失函數(shù)監(jiān)督整個(gè)網(wǎng)絡(luò)的輸出,此外采用附加損失函數(shù)監(jiān)督上下文路徑輸出[15-17]。
主損失函數(shù)采用Softmax損失函數(shù),如式(2)所示
式中N為需要預(yù)測(cè)的類別總數(shù),pi為樣本i的網(wǎng)絡(luò)預(yù)測(cè)輸出,pj為樣本j的網(wǎng)絡(luò)預(yù)測(cè)輸出。
監(jiān)督上下文路徑輸出的輔助損失函數(shù),如式(3)所示
式中l是全連接層的輸出損失函數(shù),l為第階段的輔助損失函數(shù)其輸入為X,X為ResNet-50網(wǎng)絡(luò)第階段特征值。為聯(lián)合損失函數(shù)[18-21]。
為驗(yàn)證本文提出方法對(duì)夜間農(nóng)田紅外圖像分割的有效性,設(shè)圖像中分割類別為1,p為本屬于類但被預(yù)測(cè)為類的像素?cái)?shù)量,p為本屬于類但被預(yù)測(cè)為類的像素?cái)?shù)量,p為真實(shí)像素?cái)?shù)量,采用評(píng)價(jià)實(shí)時(shí)語義分割的指標(biāo)如下:
1)平均交并比(Mean Intersection over Union,MIoU),即預(yù)測(cè)區(qū)域和實(shí)際區(qū)域交集除以預(yù)測(cè)區(qū)域和實(shí)際區(qū)域的并集[22-23],計(jì)算如式(4)所示
2)像素精度(Pixel Accuracy,PA),標(biāo)注正確的像素占總像素的比例[24],計(jì)算如式(5)
3)平均像素精度(Mean Pixel Accuracy,MPA),所有類的平均像素精度[24-26],計(jì)算如式(6)
4)幀率(Frames Per Second,F(xiàn)PS),衡量語義分割算法實(shí)時(shí)性。計(jì)算如式(7)
式中N為視頻幀數(shù),為消耗時(shí)間,s。
本文提出的紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(IR BiSeNet)在Win10+tensorflow1.9.0+CUDA9.2+VS2017+ opencv4.0/Core i7-8750H 2.2Ghz處理器+16GB內(nèi)存+Geforce GTX 1080 8GB顯卡軟/硬件平臺(tái)上使用相同夜間農(nóng)田語義分割紅外數(shù)據(jù)集進(jìn)行訓(xùn)練,采用ResNet-50預(yù)訓(xùn)練模型初始化部分參數(shù),RMSprop優(yōu)化器優(yōu)化所有參數(shù),設(shè)置學(xué)習(xí)率為0.000 1,衰減率為0.995。隨機(jī)初始化深度網(wǎng)絡(luò)各層參數(shù)。訓(xùn)練批次設(shè)置為200,一次訓(xùn)練所選取的樣本數(shù)設(shè)置為5[27-30],在120批次(120 epochs)訓(xùn)練后生成最佳語義分割模型。
為驗(yàn)證本文提出方法對(duì)夜間農(nóng)田紅外圖像分割的優(yōu)勢(shì),將本文提出方法與現(xiàn)行5種可見光環(huán)境下代表性語義分割框架(BiSeNet、DenseASPP、DeeplabV3+、DFANet、CGNet)在同一數(shù)據(jù)集下,采用相同的訓(xùn)練參數(shù)進(jìn)行訓(xùn)練,所生成的模型進(jìn)行對(duì)比測(cè)試。
對(duì)于每一類需要分割的目標(biāo)交并比(IoU),IR-BiSeNet相對(duì)其他5種方法在測(cè)試集上測(cè)試結(jié)果統(tǒng)計(jì)如表2所示。
由每一類別需分割目標(biāo)交并比測(cè)試結(jié)果可知,IR- BiSeNet相對(duì)其他5種方法,對(duì)5類夜間農(nóng)田環(huán)境需分割的目標(biāo)上均取得最高的交并比,對(duì)背景、行人、植被、障礙物、農(nóng)田的語義分割分別達(dá)到75.3%、88.6%、.3%、86.2%、85.6%的交并比。因此,本文提出的紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)對(duì)夜間農(nóng)田紅外圖像中各類目標(biāo)的語義分割精度上具有優(yōu)勢(shì)。
綜合測(cè)試環(huán)節(jié),使用測(cè)試集中實(shí)拍的夜間農(nóng)田紅外圖像進(jìn)行對(duì)比測(cè)試,從像素精度(PA)、平均像素精度(MPA)、平均交并比(MIoU)、幀率(FPS)進(jìn)行對(duì)比。本文提出方法相對(duì)其他5種方法的對(duì)比測(cè)試結(jié)果如表3所示。
表2 不同方法下不同類別的目標(biāo)交并比(IoU)對(duì)比
注:CGNet為上下文柵格語義分割網(wǎng)絡(luò);DFANet為多尺度特征提取的實(shí)時(shí)性語義分割網(wǎng)絡(luò);DeeplabV3+為編解碼金字塔結(jié)構(gòu)網(wǎng)絡(luò)語義分割網(wǎng)絡(luò);DenseASPP為稠密金字塔特征語義分割網(wǎng)絡(luò);BiSeNet為雙邊語義分割網(wǎng)絡(luò);IR-BiSeNet為紅外實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)。
Note: CGNet: Context Grided Network; DFANet: Deep Feature Aggregation for real-time semantic segmentation Net; DeeplabV3+: Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation ; DenseASPP: Dense Atrous Spatial Pyramid Pooling; BiSeNet: Bilateral Semantic Segmentation Net; IR-BiSeNet: Infrared Real-time Bilateral Semantic Segmentation Network.
由表3的對(duì)比測(cè)試指標(biāo)分析可以看出,在實(shí)時(shí)性方面,本文提出的IR-BiSeNet網(wǎng)絡(luò)與DenseASPP網(wǎng)絡(luò)和 DeeplabV3+網(wǎng)絡(luò)相比,幀率(FPS)分別提高4和28幀/s。由于加深了網(wǎng)絡(luò)深度,低于改進(jìn)前BiSeNet網(wǎng)絡(luò)2.5幀/s,同時(shí)低于對(duì)實(shí)時(shí)性特別優(yōu)化的DFANet網(wǎng)絡(luò),CGNet網(wǎng)絡(luò)28和13幀/s,文獻(xiàn)[4-6]中提出,幀率達(dá)25幀/s以上即可認(rèn)定算法具備實(shí)時(shí)性,因此本文提出方法可滿足語義分割的實(shí)時(shí)性要求。但在夜間農(nóng)田語義分割精度方面,本文提出的IR-BiSeNet網(wǎng)絡(luò)在像素精度(PA),平均像素精度(MPA),平均交并比(MIoU)3項(xiàng)指標(biāo)上與其他5種語義分割方法對(duì)比具有明顯優(yōu)勢(shì),例如與BiSeNet 網(wǎng)絡(luò)相比,本文提出方法在像素精度上提高了 8.4個(gè)百分點(diǎn),在平均像素精度上提高了10.3個(gè)百分點(diǎn),在平均交并比上提高了9.8個(gè)百分點(diǎn)。
表3 不同方法性能指標(biāo)對(duì)比
IR-BiSeNet保留紅外圖像紋理特征方面做了優(yōu)化,能夠保留更多夜間農(nóng)田紅外圖像語義分割細(xì)節(jié)。與其他5種方法對(duì)測(cè)試集中夜間農(nóng)田環(huán)境紅外圖像分割結(jié)果對(duì)比如圖5所示,由圖5可以看出對(duì)于夜間農(nóng)田環(huán)境的各類分割目標(biāo)細(xì)節(jié)分割效果,IR-BiSeNet分割結(jié)果相對(duì)其他方法分割效果更好,更接近于真實(shí)標(biāo)注圖像。
圖5 不同語義分割方法效果對(duì)比
針對(duì)智能農(nóng)機(jī)夜間視覺導(dǎo)航與視覺感知,本文提出了一種基于紅外熱成像的夜間農(nóng)田實(shí)時(shí)語義分割網(wǎng)絡(luò)(IR-BiSeNet),該網(wǎng)絡(luò)在實(shí)時(shí)雙邊語義分割網(wǎng)絡(luò)(BiSeNet)結(jié)構(gòu)基礎(chǔ)上,在其空間信息路徑上進(jìn)一步融合紅外圖像低層特征,在其注意提升模型、特征融合模塊上使用全局最大池化層替換全局平均池化層。采集制作了夜間農(nóng)田紅外數(shù)據(jù)集,在該數(shù)據(jù)集上,本文方法達(dá)到了85.1%的平均交并比(MIoU),同時(shí)達(dá)到40幀/s的處理速度,滿足對(duì)夜間農(nóng)田的實(shí)時(shí)語義分割。在未來工作中,將采用更高效的語義分割網(wǎng)絡(luò)針對(duì)紅外圖像特點(diǎn)進(jìn)行改進(jìn),同時(shí)擴(kuò)充夜間農(nóng)田紅外數(shù)據(jù)集,以達(dá)到更好的實(shí)時(shí)語義分割效果。
[1]崔美玉. 論紅外熱像儀的應(yīng)用領(lǐng)域及技術(shù)特點(diǎn)[J]. 中國安防,2014(12):90-93. Cui Meiyu. On the application field and technical characteristics of infrared thermal imager[J]. China Security, 2014(12): 90-93. (in Chinese with English abstract)
[2]Yang Maoke, Yu Kun, Chi Zhang, et al. DenseASPP for semantic segmentation in street scenes[C]. IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3684-3692.
[3]Chen L C, Papandreou G, Schro F, et al. Rethinking atrous convolution for semantic image segmentation[C].IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018.
[4]Wu Tianyi, Tang Sheng. CGNet: A Light-weight context guided network for semantic segmentation[C].IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018.
[5]Li Hanchao, Xiong Pengfei, Fan Haoqiang, et al. DFANet: Deep feature aggregation for real-time semantic segmentation[C].IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019.
[6]Yu Changqian, Wang Jingbo, Peng Chao, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[C]. IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018.
[7]王晨,湯心溢. 基于深度卷積神經(jīng)網(wǎng)絡(luò)的紅外場(chǎng)景理解算法[J]. 紅外技術(shù),2019,39(8):728-733. Wang Chen, Tang Xinyi. Infrared scene understanding algorithm based on deep convolutional neural network[J]. Infrared Technology, 2019, 39(8): 728-733. (in Chinese with English abstract)
[8]吳駿逸,谷小婧,顧幸生. 基于可見光/紅外圖像的夜間道路場(chǎng)景語義分割[J]. 華 東 理 工 大 學(xué) 學(xué) 報(bào):自然科學(xué)版,2019,45(2):301-309. Wu Junyi, Gu Xiaojing, Gu Xingsheng. Night road scene semantic segmentation based on visible and infrared thermal images[J]. Journal of East China University of Science and Technology: Natural Science Edition, 2019, 45(2): 301-309(in Chinese with English abstract)
[9]李云伍,徐俊杰,劉得雄,等. 基于改進(jìn)空洞卷積神經(jīng)網(wǎng)絡(luò)的丘陵山區(qū)田間道路場(chǎng)景識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(7):150-159. Li Yunwu, Xu Junjie, Liu Dexiong, et al. Field road scene recognition in hilly regions based on improved dilated convolutional networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(7): 150-159. (in Chinese with English abstract)
[10]易詩,李欣榮,吳志娟,等. 基于紅外熱成像與改進(jìn)YOLOV3的夜間野兔監(jiān)測(cè)方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(19):223-229. Yi Shi, Li Xinrong, Wu Zhijuan, et al. Night hare detection method based on infrared thermal imaging and improved YOLOV3[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(19): 223-229. (in Chinese with English abstract)
[11]昌杰. 基于深度神經(jīng)網(wǎng)絡(luò)的腫瘤圖像分析與處理[D]. 合肥:中國科學(xué)技術(shù)大學(xué),2019. Chang Jie. Tumor Image Analysis and Processing Based on Deep Neural Network[D]. Hefei: University of Science and Technology of China, 2019. (in Chinese with English abstract)
[12]Li Xiangyuan, Cheng Cai, Zhang Ruifei, et al. Deep cascaded convolutional models for cattle pose estimation[J]. Computers and Electronics in Agriculture. 2019, 164: 104885.
[13]Russell B C, Torralba A, Murphy K P, et al. LabelMe: Adatabase and web-based tool for image annotation[J]. International Journal of Computer Vision, 2008, 77(1-3): 157-173.
[14]He Zewei, Cao Yanpeng, Dong Yafei et al. Single-image-based nonuniformity correction of uncooled long-wave infrared detectors: A deep-learning approach[J]. Applied Optics. 2018, 57(18):155-164.
[15]Amorim W P , Tetila E C , Pistori H , et al. Semi-supervised learning with convolutional neural networks for UAV imagesautomatic recognition[J]. Computers and Electronics in Agriculture. 2019, 164: 104932.
[16]Barth R , Ijsselmuiden J , Hemming J , et al. Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation[J]. Computersand Electronics in Agriculture. 2019, 161: 291–304.
[17]Axel-Christian G , Moulay A. Deep learning enhancement of infrared face images using generative adversarial networks[J]. Applied Optics, 2018, 57(18):98-105.
[18]Satoru K, Adam M, Abhijit M, et al. Three-dimensional integral imaging and object detection using long-wave infrared imaging[J]. Applied Optics, 2017, 56(9): 120-126.
[19]Kuang Xiaodong, Sui Xiubao, Liu Yuan, et al. Single infrared image enhancement using a deep convolutional neural network[J]. Neurocomputing, 2019, 332: 119-128.
[20]李云伍,徐俊杰,王銘楓,等. 丘陵山區(qū)田間道路自主行駛轉(zhuǎn)運(yùn)車及其視覺導(dǎo)航系統(tǒng)研制[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(1):52-61. Li Yunwu, Xu Junjie, Wang Mingfeng, et al. Development of autonomous driving transfer trolley on field roads and its visual navigation system for hilly areas[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(1): 52-61. (in Chinese with English abstract)
[21]Liu Zihao, Jia Xiaojun, Xu Xinsheng. Study of shrimp recognition methods using smart networks[J]. Computers and Electronics in Agriculture, 2019, 165: 104926.
[22] Tian Mengxiao, Guo Hao, Chen Hong, et al. Automated pig counting using deep learning[J]. Computers and Electronics in Agriculture., 2019, https: //doi. org/10. 1016/j. compag. 2019. 05. 049.
[23]孫哲,張春龍,葛魯鎮(zhèn),等. 基于 Faster R-CNN 的田間西蘭花幼苗圖像檢測(cè)方法[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2019,50(7):216-221. Sun Zhe, Zhang Chunlong, Ge Luzhen, et al. Image detection method for broccoli seedlings in the field based on Faster R-CNN[J]. Transactions of the Chinese Society for Agricultural Machinery, 2019, 50(7): 216-221. (in Chinese with English abstract)
[24]Zhang Shanwen, Zhang Subing, Zhang Chuanlei, et al. Cucumber leaf disease identification with global pooling dilated convolutional neural network[J]. Computers and Electronics in Agriculture, 2019, 162: 422-430.
[25]Kounalakisa T, Triantafyllidisb G A, Nalpantidis L. Deep learning-based visual recognition of rumex for robotic precision farming[J]. Computers and Electronics in Agriculture, 2019, 165: 104973.
[26]Kapoor A J, Fan H, Sardar M S. Intelligent detection using convolutional neural network (ID-CNN)[C]. IOP Conference Series: Earth and Environmental Science, Hubei: IOP Publishing, 2019.
[27]Li X, Orchard M T. New edge-directed interpolation[J]. IEEE Transactions on Image Processing, 2001, 10(10): 1521-1527.
[28]Yu Yang, Zhang Kailiang, Yang Li, et al. Fruit detection for strawberry harvesting robot in non-structural environment based on Mask-RCNN[J]. Computers and Electronics in Agriculture, 2019, 163: 104846.
[29]Ye Xujun, Sakai K, Manago M, et al. Prediction of citrus yield from airborne hyperspectral imagery[J]. Precision Agriculture, 2007, 8(3): 111-125.
[30]Lin Tsung Yi, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[C]. Proceedings of the 13thEuropean Conference on Computer Vision, New York, USA: Springer, 2014: 740-755.
Real-time semantic segmentation of farmland at night using infrared thermal imaging
Yi Shi, Li Junjie, Jia Yong
((,),,610059,)
In intelligent agricultural machinery, automatic navigation, and visual perception technology have been developed rapidly in recent years, and they also play a vital role in intelligent modern agriculture. Therefore, real-time semantic segmentation of farmland environment become an important part of visual environment perception in the intelligent agricultural machinery. The visible light sensing equipment is mainly used for image collection. However, particularly in the dark environment at night, the deficient imaging effect of visible light cameras can result in a decrease in the accuracy of semantic segmentation. Infrared thermal imaging can offer an alternatively way in this case, due to this technology uses the temperature difference of the object for imaging, rather than the light source. Therefore, the infrared thermal imaging can be used to clearly capture the image in the dark night, rain, mist, smoke, and other visible light sensing equipment that is not suitable. In this study, a method for real-time semantic segmentation of infrared images of farmland environment at night was proposed using the infrared thermal imaging system. An infrared real-time bilateral semantic segmentation network (IR-BiSeNet) was also addressed suitable for infrared images, in order to ensure the accuracy and real-time performance of infrared image semantic segmentation in the farmland environment at night. According to the characteristics of low resolution and fuzzy details of infrared images, the network was improved based on the BiSeNet structure, and the low-level features of infrared images were further integrated in its spatial path. In the network, the global maximum pooling layer was used to replace the global average pooling layer in the attention enhancement and the feature fusion module, in order to preserve the texture details of infrared image. The infrared farmland data was collected by the infrared thermal imaging to create a dataset at night, thereby to train a semantic segmentation model suitable for the farmland environment in this case. The segmentation targets of dataset included the fields, pedestrians, plants, obstacles, backgrounds, using the data augmentation to produce the dataset of infrared night farmland. Five representative semantic segmentation methods were selected to verify the proposed method, including BiSeNet、DenseASPP、DeeplabV3+、DFANet, and CGNet. Experimental results showed that the proposed method can achieved the mean intersection over union of 85.1%, and the processing speed of 40 frames/s. The method proposed in this study can be used the infrared thermal imaging to perform real-time farmland environment semantic segmentation at night, which can greatly improve the visual perception of intelligent agricultural machinery at night.
intelligent agricultural machinery; semantic segmentation; infrared thermal imaging; infrared real-time bilateral semantic segmentation net; farmland dataset at night
易詩,李俊杰,賈勇. 基于紅外熱成像的夜間農(nóng)田實(shí)時(shí)語義分割[J]. 農(nóng)業(yè)工程學(xué)報(bào),2020,36(18):174-180.doi:10.11975/j.issn.1002-6819.2020.18.021 http://www.tcsae.org
Yi Shi, Li Junjie, Jia Yong. Real-time semantic segmentation of farmland at night using infrared thermal imaging[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(18): 174-180. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2020.18.021 http://www.tcsae.org
2020-04-27
2020-07-29
國家自然科學(xué)基金項(xiàng)目(61771096);國家大學(xué)生創(chuàng)新創(chuàng)業(yè)項(xiàng)目(201910616129)
易詩,高級(jí)實(shí)驗(yàn)師,主要從事人工智能,紅外圖像處理研究。Email:549745481@qq.com
10.11975/j.issn.1002-6819.2020.18.021
TN919.5
A
1002-6819(2020)-18-0174-07