趙斌 董長(zhǎng)元
[摘 要]圖像識(shí)別任務(wù)中,要想得到更具辨識(shí)度的特征的前提是精準(zhǔn)定位到關(guān)鍵位置,汽車的車頂、車窗、前臉為車輛最關(guān)鍵的3個(gè)部位。將一種PCB-LS方法用于車輛再識(shí)別,基于提取局部特征的思想,使用ResNet50的主干網(wǎng)絡(luò)提取特征圖,然后將特征圖平均劃分為3個(gè)部分,對(duì)于3個(gè)部位分別訓(xùn)練分類器;對(duì)于模型在訓(xùn)練集中出現(xiàn)的過擬合現(xiàn)象,采用標(biāo)簽平滑的正則化方法降低模型對(duì)訓(xùn)練集樣本的信任度,提高模型在測(cè)試集上的準(zhǔn)確率;使用VeRi776數(shù)據(jù)集進(jìn)行訓(xùn)練和測(cè)試,使用PCB-LS方法在測(cè)試集上能達(dá)到準(zhǔn)確率Rank@1、Rank@5、Rank@10分別為93.62%、96.72%、97.74%,mAP為76.17%。PCB-LS方法不僅能獲得辨識(shí)度高的特征,還有很好的泛化能力。
[關(guān)鍵詞]車輛再識(shí)別;局部特征;特征提??;標(biāo)簽平滑;泛化能力
[中圖分類號(hào)]TP391.4[文獻(xiàn)標(biāo)識(shí)碼]A
車輛重識(shí)別就是在車輛數(shù)據(jù)中檢索特定的車輛,并給出和特定車輛最相近的檢索結(jié)果。隨著深度學(xué)習(xí)在圖像識(shí)別領(lǐng)域的發(fā)展,大量的基于深度學(xué)習(xí)的車輛再識(shí)別技術(shù)應(yīng)運(yùn)而生,識(shí)別車輛不僅僅是依靠車牌信息,還可以通過整體車身的有辨識(shí)度的特征。車輛的重識(shí)別技術(shù)可以應(yīng)用在車輛行駛軌跡分析、高速公路ETC收費(fèi)稽查系統(tǒng)、在逃車輛追蹤等方面。
基于傳感器或者人工設(shè)計(jì)特征的車輛再識(shí)別方法大致分為基于傳感器的方法[1-2]、基于人工設(shè)計(jì)特征的方法[3-4]以及基于深度學(xué)習(xí)的方法。自從卷積神經(jīng)網(wǎng)絡(luò)被提出以后,深度學(xué)習(xí)在圖像識(shí)別和檢索領(lǐng)域逐漸流行起來,目前性能最好的圖像識(shí)別模型都是基于深度學(xué)習(xí)的方法提出的。Shen et al[5]提出了一種兩階段識(shí)別模型,并結(jié)合車輛空間實(shí)時(shí)信息來調(diào)整車輛重識(shí)別結(jié)果。He et al[6]提出了一種簡(jiǎn)單高效的局部關(guān)鍵特征提取模型,該方法增強(qiáng)了模型對(duì)微小差別的區(qū)分能力,識(shí)別準(zhǔn)確率獲得了較大提升。文獻(xiàn)[6-7]表明,車窗、車燈和汽車輪廓等特定位置包含更具分辨度的信息。由于車輛和行人重識(shí)別任務(wù)具有很高的相似性,部分文獻(xiàn)還將行人重識(shí)別方法應(yīng)用在車輛重識(shí)別中。Luo et al[8]將ResNet50作為基礎(chǔ)模型,使用隨機(jī)擦除、分段學(xué)習(xí)率、將Last Stride改為1等技巧提高了行人重識(shí)別準(zhǔn)確率,He et al[9]將文獻(xiàn)[8]的技巧用于車輛識(shí)別任務(wù)上,在AICITY2020中取得了96.9%Rank@1和82.0%mAP的成績(jī)。
Sun et al[10]提出的PCB方法是近年來非常優(yōu)秀的行人重識(shí)別方法。PCB方法使用ResNet主干網(wǎng)絡(luò)提取特征圖,然后將特征圖平均劃分為6個(gè)部分,分別訓(xùn)練分類器計(jì)算交叉熵?fù)p失訓(xùn)練模型,使用訓(xùn)練的模型提取測(cè)試集圖片的特征并計(jì)算之間的相似度,找出相似的車輛。車輛的局部特征也可以通過劃分關(guān)鍵部位提取,然后分別訓(xùn)練網(wǎng)絡(luò)提取特征計(jì)算相似度對(duì)車輛進(jìn)行重識(shí)別。在使用訓(xùn)練集訓(xùn)練PCB模型的過程中,采用one-hot標(biāo)簽進(jìn)行計(jì)算交叉熵?fù)p失時(shí),只考慮訓(xùn)練樣本中正確的標(biāo)簽位置(one-hot標(biāo)簽為1的位置)的損失,而忽略錯(cuò)誤標(biāo)簽位置(one-hot標(biāo)簽為0的位置)的損失。這樣一來,模型可以在訓(xùn)練集上擬合得很好,但由于其他錯(cuò)誤標(biāo)簽位置的損失沒有計(jì)算,導(dǎo)致預(yù)測(cè)時(shí)預(yù)測(cè)錯(cuò)誤概率增大。Szegedy et al[11]提出使用label smoothing(LS)的方法修改p值以降低模型對(duì)訓(xùn)練集標(biāo)簽的敏感度,從而避免過擬合問題,通過將PCB方法和LS方法相結(jié)合,既可以有效提取到最具有辨識(shí)度的特征,又能提高模型的泛化能力,使得模型在測(cè)試集上也有很好的預(yù)測(cè)效果。
1 車輛重識(shí)別模型
深度學(xué)習(xí)模型彌補(bǔ)了傳統(tǒng)手工特征表達(dá)能力不足的問題,可以通過設(shè)計(jì)深度學(xué)習(xí)模型提取到更具辨識(shí)度的特征,提高識(shí)別率。為了更好地提取辨識(shí)度更高的特征,可以先定位關(guān)鍵部位,對(duì)不同的部位分別訓(xùn)練分類器,這正是PCB-LS模型的思想。使用ResNet50作為主干網(wǎng)絡(luò),將輸出的tensor數(shù)據(jù)劃分為3個(gè)部分,分別訓(xùn)練分類器并使用標(biāo)簽平滑(Label Smoothing)損失函數(shù)作為目標(biāo)函數(shù)反向訓(xùn)練模型。學(xué)習(xí)率對(duì)模型的表現(xiàn)具有較大的影響,對(duì)于較大的批尺寸需要設(shè)定一個(gè)較大的初始值。為了避免在初始階段出現(xiàn)數(shù)值不穩(wěn)定的情況,使用熱啟動(dòng)啟發(fā)式學(xué)習(xí),使學(xué)習(xí)率從0線性上升到初始學(xué)習(xí)率;為了加快模型的收斂速度,使用余弦衰減函數(shù)計(jì)算每個(gè)回合的學(xué)習(xí)率。
1.1 PCB-LS模型
PCB方法最初是在行人重識(shí)別任務(wù)中提出的,主要思想是通過提前定位關(guān)鍵部位來提取更具辨識(shí)度的特征。PCB方法采用ResNet50作為主干網(wǎng)絡(luò),刪除ResNet50最后兩層,將輸出的特征圖劃分為6塊,對(duì)每一塊訓(xùn)練出一個(gè)分類器,將分類結(jié)果與標(biāo)簽計(jì)算交叉熵?fù)p失,用6塊的損失之和作為目標(biāo)損失函數(shù),通過最小化目標(biāo)損失函數(shù)來優(yōu)化模型。
ResNet是當(dāng)前深度學(xué)習(xí)非常流行的網(wǎng)絡(luò)結(jié)構(gòu),殘差網(wǎng)絡(luò)通過重復(fù)使用殘差塊來提取特征。而在殘差塊中,通過對(duì)卷積相關(guān)參數(shù)的設(shè)置,控制殘差塊輸入與輸出的特征圖尺寸一致,從而進(jìn)行相加處理,避免深層網(wǎng)絡(luò)的梯度消失和退化問題。ResNet50包含50層,主干網(wǎng)絡(luò)包含5個(gè)階段,第1個(gè)階段為對(duì)輸入的預(yù)處理,后面4個(gè)階段都由Bottleneck組成,結(jié)構(gòu)較為相似。本文采用ResNet50的主干網(wǎng)絡(luò)初步提取輸入圖片特征,將圖片尺寸調(diào)整為512×384后作為輸入X,輸入尺寸為B×3×512×384,其中B為batch-size,通過ResNet50主干網(wǎng)絡(luò)后的特征圖設(shè)為T,大小為B×2048×32×24,車輛最具辨識(shí)度的位置為車頂、車窗、前臉等3個(gè)部分,可以將特征圖T劃分為3個(gè)位置,使用自適應(yīng)池化方法將數(shù)據(jù)變?yōu)間,大小為B×2048×3×1,使用1×1卷積核,Dropout設(shè)為0.5,然后對(duì)3個(gè)部分分別訓(xùn)練分類器進(jìn)行訓(xùn)練。大致的訓(xùn)練流程如圖1所示。
[2] JENG S,CHU L.Vehicle re-identification with the inductive loop signature technology[J].Journal of the Eastern Asia Society for Transportation Studies,2013,12(10): 1896-1915.
[3] ZHANG Z,TAN T,HUANG K.Three-dimensional deformable-model-based localization and recognition of road vehicles[J].IEEE Transactions on Image Processing,2012,21(01):1-13.
[4] WOESLER R.Fast extraction of traffic parameters and re-identification of vehicles from video data[C].∥ The 2003 IEEE International Conference on Intelligent Transportation Systems.Piscataway: IEEE Press,2003.774-778.
[5] SHEN Y T,XIAO T,LI H S,et al.Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals[J].Proceedings of the IEEE International Conference on Computer Vision,2017,12(01): 1900-1909.
[6] BING H, JIA L,YIFAN Z,et al.Partregularized near-duplicate vehicle re-identification[J].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2019,12(01): 3997-4005.
[7] SHANGZHI T, XIAOBIN L,SHILIANG Z, et al. Spatial and channel attention network for vehicle re-identification[J].Pacific Rim Conference on Multimedia,2018,23(10): 350-361.
[8] HAO L, YOUZHI G,XINGYU L,et al.Bag of tricks and a strong baseline for deep personre-identification[J].Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition Workshops,2019,3(10): 432-443.
[9] SHUTING H,HAO L,WEIHUA C,et al.Multi-domain learning and identity mining for vehicle re-identification[J].IEEE Transactions on Vehicular Technology,2022(09):1-15.
[10]YIFAN S,LIANG Z,YI Y,et al.Beyond part model person retrieval with refined part pooling[J].Computer Vision-ECCV 2018,11208:510-518.
[11]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[J].Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2016:2818-2826.
[12]LIUHONGYE,TIAN YONGHONG,WANG YAOWEI,et al.Deep relative distance learning: Tell the difference between similar vehicles[J].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016,24(01): 2167-2175.
[13]SZEGEDY C,VANHOUCKE V,IOFFE,et al.Rethinking the inception architecture for computer vision[J].Proceedings of the IEEE conference on computer vision and pattern recognition,2016,12(01): 2818-2826.
[14]SMITH S L,KINDERMANS PGJ,YING C,e tal. Don't Decay the Learning Rate, Increase the Batch Size[A/OL].[2018-02-24].https:∥arxiv.org/abs/1711.00489.
[15]GOYAL P, DOLLAR P, GIRSHICK R B, et al.Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour[A/OL].[2018-01-30].https:∥arxiv.org/abs/1706.02677
[16]JIA X, SONG S, HE W, et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes[A/OL].[2018-07-30].https:∥arxiv.org/abs/1807.11205.
[17]GOYAL P,DOLLAR P,GIRSHICK R B,et al.Noordhuis,′ L.Wesolowski,A.Kyrola,A.Tulloch,Y.Jia,and K.He.Accurate,large minibatch SGD: training imagenet in 1 hour[J].CoRR,abs/1706.02677,2017.
[18]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[J].Proceedings of the IEEE conference on computer vision and pattern recognition,2016,24(01): 770-778.
[19] LOSHCHILOV I , HUTTER F. SGDR: Stochastic gradient descent with warm Restarts[A/OL].[2017-03-03].https:∥arxiv.org/abs/1608.03983v2.
[20]LIU X,ZHANG S,HUANG Q.Ram:a region aware deep model for vehicle re-identification[C].∥IEEE International Conference on Multimedia and Expo(ICME),2018:1-6.
[21]ZHOU K,YANG Y,Cavallaro A,et al.Omni-scale feature learning for person re-identification[C].∥Proceedings of the IEEE International Conference on Computer Vision,2019:3702-3712.
[22]JIN X, LAN C, ZENG W, et al. Uncertainty-aware multi-shot knowledge distillation for image-based object re-identification[A/OL].[2020-01-21].https:∥arxiv.org/abs/2001.05197.
Vehicle Re-identification Algorithm Based on Key Position Feature
ZHAO Bin, DONG Changyuan
(School of Sciences, Hubei Univ. of Tech., Wuhan 430068,China)
Abstract:In the image recognition task, the premise of getting more recognizable features is to accurately locate the key position. The roof, window and front face of the car are the three most critical parts of the vehicle. A PCB-LS method is proposed for vehicle re-identification. Based on the idea of extracting local features, the feature map is extracted by using the backbone network of Resnet50, and then the feature map is divided into three parts, and the classifier is trained for the three parts respectively. For the over-fitting phenomenon of the model in the training set, the label smoothing regularization method is used to reduce the trust of the model to the training set samples and improve the accuracy of the model in the test set. Using the Veri776 dataset for training and testing, the accuracy of the PCB-LSS method on the test set can reach Rank@1, 5, 10 are 93.62%, 96.72%, 97.74% respectively, and mAP is 76.17%. The PCB-LS method can not only obtain the features with high recognition, but also the excellent generalization ability.
Keywords:vehicle re-identification; PCB; feature extraction; label smoothing; generalization ability
[責(zé)任編校:張 眾]