楊世強(qiáng) 羅曉宇 喬丹 柳培蕾 李德信
摘 要:針對現(xiàn)有動作識別中對連續(xù)動作識別研究較少且單一算法對連續(xù)動作識別效果較差的問題,提出在單個動作建模的基礎(chǔ)上,采用滑動窗口法和動態(tài)規(guī)劃法結(jié)合,實(shí)現(xiàn)連續(xù)動作的分割與識別。首先,采用深度置信網(wǎng)絡(luò)和隱馬爾可夫結(jié)合的模型DBN-HMM對單個動作建模;其次,運(yùn)用所訓(xùn)練動作模型的對數(shù)似然值和滑動窗口法對連續(xù)動作進(jìn)行評分估計,實(shí)現(xiàn)初始分割點(diǎn)的檢測;然后,采用動態(tài)規(guī)劃對分割點(diǎn)位置進(jìn)行優(yōu)化并對單個動作進(jìn)行識別。在公開動作數(shù)據(jù)庫MSR Action3D上進(jìn)行連續(xù)動作分割與識別測試,結(jié)果表明基于滑動窗口的動態(tài)規(guī)劃能夠優(yōu)化分割點(diǎn)的選取,進(jìn)而提高識別精度,能夠用于連續(xù)動作識別。
關(guān)鍵詞:隱馬爾可夫模型;動作分割;動作識別;滑動窗口;動態(tài)規(guī)劃
中圖分類號: TP391.4
文獻(xiàn)標(biāo)志碼:A
Abstract: Concerning the fact that there are few researches on continuous action recognition in the field of action recognition and single algorithms have poor effect on continuous action recognition, a segmentation and recognition method of continuous actions was proposed based on single motion modeling by combining sliding window method and dynamic programming method. Firstly, the single action model was constructed based on the Deep Belief Network and Hidden Markov Model (DBN-HMM). Secondly, the logarithmic likelihood value of the trained action model and the sliding window method were used to estimate the score of the continous action, detecting the initial segmentation points. Thirdly, the dynamic programming method was used to optimize the location of the segmentation points and identify the single action. Finally, the testing experiments of continuous action segmentation and recognition were conducted with an open action database MSR Action3D. The experimental results show that the dynamic programming based on sliding window can optimize the selection of segmentation points to improve the recognition accuracy, which can be used to recognize continuous action.
Key words: Hidden Markov Model (HMM); action segmentation; action recognition; sliding window; dynamic programming
0 引言
人體動作識別是近年來諸多鄰域研究的熱點(diǎn)[1], 如視頻監(jiān)控[2]、人機(jī)交互[3]等領(lǐng)域。隨著人口老齡化,服務(wù)機(jī)器人將在未來的日常生活中發(fā)揮重要作用,觀察和反映人類行動將成為服務(wù)機(jī)器人的基本技能[4]。動作識別逐漸應(yīng)用到人們生活和工作的各個方面,具有深遠(yuǎn)的應(yīng)用價值。
動作行為一般是以連續(xù)動作的形式來體現(xiàn),包含多個單一動作,行為識別時根據(jù)分割與識別的前后關(guān)系,可分為直接分割和間接分割。直接分割是先根據(jù)簡單的參數(shù)大小變化確定分割邊界,然后識別分割好的片段,如白棟天等[5]根據(jù)關(guān)節(jié)速度、關(guān)節(jié)角度的變化對動作序列進(jìn)行初始分割,該方法較為簡單快速,但對于較復(fù)雜的連續(xù)動作分割誤差較大。間接分割是分割與識別同時進(jìn)行,連續(xù)動作的分割與識別在實(shí)際中相互耦合,動作分割結(jié)果會影響動作識別,且動作分割一般需要動作識別的支持。在連續(xù)動作的識別中使用較多的算法有動態(tài)時間規(guī)整(Dynamic Time Warping, DTW)[6]、連續(xù)動態(tài)規(guī)劃(Continuous Dynamic Programming, CDP)[7]和隱馬爾可夫模型(Hidden Markov Model, HMM)。
Gong等[8]用動態(tài)流形變化法(Dynamic Manifold Warping, DMW)計算兩個多變量時間序列之間的相似性,實(shí)現(xiàn)動作分割與識別。Zhu等[9]利用基于特征位差的在線分割方法將特征序列劃分為姿態(tài)特征段和運(yùn)動特征段,通過在線模型匹配計算每個特征片段可以被標(biāo)記為提取的關(guān)鍵姿態(tài)或原子運(yùn)動的似然概率。Lei等[10]提出了一種結(jié)合卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network, CNN)和HMM的分層框架(CNN-HMM),對連續(xù)動作同時進(jìn)行分割與識別,能提取有效魯棒的動作特征,對動作視頻序列取得了較好的識別結(jié)果,而且HMM具有較強(qiáng)的可擴(kuò)展性。Kulkarni等[11] 設(shè)計了一種視覺對準(zhǔn)技術(shù)動態(tài)幀規(guī)整(Dynamic Frame Warping, DFW),對每個動作視頻訓(xùn)練超級模板,能實(shí)現(xiàn)多個動作的分割與識別;但在測試中,測試動作序列幀與模板的距離計算復(fù)雜度較高,且與概率統(tǒng)計的方法相比,模型訓(xùn)練方面學(xué)習(xí)能力較低。Evangelidis等[12]采用滑動窗口來構(gòu)造框架式Fisher向量,由多類支持向量機(jī)(Support Vector Machine, SVM)進(jìn)行分類,由于滑動窗口法對動作序列長度的固定,導(dǎo)致對相同類別且長度有較大差異的動作識別較差。
和HMM結(jié)合的復(fù)合模型DBN-HMM對單個動作建模,該復(fù)合模型對時序數(shù)據(jù)具有較強(qiáng)的建模能力和模型學(xué)習(xí)能力,然后利用評分機(jī)制和滑動窗口法對初始分割點(diǎn)進(jìn)行檢測,最后用動態(tài)規(guī)劃法進(jìn)行分割點(diǎn)優(yōu)化與識別。利用滑動窗口可降低動態(tài)規(guī)劃計算復(fù)雜度,而動態(tài)規(guī)劃能彌補(bǔ)滑動窗口固定長度的缺陷,最終實(shí)現(xiàn)最優(yōu)分割點(diǎn)的檢測。
1 單個動作建模
連續(xù)動作識別中首先對連續(xù)動作中的單個動作分別建模,在此使用DBN與HMM相結(jié)合的模型DBN-HMM對動作建模。
1.1 特征提取
人體動作可以表示為三維空間中人體不同肢體的旋轉(zhuǎn)變化,結(jié)合由關(guān)節(jié)點(diǎn)組成的人體骨架模型,可由人體的20個關(guān)節(jié)點(diǎn)在空間中的三維坐標(biāo)表示人體姿態(tài),各關(guān)節(jié)點(diǎn)位置分別為:頭部、左/右肩關(guān)節(jié)、肩膀中心、左/右肘關(guān)節(jié)、脊柱中心、左/右手腕關(guān)節(jié)、左/右手、左/右髖關(guān)節(jié)、臀部中心、左/右膝蓋、左/右腳踝、左/右腳。在肢體角度模型中,一個肢體由人體20個關(guān)節(jié)點(diǎn)中兩個相鄰關(guān)節(jié)點(diǎn)在空間中的相對位置來表示。假設(shè)所有關(guān)節(jié)都是從脊柱關(guān)節(jié)點(diǎn)延伸出的,由相鄰兩個關(guān)節(jié)點(diǎn)組成的一個肢體中,靠近脊柱關(guān)節(jié)的關(guān)節(jié)點(diǎn)定義為父關(guān)節(jié)點(diǎn),另一個定義為子關(guān)節(jié)點(diǎn)。通過坐標(biāo)系轉(zhuǎn)換將世界坐標(biāo)系轉(zhuǎn)換為局部球坐標(biāo)系來表示每個肢體的相對位置信息,以每個肢體中的父關(guān)節(jié)點(diǎn)作為球坐標(biāo)系的原點(diǎn),子關(guān)節(jié)點(diǎn)與父關(guān)節(jié)點(diǎn)的連線長度為r,其在球坐標(biāo)系中與Z軸的夾角為φ,投影到XOY平面上與X軸的夾角為θ,一個肢體角度模型可以表示為(r,θ,φ),如圖1所示。由于距離r包含有人體尺寸的影響,因此去掉距離r,由(θ,φ)表示肢體角度模型。
5 結(jié)語
針對現(xiàn)有動作識別中對連續(xù)動作識別研究較少,且單一算法對連續(xù)動作識別效果較差的問題,本文給出了一種連續(xù)動作的分割與識別方法——采用滑動窗口法和動態(tài)規(guī)劃法結(jié)合用于連續(xù)動作的分割與識別。建立的DBN-HMM具有較強(qiáng)的建模能力,結(jié)合滑動窗口和動態(tài)規(guī)劃對連續(xù)動作分割點(diǎn)進(jìn)行檢測,使兩種方法互補(bǔ),既能降低計算復(fù)雜度又能彌補(bǔ)固定長度的限制。實(shí)驗(yàn)結(jié)果表明,本文方法在復(fù)雜連續(xù)動作的分割與識別中獲得了較好的識別結(jié)果。不過算法的識別率還有進(jìn)一步提高的空間,在后續(xù)研究中需考慮開展采集連續(xù)動作視頻的動作分割與識別。
參考文獻(xiàn)
[1] 胡瓊,秦磊,黃慶明.基于視覺的人體動作識別綜述[J].計算機(jī)學(xué)報,2013,36(12): 2512-2524. (HU Q, QIN L, HUANG Q M. A survey on visual human action recognition [J]. Chinese Journal of Computers, 2013, 36 (12):2512-2524.)
[2] AGGARWAL J K, RYOO M S. Human activity analysis:a review[J]. ACM Computing Surveys, 2011, 43(3): Article No. 16.
[3] KOPPULA H S, SAXENA A. Anticipating human activities using object affordances for reactive robotic response [J]. IEEE Transactions on Pattern analysis and Machine Intelligence, 2015, 38(1): 1-14.
[4] ZHANG C, TIAN Y. RGB-D camera-based daily living activity recognition [J]. Journal of Computer Vision and Image Processing, 2012, 2(4): 1-7.
[5] 白棟天,張磊,黃華.RGB-D視頻中連續(xù)動作識別[J].中國科技論文,2016(2):168-172. (BAI D T, ZHANG L,HUANG H. Recognition continuous human actions from RGB-D videos[J]. China Science Paper, 2016(2): 168-172.)
[6] DARRELL T, PENTLAND A. Space-time gestures [C]// Proceedings of the 1993 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 1993: 335-340.
[7] OKA R. Spotting method for classification of real world data[J]. Computer Journal, 1998, 41(8): 559-565.
[8] GONG D, MEDIONI G, ZHAO X. Structured time series analysis for human action segmentation and recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1414-1427.
[9] ZHU G, ZHANG L, SHEN P, et al. An online continuous human action recognition algorithm based on the Kinect sensor [J]. Sensors, 2016, 16(2): 161-179.
[10] LEI J, LI G, ZHANG J, et al. Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model[J]. IET Computer Vision, 2016, 10(6): 537-544.
[11] KULKARNI K, EVANGELIDIS G, CECH J, et al. Continuous action recognition based on sequence alignment[J]. International Journal of Computer Vision, 2015, 112(1): 90-114.
[12] EVANGELIDIS G D, SINGH G, HORAUD R. Continuous gesture recognition from articulated poses [C]// Proceedings of the 2014 European Conference on Computer Vision. Cham: Springer, 2014: 595-607.
[13] SONG Y, GU Y, WANG P, et al. A Kinect based gesture recognition algorithm using GMM and HMM [C]// Proceedings of the 2013 6th International Conference on Biomedical Engineering and Informatics. Piscataway, NJ: IEEE, 2013: 750-754.
[14] VITERBI A J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm [J]. IEEE Transactions on Information Theory, 1967, 13(2): 260-269.
[15] TAYLOR G W, HINTON G E, ROWEIS S. Modeling human motion using binary latent variables [C]// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007: 1345-1352.
[16] HINTON G E, SIMON O, TEH Y W, et al. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2014, 18(7): 1527-1554.
[17] LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points [C]// Proceedings of the 2010 IEEE Computer Vision and Pattern Recognition Workshops. Washington, DC: IEEE Computer Society, 2010: 9-14.