王繼霞,劉次華
(1.河南師范大學(xué) 數(shù)學(xué)與信息科學(xué)學(xué)院 河南 新鄉(xiāng) 453007; 2.華中科技大學(xué) 數(shù)學(xué)與統(tǒng)計(jì)學(xué)院 湖北 武漢 430074)
缺失數(shù)據(jù)下多元正態(tài)模型MonteCarloEM算法
王繼霞1,劉次華2
(1.河南師范大學(xué) 數(shù)學(xué)與信息科學(xué)學(xué)院 河南 新鄉(xiāng) 453007; 2.華中科技大學(xué) 數(shù)學(xué)與統(tǒng)計(jì)學(xué)院 湖北 武漢 430074)
研究含有缺失數(shù)據(jù)的多元正態(tài)模型參數(shù)的極大似然估計(jì)問題,利用Monte Carlo EM算法求得多元正態(tài)模型參數(shù)的迭代解,并證明了此迭代解收斂到最優(yōu)解,且其收斂速度是二階的.
多元正態(tài)模型; 缺失數(shù)據(jù); EM算法; Monte Carlo EM算法; Newton-Raphson算法
EM算法[1-2]是常用的求后驗(yàn)眾數(shù)的估計(jì)的一種數(shù)據(jù)增廣算法,但由于求出其E步中積分的顯式表達(dá)式有時(shí)很困難,甚至不可能,為了解決這個(gè)問題,將EM算法中的E步的積分用Monte Carlo模擬來有效實(shí)現(xiàn),使其應(yīng)用性大大增強(qiáng).但是Dempster、Laird和Rubin[3-4]指出,EM算法的收斂速率是線性的,被缺失信息的倒數(shù)所控制,這樣,當(dāng)缺失數(shù)據(jù)的比例很高時(shí),收斂速度非常緩慢.鑒于此,作者研究含缺失數(shù)據(jù)下多元正態(tài)模型參數(shù)的極大似然估計(jì)問題,將Monte Carlo EM算法與Newton-Raphson算法結(jié)合,給出均值向量的迭代解,并證明了該算法在后驗(yàn)眾數(shù)附近具有二階收斂速度.
N-R步 令
(1)
在上述算法中,由于μ的增廣后驗(yàn)分布與缺失數(shù)據(jù)Xmis的條件預(yù)測分布易知且形式較簡單,故N-R步中的數(shù)學(xué)期望與方差容易求得.
(2)
其中Gij(x)是Hesse矩陣G(x)的第i行第j列的元素,則對一切i,上述算法有定義,且當(dāng)n充分大時(shí),所得序列{μ(i)}收斂到最優(yōu)解μ*,并且序列具有二階收斂速度.
(3)
(4)
(5)
令h=-hi得
(6)
由O(·)的定義可知,存在常數(shù)C,使得
‖hi+1‖≤C‖hi‖2,
(7)
‖hi+1‖≤γ‖hi‖,
[1] Little R J A, Rubin D R. Statistical Analysis with Missing Data[M]. New York: Wiley,1987.
[2] Shi N Z,Zhong S R,Guo J H.The restricted EM algorithm under inequality restrictions on the parameters[J].Journal of Multivariate Analysis, 2005,92(4):53-76.
[3] Booth J G, Hobert J P.Maximizing generalized linear mixed model likelihoods with automated Monte Carlo EM algorithm[J].Journal of the Royal Statistical Society: Ser B, 1999,61(2):265-285.
[4] Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm (with discussion)[J].Journal of the Royal Statistical Society: Ser B, 1977, 39(3):1-38.
[5] 羅季.Monte Carlo EM加速算法[J].應(yīng)用概率統(tǒng)計(jì),2008,24(3):312-318.
[6] Geweke J.Bayesian inference in econometric models using Monte Carlo integration[J].Econometrica,1989,57(2):1317-1339.
[7] 茆詩松,王靜龍,濮曉龍.高等數(shù)理統(tǒng)計(jì)[M].北京:高等教育出版社,1998.
MonteCarloEMAlgorithmforMultivariateNormalDistributionunderMissingData
WANG Ji-xia1,LIU Ci-hua2
(1.CollegeofMathematicsandInformationScience,HenanNormalUniversity,Xinxiang453007,China; 2.DepartmentofMath,HuazhongUniversityofScienceandTechnology,Wuhan430074,China)
Maximum likelihood estimations of the parameters of multivariate normal distribution models under missing data were studied. The iterative solution of the parameters of multivariate normal distribution models were obtained through the Monte Carlo EM algorithm and this solution converge to the optimum solution were proved and the convergence rate of this solution was secondary.
multivariate normal distribution; missing data; EM algorithm; Monte Carlo EM algorithm; Newton-Raphson algorithm
O 212.1
A
1671-6841(2011)03-0059-03
2010-04-24
國家自然科學(xué)基金資助項(xiàng)目,編號10671057;河南省教育廳軟科學(xué)研究計(jì)劃,編號2010B110013.
王繼霞(1978-),女,講師,碩士,主要從事保序回歸、約束統(tǒng)計(jì)推斷等方面的研究,E-mail: jixiawang@163.com.