2015年6月17日星期三

EM algorithm to impute missing value and underlying parameter

 Genetic linkage data are from Rao, C. R. (1973), Linear Statistical Inference and Its Applications, 2nd Ed., New York: John Wiley & Sons. p368


x<-c(125,18,20,34)
n<-sum(x)
theta.0<-(x[1]-x[2]-x[3]+x[4])/n
theta.0
current_theta<-c()
new_theta<-c()
z11<-c()
z12<-c()
i<-1
current_theta[i]<-theta.0

new_theta[i]<-(x[1]*current_theta[i]+2*x[4]+x[4]*current_theta[i])/(n*current_theta[i]+2*(x[2]+x[3]+x[4]))
new_theta[i]
difference<-abs(new_theta[i]-current_theta[i])
difference

while(difference>0.00000001){
current_theta[i]<-new_theta[i]
new_theta[i+1]<-(x[1]*current_theta[i]+2*x[4]+x[4]*current_theta[i])/(n*current_theta[i]+2*(x[2]+x[3]+x[4]))
difference<-abs(new_theta[i+1]-current_theta[i])
z11[i]<-(0.5/(0.5+0.25*current_theta[i]))*x[1]
z12[i]<-((0.25*current_theta[i])/(0.5+0.25*current_theta[i]))*x[1]
i<-i+1
}
cbind(theta=current_theta,Z11=z11,Z12=z12)

Results: Z11 and Z12 are imputed missing values and thetas are mle estimate of parameter, 7 iterations will get 10^-8 accuracy.


        theta             Z11               Z12
[1,] 0.6251317     95.23332     29.76668
[2,] 0.6265968     95.18019     29.81981
[3,] 0.6267917     95.17314     29.82686
[4,] 0.6268175     95.17220     29.82780
[5,] 0.6268210     95.17207     29.82793
[6,] 0.6268214     95.17206     29.82794
[7,] 0.6268215     95.17206     29.82794