2017年8月13日星期日

Transform or link?

https://ecommons.cornell.edu/bitstream/handle/1813/31620/BU-1049-MA.pdf?sequence=1

2017年8月6日星期日

ranking and empirical distributions

In the absence of repeated values (ties), the cdf can be obtained computationally by sorting the observed data in ascending order, i.e., X_{s} = \{x_{(1)}, x_{(2)}, \ldots , x_{(N)}\}. Then F(x)=(n_x)/N, where (n_x) represents the ascending rank of x. Likewise, the p-value can be obtaining by sorting the data in descending order, and using a similar formula, P(X \geqslant x) = (\tilde{n}_x)/N, where (\tilde{n}_x) represents the descending rank of x.

https://brainder.org/2012/11/28/competition-ranking-and-empirical-distributions/

2017年7月23日星期日

Spline regression

In regression modeling when we include a continuous predictor variable in our model, either as the main exposure of interest or as a confounder, we are making the assumption that the relationship between the predictor variable and the outcome is linear. In other words, a one unit increase in the predictor variable is associated with a fixed difference in the outcome. Thus, we make no distinction between a one unit increase in the predictor variable near the minimum value and a one unit increase in the predictor variable near the maximum value. This assumption of linearity may not always be true, and may lead to an incorrect conclusion about the relationship between the exposure and outcome, or in the case of a confounder that violates the linearity assumption, may lead to residual confounding. Spline regression is one method for testing non-linearity in the predictor variables and for modeling non-linear functions.

2017年7月13日星期四

THE DARTH VADER RULE

https://www.sav.sk/journals/uploads/1030150905-M-O-W.pdf

2017年7月12日星期三

Adaptive procedures for non-parameter tests


x<-c(51.9,56.9,45.2,52.3,59.5,41.4,46.4,45.1,53.9,42.9,41.5,55.2,32.9,54.0,45.0)
y<-c(59.2,49.1,54.4,47.0,55.9,34.9,62.2,41.6,59.3,32.7,72.1,43.8,56.8,76.7,60.3)

drive4=function(x,y){
   n1=length(x)
   n2=length(y)
   n=n1+n2
   cb=(1:n)/(n+1)

   const=(n1*n2)/(n*(n-1))#p550

   p1=phi1(cb)
   var1=const*sum(p1^2)

   p2=phi2(cb)
   var2=const*sum(p2^2)

   p3=phi3(cb)
   var3=const*sum(p3^2)

   p4=phi3(cb)
   var4=const*sum(p4^2)

   vars=c(var1,var2,var3,var4)
   allxy=c(x,y)

   rall=rank(allxy)/(n+1)
   ind=c(rep(0,n1),rep(1,n2))

   s1=sum(ind*phi1(rall))
   s2=sum(ind*phi2(rall))
   s3=sum(ind*phi3(rall))
   s4=sum(ind*phi4(rall))

   tests=c(s1,s2,s3,s4)
   ztests=tests/sqrt(vars)

list(vars=vars,tests=tests,ztests=ztests)
}

phi1=function(u){
phi1=2*u-1
phi1
}

phi2=function(u){
phi2=sign(2*u-1)
phi2
}

phi3=function(u){
n=length(u)
phi3=rep(0,n)
for(i in 1:n){
   if(u[i]<=0.25){phi3[i]=4*u[i]-1}
   if(u[i]>0.75){phi3[i]=4*u[i]-3}
}
phi3
}

phi4=function(u){
n=length(u)
phi4=rep(0.5,n)
for(i in 1:n){
if(u[i]<=0.5){phi4[i]=4*u[i]-3/2}
}
phi4
}
drive4(x,y)

v<-c(x,y)
sv<-sort(v)
q<-quantile(sv, c(0.05, 0.25,0.5,0.75,0.95))

U005<-c()
M05<-c()
L005<-c()
U05<-c()
L05<-c()

for(i in 1:30){
if (sv[i]<q[1]){L005[i]=sv[i]}
if (sv[i]>q[2]&sv[i]<q[4]){M05[i]=sv[i]}
if (sv[i]>q[4]){U005[i]=sv[i]}
if (sv[i]>q[3]){U05[i]=sv[i]}
if (sv[i]<q[3]){L05[i]=sv[i]}
}

L005
M05
U005
U05
L05


Um005<-mean(U005,na.rm=TRUE)
Um005

Mm05<-mean(M05,na.rm=TRUE)
Mm05

Um05<-mean(U05,na.rm=TRUE)
Um05

Lm05<-mean(L05,na.rm=TRUE)
Lm05

Lm005<-mean(L005,na.rm=TRUE)
Lm005

Q1<-(Um005-Mm05)/(Mm05-Lm005)
Q1

Q2=(Um005-Lm005)/(Um05-Lm05)
Q2