2017年9月30日星期六

Statistical Errors in the Medical Literature

http://www.fharrell.com/2017/04/statistical-errors-in-medical-literature.html

2017年9月6日星期三

When X'X is invertible

X'X is always positive semidefinite, because for any nonzero a, a'X'Xa = (Xa)'(Xa) = ||Xa||2 >= 0. Moreover, Xa = 0 (and hence ||Xa||2 = 0) if and only if the columns of X are linearly dependent, so if X has full column rank then X'X is positive definite.
Every positive definite matrix is invertible, because if Ax=0 for x =/= 0 then x'Ax = dot(x, 0) = 0 which means A is not positive definite.
Therefore, if X has full column rank then X'X is invertible

2017年9月4日星期一

A SAS macro to run restricted cubic spline Cox model

https://www.hsph.harvard.edu/donna-spiegelman/software/lgtphcurv9/

http://www.sciencedirect.com/science/article/pii/S0169260797000436

http://epidemiologymatters.org/epidemiology-we-like/methods/spline-regression/


2017年9月3日星期日

Bayesian regression compare to other regressions

https://stats.stackexchange.com/questions/252577/bayes-regression-how-is-it-done-in-comparison-to-standard-regression

2017年8月31日星期四

Review some Monte Carlo theorems

1.
Suppose the random variable U has a uniform (0,1) distribution.
Let F be a continuous distribution function. Then the random variable X = F^(-1)(U)
has distribution function F.

Note: F(x)=(x-a)/(b-a), for uniform random variable at [a,b] interval. Here F(u)=u.

2.Monte Carlo Integration

generate x1, x2,...,xn from uniform(a,b), then compute Yi = (b - a)g(Xi). Then mean Y is a consistent estimate of the integral

Note: 1. definite integral is a number.

3. Accept-Reject Generation Algorithm



2017年8月29日星期二

2017年8月26日星期六

Different likelihoods

Maximum Likelihood
Find β and θ that maximizes L(β, θ|data).
Partial Likelihood
If we can write the likelihood function as:
L(β, θ|data) = L1(β|data) L2(θ|data)
Then we simply maximize L1(β|data).
Profile Likelihood
If we can express θ as a function of β then we replace θ with the corresponding function.
Say, θ = g(β). Then, we maximize:
L(β, g(β)|data)
Marginal Likelihood
We integrate out θ from the likelihood equation by exploiting the fact that we can identify the probability distribution of θ conditional on β.

2017年8月13日星期日

Transform or link?

https://ecommons.cornell.edu/bitstream/handle/1813/31620/BU-1049-MA.pdf?sequence=1

2017年8月6日星期日

ranking and empirical distributions

In the absence of repeated values (ties), the cdf can be obtained computationally by sorting the observed data in ascending order, i.e., X_{s} = \{x_{(1)}, x_{(2)}, \ldots , x_{(N)}\}. Then F(x)=(n_x)/N, where (n_x) represents the ascending rank of x. Likewise, the p-value can be obtaining by sorting the data in descending order, and using a similar formula, P(X \geqslant x) = (\tilde{n}_x)/N, where (\tilde{n}_x) represents the descending rank of x.

https://brainder.org/2012/11/28/competition-ranking-and-empirical-distributions/

2017年7月23日星期日

Spline regression

In regression modeling when we include a continuous predictor variable in our model, either as the main exposure of interest or as a confounder, we are making the assumption that the relationship between the predictor variable and the outcome is linear. In other words, a one unit increase in the predictor variable is associated with a fixed difference in the outcome. Thus, we make no distinction between a one unit increase in the predictor variable near the minimum value and a one unit increase in the predictor variable near the maximum value. This assumption of linearity may not always be true, and may lead to an incorrect conclusion about the relationship between the exposure and outcome, or in the case of a confounder that violates the linearity assumption, may lead to residual confounding. Spline regression is one method for testing non-linearity in the predictor variables and for modeling non-linear functions.

2017年7月13日星期四

THE DARTH VADER RULE

https://www.sav.sk/journals/uploads/1030150905-M-O-W.pdf

2017年7月12日星期三

Adaptive procedures for non-parameter tests


x<-c(51.9,56.9,45.2,52.3,59.5,41.4,46.4,45.1,53.9,42.9,41.5,55.2,32.9,54.0,45.0)
y<-c(59.2,49.1,54.4,47.0,55.9,34.9,62.2,41.6,59.3,32.7,72.1,43.8,56.8,76.7,60.3)

drive4=function(x,y){
   n1=length(x)
   n2=length(y)
   n=n1+n2
   cb=(1:n)/(n+1)

   const=(n1*n2)/(n*(n-1))#p550

   p1=phi1(cb)
   var1=const*sum(p1^2)

   p2=phi2(cb)
   var2=const*sum(p2^2)

   p3=phi3(cb)
   var3=const*sum(p3^2)

   p4=phi3(cb)
   var4=const*sum(p4^2)

   vars=c(var1,var2,var3,var4)
   allxy=c(x,y)

   rall=rank(allxy)/(n+1)
   ind=c(rep(0,n1),rep(1,n2))

   s1=sum(ind*phi1(rall))
   s2=sum(ind*phi2(rall))
   s3=sum(ind*phi3(rall))
   s4=sum(ind*phi4(rall))

   tests=c(s1,s2,s3,s4)
   ztests=tests/sqrt(vars)

list(vars=vars,tests=tests,ztests=ztests)
}

phi1=function(u){
phi1=2*u-1
phi1
}

phi2=function(u){
phi2=sign(2*u-1)
phi2
}

phi3=function(u){
n=length(u)
phi3=rep(0,n)
for(i in 1:n){
   if(u[i]<=0.25){phi3[i]=4*u[i]-1}
   if(u[i]>0.75){phi3[i]=4*u[i]-3}
}
phi3
}

phi4=function(u){
n=length(u)
phi4=rep(0.5,n)
for(i in 1:n){
if(u[i]<=0.5){phi4[i]=4*u[i]-3/2}
}
phi4
}
drive4(x,y)

v<-c(x,y)
sv<-sort(v)
q<-quantile(sv, c(0.05, 0.25,0.5,0.75,0.95))

U005<-c()
M05<-c()
L005<-c()
U05<-c()
L05<-c()

for(i in 1:30){
if (sv[i]<q[1]){L005[i]=sv[i]}
if (sv[i]>q[2]&sv[i]<q[4]){M05[i]=sv[i]}
if (sv[i]>q[4]){U005[i]=sv[i]}
if (sv[i]>q[3]){U05[i]=sv[i]}
if (sv[i]<q[3]){L05[i]=sv[i]}
}

L005
M05
U005
U05
L05


Um005<-mean(U005,na.rm=TRUE)
Um005

Mm05<-mean(M05,na.rm=TRUE)
Mm05

Um05<-mean(U05,na.rm=TRUE)
Um05

Lm05<-mean(L05,na.rm=TRUE)
Lm05

Lm005<-mean(L005,na.rm=TRUE)
Lm005

Q1<-(Um005-Mm05)/(Mm05-Lm005)
Q1

Q2=(Um005-Lm005)/(Um05-Lm05)
Q2