2017年12月12日星期二

2017年11月11日星期六

Detailed notes on constructing restricted cubic spline

2017年10月22日星期日

A simple review of Poisson regression

2017年10月7日星期六

2017年9月30日星期六

Statistical Errors in the Medical Literature

http://www.fharrell.com/2017/04/statistical-errors-in-medical-literature.html

2017年9月20日星期三

Taylor series and Delta method

2017年9月6日星期三

X'X is always positive semidefinite, because for any nonzero a, a'X'Xa = (Xa)'(Xa) = ||Xa||² >= 0. Moreover, Xa = 0 (and hence ||Xa||² = 0) if and only if the columns of X are linearly dependent, so if X has full column rank then X'X is positive definite.

Every positive definite matrix is invertible, because if Ax=0 for x =/= 0 then x'Ax = dot(x, 0) = 0 which means A is not positive definite.

Therefore, if X has full column rank then X'X is invertible

2017年9月4日星期一

A SAS macro to run restricted cubic spline Cox model

https://www.hsph.harvard.edu/donna-spiegelman/software/lgtphcurv9/

http://www.sciencedirect.com/science/article/pii/S0169260797000436

http://epidemiologymatters.org/epidemiology-we-like/methods/spline-regression/

2017年9月3日星期日

Bayesian regression compare to other regressions

https://stats.stackexchange.com/questions/252577/bayes-regression-how-is-it-done-in-comparison-to-standard-regression

2017年8月31日星期四

Review some Monte Carlo theorems

1.
Suppose the random variable U has a uniform (0,1) distribution.
Let F be a continuous distribution function. Then the random variable X = F^(-1)(U)
has distribution function F.

Note: F(x)=(x-a)/(b-a), for uniform random variable at [a,b] interval. Here F(u)=u.

2.Monte Carlo Integration

generate x1, x2,...,xn from uniform(a,b), then compute Yi = (b - a)g(Xi). Then mean Y is a consistent estimate of the integral

Note: 1. definite integral is a number.

3. Accept-Reject Generation Algorithm

2017年8月30日星期三

Difference between indefinite and definite integrals.

Indefinite integrals are functions while definite integrals are numbers. This is quite useful when we calculate Bayesian estimator

EXPECTATION AND THE INDICATOR FUNCTION

Any distribution that has an inverse can be simulated from Uni(0,1)

Proof:

2017年8月29日星期二

Highest density region

https://stats.stackexchange.com/a/148452/61705

2017年8月26日星期六

Different likelihoods

Maximum Likelihood

Find β and θ that maximizes L(β, θ|data).

Partial Likelihood

If we can write the likelihood function as:

L(β, θ|data) = L1(β|data) L2(θ|data)

Then we simply maximize L1(β|data).

Profile Likelihood

If we can express θ as a function of β then we replace θ with the corresponding function.

Say, θ = g(β). Then, we maximize:

L(β, g(β)|data)

Marginal Likelihood

We integrate out θ from the likelihood equation by exploiting the fact that we can identify the probability distribution of θ conditional on β.

2017年8月22日星期二

Calculate 10 year probability use Cox model

Also pay attention to time unit here.

2017年8月13日星期日

Transform or link?

https://ecommons.cornell.edu/bitstream/handle/1813/31620/BU-1049-MA.pdf?sequence=1

2017年8月12日星期六

Simple linear regression

2017年8月6日星期日

ranking and empirical distributions

In the absence of repeated values (ties), the cdf can be obtained computationally by sorting the observed data in ascending order, i.e., $X_{s} = \{x_{(1)}, x_{(2)}, \ldots , x_{(N)}\}$ . Then $F(x)=(n_x)/N$ , where $(n_x)$ represents the ascending rank of $x$ . Likewise, the p-value can be obtaining by sorting the data in descending order, and using a similar formula, $P(X \geqslant x) = (\tilde{n}_x)/N$ , where $(\tilde{n}_x)$ represents the descending rank of $x$ .

https://brainder.org/2012/11/28/competition-ranking-and-empirical-distributions/

2017年7月23日星期日

Spline regression

In regression modeling when we include a continuous predictor variable in our model, either as the main exposure of interest or as a confounder, we are making the assumption that the relationship between the predictor variable and the outcome is linear. In other words, a one unit increase in the predictor variable is associated with a fixed difference in the outcome. Thus, we make no distinction between a one unit increase in the predictor variable near the minimum value and a one unit increase in the predictor variable near the maximum value. This assumption of linearity may not always be true, and may lead to an incorrect conclusion about the relationship between the exposure and outcome, or in the case of a confounder that violates the linearity assumption, may lead to residual confounding. Spline regression is one method for testing non-linearity in the predictor variables and for modeling non-linear functions.

2017年7月13日星期四

THE DARTH VADER RULE

https://www.sav.sk/journals/uploads/1030150905-M-O-W.pdf

2017年7月12日星期三

Adaptive procedures for non-parameter tests

x<-c(51.9,56.9,45.2,52.3,59.5,41.4,46.4,45.1,53.9,42.9,41.5,55.2,32.9,54.0,45.0)
y<-c(59.2,49.1,54.4,47.0,55.9,34.9,62.2,41.6,59.3,32.7,72.1,43.8,56.8,76.7,60.3)

drive4=function(x,y){
n1=length(x)
n2=length(y)
n=n1+n2
cb=(1:n)/(n+1)

const=(n1*n2)/(n*(n-1))#p550

p1=phi1(cb)
var1=const*sum(p1^2)

p2=phi2(cb)
var2=const*sum(p2^2)

p3=phi3(cb)
var3=const*sum(p3^2)

p4=phi3(cb)
var4=const*sum(p4^2)

vars=c(var1,var2,var3,var4)
allxy=c(x,y)

rall=rank(allxy)/(n+1)
ind=c(rep(0,n1),rep(1,n2))

s1=sum(ind*phi1(rall))
s2=sum(ind*phi2(rall))
s3=sum(ind*phi3(rall))
s4=sum(ind*phi4(rall))

tests=c(s1,s2,s3,s4)
ztests=tests/sqrt(vars)

list(vars=vars,tests=tests,ztests=ztests)
}

phi1=function(u){
phi1=2*u-1
phi1
}

phi2=function(u){
phi2=sign(2*u-1)
phi2
}

phi3=function(u){
n=length(u)
phi3=rep(0,n)
for(i in 1:n){
if(u[i]<=0.25){phi3[i]=4*u[i]-1}
if(u[i]>0.75){phi3[i]=4*u[i]-3}
}
phi3
}

phi4=function(u){
n=length(u)
phi4=rep(0.5,n)
for(i in 1:n){
if(u[i]<=0.5){phi4[i]=4*u[i]-3/2}
}
phi4
}
drive4(x,y)

v<-c(x,y)
sv<-sort(v)
q<-quantile(sv, c(0.05, 0.25,0.5,0.75,0.95))

U005<-c()
M05<-c()
L005<-c()
U05<-c()
L05<-c()

for(i in 1:30){
if (sv[i]<q[1]){L005[i]=sv[i]}
if (sv[i]>q[2]&sv[i]<q[4]){M05[i]=sv[i]}
if (sv[i]>q[4]){U005[i]=sv[i]}
if (sv[i]>q[3]){U05[i]=sv[i]}
if (sv[i]<q[3]){L05[i]=sv[i]}
}

L005
M05
U005
U05
L05

Um005<-mean(U005,na.rm=TRUE)
Um005

Mm05<-mean(M05,na.rm=TRUE)
Mm05

Um05<-mean(U05,na.rm=TRUE)
Um05

Lm05<-mean(L05,na.rm=TRUE)
Lm05

Lm005<-mean(L005,na.rm=TRUE)
Lm005

Q1<-(Um005-Mm05)/(Mm05-Lm005)
Q1

Q2=(Um005-Lm005)/(Um05-Lm05)
Q2

2017年6月30日星期五

订阅：博文 (Atom)