Every time I think I know what's going on, suddenly there's another layer of complications.
2017年12月12日星期二
2017年11月11日星期六
2017年10月22日星期日
2017年10月7日星期六
2017年9月30日星期六
Statistical Errors in the Medical Literature
http://www.fharrell.com/2017/04/statistical-errors-in-medical-literature.html
2017年9月20日星期三
2017年9月6日星期三
When X'X is invertible
X'X is always positive semidefinite, because for any nonzero a, a'X'Xa = (Xa)'(Xa) = ||Xa||2 >= 0. Moreover, Xa = 0 (and hence ||Xa||2 = 0) if and only if the columns of X are linearly dependent, so if X has full column rank then X'X is positive definite.
Every positive definite matrix is invertible, because if Ax=0 for x =/= 0 then x'Ax = dot(x, 0) = 0 which means A is not positive definite.
Therefore, if X has full column rank then X'X is invertible
2017年9月4日星期一
A SAS macro to run restricted cubic spline Cox model
https://www.hsph.harvard.edu/donna-spiegelman/software/lgtphcurv9/
http://www.sciencedirect.com/science/article/pii/S0169260797000436
http://epidemiologymatters.org/epidemiology-we-like/methods/spline-regression/
http://www.sciencedirect.com/science/article/pii/S0169260797000436
http://epidemiologymatters.org/epidemiology-we-like/methods/spline-regression/
2017年9月3日星期日
Bayesian regression compare to other regressions
https://stats.stackexchange.com/questions/252577/bayes-regression-how-is-it-done-in-comparison-to-standard-regression
2017年8月31日星期四
Review some Monte Carlo theorems
1.
Suppose the random variable U has a uniform (0,1) distribution.
Let F be a continuous distribution function. Then the random variable X = F^(-1)(U)
has distribution function F.
Note: F(x)=(x-a)/(b-a), for uniform random variable at [a,b] interval. Here F(u)=u.
2.Monte Carlo Integration
generate x1, x2,...,xn from uniform(a,b), then compute Yi = (b - a)g(Xi). Then mean Y is a consistent estimate of the integral
Note: 1. definite integral is a number.
3. Accept-Reject Generation Algorithm
Suppose the random variable U has a uniform (0,1) distribution.
Let F be a continuous distribution function. Then the random variable X = F^(-1)(U)
has distribution function F.
Note: F(x)=(x-a)/(b-a), for uniform random variable at [a,b] interval. Here F(u)=u.
2.Monte Carlo Integration
generate x1, x2,...,xn from uniform(a,b), then compute Yi = (b - a)g(Xi). Then mean Y is a consistent estimate of the integral
Note: 1. definite integral is a number.
3. Accept-Reject Generation Algorithm
2017年8月30日星期三
Difference between indefinite and definite integrals.
Indefinite integrals are functions while definite integrals are numbers. This is quite useful when we calculate Bayesian estimator
2017年8月29日星期二
2017年8月26日星期六
Different likelihoods
Maximum Likelihood
Find β and θ that maximizes L(β, θ|data).
Partial Likelihood
If we can write the likelihood function as:
L(β, θ|data) = L1(β|data) L2(θ|data)
Then we simply maximize L1(β|data).
Profile Likelihood
If we can express θ as a function of β then we replace θ with the corresponding function.
Say, θ = g(β). Then, we maximize:
L(β, g(β)|data)
Marginal Likelihood
We integrate out θ from the likelihood equation by exploiting the fact that we can identify the probability distribution of θ conditional on β.
2017年8月22日星期二
2017年8月13日星期日
Transform or link?
https://ecommons.cornell.edu/bitstream/handle/1813/31620/BU-1049-MA.pdf?sequence=1
2017年8月12日星期六
2017年8月6日星期日
ranking and empirical distributions
In the absence of repeated values (ties), the cdf can be obtained computationally by sorting the observed data in ascending order, i.e., . Then , where represents the ascending rank of . Likewise, the p-value can be obtaining by sorting the data in descending order, and using a similar formula, , where represents the descending rank of .
https://brainder.org/2012/11/28/competition-ranking-and-empirical-distributions/
https://brainder.org/2012/11/28/competition-ranking-and-empirical-distributions/
2017年7月23日星期日
Spline regression
In regression modeling when we include a continuous predictor variable in our model, either as the main exposure of interest or as a confounder, we are making the assumption that the relationship between the predictor variable and the outcome is linear. In other words, a one unit increase in the predictor variable is associated with a fixed difference in the outcome. Thus, we make no distinction between a one unit increase in the predictor variable near the minimum value and a one unit increase in the predictor variable near the maximum value. This assumption of linearity may not always be true, and may lead to an incorrect conclusion about the relationship between the exposure and outcome, or in the case of a confounder that violates the linearity assumption, may lead to residual confounding. Spline regression is one method for testing non-linearity in the predictor variables and for modeling non-linear functions.
2017年7月13日星期四
2017年7月12日星期三
Adaptive procedures for non-parameter tests
x<-c(51.9,56.9,45.2,52.3,59.5,41.4,46.4,45.1,53.9,42.9,41.5,55.2,32.9,54.0,45.0)
y<-c(59.2,49.1,54.4,47.0,55.9,34.9,62.2,41.6,59.3,32.7,72.1,43.8,56.8,76.7,60.3)
drive4=function(x,y){
n1=length(x)
n2=length(y)
n=n1+n2
cb=(1:n)/(n+1)
const=(n1*n2)/(n*(n-1))#p550
p1=phi1(cb)
var1=const*sum(p1^2)
p2=phi2(cb)
var2=const*sum(p2^2)
p3=phi3(cb)
var3=const*sum(p3^2)
p4=phi3(cb)
var4=const*sum(p4^2)
vars=c(var1,var2,var3,var4)
allxy=c(x,y)
rall=rank(allxy)/(n+1)
ind=c(rep(0,n1),rep(1,n2))
s1=sum(ind*phi1(rall))
s2=sum(ind*phi2(rall))
s3=sum(ind*phi3(rall))
s4=sum(ind*phi4(rall))
tests=c(s1,s2,s3,s4)
ztests=tests/sqrt(vars)
list(vars=vars,tests=tests,ztests=ztests)
}
phi1=function(u){
phi1=2*u-1
phi1
}
phi2=function(u){
phi2=sign(2*u-1)
phi2
}
phi3=function(u){
n=length(u)
phi3=rep(0,n)
for(i in 1:n){
if(u[i]<=0.25){phi3[i]=4*u[i]-1}
if(u[i]>0.75){phi3[i]=4*u[i]-3}
}
phi3
}
phi4=function(u){
n=length(u)
phi4=rep(0.5,n)
for(i in 1:n){
if(u[i]<=0.5){phi4[i]=4*u[i]-3/2}
}
phi4
}
drive4(x,y)
v<-c(x,y)
sv<-sort(v)
q<-quantile(sv, c(0.05, 0.25,0.5,0.75,0.95))
U005<-c()
M05<-c()
L005<-c()
U05<-c()
L05<-c()
for(i in 1:30){
if (sv[i]<q[1]){L005[i]=sv[i]}
if (sv[i]>q[2]&sv[i]<q[4]){M05[i]=sv[i]}
if (sv[i]>q[4]){U005[i]=sv[i]}
if (sv[i]>q[3]){U05[i]=sv[i]}
if (sv[i]<q[3]){L05[i]=sv[i]}
}
L005
M05
U005
U05
L05
Um005<-mean(U005,na.rm=TRUE)
Um005
Mm05<-mean(M05,na.rm=TRUE)
Mm05
Um05<-mean(U05,na.rm=TRUE)
Um05
Lm05<-mean(L05,na.rm=TRUE)
Lm05
Lm005<-mean(L005,na.rm=TRUE)
Lm005
Q1<-(Um005-Mm05)/(Mm05-Lm005)
Q1
Q2=(Um005-Lm005)/(Um05-Lm05)
Q2
2017年6月30日星期五
订阅:
博文 (Atom)