2018年12月29日星期六

some public genetic data

ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/

https://www.broadinstitute.org/medical-and-population-genetics/hapmap-3

ftp://ftp.illumina.com/

user name:guest
password:illumina

 http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20181203_biallelic_SNV/

2018年12月25日星期二

Manhattan Plot for GWAS



if(!require("ggplot2")) {
  print("Please install the ggplot plackage")
  ## install.packages('ggplot2', dep=T)
}
 
require(ggplot2)

d <- read.table("simulated_assoc.assoc",header=T)
d <- na.omit(d)

maxy <- 15
sig <- -log10(5e-8)

d$logp <- -log10(d$P)
d$x <- NA
ticks <- NULL
lastbase <- 0

order <- order(d$CHR,d$BP)
d <- d[order,]
d[d$logp > maxy,'logp'] <- maxy

numchrs <- length(unique(d$CHR))

for( i in unique(d$CHR) ) {
  if( i==1 ) {
    d[d$CHR==i,]$x <- d[d$CHR==i,]$BP
  } else {
    lastbase <- lastbase+tail(subset(d,CHR==i-1)$BP,1)
    d[d$CHR==i,]$x <- d[d$CHR==i,]$BP+lastbase
  }
  ticks <- c(ticks,d[d$CHR==i,]$x[floor(length(d[d$CHR==i,]$x)/2)+1])
}
ticklim <- c(min(d$x),max(d$x))

if(numchrs > 1) {
  cols <- rep(c("gray","black"),max(d$CHR))
} else {
  cols <- "gray"
}

if (maxy=="max") maxy=ceiling(max(d$logp)) else maxy=maxy
if (maxy<8) maxy=8

pdf("manhattan_plot.pdf")

plot <- qplot(x,logp,data=d,
              ylab=expression(-log[10](italic(p))) , colour=factor(CHR))
plot <- plot + scale_x_continuous(name="Chromosome",
                                  breaks=ticks,
                                  labels=(unique(d$CHR)))
plot <- plot + scale_y_continuous(limits=c(0,maxy),
                                  breaks=1:maxy)
plot <- plot + scale_colour_manual(values=cols)
plot <- plot + theme(legend.position="none")
plot <- plot + labs(title="Manhattan Plot")
plot <- plot + theme(panel.background=element_blank(),
                    panel.grid.minor=element_blank())


plot <- plot + geom_abline(intercept=sig,slope=0, col="black")


2018年12月23日星期日

some commands for Ubuntu


  1. cd /home/user/Download
  2. sudo cp filename /usr/bin
If you want to copy a folder from your download then just add -r before cp command
sudo cp -r foldername /usr/bin
you can use mv command to move that file or folder
sudo mv filename /usr/bin

 3. awk: line 2: missing } near end of file

use  awk '{print  $1,$2,$4,$5}'

2018年10月11日星期四

steps for simulation sample size

  1. Write the statistical model mathematically.
  2. Generate the sample according to the model with sample size N.
  3. Fit the model, perform the test, and record the rejection or acceptance of hull hypothesis.
  4. Repeat step 2 and 3 n (generally I used 5000) times.
  5. Calculate the power by (# of rejections)/n.
6 If power is too higher, decrease sample size N, repeat 2 - 5. If power is too lower, increase sample size N, repeat 2 - 5.

2018年9月25日星期二

number of basis fuction for spline regression

The number of basis function is n+d+1, where n is the number of (internal) knots, where is the degree of the basis function.

2018年6月13日星期三

R datasets

https://vincentarelbundock.github.io/Rdatasets/datasets.html