目錄
- 前言
- 方法簡(jiǎn)介
- 從vcf文件里提取有用信息
- tidy vcfR
- vcf可視化1
- vcf可視化2
- 測(cè)序深度覆蓋度
- 窗口縮放
- 如何單獨(dú)分離染色體
- 利用vcf信息判斷物種染色體倍數(shù)
- CNV分析
vcfR可以對(duì)單獨(dú)的每一段染色體進(jìn)行分析。比起全基因總體的變化,很多時(shí)候染色體單位的變化往往會(huì)更加引起我們的興趣。
數(shù)據(jù)準(zhǔn)備和導(dǎo)入
和之前文章里介紹的一樣,先導(dǎo)入基礎(chǔ)vcf數(shù)據(jù)
library(vcfR)
data(vcfR_example)
data("vcfR_test")
然后為了方便理解,自己虛擬三條染色體的參照序列和注釋信息。
library(ape)
dna.l <- vector('list', length=3)
dna.l[[1]] <- as.character(dna[1,1:length(dna)])
set.seed(123)
dna.l[[2]] <- sample( c('a','c','g','t'), size = getPOS(vcfR_test)[length(getPOS(vcfR_test))], replace = TRUE )
dna.l[[3]] <- sample( c('a','c','g','t'), size = getPOS(vcfR_test)[length(getPOS(vcfR_test))], replace = TRUE )
dna.l <- as.DNAbin(dna.l)
names(dna.l)[1] <- "Supercontig_1.50"
names(dna.l)[2] <- "Supercontig_1.5"
names(dna.l)[3] <- "Supercontig_1.10"
dna.l
## 3 DNA sequences in binary format stored in a list.
##
## Mean sequence length: 856378.3
## Shortest sequence: 100001
## Longest sequence: 1234567
##
## Labels:
## Supercontig_1.50
## Supercontig_1.5
## Supercontig_1.10
##
## Base composition:
## a c g t
## 0.25 0.25 0.25 0.25
分別虛擬三條染色體的注釋文件,然后合并成一個(gè)整體。
gff2 <- gff[1:10,]
gff2[,1] <- "Supercontig_1.5"
gff3 <- gff[11:20,]
gff3[,1] <- "Supercontig_1.10"
gff <- rbind(gff, gff2, gff3)
寫出三條染色體的vcf
write.vcf(vcf, file = "Supercontig_1.50.vcf.gz")
vcfR_test@fix[,'CHROM'] <- "Supercontig_1.5"
write.vcf(vcfR_test, file = "Supercontig_1.5.vcf.gz")
vcfR_test@fix[,'CHROM'] <- "Supercontig_1.10"
write.vcf(vcfR_test, file = "Supercontig_1.10.vcf.gz")
這樣,三條染色體樣本數(shù)據(jù)就準(zhǔn)備好了。
縮放染色體
這樣就可以自定義窗口尺寸,對(duì)染色體進(jìn)行分析了。
myFiles <- list.files(".", pattern = "Supercontig.*vcf.gz$")
myChroms <- unlist( lapply( strsplit(myFiles, "\\.vcf"), function(x){ x[1] } ) )
myWin.size <- 1e4
i <- 1
myChrom <- myChroms[i]
dna1 <- dna.l[myChrom]
gff1 <- gff[ gff[,1] == myChrom,]
vcf <- read.vcfR(myFiles[i], verbose = FALSE)
chrom <- create.chromR(name=myChrom, vcf=vcf, seq=dna1, ann=gff1, verbose = FALSE)
chrom <- proc.chromR(chrom, win.size = myWin.size, verbose=FALSE)
write.var.info(chrom, file = "pinf_ref_vars.csv", APPEND = FALSE)
write.win.info(chrom, file = "pinf_ref_wins.csv", APPEND = FALSE)
for(i in 2:length(myFiles)){
myChrom <- myChroms[i]
cat(myChrom)
cat("\n")
dna1 <- dna.l[myChrom]
gff1 <- gff[ gff[,1] == myChrom,]
vcf <- read.vcfR(myFiles[i], verbose = FALSE)
chrom <- create.chromR(name=myChrom, vcf=vcf, seq=dna1, ann=gff1, verbose = FALSE)
chrom <- proc.chromR(chrom, win.size = myWin.size, verbose=FALSE)
write.var.info(chrom, file = "pinf_ref_vars.csv", APPEND = TRUE)
write.win.info(chrom, file = "pinf_ref_wins.csv", APPEND = TRUE)
}