bcftools詳解
bcftools系統(tǒng)性練習(xí)
bcftools query file.vcf.gz -f'%FS\n' > file_FS.txt
bcftools query file.vcf.gz -f '%FS\t%SOR\t%MQRankSum\t%ReadPosRankSum\t%QD\t%MQ\t%DP\n' > file_FS.SOR.MQRS.RPRS.QD.MQ.DP.txt
-f defines the output format. The %FS\t%SOR\t… indicates that for each variant first the FS value should be printed, then a tab \t should be printed, followed by the SOR value, followed by a tab etc… . At the end, the \n tells the program that after all six measurements, there should be a new line.
比較兩個(gè)vcf
>bedtools intersect -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam> [OPTIONS]
bedtools intersect -u -a first.vcf.gz -b second.vcf.gz | wc -l
>vcf-compare first.vcf.gz second.vcf.gz
# First, compress the VCF using bgzip, then index the gzipped VCF
>bgzip first.vcf
>tabix -p vcf first.vcf.gz
>bcftools isec first.vcf.gz second.vcf.gz -p folder
? 0000.vcf # records private to first.vcf.gz
? 0001.vcf # records private to second.vcf.gz
? 0002.vcf # records from first.vcf.gz shared by both
? 0003.vcf # records from second.vcf.gz shared by both
grep -F -f file1 file2 > #simplest way to obtain overlap rows
小結(jié):
BEDTools可用于比較VCF文件,但只能通過(guò)比較基因組坐標(biāo)進(jìn)行比較;這可以提供對(duì)兩個(gè)文件中有多少個(gè)重疊變異位點(diǎn)的快速解答,并且可以用來(lái)計(jì)算Jaccard索引,從而指示總體兩個(gè)文件重疊位點(diǎn)的數(shù)量
vcf-compare提供了BEDTools的其他統(tǒng)計(jì)信息,包括重復(fù)位點(diǎn)的數(shù)量和Venn-Diagram數(shù)字,它們顯示了每個(gè)相應(yīng)的VCF文件中非它變體的數(shù)量
bcftools isec還提供了Venn-Diagram數(shù)字,并根據(jù)這些交集另外創(chuàng)建了VCF文件。
bedtools
比較兩個(gè)VCF文件?
目前個(gè)人還是傾向于bcftools!