【不定期更新】Bedtools大全--42個(gè)子命令介紹

參考鏈接:

用法大全:

官方說(shuō)明:https://bedtools.readthedocs.io/en/latest/content/overview.html
Bedtools使用:http://blog.genesino.com/2018/04/bedtools/
bedtools 用法大全:http://www.itdecent.cn/p/6c3b87301491
2019-和劉小澤一起跟著官網(wǎng)學(xué)bedtools:http://www.itdecent.cn/p/2efcb6f8f55d
bedtools使用教程詳解:

安裝:
wget https://github.com/arq5x/bedtools2/archive/v2.25.0.tar.gz
tar zxvf v2.25.0 
cd bedtools2-2.25.0/
make
cd bin/
export PATH=$PWD:$PATH
特定功能:

使用bedtools劃分各種條件的bin dplyr-pandas-bedtools 分組變量處理
dplyr-pandas-bedtools 分組變量處理





bedtools開(kāi)發(fā)的目的是為了快速,靈活的比較大量的基因組特征(genomic features)。而genomic features通常使用Browser Extensible Data (BED) 或者 General Feature Format (GFF)文件表示,用UCSC Genome Browser進(jìn)行可視化比較。

例如,bedtools可以進(jìn)行取intersect(交集), merge(并集), count(計(jì)數(shù)), complement(補(bǔ)集),以及用來(lái)對(duì)廣泛使用的基因組文件格式,例如BAM, BED, GFF/GTF, VCF等進(jìn)行基因組區(qū)間的轉(zhuǎn)換。單個(gè)的工具設(shè)計(jì)的目的是應(yīng)對(duì)簡(jiǎn)單的任務(wù),復(fù)雜的分析能通過(guò)組合多個(gè)bedtools工具操作實(shí)現(xiàn)。同時(shí),該工具允許控制輸出結(jié)果的呈現(xiàn)形式。最初的bedtools版本支持單獨(dú)的6列BED文件。但是,如今增加了對(duì)序列比對(duì)BAM文件的支持。以及GFF文件的特征,BED文件。以及VCF文件。這些工具是相當(dāng)快速的,并且即使是大的數(shù)據(jù)集也可以在數(shù)秒內(nèi)完成任務(wù)。

BEDTools主要使用BED格式的前三列,即:

  • chrom: 染色體信息

  • start: genome feature的起始位點(diǎn),從0開(kāi)始

  • end: genome feature的終止位點(diǎn),至少為1

一般常用物種的genome file在BEDTools安裝目錄的/genome里面

BEDPE格式是其自定義的一種新的格式,為了簡(jiǎn)潔的描述不連續(xù)的genome features,例如結(jié)構(gòu)變異和雙端測(cè)序比對(duì)

注意:

  • start1和start2起始坐標(biāo)第一個(gè)堿基都為0,所以start=9, end=20表示堿基跨度是從第10位到第20位

  • chrom1或者chrom2用.表示unknown;start1,end1,start2,end2用-1表示unknown



bedtools 的39個(gè)子命令列表(按照字母排序)

Utility Description
annotate Annotate coverage of features from multiple files.
bamtobed Convert BAM alignments to BED (& other) formats.
bamtofastq Convert BAM records to FASTQ records.
bed12tobed6 Breaks BED12 intervals into discrete BED6 intervals.
bedpetobam Convert BEDPE intervals to BAM records.
bedtobam Convert intervals to BAM records.
closest Find the closest, potentially non-overlapping interval.
cluster Cluster (but don’t merge) overlapping/nearby intervals.
complement Extract intervals not represented by an interval file.
coverage Compute the coverage over defined intervals.
expand Replicate lines based on lists of values in columns.
fisher Calculate Fisher statistic b/w two feature files.
flank Create new intervals from the flanks of existing intervals.
genomecov Compute the coverage over an entire genome.
getfasta Use intervals to extract sequences from a FASTA file.
groupby Group by common cols. & summarize oth. cols. (~ SQL “groupBy”)
igv Create an IGV snapshot batch script.
intersect Find overlapping intervals in various ways.
jaccard Calculate the Jaccard statistic b/w two sets of intervals.
links Create a HTML page of links to UCSC locations.
makewindows Make interval “windows” across a genome.
map Apply a function to a column for each overlapping interval.
maskfasta Use intervals to mask sequences from a FASTA file.
merge Combine overlapping/nearby intervals into a single interval.
multicov Counts coverage from multiple BAMs at specific intervals.
multiinter Identifies common intervals among multiple interval files.
nuc Profile the nucleotide content of intervals in a FASTA file.
overlap Computes the amount of overlap from two intervals.
pairtobed Find pairs that overlap intervals in various ways.
pairtopair Find pairs that overlap other pairs in various ways.
random Generate random intervals in a genome.
reldist Calculate the distribution of relative distances b/w two files.
tools/sample Sample random records from file using reservoir sampling.
shift Adjust the position of intervals.
shuffle Randomly redistribute intervals in a genome.
slop Adjust the size of intervals.
sort Order the intervals in a file.
tools/spacing Sample random records from file using reservoir sampling.
tools/split Split a file into multiple files with equal records or base pairs.
subtract Remove intervals based on overlaps b/w two files.
tag Tag BAM alignments based on overlaps with interval files.
unionbedg Combines coverage intervals from multiple BEDGRAPH files.
window Find overlapping intervals within a window around an interval.




bedtools 的41個(gè)子命令列表(按照功能排序)

  1. 區(qū)域注釋,如peak注釋,peak分布分析,peak與調(diào)控元件交集等。

  2. 區(qū)域合并,如求算多樣品peak合集,或合并重疊區(qū)域

  3. 區(qū)域互補(bǔ),如得到非基因區(qū)

  4. 利用比對(duì)結(jié)果對(duì)測(cè)序廣度和深度評(píng)估

  5. 多樣品peak相似性計(jì)算,評(píng)估ChIP類區(qū)域結(jié)果的樣品相似性。

bedtools: flexible tools for genome arithmetic and DNA sequence analysis.
?
usage:    bedtools <subcommand> [options]
?
The bedtools sub-commands include:
?
?
[ Genome arithmetic ]
?
 intersect     Find overlapping intervals in various ways.
?
 求區(qū)域之間的交集,可以用來(lái)注釋peak,計(jì)算reads比對(duì)到的基因組區(qū)域
 不同樣品的peak之間的peak重疊情況。
?
 window        Find overlapping intervals within a window around an interval.
 closest       Find the closest, potentially non-overlapping interval.
?
 尋找最近但可能不重疊的區(qū)域
?
 coverage      Compute the coverage over defined intervals.
?
 計(jì)算區(qū)域覆蓋度
?
 map           Apply a function to a column for each overlapping interval.
 genomecov     Compute the coverage over an entire genome.
 merge         Combine overlapping/nearby intervals into a single interval.
?
 合并重疊或相接的區(qū)域
?
 cluster       Cluster (but don't merge) overlapping/nearby intervals.
 complement    Extract intervals _not_ represented by an interval file.
?
 獲得互補(bǔ)區(qū)域
?
 subtract      Remove intervals based on overlaps b/w two files.
?
 計(jì)算區(qū)域差集
?
 slop          Adjust the size of intervals.
?
 調(diào)整區(qū)域大小,如獲得轉(zhuǎn)錄起始位點(diǎn)上下游3 K的區(qū)域
?
 flank         Create new intervals from the flanks of existing intervals.
?
 sort          Order the intervals in a file.
?
 排序,部分命令需要排序過(guò)的bed文件
?
 random        Generate random intervals in a genome.
?
 獲得隨機(jī)區(qū)域,作為背景集
?
 shuffle       Randomly redistrubute intervals in a genome.
?
 根據(jù)給定的bed文件獲得隨機(jī)區(qū)域,作為背景集
?
 sample        Sample random records from file using reservoir sampling.
 spacing       Report the gap lengths between intervals in a file.
 annotate      Annotate coverage of features from multiple files.
?
[ Multi-way file comparisons ]
?
 multiinter    Identifies common intervals among multiple interval files.
 unionbedg     Combines coverage intervals from multiple BEDGRAPH files.
?
[ Paired-end manipulation ]
?
 pairtobed     Find pairs that overlap intervals in various ways.
 pairtopair    Find pairs that overlap other pairs in various ways.
?
[ Format conversion ]
?
 bamtobed      Convert BAM alignments to BED (& other) formats.
 bedtobam      Convert intervals to BAM records.
 bamtofastq    Convert BAM records to FASTQ records.
 bedpetobam    Convert BEDPE intervals to BAM records.
 bed12tobed6   Breaks BED12 intervals into discrete BED6 intervals.
?
[ Fasta manipulation ]
?
 getfasta      Use intervals to extract sequences from a FASTA file.
?
 提取給定位置的FASTA序列
?
 maskfasta     Use intervals to mask sequences from a FASTA file.
 nuc           Profile the nucleotide content of intervals in a FASTA file.
?
[ BAM focused tools ]
?
 multicov      Counts coverage from multiple BAMs at specific intervals.
 tag           Tag BAM alignments based on overlaps with interval files.
?
[ Statistical relationships ]
?
 jaccard       Calculate the Jaccard statistic b/w two sets of intervals.
?
 計(jì)算數(shù)據(jù)集相似性
?
 reldist       Calculate the distribution of relative distances b/w two files.
 fisher        Calculate Fisher statistic b/w two feature files.
?
[ Miscellaneous tools ]
?
 overlap       Computes the amount of overlap from two intervals.
 igv           Create an IGV snapshot batch script.
?
 用于生成一個(gè)腳本,批量捕獲IGV截圖
?
 links         Create a HTML page of links to UCSC locations.
?
 makewindows   Make interval "windows" across a genome.
?
 把給定區(qū)域劃分成指定大小和間隔的小區(qū)間 (bin)
?
 groupby       Group by common cols. & summarize oth. cols. (~ SQL "groupBy")
?
 分組結(jié)算,不只可以用于bed文件。
?
 expand        Replicate lines based on lists of values in columns.
 split         Split a file into multiple files with equal records or base pairs
  • stdin 和 - 用法和xargs 不一樣
    image.png





Part1:Genome arithmetic

1.intersect

可以計(jì)算兩個(gè)或者多個(gè)BED/BAM/VCF/GFF文件中基因組坐標(biāo)位置的交集(overlap),根據(jù)參數(shù)不同,可以得到不同的結(jié)果。

求區(qū)域之間的交集,可以用來(lái)注釋peak,計(jì)算reads比對(duì)到的基因組區(qū)域, 不同樣品的peak之間的peak重疊情況

image
  • Usage:
    bedtools intersect [OPTIONS] -a <FILE> \
                             -b <FILE1, FILE2, ..., FILEN>
  • 計(jì)算兩個(gè)bed交集區(qū)域
$ cat A.bed
chr1  10  20
chr1  30  40

$ cat B.bed
chr1  15   20

$ bedtools intersect -a A.bed -b B.bed
chr1  15   20

2.window

與bedtools intersect相似,窗口搜索A和B中的重疊特性。然而,窗口添加了A中每個(gè)特性上游和下游的指定大小(默認(rèn)為1000)的堿基對(duì)。

image
  • Usage:
bedtools window [OPTIONS] [-a|-abam] -b <BED/GFF/VCF></pre>
  • 相當(dāng)于將A區(qū)域進(jìn)行擴(kuò)增一定區(qū)間,再和B比較。
    $ cat A.bed
    chr1  100  200
    ?
    $ cat B.bed
    chr1  500  1000
    chr1  1300 2000
    ?
    $ bedtools window -a A.bed -b B.bed
    chr1  100  200  chr1  500  1000</pre>

3.closest

類似于相交,最接近的搜索是在A和B中重疊的特征。如果B中沒(méi)有一個(gè)特征與A中的當(dāng)前特征重疊,則最接近將報(bào)告最接近的特征(即距A的起點(diǎn)或終點(diǎn)最小的基因組距離)。 例如,人們可能想找到哪個(gè)是與顯著GWAS多態(tài)性最接近的基因。 請(qǐng)注意,最接近將報(bào)告重疊特征為最接近-即,它不限于最接近的非重疊特征。 以下標(biāo)志性的“備忘單”總結(jié)了最接近的工具提供的各種選項(xiàng)提供的功能。

尋找最近但可能不重疊的區(qū)域: 簡(jiǎn)單來(lái)說(shuō)就是從A.bed 中找到B.bed 中最近的區(qū)間,比如修飾一個(gè)基因,最近的enhancer 位置.

image
  • Usage:
    bedtools closest [OPTIONS] -a <FILE> \
     -b <FILE1, FILE2, ..., FILEN></pre>
  • 比如:
   $ cat a.bed
    chr1  10  20  a1  1 -
    ?
    $ cat b.bed
    chr1  7   8   b1  1 -
    chr1  15  25  b2  2 +
    ?
    $ bedtools closest -a a.bed -b b.bed
    chr1  10  20  a1  1 - chr1  15  25  b2  2 +</pre>

4.coverage

計(jì)算區(qū)域覆蓋度:計(jì)算一個(gè)bed下,每一個(gè)region 和另一個(gè)bed 文件的交集。

Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
?
BED FILE A  ***************     ***************     ******    **************
?
BED File B  ^^^^ ^^^^              ^^             ^^^^^^^^^    ^^^ ^^ ^^^^
 ^^^^^^^^                                      ^^^^^ ^^^^^ ^^
?
Result      [  N=3, 10/15 ]     [  N=1, 2/15 ]     [N=1,6/6]   [N=6, 12/14 ]</pre>
  • Usage:
   bedtools coverage [OPTIONS] -a <FILE> \
     -b <FILE1, FILE2, ..., FILEN></pre>
  • 例子:統(tǒng)計(jì)A區(qū)間每一個(gè)元素和B文件交集情況
$ cat A.bed
    chr1  0   100
    chr1  100 200
    chr2  0   100
    ?
    $ cat B.bed
    chr1  10  20
    chr1  20  30
    chr1  30  40
    chr1  100 200
    ?
    $ bedtools coverage -a A.bed -b B.bed
    chr1  0   100  3  30  100 0.3000000
    chr1  100 200  1  100 100 1.0000000
    chr2  0   100  0  0   100 0.0000000</pre>

5.map

為每個(gè)重疊間隔對(duì)列應(yīng)用一個(gè)函數(shù)。

image
  • Usage:
bedtools map [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf></pre>
  • 例子:和A 區(qū)間有重疊的B中元素作為一個(gè)分組,運(yùn)用函數(shù)計(jì)算最大值或者平均值等等。
   $ cat a.bed
    chr1        10      20      a1      1       +
    chr1        50      60      a2      2       -
    chr1        80      90      a3      3       -
    ?
    $ cat b.bed
    chr1        12      14      b1      2       +
    chr1        13      15      b2      5       -
    chr1        16      18      b3      5       +
    chr1        82      85      b4      2       -
    chr1        85      87      b5      3       +
    ?
    $ bedtools map -a a.bed -b b.bed
    chr1        10      20      a1      1       +       12
    chr1        50      60      a2      2       -       .
    chr1        80      90      a3      3       -       5

6.genomecov

染色體和全基因組覆蓋度計(jì)算

image
  • Usage:
    bedtools genomecov [OPTIONS] [-i|-ibam] -g (iff. -i)</pre>
  • example:
    $ cat A.bed
    chr1  10  20
    chr1  20  30
    chr2  0   500
    ?
    $ cat my.genome
    chr1  1000
    chr2  500
    ?
    $ bedtools genomecov -i A.bed -g my.genome
    chr1   0  980  1000  0.98
    chr1   1  20   1000  0.02
    chr2   1  500  500   1
    genome 0  980  1500  0.653333
    genome 1  520  1500  0.346667
    # name 覆蓋次數(shù) 覆蓋堿基數(shù) 總堿基數(shù) 覆蓋度
    # 同時(shí)計(jì)算單染色體和全基因組覆蓋度</pre>

7.merge

http://www.itdecent.cn/p/2efcb6f8f55d

許多數(shù)據(jù)集的基因組feature坐標(biāo)經(jīng)常是連續(xù)的(比如ChIPseq的結(jié)果),就像下圖的藍(lán)色部分

于是可以把這些連續(xù)的基因組小區(qū)間連接起來(lái),拼成一個(gè)連續(xù)的大區(qū)間

image
  • Usage:
bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM></pre>
  • 例子:
需要拼接的輸入文件(bed/gff/vcf)必須是排序(sort)過(guò)的【不sort會(huì)報(bào)錯(cuò)???♂?】

先按染色體,再按起始位點(diǎn),這樣保證merge的算法執(zhí)行起來(lái)非常順暢,而基本不需要消耗內(nèi)存再次加工

以exon.bed為例,展示merge的作用

注意看:第3行和第4行的區(qū)間是有重疊的,因此它們可以進(jìn)行merge

image.png
bedtools merge -i exons.bed | head -10</pre>

可以看到,merge之后原來(lái)的第3行(13220 - 14409)和第4行(14361 - 14829)坐標(biāo)合并成了13220 - 14829

[圖片上傳失敗...(image-de7f16-1586285151489)]

8.cluster

類似merge 功能,合并重疊及其靠近的區(qū)間

image
  • Usage:
bedtools cluster [OPTIONS] -i <BED/GFF/VCF></pre>
  • example: 默認(rèn)情況下合并有重疊的區(qū)域(1bp)
$ cat A.bed
 chr1  100  200
 chr1  180  250
 chr1  250  500
 chr1  501  1000
 ?
 $ bedtools cluster -i A.bed
 chr1  100     200     1
 chr1  180     250     1
 chr1  250     500     1
 chr1  501     1000    2

9.complement

http://www.itdecent.cn/p/2efcb6f8f55d

bedtools complement 實(shí)現(xiàn)反選.
給定一個(gè)feature坐標(biāo)信息文件,我們?nèi)绻魂P(guān)心其中標(biāo)記的區(qū)間,而是想看看有哪些區(qū)間不在這個(gè)文件中.

例如,有一個(gè)ChIP-seq的peaks信息,現(xiàn)在想知道有哪些區(qū)域沒(méi)有被抗體結(jié)合,就可以用complement

image
  • Usage:
bedtools complement -i <BED/GFF/VCF> -g <GENOME></pre>
  • example:

    現(xiàn)在有了外顯子的bed文件,通過(guò)反選,我們就能獲得內(nèi)含子或基因間區(qū)的坐標(biāo)。那么既然是反選,除了exon.bed,還要有一個(gè)總體的范圍(也就是基因組各個(gè)染色體的長(zhǎng)度信息)【用-g指定】

 # 首先要確定整體的操作范圍
  head -10 genome.txt
  ?
  chr1    249250621
  chr10   135534747
  chr11   135006516
  chr11_gl000202_random   40103
  chr12   133851895
  chr13   115169878
  chr14   107349540
  chr15   102531392
  chr16   90354753
  chr17   81195210
  ?
  # 實(shí)現(xiàn)反選,選出不屬于exon的區(qū)域
  bedtools complement -i exons.bed -g genome.txt | head -10
  ?
  chr1    0   11873
  chr1    12227   12612
  chr1    12721   13220
  chr1    14829   14969
  chr1    15038   15795
  chr1    15947   16606
  chr1    16765   16857
  chr1    17055   17232
  chr1    17368   17605
  chr1    17742   17914</pre>

10 .subtract

http://www.itdecent.cn/p/cb079a393661

從A中去掉B

image
  • Usage:
bedtools subtract [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF></pre>
  • example:
    $ cat A.bed
    chr1  10   20
    chr1  100  200
    ?
    $ cat B.bed
    chr1  0    30
    chr1  180  300
    ?
    $ bedtools subtract -a A.bed -b B.bed
    chr1  100  180</pre>

11.slop

awk '{OFS="\t" print $1,$2-<slop>,$3+<slop>}'類似.
bedtools slop將限制染色體的大小(即沒(méi)有不能小于 0和結(jié)束不能大于染色體長(zhǎng)度)。

image
  • Usage:
bedtools slop [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]</pre>
  • example :
    $ cat A.bed
    chr1 5 100
    chr1 800 980
    ?
    $ cat my.genome
    chr1 1000
    ?
    $ bedtools slop -i A.bed -g my.genome -b 5
    chr1 0 105
    chr1 795 985
    ?
    $ bedtools slop -i A.bed -g my.genome -l 2 -r 3
    chr1 3 103
    chr1 798 983

12.flank

將區(qū)間進(jìn)行左右兩側(cè)擴(kuò)充.

image
  • Usage:
bedtools flank [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]</pre>
  • example:
$ cat A.bed
    chr1 100 200
    chr1 500 600
    ?
    $ cat my.genome
    chr1 1000
    ?
    $ bedtools flank -i A.bed -g my.genome -b 5
    chr1  95      100
    chr1  200     205
    chr1  495     500
    chr1  600     605
    ?
    $ bedtools flank -i A.bed -g my.genome -l 2 -r 3
    chr1  98      100
    chr1  200     203
    chr1  498     500
    chr1  600     603</pre>

13. sort

排序,部分命令需要排序過(guò)的bed文件

  • Usage:
bedtools sort [OPTIONS] -i <BED/GFF/VCF></pre>
  • example:
cat A.bed
   chr1 800 1000
   chr1 80  180
   chr1 1   10
   chr1 750 10000
   ?
   sortBed -i A.bed
   chr1 1   10
   chr1 80  180
   chr1 750 10000
   chr1 800 1000</pre>

14.random

bedtools random將以BED6格式生成一組隨機(jī)的間隔??梢灾付☉?yīng)該生成的間隔的數(shù)目(-n)和大小(-l)。

image
  • Usage:
bedtools random [OPTIONS] -g <GENOME></pre>
  • example:
$ bedtools random -g hg19.genome
 chr2  87536758        87536858        1       100     -
 chrX  46051735        46051835        2       100     +
 chr18 5237041 5237141 3       100     -
 chr12 45809998        45810098        4       100     +
 chrX  42034890        42034990        5       100     -
 chr10 77510935        77511035        6       100     -
 chr3  39844278        39844378        7       100     -
 chr6  101012700       101012800       8       100     +
 chr12 38123482        38123582        9       100     +
 chr7  88508598        88508698        10      100     -
 ?
 $ bedtools random -g hg19.genome
 chr3  141987850       141987950       1       100     +
 chr5  137643331       137643431       2       100     +
 chr2  155523858       155523958       3       100     -
 chr5  147874094       147874194       4       100     +
 chr1  71838335        71838435        5       100     -
 chr8  71154323        71154423        6       100     -
 chr2  133240474       133240574       7       100     +
 chr9  131495427       131495527       8       100     +
 chrX  125952943       125953043       9       100     +
 chr3  59685545        59685645        10      100     +</pre>

15.shuffle

bedtools shuffle將在基因組文件中定義的基因組中,隨機(jī)排列bed文件中區(qū)域在基因組位置。

根據(jù)給定的bed文件獲得隨機(jī)區(qū)域,作為背景集

image
  • Usage:
bedtools shuffle [OPTIONS] -i <BED/GFF/VCF> -g <GENOME></pre>
  • example: 默認(rèn)情況下,bedtools shuffle將在隨機(jī)染色體上的隨機(jī)位置上重新定位輸入BED文件中的每個(gè)特性。每個(gè)特征的大小和鏈被保留。
$ cat A.bed
  chr1  0  100  a1  1  +
  chr1  0  1000 a2  2  -

  $ cat my.genome
  chr1  10000
  chr2  8000
  chr3  5000
  chr4  2000

  $ bedtools shuffle -i A.bed -g my.genome
  chr4  1498  1598  a1  1  +
  chr3  2156  3156  a2  2  -</pre>

16.sample

從文件中隨機(jī)取樣記錄.

Summary: Take sample of input file(s) using reservoir sampling algorithm.

Usage: bedtools sample [OPTIONS] -i <bed/gff/vcf/bam>

WARNING:  The current sample algorithm will hold all requested sample records in memory prior to output.  The user must ensure that there is adequate memory for this.

17.spacing

報(bào)告文件中間隔之間的間隔長(zhǎng)度

Summary: Report (last col.) the gap lengths between intervals in a file.

Usage: bedtools spacing [OPTIONS] -i <bed/gff/vcf/bam>

-bed  If using BAM input, write output as BED.

-header  Print the header from the A file prior to results.

-nobuf  Disable buffered output. Using this option will cause each line  of output to be printed as it is generated, rather than saved  in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with  other software tools and scripts that need to process one  line of bedtools output at a time.

-iobuf  Specify amount of memory to use for input buffer.  Takes an integer argument. Optional suffixes K/M/G supported.  Note: currently has no effect with compressed files.

Notes: (1) Input must be sorted by chrom,start (sort -k1,1 -k2,2n for BED).  (2) The 1st element for each chrom will have NULL distance. (".").  (3) Distance for overlapping intervals is -1 and 0 for adjacent intervals.

Example:


$ cat test.bed 
    chr1    0   10 
    chr1    10  20 
    chr1    19  30 
    chr1    35  45 
    chr1    100 200 </pre>

?  $ bedtools spacing -i test.bed ?  chr1 0 10 . ?  chr1 10 20 0 ?  chr1 19 30 -1 ?  chr1 35 45 5 ?  chr1 100 200 55

18.annotate

bedtools可以對(duì)一個(gè)BED / VCF / GFF文件進(jìn)行注釋,并具有從多個(gè)其他BED / VCF / GFF文件中觀察到的覆蓋范圍和重疊數(shù)。 通過(guò)這種方式,它允許人們通過(guò)一個(gè)命令詢問(wèn)一個(gè)feature與其他多個(gè)feature類型的重合程度。

  • Usage
bedtools annotate [OPTIONS] -i <BED/GFF/VCF> -files FILE1 FILE2 FILE3 ... FILEn</pre>
  • 計(jì)算輸入的bed文件和其他多個(gè)文件交集個(gè)數(shù).
chr1 100  200   nasty 1  -
 chr2 500  1000  ugly  2  +
 chr3 1000 5000  big   3  -

 $ cat genes.bed
 chr1 150  200   geneA 1  +
 chr1 175  250   geneB 2  +
 chr3 0    10000 geneC 3  -

 $ cat conserve.bed
 chr1 0    10000 cons1 1  +
 chr2 700  10000 cons2 2  -
 chr3 4000 10000 cons3 3  +

 $ cat known_var.bed
 chr1 0    120   known1   -
 chr1 150  160   known2   -
 chr2 0    10000 known3   +

 $ bedtools annotate -counts -i variants.bed -files genes.bed conserve.bed known_var.bed
 chr1  100     200     nasty   1       -       2       1       2
 chr2  500     1000    ugly    2       +       0       1       1
 chr3  1000    5000    big     3       -       1       1       0</pre>





Part2:Multi-way file comparisons

19. multiinter

標(biāo)識(shí)多個(gè)bed文件之間的公共區(qū)間。

Summary: Identifies common intervals among multiple  BED/GFF/VCF files.

Usage: bedtools multiinter [OPTIONS] -i FILE1 FILE2 .. FILEn  Requires that each interval file is sorted by chrom/start.

20.unionbedg

unionbedg將多個(gè)BEDGRAPH文件組合成單個(gè)文件,這樣就可以直接比較多個(gè)樣本的覆蓋率(如基因型)

  • Usage:
bedtools unionbedg [OPTIONS] -i FILE1 FILE2 FILE3 ... FILEn</pre>
  • example:
cat 1.bg
  chr1 1000 1500 10
  chr1 2000 2100 20

  cat 2.bg
  chr1 900 1600 60
  chr1 1700 2050 50

  cat 3.bg
  chr1 1980 2070 80
  chr1 2090 2100 20

  cat sizes.txt
  chr1 5000

  bedtools unionbedg -i 1.bg 2.bg 3.bg
  chr1 900  1000 0  60 0
  chr1 1000 1500 10 60 0
  chr1 1500 1600 0  60 0
  chr1 1700 1980 0  50 0
  chr1 1980 2000 0  50 80
  chr1 2000 2050 20 50 80
  chr1 2050 2070 20 0  80
  chr1 2070 2090 20 0  0
  chr1 2090 2100 20 0  20</pre>




Part3 : Paired-end manipulation

21. pairtobed

找出以各種方式重疊區(qū)間的對(duì)

Summary: Report overlaps between a BEDPE file and a BED/GFF/VCF file.

Usage: bedtools pairtobed [OPTIONS] -a <bedpe> -b <bed/gff/vcf>

22.pairtopair

找出以各種方式重疊的配對(duì)。

pairToPair比較兩個(gè)BEDPE文件以查找重疊,其中A中BEDPE特征的每個(gè)末端與B中特征的末端重疊。例如,使用pairToPair,可以在兩個(gè)文件中篩選出完全相同的不一致雙末端對(duì)齊方式。 這可能表明(除其他事項(xiàng)外)不一致的對(duì)表明每個(gè)文件/樣本中的結(jié)構(gòu)都相同。

  • Usage:
pairToPair [OPTIONS] -a <BEDPE> -b <BEDPE></pre>
  • example: 默認(rèn)情況下,如果兩端都與BEDPE B文件中的特征重疊,則將報(bào)告A中的BEDPE特征。 如果存在兩個(gè)BEDPE文件的鏈信息,則進(jìn)一步要求兩端的重疊都在同一鏈上。 這樣,原本重疊的(就基因組位置而言)F / R比對(duì)將不與R / R比對(duì)匹配。
  Chromosome  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    BEDPE A         >>>>>.................................>>>>>

    BEDPE B            <<<<<.............................>>>>>

    Result

    BEDPE A         >>>>>.................................>>>>>

    BEDPE B            >>>>>.............................>>>>>

    Result          >>>>>.................................>>>>></pre>




Part 4 : Format conversion

23.bamtobed

bedtools bamtobed是一種比較實(shí)用轉(zhuǎn)換程序,可將BAM格式的序列比對(duì)轉(zhuǎn)換為BED,BED12和/或BEDPE記錄。保存了位置信息,同時(shí)節(jié)約了空間.

  • Usage
bedtools bamtobed [OPTIONS] -i <BAM></pre>
  • 比如將bam 轉(zhuǎn)換成bed6 格式
    $ bedtools bamtobed -i reads.bam | head -3
    chr7   118970079   118970129   TUPAC_0001:3:1:0:1452#0/1   37   -
    chr7   118965072   118965122   TUPAC_0001:3:1:0:1452#0/2   37   +
    chr11  46769934    46769984    TUPAC_0001:3:1:0:1472#0/1   37   -

24.bedtobam

將bed 轉(zhuǎn)換成bam 文件

  • Usage:
bedToBam [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> > <BAM></pre>
  • example:
head -5 rmsk.hg18.chr21.bed
chr21 9719768  9721892  ALR/Alpha  1004  +
chr21 9721905  9725582  ALR/Alpha  1010  +
chr21 9725582  9725977  L1PA3 3288 +
chr21 9726021  9729309  ALR/Alpha  1051  +
chr21 9729320  9729809  L1PA3 3897 -

bedToBam -i rmsk.hg18.chr21.bed -g human.hg18.genome > rmsk.hg18.chr21.bam

samtools view rmsk.hg18.chr21.bam | head -5
ALR/Alpha  0   chr21 9719769  255  2124M *  0  0  *  *
ALR/Alpha  0   chr21 9721906  255  3677M *  0  0  *  *
L1PA3      0   chr21 9725583  255  395M  *  0  0  *  *
ALR/Alpha  0   chr21 9726022  255  3288M *  0  0  *  *
L1PA3      16  chr21 9729321  255  489M  *  0  0  *  *</pre>

25.bamtofastq

bedtools bamtofastq是一個(gè)轉(zhuǎn)換工具,用于從BAM格式的序列比對(duì)中提取FASTQ記錄。

  • Usage
bedtools bamtofastq [OPTIONS] -i <BAM> -fq <FASTQ>
  • 比如下面例子
$ bedtools bamtofastq -i NA18152.bam -fq NA18152.fq

    $ head -8 NA18152.fq
    @NA18152-SRR007381.35051
    GGAGACATATCATATAAGTAATGCTAGGGTGAGTGGTAGGAAGTTTTTTCATAGGAGGTGTATGAGTTGGTCGTAGCGGAATCGGGGGTATGCTGTTCGAATTCATAAGAACAGGGAGGTTAGAAGTAGGGTCTTGGTGACAAAATATGTTGTATAGAGTTCAGGGGAGAGTGCGTCATATGTTGTTCCTAGGAAGATTGTAGTGGTGAGGGTGTTTATTATAATAATGTTTGTGTATTCGGCTATGAAGAATAGGGCGAAGGGGCCTGCGGCGTATTCGATGTTGAAGCCTGAGACTAGTTCGGACTCCCCTTCGGCAAGGTCGAA
    +
    <<<;;<;<;;<;;;;;;;;;;;;<<<:;;;;;;;;;;;;;;;;::::::;;;;<<;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<<<;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<;;;;;:;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<;;;;;;;;;;<<<<<<<<;;;;;;;;;:;;;;;;;;;;;;;;;;;;;:;;;;8;;8888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;8966689666666299866669:899
    @NA18152-SRR007381.637219
    AATGCTAGGGTGAGTGGTAGGAAGTTTTTTCATAGGAGGTGTATGAGTTGGTCGTAGCGGAATCGGGGGTATGCTGTTCGAATTCATAAGAACAGGGAGGTTAGAAGTAGGGTCTTGGTGACAAAATATGTTGTATAGAGTTCAGGGGAGAGTGCGTCATATGTTGTTCCTAGGAAGATTGTAGTGGTGAGGGTGTTTATTATAATAATGTTTGTGTATTCGGCTATGAAGAATAGGGCGAAGGGGCCTGCGGCGTATTCGATGTTGAAGCCTGAGACTAGTTCGGACTCCCCTTCCGGCAAGGTCGAA
    +
    <<<<<<<<<<;;<;<;;;;<<;<888888899<;;;;;;<;;;;;;;;;;;;;;;;;;;;;;;;<<<<<;;;;;;;;;<;<<<<<;;;;;;;;;;;;;<<<<;;;;;;;:::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<<;;;;;;;;;;;;;;;;;;;;;;;<;;;;;;;;;;;;;;;;;;;;;;<888<;<<;;;;<<<<<<;;;;;<<<<<<<<;;;;;;;;;:;;;;888888899:::;;8;;;;;;;;;;;;;;;;;;;99;;99666896666966666600;96666669966</pre>

26.bedpetobam

Summary: Converts feature records to BAM format.

Usage: bedpetobam [OPTIONS] -i <bed/gff/vcf> -g <genome>

Options: -mapq  Set the mappinq quality for the BAM records.  (INT) Default: 255

-ubam  Write uncompressed BAM output. Default writes compressed BAM.

Notes: (1) BED files must be at least BED4 to create BAM (needs name field).

27.bed12tobed6

bed12ToBed6是一種方便的工具,它可以將BED12中的bed特征(即堆疊型的bed特征,如基因)轉(zhuǎn)換為離散的bed特征。例如,對(duì)于一個(gè)有六個(gè)外顯子的基因,bed12ToBed6會(huì)產(chǎn)生六個(gè)單獨(dú)的BED6特征(即,每個(gè)外顯子對(duì)應(yīng)一個(gè))。

  • Usage:
   bed12ToBed6 [OPTIONS] -i <BED12></pre>
  • 基因多個(gè)外顯子轉(zhuǎn)換成每一個(gè)外顯子單獨(dú)一行
  head data/knownGene.hg18.chr21.bed | tail -n 3
    chr21 10079666  10120808   uc002yiv.1  0  -  10081686  1 0 1 2 0 6 0 8  0     4   528,91,101,215, 0,1930,39750,40927,
    chr21 10080031  10081687   uc002yiw.1  0  -  10080031  1 0 0 8 0 0 3 1  0     2   200,91,    0,1565,
    chr21 10081660  10120796   uc002yix.2  0  -  10081660  1 0 0 8 1 6 6 0  0     3   27,101,223,0,37756,38913,

    head data/knownGene.hg18.chr21.bed | tail -n 3 | bed12ToBed6 -i stdin
    chr21 10079666  10080194  uc002yiv.1 0  -
    chr21 10081596  10081687  uc002yiv.1 0  -
    chr21 10119416  10119517  uc002yiv.1 0  -
    chr21 10120593  10120808  uc002yiv.1 0  -
    chr21 10080031  10080231  uc002yiw.1 0  -
    chr21 10081596  10081687  uc002yiw.1 0  -
    chr21 10081660  10081687  uc002yix.2 0  -
    chr21 10119416  10119517  uc002yix.2 0  -
    chr21 10120573  10120796  uc002yix.2 0  -</pre>




Part5 : Fasta manipulation

28.getfasta

http://www.itdecent.cn/p/6c3b87301491

根據(jù)坐標(biāo)區(qū)域來(lái)從基因組里面提取fasta序列

image
  • Usage
$ bedtools getfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF></pre>
  • example:

參考:# BED/GFF/VCF +reference --> fasta

bedtools getfasta -fi ~/biosoft/bowtie/hg19_index/hg19.fa  -bed ../macs14_results/highQuality_summits.bed  -fo highQuality.fa
bedtools getfasta -fi ~/biosoft/bowtie/hg19_index/hg19.fa  -bed ../macs14_results/highQuality_peaks.bed  -fo highQuality.fa

腳本里面用的是bed格式來(lái)記錄坐標(biāo)區(qū)域,參考基因組用-fi參數(shù)指定具體位置,輸出的fasta序列文件用-fo參數(shù)指定

tips: 有用的三個(gè)參數(shù)
image.png
-s  Force strandedness. If the feature occupies the antisense,
        strand, the sequence will be reverse complemented.
        - By default, strand information is ignored.
-name   Use the name field for the FASTA header
-tab    Write output in TAB delimited format.
        - Default is FASTA format.

-s : 當(dāng)提取數(shù)據(jù),區(qū)分正負(fù)鏈時(shí)候,需要添加-s 參數(shù),自動(dòng)提取此區(qū)間負(fù)鏈反向
的序列(方向互補(bǔ))
-name : 提取bed文件第四列區(qū)間name,為fasta 文件輸出注釋行.
-tab : fasta文件名稱和序列以tab 分割。

29.maskfasta

和getfasta 相反,屏蔽區(qū)間.

image
  • Usage
$ bedtools maskfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA></pre>
  • example:
$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10

$ bedtools maskfasta -fi test.fa -bed test.bed -fo test.fa.out

$ cat test.fa.out
>chr1
AAAAANNNNNCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG</pre>

30.nuc

分析FASTA文件中,bed 區(qū)間對(duì)應(yīng)的核苷酸含量

Summary: Profiles the nucleotide content of intervals in a fasta file.

Usage: bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>





Part6 : BAM focused tools

31.multicov

http://www.itdecent.cn/p/6c3b87301491

提供的每個(gè)bed間隔,它報(bào)告來(lái)自每個(gè)BAM文件的重疊對(duì)齊的單獨(dú)計(jì)數(shù)。類似功能 featurecount/deeptools multiBamSummary

  • Usage:
bedtools multicov [OPTIONS] -bams BAM1 BAM2 BAM3 ... BAMn -bed  <BED/GFF/VCF></pre>
  • example:

對(duì)RNA-seq的比對(duì)文件中的比對(duì)到各個(gè)基因的reads進(jìn)行計(jì)數(shù)。**

# 例子:
bedtools multicov -bams aln1.bam aln2.bam aln3.bam -bed ivls-of-interest.bed
# ivls-of-interest.bed這個(gè)文件是必須的,可能需要自己制作,其實(shí)用gtf文件也可以的,如下:
chr1 0   10000   ivl1
chr1 10000   20000   ivl2
chr1 20000   30000   ivl3
chr1 30000   40000   ivl4

輸出結(jié)果前三列是坐標(biāo),第四列是基因名,跟我們的bed文件一樣,只是最后三列是三個(gè)樣本的計(jì)數(shù),是添加上來(lái)的!

chr1 0       10000   ivl1    100 2234    0
chr1 10000   20000   ivl2    123 3245    1000
chr1 20000   30000   ivl3    213 2332    2034
chr1 30000   40000   ivl4    335 7654    0</pre>

32. tag

Tag BAM alignments based on overlaps with interval files.

Summary: Annotates a BAM file based on overlaps with multiple BED/GFF/VCF files  on the intervals in -i.

Usage: bedtools tag [OPTIONS] -i <BAM> -files FILE1 .. FILEn -labels LAB1 .. LABn




Part 7 :Statistical relationships

33.jaccard

檢測(cè)兩個(gè)數(shù)據(jù)之間的相關(guān)性

引入一個(gè)新的bedtools工具jaccard,它會(huì)計(jì)算一個(gè)杰卡德相似性系數(shù)

結(jié)果是0.0 to 1. 0的值,數(shù)越小相關(guān)性越小

# 檢測(cè)同一個(gè)樣本的不同數(shù)據(jù)【系數(shù)是0.50637】
bedtools jaccard \
    -a fHeart-DS16621.hotspot.twopass.fdr0.05.merge.bed \
    -b fHeart-DS15839.hotspot.twopass.fdr0.05.merge.bed

intersection    union-intersection  jaccard n_intersections
81269248    160493950   0.50637 130852

# 再看不同的樣本的不同數(shù)據(jù)【系數(shù)是0.170995】
bedtools jaccard \
    -a fHeart-DS16621.hotspot.twopass.fdr0.05.merge.bed \
    -b fSkin_fibro_bicep_R-DS19745.hg19.hotspot.twopass.fdr0.05.merge.bed

intersection    union-intersection  jaccard n_intersections
28076951    164197278   0.170995    73261</pre>

##### 另外,還能分析更多的樣本之間相關(guān)性

這個(gè)就看(官網(wǎng)[http://quinlanlab.org/tutorials/bedtools/bedtools.html](https://links.jianshu.com/go?to=http%3A%2F%2Fquinlanlab.org%2Ftutorials%2Fbedtools%2Fbedtools.html))翻到最底部

![img](https://upload-images.jianshu.io/upload_images/9376801-806a9711daa5ebfe.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/878/format/webp) 

34.reldist

計(jì)算兩個(gè)文件的相對(duì)距離

image

總結(jié)兩組基因組區(qū)間之間相似性的傳統(tǒng)方法是基于相交區(qū)間的數(shù)量或比例。 但是,這種測(cè)量很大程度上看不到兩組之間的空間相關(guān)性,盡管間距或鄰近度一致,但相交很少見(jiàn)(例如,增強(qiáng)子和轉(zhuǎn)錄起始位點(diǎn)很少重疊,但與兩組隨機(jī)數(shù)相比,它們彼此之間的距離更近 間隔)。 Favorov等人[1]提出了一種相對(duì)距離度量標(biāo)準(zhǔn),該度量描述了一組中每個(gè)間隔與另一組中兩個(gè)最近間隔之間的相對(duì)距離分布(請(qǐng)參見(jiàn)上圖)。 如果兩組之間沒(méi)有空間相關(guān)性,則可以期望相對(duì)距離在0到0.5的相對(duì)距離之間均勻分布。 但是,如果間隔趨于比偶然預(yù)期的要近得多,則觀察到的相對(duì)距離的分布將朝較低的相對(duì)距離值(例如,下圖)移動(dòng)。

image
[1] Exploring Massive, Genome Scale Datasets with the GenometriCorr Package.
Favorov A, Mularoni L, Cope LM, Medvedeva Y, Mironov AA, et al. (2012)
PLoS Comput Biol 8(5): e1002529\. doi:10.1371/journal.pcbi.1002529</pre>

Usage:

bedtools reldist [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF></pre>

example:

$ bedtools reldist \
    -a data/refseq.chr1.exons.bed.gz \
    -b data/
    aluY.chr1.bed.gz
0.00  164 43408 0.004
0.01  551 43408 0.013
0.02  598 43408 0.014
0.03  637 43408 0.015
0.04  793 43408 0.018
0.05  688 43408 0.016
0.06  874 43408 0.020
0.07  765 43408 0.018
0.08  685 43408 0.016
0.09  929 43408 0.021
0.10  876 43408 0.020
0.11  959 43408 0.022
0.12  860 43408 0.020
0.13  851 43408 0.020
0.14  903 43408 0.021
0.15  893 43408 0.021
0.16  883 43408 0.020
0.17  828 43408 0.019
0.18  917 43408 0.021
0.19  875 43408 0.020
0.20  897 43408 0.021
0.21  986 43408 0.023
0.22  903 43408 0.021
0.23  944 43408 0.022
0.24  904 43408 0.021
0.25  867 43408 0.020
0.26  943 43408 0.022
0.27  933 43408 0.021
0.28  1132  43408 0.026
0.29  881 43408 0.020
0.30  851 43408 0.020
0.31  963 43408 0.022
0.32  950 43408 0.022
0.33  965 43408 0.022
0.34  907 43408 0.021
0.35  884 43408 0.020
0.36  965 43408 0.022
0.37  944 43408 0.022
0.38  911 43408 0.021
0.39  939 43408 0.022
0.40  921 43408 0.021
0.41  950 43408 0.022
0.42  935 43408 0.022
0.43  919 43408 0.021
0.44  915 43408 0.021
0.45  934 43408 0.022
0.46  843 43408 0.019
0.47  850 43408 0.020
0.48  1006  43408 0.023
0.49  937 43408 0.022</pre>

35.fisher

對(duì)2個(gè)文件之間的重疊/唯一區(qū)間進(jìn)行費(fèi)舍爾的精確測(cè)試。

Given a pair of input files -a and -b in the usual BedTools parlance:

$ cat a.bed
chr1  10  20
chr1  30  40
chr1  51      52

$ cat b.bed
chr1  15   25
chr1  51      52</pre>

And a genome of 500 bases:

$ echo -e "chr1\t500" > t.genome</pre>

We may wish to know **if the amount of overlap between the 2 sets of intervals is more than we would expect given their coverage and the size of the genome**. We can do this with `fisher` as:

$ bedtools fisher -a a.bed -b b.bed -g t.genome
# Number of query intervals: 3
# Number of db intervals: 2
# Number of overlaps: 2
# Number of possible intervals (estimated): 37
# phyper(2 - 1, 3, 37 - 3, 2, lower.tail=F)
# Contingency Table Of Counts
#_________________________________________
#           |  in -b       | not in -b    |
#     in -a | 2            | 1            |
# not in -a | 0            | 34           |
#_________________________________________
# p-values for fisher's exact test
left    right   two-tail    ratio
1   0.0045045   0.0045045   inf</pre>

Part8 : Miscellaneous tools

36.overlap

和intersect 功能類似,將兩個(gè)輸入文件合并了,但是通過(guò)參數(shù)來(lái)指定那幾列進(jìn)行比較

Usage:

overlap [OPTIONS] -i <input> -cols s1,e1,s2,e2</pre>

| Option | Description |
| --- | --- |
| **-i** | Input file. Use “stdin” for pipes. |
| **-cols** | Specify the columns (1-based) for the starts and ends of the features for which you’d like to compute the overlap/distance. The columns must be listed in the following order: *start1,end1,start2,end2* . |

example:

windowBed -a A.bed -b B.bed -w 10
chr1  10  20  A  chr1  15  25  B
chr1  10  20  C  chr1  25  35  D

## 指定2,3 列與6,7 列比較
windowBed -a A.bed -b B.bed -w 10 | overlap -i stdin -cols 2,3,6,7
chr1  10  20  A  chr1  15  25  B  5
chr1  10  20  C  chr1  25  35  D  -5</pre>

37.igv

用于生成一個(gè)腳本,批量捕獲IGV截圖

Summary: Creates a batch script to create IGV images at each interval defined in a BED/GFF/VCF file.

Usage: bedtools igv [OPTIONS] -i <bed/gff/vcf>

38. links

創(chuàng)建一個(gè)鏈接到UCSC的HTML頁(yè)面

Usage:

linksBed [OPTIONS] -i <BED/GFF/VCF> > <HTML file></pre>

| Option | Description |
| --- | --- |
| **-base** | The “basename” for the UCSC browser. *Default: [http://genome.ucsc.edu](http://genome.ucsc.edu)* |
| **-org** | The organism (e.g. mouse, human). *Default: human* |
| **-db** | The genome build. *Default: hg18* |

example: **linksBed** creates links to the public UCSC Genome Browser.

head -3 genes.bed
chr21 9928613 10012791 uc002yip.1 0 -
chr21 9928613 10012791 uc002yiq.1 0 -

linksBed -i genes.bed -base http://mirror.uni.edu -org mouse -db mm9 > genes.html</pre>

39.makewindows

http://www.itdecent.cn/p/7d47d8074bba

把給定區(qū)域劃分成指定大小和間隔的小區(qū)間 (bin)

  • 參考染色體大小文件chrom.size
chr1    100
chr2    150
  • 將染色體劃分為20bp為一個(gè)bin的區(qū)間
$ bedtools makewindows -g  chrom.size -w 20
chr1    0   20
chr1    20  40
chr1    40  60
chr1    60  80
chr1    80  100
chr2    0   20
chr2    20  40
chr2    40  60
chr2    60  80
chr2    80  100
chr2    100 120
chr2    120 140
chr2    140 150

40.groupby

http://www.itdecent.cn/p/548d370b75a4

分組結(jié)算,不只可以用于bed文件。

以某一列進(jìn)行分組,運(yùn)用不同的函數(shù)

Usage

bedtools groupby [OPTIONS] -i <input> -g <group columns> -c <op. column> -o <operation>
img

41. expand

將某一列是逗號(hào)分隔的,差分成多行顯示.

Summary: Replicate lines in a file based on columns of comma-separated values.

Usage:  bedtools expand -c [COLS] Options: -i  Input file. Assumes "stdin" if omitted.

-c Specify the column (1-based) that should be summarized.

*   Examples:

   $ cat test.txt
      chr1  10  20  1,2,3   10,20,30
      chr1  40  50  4,5,6   40,50,60

    $ bedtools expand test.txt -c 5
    chr1    10  20  1,2,3   10
    chr1    10  20  1,2,3   20
    chr1    10  20  1,2,3   30
    chr1    40  50  4,5,6   40
    chr1    40  50  4,5,6   50
    chr1    40  50  4,5,6   60

    $ bedtools expand test.txt -c 4,5
    chr1    10  20  1   10
    chr1    10  20  2   20
    chr1    10  20  3   30
    chr1    40  50  4   40
    chr1    40  50  5   50
    chr1    40  50  6   60</pre>

42. split

將一個(gè)文件分割成多個(gè)具有相同記錄或基對(duì)的文件

Summary: Split a Bed file.

Usage: bedtools split [OPTIONS] -i <bed> -n number-of-files

Options: 
-i|--input (file)  BED input file (req'd). 
-n|--number (int)  Number of files to create (req'd).  
-p|--prefix (string)  Output BED file prefix.  
-a|--algorithm (string) Algorithm used to split data.

      so all files contain the ~ same number of bases
    * simple : route records such that each split file has
      approximately equal records (like Unix split).</pre>

-h|--help  Print help (this screen).  -v|--version  Print version.

Note: This programs stores the input BED records in memory.


小結(jié):

bedtools 很強(qiáng)大,后面用到了再對(duì)細(xì)節(jié)進(jìn)行補(bǔ)充

有很多不足地方,希望大家留言指正~~~

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容