參考鏈接:
用法大全:
官方說(shuō)明:https://bedtools.readthedocs.io/en/latest/content/overview.html
Bedtools使用:http://blog.genesino.com/2018/04/bedtools/
bedtools 用法大全:http://www.itdecent.cn/p/6c3b87301491
2019-和劉小澤一起跟著官網(wǎng)學(xué)bedtools:http://www.itdecent.cn/p/2efcb6f8f55d
bedtools使用教程詳解:
安裝:
wget https://github.com/arq5x/bedtools2/archive/v2.25.0.tar.gz
tar zxvf v2.25.0
cd bedtools2-2.25.0/
make
cd bin/
export PATH=$PWD:$PATH
特定功能:
使用bedtools劃分各種條件的bin dplyr-pandas-bedtools 分組變量處理
dplyr-pandas-bedtools 分組變量處理
bedtools開(kāi)發(fā)的目的是為了快速,靈活的比較大量的基因組特征(genomic features)。而genomic features通常使用Browser Extensible Data (BED) 或者 General Feature Format (GFF)文件表示,用UCSC Genome Browser進(jìn)行可視化比較。
例如,bedtools可以進(jìn)行取intersect(交集), merge(并集), count(計(jì)數(shù)), complement(補(bǔ)集),以及用來(lái)對(duì)廣泛使用的基因組文件格式,例如BAM, BED, GFF/GTF, VCF等進(jìn)行基因組區(qū)間的轉(zhuǎn)換。單個(gè)的工具設(shè)計(jì)的目的是應(yīng)對(duì)簡(jiǎn)單的任務(wù),復(fù)雜的分析能通過(guò)組合多個(gè)bedtools工具操作實(shí)現(xiàn)。同時(shí),該工具允許控制輸出結(jié)果的呈現(xiàn)形式。最初的bedtools版本支持單獨(dú)的6列BED文件。但是,如今增加了對(duì)序列比對(duì)BAM文件的支持。以及GFF文件的特征,BED文件。以及VCF文件。這些工具是相當(dāng)快速的,并且即使是大的數(shù)據(jù)集也可以在數(shù)秒內(nèi)完成任務(wù)。
BEDTools主要使用BED格式的前三列,即:
chrom: 染色體信息
start: genome feature的起始位點(diǎn),從0開(kāi)始
end: genome feature的終止位點(diǎn),至少為1
一般常用物種的genome file在BEDTools安裝目錄的/genome里面
BEDPE格式是其自定義的一種新的格式,為了簡(jiǎn)潔的描述不連續(xù)的genome features,例如結(jié)構(gòu)變異和雙端測(cè)序比對(duì)
注意:
start1和start2起始坐標(biāo)第一個(gè)堿基都為0,所以start=9, end=20表示堿基跨度是從第10位到第20位
chrom1或者chrom2用.表示unknown;start1,end1,start2,end2用-1表示unknown
bedtools 的39個(gè)子命令列表(按照字母排序)
| Utility | Description |
|---|---|
| annotate | Annotate coverage of features from multiple files. |
| bamtobed | Convert BAM alignments to BED (& other) formats. |
| bamtofastq | Convert BAM records to FASTQ records. |
| bed12tobed6 | Breaks BED12 intervals into discrete BED6 intervals. |
| bedpetobam | Convert BEDPE intervals to BAM records. |
| bedtobam | Convert intervals to BAM records. |
| closest | Find the closest, potentially non-overlapping interval. |
| cluster | Cluster (but don’t merge) overlapping/nearby intervals. |
| complement | Extract intervals not represented by an interval file. |
| coverage | Compute the coverage over defined intervals. |
| expand | Replicate lines based on lists of values in columns. |
| fisher | Calculate Fisher statistic b/w two feature files. |
| flank | Create new intervals from the flanks of existing intervals. |
| genomecov | Compute the coverage over an entire genome. |
| getfasta | Use intervals to extract sequences from a FASTA file. |
| groupby | Group by common cols. & summarize oth. cols. (~ SQL “groupBy”) |
| igv | Create an IGV snapshot batch script. |
| intersect | Find overlapping intervals in various ways. |
| jaccard | Calculate the Jaccard statistic b/w two sets of intervals. |
| links | Create a HTML page of links to UCSC locations. |
| makewindows | Make interval “windows” across a genome. |
| map | Apply a function to a column for each overlapping interval. |
| maskfasta | Use intervals to mask sequences from a FASTA file. |
| merge | Combine overlapping/nearby intervals into a single interval. |
| multicov | Counts coverage from multiple BAMs at specific intervals. |
| multiinter | Identifies common intervals among multiple interval files. |
| nuc | Profile the nucleotide content of intervals in a FASTA file. |
| overlap | Computes the amount of overlap from two intervals. |
| pairtobed | Find pairs that overlap intervals in various ways. |
| pairtopair | Find pairs that overlap other pairs in various ways. |
| random | Generate random intervals in a genome. |
| reldist | Calculate the distribution of relative distances b/w two files. |
| tools/sample | Sample random records from file using reservoir sampling. |
| shift | Adjust the position of intervals. |
| shuffle | Randomly redistribute intervals in a genome. |
| slop | Adjust the size of intervals. |
| sort | Order the intervals in a file. |
| tools/spacing | Sample random records from file using reservoir sampling. |
| tools/split | Split a file into multiple files with equal records or base pairs. |
| subtract | Remove intervals based on overlaps b/w two files. |
| tag | Tag BAM alignments based on overlaps with interval files. |
| unionbedg | Combines coverage intervals from multiple BEDGRAPH files. |
| window | Find overlapping intervals within a window around an interval. |
bedtools 的41個(gè)子命令列表(按照功能排序)
區(qū)域注釋,如peak注釋,peak分布分析,peak與調(diào)控元件交集等。
區(qū)域合并,如求算多樣品peak合集,或合并重疊區(qū)域
區(qū)域互補(bǔ),如得到非基因區(qū)
利用比對(duì)結(jié)果對(duì)測(cè)序廣度和深度評(píng)估
多樣品peak相似性計(jì)算,評(píng)估ChIP類區(qū)域結(jié)果的樣品相似性。
bedtools: flexible tools for genome arithmetic and DNA sequence analysis.
?
usage: bedtools <subcommand> [options]
?
The bedtools sub-commands include:
?
?
[ Genome arithmetic ]
?
intersect Find overlapping intervals in various ways.
?
求區(qū)域之間的交集,可以用來(lái)注釋peak,計(jì)算reads比對(duì)到的基因組區(qū)域
不同樣品的peak之間的peak重疊情況。
?
window Find overlapping intervals within a window around an interval.
closest Find the closest, potentially non-overlapping interval.
?
尋找最近但可能不重疊的區(qū)域
?
coverage Compute the coverage over defined intervals.
?
計(jì)算區(qū)域覆蓋度
?
map Apply a function to a column for each overlapping interval.
genomecov Compute the coverage over an entire genome.
merge Combine overlapping/nearby intervals into a single interval.
?
合并重疊或相接的區(qū)域
?
cluster Cluster (but don't merge) overlapping/nearby intervals.
complement Extract intervals _not_ represented by an interval file.
?
獲得互補(bǔ)區(qū)域
?
subtract Remove intervals based on overlaps b/w two files.
?
計(jì)算區(qū)域差集
?
slop Adjust the size of intervals.
?
調(diào)整區(qū)域大小,如獲得轉(zhuǎn)錄起始位點(diǎn)上下游3 K的區(qū)域
?
flank Create new intervals from the flanks of existing intervals.
?
sort Order the intervals in a file.
?
排序,部分命令需要排序過(guò)的bed文件
?
random Generate random intervals in a genome.
?
獲得隨機(jī)區(qū)域,作為背景集
?
shuffle Randomly redistrubute intervals in a genome.
?
根據(jù)給定的bed文件獲得隨機(jī)區(qū)域,作為背景集
?
sample Sample random records from file using reservoir sampling.
spacing Report the gap lengths between intervals in a file.
annotate Annotate coverage of features from multiple files.
?
[ Multi-way file comparisons ]
?
multiinter Identifies common intervals among multiple interval files.
unionbedg Combines coverage intervals from multiple BEDGRAPH files.
?
[ Paired-end manipulation ]
?
pairtobed Find pairs that overlap intervals in various ways.
pairtopair Find pairs that overlap other pairs in various ways.
?
[ Format conversion ]
?
bamtobed Convert BAM alignments to BED (& other) formats.
bedtobam Convert intervals to BAM records.
bamtofastq Convert BAM records to FASTQ records.
bedpetobam Convert BEDPE intervals to BAM records.
bed12tobed6 Breaks BED12 intervals into discrete BED6 intervals.
?
[ Fasta manipulation ]
?
getfasta Use intervals to extract sequences from a FASTA file.
?
提取給定位置的FASTA序列
?
maskfasta Use intervals to mask sequences from a FASTA file.
nuc Profile the nucleotide content of intervals in a FASTA file.
?
[ BAM focused tools ]
?
multicov Counts coverage from multiple BAMs at specific intervals.
tag Tag BAM alignments based on overlaps with interval files.
?
[ Statistical relationships ]
?
jaccard Calculate the Jaccard statistic b/w two sets of intervals.
?
計(jì)算數(shù)據(jù)集相似性
?
reldist Calculate the distribution of relative distances b/w two files.
fisher Calculate Fisher statistic b/w two feature files.
?
[ Miscellaneous tools ]
?
overlap Computes the amount of overlap from two intervals.
igv Create an IGV snapshot batch script.
?
用于生成一個(gè)腳本,批量捕獲IGV截圖
?
links Create a HTML page of links to UCSC locations.
?
makewindows Make interval "windows" across a genome.
?
把給定區(qū)域劃分成指定大小和間隔的小區(qū)間 (bin)
?
groupby Group by common cols. & summarize oth. cols. (~ SQL "groupBy")
?
分組結(jié)算,不只可以用于bed文件。
?
expand Replicate lines based on lists of values in columns.
split Split a file into multiple files with equal records or base pairs
- stdin 和 - 用法和xargs 不一樣
image.png
Part1:Genome arithmetic
1.intersect
可以計(jì)算兩個(gè)或者多個(gè)BED/BAM/VCF/GFF文件中基因組坐標(biāo)位置的交集(overlap),根據(jù)參數(shù)不同,可以得到不同的結(jié)果。
求區(qū)域之間的交集,可以用來(lái)注釋peak,計(jì)算reads比對(duì)到的基因組區(qū)域, 不同樣品的peak之間的peak重疊情況

- Usage:
bedtools intersect [OPTIONS] -a <FILE> \
-b <FILE1, FILE2, ..., FILEN>
- 計(jì)算兩個(gè)bed交集區(qū)域
$ cat A.bed
chr1 10 20
chr1 30 40
$ cat B.bed
chr1 15 20
$ bedtools intersect -a A.bed -b B.bed
chr1 15 20
2.window
與bedtools intersect相似,窗口搜索A和B中的重疊特性。然而,窗口添加了A中每個(gè)特性上游和下游的指定大小(默認(rèn)為1000)的堿基對(duì)。

- Usage:
bedtools window [OPTIONS] [-a|-abam] -b <BED/GFF/VCF></pre>
- 相當(dāng)于將A區(qū)域進(jìn)行擴(kuò)增一定區(qū)間,再和B比較。
$ cat A.bed
chr1 100 200
?
$ cat B.bed
chr1 500 1000
chr1 1300 2000
?
$ bedtools window -a A.bed -b B.bed
chr1 100 200 chr1 500 1000</pre>
3.closest
類似于相交,最接近的搜索是在A和B中重疊的特征。如果B中沒(méi)有一個(gè)特征與A中的當(dāng)前特征重疊,則最接近將報(bào)告最接近的特征(即距A的起點(diǎn)或終點(diǎn)最小的基因組距離)。 例如,人們可能想找到哪個(gè)是與顯著GWAS多態(tài)性最接近的基因。 請(qǐng)注意,最接近將報(bào)告重疊特征為最接近-即,它不限于最接近的非重疊特征。 以下標(biāo)志性的“備忘單”總結(jié)了最接近的工具提供的各種選項(xiàng)提供的功能。
尋找最近但可能不重疊的區(qū)域: 簡(jiǎn)單來(lái)說(shuō)就是從A.bed 中找到B.bed 中最近的區(qū)間,比如修飾一個(gè)基因,最近的enhancer 位置.

- Usage:
bedtools closest [OPTIONS] -a <FILE> \
-b <FILE1, FILE2, ..., FILEN></pre>
- 比如:
$ cat a.bed
chr1 10 20 a1 1 -
?
$ cat b.bed
chr1 7 8 b1 1 -
chr1 15 25 b2 2 +
?
$ bedtools closest -a a.bed -b b.bed
chr1 10 20 a1 1 - chr1 15 25 b2 2 +</pre>
4.coverage
計(jì)算區(qū)域覆蓋度:計(jì)算一個(gè)bed下,每一個(gè)region 和另一個(gè)bed 文件的交集。
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
?
BED FILE A *************** *************** ****** **************
?
BED File B ^^^^ ^^^^ ^^ ^^^^^^^^^ ^^^ ^^ ^^^^
^^^^^^^^ ^^^^^ ^^^^^ ^^
?
Result [ N=3, 10/15 ] [ N=1, 2/15 ] [N=1,6/6] [N=6, 12/14 ]</pre>
- Usage:
bedtools coverage [OPTIONS] -a <FILE> \
-b <FILE1, FILE2, ..., FILEN></pre>
- 例子:統(tǒng)計(jì)A區(qū)間每一個(gè)元素和B文件交集情況
$ cat A.bed
chr1 0 100
chr1 100 200
chr2 0 100
?
$ cat B.bed
chr1 10 20
chr1 20 30
chr1 30 40
chr1 100 200
?
$ bedtools coverage -a A.bed -b B.bed
chr1 0 100 3 30 100 0.3000000
chr1 100 200 1 100 100 1.0000000
chr2 0 100 0 0 100 0.0000000</pre>
5.map
為每個(gè)重疊間隔對(duì)列應(yīng)用一個(gè)函數(shù)。

- Usage:
bedtools map [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf></pre>
- 例子:和A 區(qū)間有重疊的B中元素作為一個(gè)分組,運(yùn)用函數(shù)計(jì)算最大值或者平均值等等。
$ cat a.bed
chr1 10 20 a1 1 +
chr1 50 60 a2 2 -
chr1 80 90 a3 3 -
?
$ cat b.bed
chr1 12 14 b1 2 +
chr1 13 15 b2 5 -
chr1 16 18 b3 5 +
chr1 82 85 b4 2 -
chr1 85 87 b5 3 +
?
$ bedtools map -a a.bed -b b.bed
chr1 10 20 a1 1 + 12
chr1 50 60 a2 2 - .
chr1 80 90 a3 3 - 5
6.genomecov
染色體和全基因組覆蓋度計(jì)算

- Usage:
bedtools genomecov [OPTIONS] [-i|-ibam] -g (iff. -i)</pre>
- example:
$ cat A.bed
chr1 10 20
chr1 20 30
chr2 0 500
?
$ cat my.genome
chr1 1000
chr2 500
?
$ bedtools genomecov -i A.bed -g my.genome
chr1 0 980 1000 0.98
chr1 1 20 1000 0.02
chr2 1 500 500 1
genome 0 980 1500 0.653333
genome 1 520 1500 0.346667
# name 覆蓋次數(shù) 覆蓋堿基數(shù) 總堿基數(shù) 覆蓋度
# 同時(shí)計(jì)算單染色體和全基因組覆蓋度</pre>
7.merge
http://www.itdecent.cn/p/2efcb6f8f55d
許多數(shù)據(jù)集的基因組feature坐標(biāo)經(jīng)常是連續(xù)的(比如ChIPseq的結(jié)果),就像下圖的藍(lán)色部分
于是可以把這些連續(xù)的基因組小區(qū)間連接起來(lái),拼成一個(gè)連續(xù)的大區(qū)間

- Usage:
bedtools merge [OPTIONS] -i <BED/GFF/VCF/BAM></pre>
- 例子:
需要拼接的輸入文件(bed/gff/vcf)必須是排序(sort)過(guò)的【不sort會(huì)報(bào)錯(cuò)???♂?】
先按染色體,再按起始位點(diǎn),這樣保證merge的算法執(zhí)行起來(lái)非常順暢,而基本不需要消耗內(nèi)存再次加工
以exon.bed為例,展示merge的作用
注意看:第3行和第4行的區(qū)間是有重疊的,因此它們可以進(jìn)行merge

bedtools merge -i exons.bed | head -10</pre>
可以看到,merge之后原來(lái)的第3行(13220 - 14409)和第4行(14361 - 14829)坐標(biāo)合并成了13220 - 14829
[圖片上傳失敗...(image-de7f16-1586285151489)]
8.cluster
類似merge 功能,合并重疊及其靠近的區(qū)間

- Usage:
bedtools cluster [OPTIONS] -i <BED/GFF/VCF></pre>
- example: 默認(rèn)情況下合并有重疊的區(qū)域(1bp)
$ cat A.bed
chr1 100 200
chr1 180 250
chr1 250 500
chr1 501 1000
?
$ bedtools cluster -i A.bed
chr1 100 200 1
chr1 180 250 1
chr1 250 500 1
chr1 501 1000 2
9.complement
http://www.itdecent.cn/p/2efcb6f8f55d
bedtools complement 實(shí)現(xiàn)反選.
給定一個(gè)feature坐標(biāo)信息文件,我們?nèi)绻魂P(guān)心其中標(biāo)記的區(qū)間,而是想看看有哪些區(qū)間不在這個(gè)文件中.
例如,有一個(gè)ChIP-seq的peaks信息,現(xiàn)在想知道有哪些區(qū)域沒(méi)有被抗體結(jié)合,就可以用complement

- Usage:
bedtools complement -i <BED/GFF/VCF> -g <GENOME></pre>
-
example:
現(xiàn)在有了外顯子的bed文件,通過(guò)反選,我們就能獲得內(nèi)含子或基因間區(qū)的坐標(biāo)。那么既然是反選,除了exon.bed,還要有一個(gè)總體的范圍(也就是基因組各個(gè)染色體的長(zhǎng)度信息)【用
-g指定】
# 首先要確定整體的操作范圍
head -10 genome.txt
?
chr1 249250621
chr10 135534747
chr11 135006516
chr11_gl000202_random 40103
chr12 133851895
chr13 115169878
chr14 107349540
chr15 102531392
chr16 90354753
chr17 81195210
?
# 實(shí)現(xiàn)反選,選出不屬于exon的區(qū)域
bedtools complement -i exons.bed -g genome.txt | head -10
?
chr1 0 11873
chr1 12227 12612
chr1 12721 13220
chr1 14829 14969
chr1 15038 15795
chr1 15947 16606
chr1 16765 16857
chr1 17055 17232
chr1 17368 17605
chr1 17742 17914</pre>
10 .subtract
http://www.itdecent.cn/p/cb079a393661
從A中去掉B

- Usage:
bedtools subtract [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF></pre>
- example:
$ cat A.bed
chr1 10 20
chr1 100 200
?
$ cat B.bed
chr1 0 30
chr1 180 300
?
$ bedtools subtract -a A.bed -b B.bed
chr1 100 180</pre>
- 比如提取TSS ±2kb 區(qū)域外,peak 區(qū)間.為計(jì)算ROC做準(zhǔn)備.
具體代碼:https://github.com/Helab-bioinformatics/itChIP/blob/master/07_ROC.R
11.slop
awk '{OFS="\t" print $1,$2-<slop>,$3+<slop>}'類似.
bedtools slop將限制染色體的大小(即沒(méi)有不能小于 0和結(jié)束不能大于染色體長(zhǎng)度)。

- Usage:
bedtools slop [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]</pre>
- example :
$ cat A.bed
chr1 5 100
chr1 800 980
?
$ cat my.genome
chr1 1000
?
$ bedtools slop -i A.bed -g my.genome -b 5
chr1 0 105
chr1 795 985
?
$ bedtools slop -i A.bed -g my.genome -l 2 -r 3
chr1 3 103
chr1 798 983
12.flank
將區(qū)間進(jìn)行左右兩側(cè)擴(kuò)充.

- Usage:
bedtools flank [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> [-b or (-l and -r)]</pre>
- example:
$ cat A.bed
chr1 100 200
chr1 500 600
?
$ cat my.genome
chr1 1000
?
$ bedtools flank -i A.bed -g my.genome -b 5
chr1 95 100
chr1 200 205
chr1 495 500
chr1 600 605
?
$ bedtools flank -i A.bed -g my.genome -l 2 -r 3
chr1 98 100
chr1 200 203
chr1 498 500
chr1 600 603</pre>
13. sort
排序,部分命令需要排序過(guò)的bed文件
- Usage:
bedtools sort [OPTIONS] -i <BED/GFF/VCF></pre>
- example:
cat A.bed
chr1 800 1000
chr1 80 180
chr1 1 10
chr1 750 10000
?
sortBed -i A.bed
chr1 1 10
chr1 80 180
chr1 750 10000
chr1 800 1000</pre>
14.random
bedtools random將以BED6格式生成一組隨機(jī)的間隔??梢灾付☉?yīng)該生成的間隔的數(shù)目(-n)和大小(-l)。

- Usage:
bedtools random [OPTIONS] -g <GENOME></pre>
- example:
$ bedtools random -g hg19.genome
chr2 87536758 87536858 1 100 -
chrX 46051735 46051835 2 100 +
chr18 5237041 5237141 3 100 -
chr12 45809998 45810098 4 100 +
chrX 42034890 42034990 5 100 -
chr10 77510935 77511035 6 100 -
chr3 39844278 39844378 7 100 -
chr6 101012700 101012800 8 100 +
chr12 38123482 38123582 9 100 +
chr7 88508598 88508698 10 100 -
?
$ bedtools random -g hg19.genome
chr3 141987850 141987950 1 100 +
chr5 137643331 137643431 2 100 +
chr2 155523858 155523958 3 100 -
chr5 147874094 147874194 4 100 +
chr1 71838335 71838435 5 100 -
chr8 71154323 71154423 6 100 -
chr2 133240474 133240574 7 100 +
chr9 131495427 131495527 8 100 +
chrX 125952943 125953043 9 100 +
chr3 59685545 59685645 10 100 +</pre>
15.shuffle
bedtools shuffle將在基因組文件中定義的基因組中,隨機(jī)排列bed文件中區(qū)域在基因組位置。
根據(jù)給定的bed文件獲得隨機(jī)區(qū)域,作為背景集

- Usage:
bedtools shuffle [OPTIONS] -i <BED/GFF/VCF> -g <GENOME></pre>
- example: 默認(rèn)情況下,bedtools shuffle將在隨機(jī)染色體上的隨機(jī)位置上重新定位輸入BED文件中的每個(gè)特性。每個(gè)特征的大小和鏈被保留。
$ cat A.bed
chr1 0 100 a1 1 +
chr1 0 1000 a2 2 -
$ cat my.genome
chr1 10000
chr2 8000
chr3 5000
chr4 2000
$ bedtools shuffle -i A.bed -g my.genome
chr4 1498 1598 a1 1 +
chr3 2156 3156 a2 2 -</pre>
16.sample
從文件中隨機(jī)取樣記錄.
Summary: Take sample of input file(s) using reservoir sampling algorithm.
Usage: bedtools sample [OPTIONS] -i <bed/gff/vcf/bam>
WARNING: The current sample algorithm will hold all requested sample records in memory prior to output. The user must ensure that there is adequate memory for this.
17.spacing
報(bào)告文件中間隔之間的間隔長(zhǎng)度
Summary: Report (last col.) the gap lengths between intervals in a file.
Usage: bedtools spacing [OPTIONS] -i <bed/gff/vcf/bam>
-bed If using BAM input, write output as BED.
-header Print the header from the A file prior to results.
-nobuf Disable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time.
-iobuf Specify amount of memory to use for input buffer. Takes an integer argument. Optional suffixes K/M/G supported. Note: currently has no effect with compressed files.
Notes: (1) Input must be sorted by chrom,start (sort -k1,1 -k2,2n for BED). (2) The 1st element for each chrom will have NULL distance. ("."). (3) Distance for overlapping intervals is -1 and 0 for adjacent intervals.
Example:
$ cat test.bed
chr1 0 10
chr1 10 20
chr1 19 30
chr1 35 45
chr1 100 200 </pre>
? $ bedtools spacing -i test.bed ? chr1 0 10 . ? chr1 10 20 0 ? chr1 19 30 -1 ? chr1 35 45 5 ? chr1 100 200 55
18.annotate
bedtools可以對(duì)一個(gè)BED / VCF / GFF文件進(jìn)行注釋,并具有從多個(gè)其他BED / VCF / GFF文件中觀察到的覆蓋范圍和重疊數(shù)。 通過(guò)這種方式,它允許人們通過(guò)一個(gè)命令詢問(wèn)一個(gè)feature與其他多個(gè)feature類型的重合程度。
- Usage
bedtools annotate [OPTIONS] -i <BED/GFF/VCF> -files FILE1 FILE2 FILE3 ... FILEn</pre>
- 計(jì)算輸入的bed文件和其他多個(gè)文件交集個(gè)數(shù).
chr1 100 200 nasty 1 -
chr2 500 1000 ugly 2 +
chr3 1000 5000 big 3 -
$ cat genes.bed
chr1 150 200 geneA 1 +
chr1 175 250 geneB 2 +
chr3 0 10000 geneC 3 -
$ cat conserve.bed
chr1 0 10000 cons1 1 +
chr2 700 10000 cons2 2 -
chr3 4000 10000 cons3 3 +
$ cat known_var.bed
chr1 0 120 known1 -
chr1 150 160 known2 -
chr2 0 10000 known3 +
$ bedtools annotate -counts -i variants.bed -files genes.bed conserve.bed known_var.bed
chr1 100 200 nasty 1 - 2 1 2
chr2 500 1000 ugly 2 + 0 1 1
chr3 1000 5000 big 3 - 1 1 0</pre>
Part2:Multi-way file comparisons
19. multiinter
標(biāo)識(shí)多個(gè)bed文件之間的公共區(qū)間。
Summary: Identifies common intervals among multiple BED/GFF/VCF files.
Usage: bedtools multiinter [OPTIONS] -i FILE1 FILE2 .. FILEn Requires that each interval file is sorted by chrom/start.
20.unionbedg
unionbedg將多個(gè)BEDGRAPH文件組合成單個(gè)文件,這樣就可以直接比較多個(gè)樣本的覆蓋率(如基因型)
- Usage:
bedtools unionbedg [OPTIONS] -i FILE1 FILE2 FILE3 ... FILEn</pre>
- example:
cat 1.bg
chr1 1000 1500 10
chr1 2000 2100 20
cat 2.bg
chr1 900 1600 60
chr1 1700 2050 50
cat 3.bg
chr1 1980 2070 80
chr1 2090 2100 20
cat sizes.txt
chr1 5000
bedtools unionbedg -i 1.bg 2.bg 3.bg
chr1 900 1000 0 60 0
chr1 1000 1500 10 60 0
chr1 1500 1600 0 60 0
chr1 1700 1980 0 50 0
chr1 1980 2000 0 50 80
chr1 2000 2050 20 50 80
chr1 2050 2070 20 0 80
chr1 2070 2090 20 0 0
chr1 2090 2100 20 0 20</pre>
Part3 : Paired-end manipulation
21. pairtobed
找出以各種方式重疊區(qū)間的對(duì)
Summary: Report overlaps between a BEDPE file and a BED/GFF/VCF file.
Usage: bedtools pairtobed [OPTIONS] -a <bedpe> -b <bed/gff/vcf>
22.pairtopair
找出以各種方式重疊的配對(duì)。
pairToPair比較兩個(gè)BEDPE文件以查找重疊,其中A中BEDPE特征的每個(gè)末端與B中特征的末端重疊。例如,使用pairToPair,可以在兩個(gè)文件中篩選出完全相同的不一致雙末端對(duì)齊方式。 這可能表明(除其他事項(xiàng)外)不一致的對(duì)表明每個(gè)文件/樣本中的結(jié)構(gòu)都相同。
- Usage:
pairToPair [OPTIONS] -a <BEDPE> -b <BEDPE></pre>
- example: 默認(rèn)情況下,如果兩端都與BEDPE B文件中的特征重疊,則將報(bào)告A中的BEDPE特征。 如果存在兩個(gè)BEDPE文件的鏈信息,則進(jìn)一步要求兩端的重疊都在同一鏈上。 這樣,原本重疊的(就基因組位置而言)F / R比對(duì)將不與R / R比對(duì)匹配。
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BEDPE A >>>>>.................................>>>>>
BEDPE B <<<<<.............................>>>>>
Result
BEDPE A >>>>>.................................>>>>>
BEDPE B >>>>>.............................>>>>>
Result >>>>>.................................>>>>></pre>
Part 4 : Format conversion
23.bamtobed
bedtools bamtobed是一種比較實(shí)用轉(zhuǎn)換程序,可將BAM格式的序列比對(duì)轉(zhuǎn)換為BED,BED12和/或BEDPE記錄。保存了位置信息,同時(shí)節(jié)約了空間.
- Usage
bedtools bamtobed [OPTIONS] -i <BAM></pre>
- 比如將bam 轉(zhuǎn)換成bed6 格式
$ bedtools bamtobed -i reads.bam | head -3
chr7 118970079 118970129 TUPAC_0001:3:1:0:1452#0/1 37 -
chr7 118965072 118965122 TUPAC_0001:3:1:0:1452#0/2 37 +
chr11 46769934 46769984 TUPAC_0001:3:1:0:1472#0/1 37 -
24.bedtobam
將bed 轉(zhuǎn)換成bam 文件
- Usage:
bedToBam [OPTIONS] -i <BED/GFF/VCF> -g <GENOME> > <BAM></pre>
- example:
head -5 rmsk.hg18.chr21.bed
chr21 9719768 9721892 ALR/Alpha 1004 +
chr21 9721905 9725582 ALR/Alpha 1010 +
chr21 9725582 9725977 L1PA3 3288 +
chr21 9726021 9729309 ALR/Alpha 1051 +
chr21 9729320 9729809 L1PA3 3897 -
bedToBam -i rmsk.hg18.chr21.bed -g human.hg18.genome > rmsk.hg18.chr21.bam
samtools view rmsk.hg18.chr21.bam | head -5
ALR/Alpha 0 chr21 9719769 255 2124M * 0 0 * *
ALR/Alpha 0 chr21 9721906 255 3677M * 0 0 * *
L1PA3 0 chr21 9725583 255 395M * 0 0 * *
ALR/Alpha 0 chr21 9726022 255 3288M * 0 0 * *
L1PA3 16 chr21 9729321 255 489M * 0 0 * *</pre>
25.bamtofastq
bedtools bamtofastq是一個(gè)轉(zhuǎn)換工具,用于從BAM格式的序列比對(duì)中提取FASTQ記錄。
- Usage
bedtools bamtofastq [OPTIONS] -i <BAM> -fq <FASTQ>
- 比如下面例子
$ bedtools bamtofastq -i NA18152.bam -fq NA18152.fq
$ head -8 NA18152.fq
@NA18152-SRR007381.35051
GGAGACATATCATATAAGTAATGCTAGGGTGAGTGGTAGGAAGTTTTTTCATAGGAGGTGTATGAGTTGGTCGTAGCGGAATCGGGGGTATGCTGTTCGAATTCATAAGAACAGGGAGGTTAGAAGTAGGGTCTTGGTGACAAAATATGTTGTATAGAGTTCAGGGGAGAGTGCGTCATATGTTGTTCCTAGGAAGATTGTAGTGGTGAGGGTGTTTATTATAATAATGTTTGTGTATTCGGCTATGAAGAATAGGGCGAAGGGGCCTGCGGCGTATTCGATGTTGAAGCCTGAGACTAGTTCGGACTCCCCTTCGGCAAGGTCGAA
+
<<<;;<;<;;<;;;;;;;;;;;;<<<:;;;;;;;;;;;;;;;;::::::;;;;<<;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<<<;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<;;;;;:;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<;;;;;;;;;;<<<<<<<<;;;;;;;;;:;;;;;;;;;;;;;;;;;;;:;;;;8;;8888;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;8966689666666299866669:899
@NA18152-SRR007381.637219
AATGCTAGGGTGAGTGGTAGGAAGTTTTTTCATAGGAGGTGTATGAGTTGGTCGTAGCGGAATCGGGGGTATGCTGTTCGAATTCATAAGAACAGGGAGGTTAGAAGTAGGGTCTTGGTGACAAAATATGTTGTATAGAGTTCAGGGGAGAGTGCGTCATATGTTGTTCCTAGGAAGATTGTAGTGGTGAGGGTGTTTATTATAATAATGTTTGTGTATTCGGCTATGAAGAATAGGGCGAAGGGGCCTGCGGCGTATTCGATGTTGAAGCCTGAGACTAGTTCGGACTCCCCTTCCGGCAAGGTCGAA
+
<<<<<<<<<<;;<;<;;;;<<;<888888899<;;;;;;<;;;;;;;;;;;;;;;;;;;;;;;;<<<<<;;;;;;;;;<;<<<<<;;;;;;;;;;;;;<<<<;;;;;;;:::;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;<<<<;;;;;;;;;;;;;;;;;;;;;;;<;;;;;;;;;;;;;;;;;;;;;;<888<;<<;;;;<<<<<<;;;;;<<<<<<<<;;;;;;;;;:;;;;888888899:::;;8;;;;;;;;;;;;;;;;;;;99;;99666896666966666600;96666669966</pre>
26.bedpetobam
Summary: Converts feature records to BAM format.
Usage: bedpetobam [OPTIONS] -i <bed/gff/vcf> -g <genome>
Options: -mapq Set the mappinq quality for the BAM records. (INT) Default: 255
-ubam Write uncompressed BAM output. Default writes compressed BAM.
Notes: (1) BED files must be at least BED4 to create BAM (needs name field).
27.bed12tobed6
bed12ToBed6是一種方便的工具,它可以將BED12中的bed特征(即堆疊型的bed特征,如基因)轉(zhuǎn)換為離散的bed特征。例如,對(duì)于一個(gè)有六個(gè)外顯子的基因,bed12ToBed6會(huì)產(chǎn)生六個(gè)單獨(dú)的BED6特征(即,每個(gè)外顯子對(duì)應(yīng)一個(gè))。
- Usage:
bed12ToBed6 [OPTIONS] -i <BED12></pre>
- 基因多個(gè)外顯子轉(zhuǎn)換成每一個(gè)外顯子單獨(dú)一行
head data/knownGene.hg18.chr21.bed | tail -n 3
chr21 10079666 10120808 uc002yiv.1 0 - 10081686 1 0 1 2 0 6 0 8 0 4 528,91,101,215, 0,1930,39750,40927,
chr21 10080031 10081687 uc002yiw.1 0 - 10080031 1 0 0 8 0 0 3 1 0 2 200,91, 0,1565,
chr21 10081660 10120796 uc002yix.2 0 - 10081660 1 0 0 8 1 6 6 0 0 3 27,101,223,0,37756,38913,
head data/knownGene.hg18.chr21.bed | tail -n 3 | bed12ToBed6 -i stdin
chr21 10079666 10080194 uc002yiv.1 0 -
chr21 10081596 10081687 uc002yiv.1 0 -
chr21 10119416 10119517 uc002yiv.1 0 -
chr21 10120593 10120808 uc002yiv.1 0 -
chr21 10080031 10080231 uc002yiw.1 0 -
chr21 10081596 10081687 uc002yiw.1 0 -
chr21 10081660 10081687 uc002yix.2 0 -
chr21 10119416 10119517 uc002yix.2 0 -
chr21 10120573 10120796 uc002yix.2 0 -</pre>
Part5 : Fasta manipulation
28.getfasta
http://www.itdecent.cn/p/6c3b87301491
根據(jù)坐標(biāo)區(qū)域來(lái)從基因組里面提取fasta序列

- Usage
$ bedtools getfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF></pre>
- example:
參考:# BED/GFF/VCF +reference --> fasta
bedtools getfasta -fi ~/biosoft/bowtie/hg19_index/hg19.fa -bed ../macs14_results/highQuality_summits.bed -fo highQuality.fa
bedtools getfasta -fi ~/biosoft/bowtie/hg19_index/hg19.fa -bed ../macs14_results/highQuality_peaks.bed -fo highQuality.fa
腳本里面用的是bed格式來(lái)記錄坐標(biāo)區(qū)域,參考基因組用-fi參數(shù)指定具體位置,輸出的fasta序列文件用-fo參數(shù)指定
tips: 有用的三個(gè)參數(shù)

-s Force strandedness. If the feature occupies the antisense,
strand, the sequence will be reverse complemented.
- By default, strand information is ignored.
-name Use the name field for the FASTA header
-tab Write output in TAB delimited format.
- Default is FASTA format.
-s : 當(dāng)提取數(shù)據(jù),區(qū)分正負(fù)鏈時(shí)候,需要添加-s 參數(shù),自動(dòng)提取此區(qū)間負(fù)鏈反向
的序列(方向互補(bǔ))
-name : 提取bed文件第四列區(qū)間name,為fasta 文件輸出注釋行.
-tab : fasta文件名稱和序列以tab 分割。
29.maskfasta
和getfasta 相反,屏蔽區(qū)間.

- Usage
$ bedtools maskfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA></pre>
- example:
$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
$ cat test.bed
chr1 5 10
$ bedtools maskfasta -fi test.fa -bed test.bed -fo test.fa.out
$ cat test.fa.out
>chr1
AAAAANNNNNCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG</pre>
30.nuc
分析FASTA文件中,bed 區(qū)間對(duì)應(yīng)的核苷酸含量
Summary: Profiles the nucleotide content of intervals in a fasta file.
Usage: bedtools nuc [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Part6 : BAM focused tools
31.multicov
http://www.itdecent.cn/p/6c3b87301491
提供的每個(gè)bed間隔,它報(bào)告來(lái)自每個(gè)BAM文件的重疊對(duì)齊的單獨(dú)計(jì)數(shù)。類似功能 featurecount/deeptools multiBamSummary
- Usage:
bedtools multicov [OPTIONS] -bams BAM1 BAM2 BAM3 ... BAMn -bed <BED/GFF/VCF></pre>
- example:
對(duì)RNA-seq的比對(duì)文件中的比對(duì)到各個(gè)基因的reads進(jìn)行計(jì)數(shù)。**
# 例子:
bedtools multicov -bams aln1.bam aln2.bam aln3.bam -bed ivls-of-interest.bed
# ivls-of-interest.bed這個(gè)文件是必須的,可能需要自己制作,其實(shí)用gtf文件也可以的,如下:
chr1 0 10000 ivl1
chr1 10000 20000 ivl2
chr1 20000 30000 ivl3
chr1 30000 40000 ivl4
輸出結(jié)果前三列是坐標(biāo),第四列是基因名,跟我們的bed文件一樣,只是最后三列是三個(gè)樣本的計(jì)數(shù),是添加上來(lái)的!
chr1 0 10000 ivl1 100 2234 0
chr1 10000 20000 ivl2 123 3245 1000
chr1 20000 30000 ivl3 213 2332 2034
chr1 30000 40000 ivl4 335 7654 0</pre>
32. tag
Tag BAM alignments based on overlaps with interval files.
Summary: Annotates a BAM file based on overlaps with multiple BED/GFF/VCF files on the intervals in -i.
Usage: bedtools tag [OPTIONS] -i <BAM> -files FILE1 .. FILEn -labels LAB1 .. LABn
Part 7 :Statistical relationships
33.jaccard
檢測(cè)兩個(gè)數(shù)據(jù)之間的相關(guān)性
引入一個(gè)新的bedtools工具jaccard,它會(huì)計(jì)算一個(gè)杰卡德相似性系數(shù)
結(jié)果是0.0 to 1. 0的值,數(shù)越小相關(guān)性越小
# 檢測(cè)同一個(gè)樣本的不同數(shù)據(jù)【系數(shù)是0.50637】
bedtools jaccard \
-a fHeart-DS16621.hotspot.twopass.fdr0.05.merge.bed \
-b fHeart-DS15839.hotspot.twopass.fdr0.05.merge.bed
intersection union-intersection jaccard n_intersections
81269248 160493950 0.50637 130852
# 再看不同的樣本的不同數(shù)據(jù)【系數(shù)是0.170995】
bedtools jaccard \
-a fHeart-DS16621.hotspot.twopass.fdr0.05.merge.bed \
-b fSkin_fibro_bicep_R-DS19745.hg19.hotspot.twopass.fdr0.05.merge.bed
intersection union-intersection jaccard n_intersections
28076951 164197278 0.170995 73261</pre>
##### 另外,還能分析更多的樣本之間相關(guān)性
這個(gè)就看(官網(wǎng)[http://quinlanlab.org/tutorials/bedtools/bedtools.html](https://links.jianshu.com/go?to=http%3A%2F%2Fquinlanlab.org%2Ftutorials%2Fbedtools%2Fbedtools.html))翻到最底部

34.reldist
計(jì)算兩個(gè)文件的相對(duì)距離

總結(jié)兩組基因組區(qū)間之間相似性的傳統(tǒng)方法是基于相交區(qū)間的數(shù)量或比例。 但是,這種測(cè)量很大程度上看不到兩組之間的空間相關(guān)性,盡管間距或鄰近度一致,但相交很少見(jiàn)(例如,增強(qiáng)子和轉(zhuǎn)錄起始位點(diǎn)很少重疊,但與兩組隨機(jī)數(shù)相比,它們彼此之間的距離更近 間隔)。 Favorov等人[1]提出了一種相對(duì)距離度量標(biāo)準(zhǔn),該度量描述了一組中每個(gè)間隔與另一組中兩個(gè)最近間隔之間的相對(duì)距離分布(請(qǐng)參見(jiàn)上圖)。 如果兩組之間沒(méi)有空間相關(guān)性,則可以期望相對(duì)距離在0到0.5的相對(duì)距離之間均勻分布。 但是,如果間隔趨于比偶然預(yù)期的要近得多,則觀察到的相對(duì)距離的分布將朝較低的相對(duì)距離值(例如,下圖)移動(dòng)。

[1] Exploring Massive, Genome Scale Datasets with the GenometriCorr Package.
Favorov A, Mularoni L, Cope LM, Medvedeva Y, Mironov AA, et al. (2012)
PLoS Comput Biol 8(5): e1002529\. doi:10.1371/journal.pcbi.1002529</pre>
Usage:
bedtools reldist [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF></pre>
example:
$ bedtools reldist \
-a data/refseq.chr1.exons.bed.gz \
-b data/
aluY.chr1.bed.gz
0.00 164 43408 0.004
0.01 551 43408 0.013
0.02 598 43408 0.014
0.03 637 43408 0.015
0.04 793 43408 0.018
0.05 688 43408 0.016
0.06 874 43408 0.020
0.07 765 43408 0.018
0.08 685 43408 0.016
0.09 929 43408 0.021
0.10 876 43408 0.020
0.11 959 43408 0.022
0.12 860 43408 0.020
0.13 851 43408 0.020
0.14 903 43408 0.021
0.15 893 43408 0.021
0.16 883 43408 0.020
0.17 828 43408 0.019
0.18 917 43408 0.021
0.19 875 43408 0.020
0.20 897 43408 0.021
0.21 986 43408 0.023
0.22 903 43408 0.021
0.23 944 43408 0.022
0.24 904 43408 0.021
0.25 867 43408 0.020
0.26 943 43408 0.022
0.27 933 43408 0.021
0.28 1132 43408 0.026
0.29 881 43408 0.020
0.30 851 43408 0.020
0.31 963 43408 0.022
0.32 950 43408 0.022
0.33 965 43408 0.022
0.34 907 43408 0.021
0.35 884 43408 0.020
0.36 965 43408 0.022
0.37 944 43408 0.022
0.38 911 43408 0.021
0.39 939 43408 0.022
0.40 921 43408 0.021
0.41 950 43408 0.022
0.42 935 43408 0.022
0.43 919 43408 0.021
0.44 915 43408 0.021
0.45 934 43408 0.022
0.46 843 43408 0.019
0.47 850 43408 0.020
0.48 1006 43408 0.023
0.49 937 43408 0.022</pre>
35.fisher
對(duì)2個(gè)文件之間的重疊/唯一區(qū)間進(jìn)行費(fèi)舍爾的精確測(cè)試。
Given a pair of input files -a and -b in the usual BedTools parlance:
$ cat a.bed
chr1 10 20
chr1 30 40
chr1 51 52
$ cat b.bed
chr1 15 25
chr1 51 52</pre>
And a genome of 500 bases:
$ echo -e "chr1\t500" > t.genome</pre>
We may wish to know **if the amount of overlap between the 2 sets of intervals is more than we would expect given their coverage and the size of the genome**. We can do this with `fisher` as:
$ bedtools fisher -a a.bed -b b.bed -g t.genome
# Number of query intervals: 3
# Number of db intervals: 2
# Number of overlaps: 2
# Number of possible intervals (estimated): 37
# phyper(2 - 1, 3, 37 - 3, 2, lower.tail=F)
# Contingency Table Of Counts
#_________________________________________
# | in -b | not in -b |
# in -a | 2 | 1 |
# not in -a | 0 | 34 |
#_________________________________________
# p-values for fisher's exact test
left right two-tail ratio
1 0.0045045 0.0045045 inf</pre>
Part8 : Miscellaneous tools
36.overlap
和intersect 功能類似,將兩個(gè)輸入文件合并了,但是通過(guò)參數(shù)來(lái)指定那幾列進(jìn)行比較
Usage:
overlap [OPTIONS] -i <input> -cols s1,e1,s2,e2</pre>
| Option | Description |
| --- | --- |
| **-i** | Input file. Use “stdin” for pipes. |
| **-cols** | Specify the columns (1-based) for the starts and ends of the features for which you’d like to compute the overlap/distance. The columns must be listed in the following order: *start1,end1,start2,end2* . |
example:
windowBed -a A.bed -b B.bed -w 10
chr1 10 20 A chr1 15 25 B
chr1 10 20 C chr1 25 35 D
## 指定2,3 列與6,7 列比較
windowBed -a A.bed -b B.bed -w 10 | overlap -i stdin -cols 2,3,6,7
chr1 10 20 A chr1 15 25 B 5
chr1 10 20 C chr1 25 35 D -5</pre>
37.igv
用于生成一個(gè)腳本,批量捕獲IGV截圖
Summary: Creates a batch script to create IGV images at each interval defined in a BED/GFF/VCF file.
Usage: bedtools igv [OPTIONS] -i <bed/gff/vcf>
38. links
創(chuàng)建一個(gè)鏈接到UCSC的HTML頁(yè)面
Usage:
linksBed [OPTIONS] -i <BED/GFF/VCF> > <HTML file></pre>
| Option | Description |
| --- | --- |
| **-base** | The “basename” for the UCSC browser. *Default: [http://genome.ucsc.edu](http://genome.ucsc.edu)* |
| **-org** | The organism (e.g. mouse, human). *Default: human* |
| **-db** | The genome build. *Default: hg18* |
example: **linksBed** creates links to the public UCSC Genome Browser.
head -3 genes.bed
chr21 9928613 10012791 uc002yip.1 0 -
chr21 9928613 10012791 uc002yiq.1 0 -
linksBed -i genes.bed -base http://mirror.uni.edu -org mouse -db mm9 > genes.html</pre>
39.makewindows
http://www.itdecent.cn/p/7d47d8074bba
把給定區(qū)域劃分成指定大小和間隔的小區(qū)間 (bin)
- 參考染色體大小文件
chrom.size
chr1 100
chr2 150
- 將染色體劃分為20bp為一個(gè)bin的區(qū)間
$ bedtools makewindows -g chrom.size -w 20
chr1 0 20
chr1 20 40
chr1 40 60
chr1 60 80
chr1 80 100
chr2 0 20
chr2 20 40
chr2 40 60
chr2 60 80
chr2 80 100
chr2 100 120
chr2 120 140
chr2 140 150
40.groupby
http://www.itdecent.cn/p/548d370b75a4
分組結(jié)算,不只可以用于bed文件。
以某一列進(jìn)行分組,運(yùn)用不同的函數(shù)
Usage
bedtools groupby [OPTIONS] -i <input> -g <group columns> -c <op. column> -o <operation>

41. expand
將某一列是逗號(hào)分隔的,差分成多行顯示.
Summary: Replicate lines in a file based on columns of comma-separated values.
Usage: bedtools expand -c [COLS] Options: -i Input file. Assumes "stdin" if omitted.
-c Specify the column (1-based) that should be summarized.
* Examples:
$ cat test.txt
chr1 10 20 1,2,3 10,20,30
chr1 40 50 4,5,6 40,50,60
$ bedtools expand test.txt -c 5
chr1 10 20 1,2,3 10
chr1 10 20 1,2,3 20
chr1 10 20 1,2,3 30
chr1 40 50 4,5,6 40
chr1 40 50 4,5,6 50
chr1 40 50 4,5,6 60
$ bedtools expand test.txt -c 4,5
chr1 10 20 1 10
chr1 10 20 2 20
chr1 10 20 3 30
chr1 40 50 4 40
chr1 40 50 5 50
chr1 40 50 6 60</pre>
42. split
將一個(gè)文件分割成多個(gè)具有相同記錄或基對(duì)的文件
Summary: Split a Bed file.
Usage: bedtools split [OPTIONS] -i <bed> -n number-of-files
Options:
-i|--input (file) BED input file (req'd).
-n|--number (int) Number of files to create (req'd).
-p|--prefix (string) Output BED file prefix.
-a|--algorithm (string) Algorithm used to split data.
so all files contain the ~ same number of bases
* simple : route records such that each split file has
approximately equal records (like Unix split).</pre>
-h|--help Print help (this screen). -v|--version Print version.
Note: This programs stores the input BED records in memory.
小結(jié):
bedtools 很強(qiáng)大,后面用到了再對(duì)細(xì)節(jié)進(jìn)行補(bǔ)充
有很多不足地方,希望大家留言指正~~~
