利用bedtools能夠快速批量的提取基因組上指定區(qū)域的序列。
1. Example:
bedtools getfasta -fi example_genome.fasta -bed example.bed -fo example.fa -name
| 文件 | 說(shuō)明 |
|---|---|
| example_genome.fasta | 基因組序列; |
| example.bed | 指定位置,bed文件前四列分別為染色體、起始位置、結(jié)束位置及命名,列之間以制表符分隔(\t),需要提取多個(gè)位置,按行分隔; |
| example.fa | 截取序列的輸出文件。 |
在提取指定位點(diǎn)的前后各100bp時(shí),如指定位點(diǎn)為Chr1 12345,bed文件中可以位置應(yīng)該為Chr1 12345-101 12345+100。
2. 安裝
bedtools軟件的安裝,建議在有網(wǎng)絡(luò)的情況下利用conda安裝,方便快捷,安裝命令:
conda install -c bioconda bedtools
Ubuntu系統(tǒng)也可以利用apt進(jìn)行安裝:
sudo apt-get install bedtools
3. 說(shuō)明書(shū)
Tool: bedtools getfasta (aka fastaFromBed)
Version: v2.27.1
Summary: Extract DNA sequences from a fasta file based on feature coordinates.
Usage: bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>
Options:
-fi Input FASTA file
-fo Output file (opt., default is STDOUT
-bed BED/GFF/VCF file of ranges to extract from -fi
-name Use the name field for the FASTA header
-name+ Use the name field and coordinates for the FASTA header
-split given BED12 fmt., extract and concatenate the sequences
from the BED "blocks" (e.g., exons)
-tab Write output in TAB delimited format.
- Default is FASTA format.
-s Force strandedness. If the feature occupies the antisense,
strand, the sequence will be reverse complemented.
- By default, strand information is ignored.
-fullHeader Use full fasta header.
- By default, only the word before the first space or tab
is used.