hbctraining-Introduction to ChIP-Seq Lesson 3

Align and Filtering


Part 1. Align

1.Alignment to Genome

??? After we have assessed the clean sequence data, we are ready to align the reads to the reference genome. Bowtie2 is a fast and accurate alignment tools that indexes the genome with an FM index based on the Burrows-Wheeler transform method to keep memory requirements low for the alignment process. Bowtie2 supports gapped, local and paired end alignment modes and works best for reads that are at least 50bp (shorter read lengths should use Bowtie1, like smRNA-Seq). By default, Bowtie2 will perform a global end-to-end read alignment, which is best for quality-trimmed reads. However, it also has a local alignment mode, which will perform soft-clipping for the removal of poor quality bases or adapters from untrimmed reads.?

2. Bowtie2 Usage

* Creating a Bowtie2 index

??????? Genome index , analagous to the index in the back of a book,is required to perform????????? the? Bowtie2 alignment. We can generate the genome index by the following command:

??????? bowtie2-build<path_to_reference_genome.fa><prefix_to_name_indexes>

* often-used parameters in Bowtie2

???? -p: number of processors/cores

???? -q: reads that are in FASTQ format

???? --local: local alignment feature to perform soft-clipping

???? -x: /path/to/genome_index_directory

???? -S: /path/to/output/SAM_file

???? -U: Single-end data

???? -1/-2: Pair-end data

3. Alignment file format: SAM/BAM

to be continued


Part 2. Filtering

An important issue with ChIP-Seq data concerns the inclusion of multiple mapped reads (reads mapped to multiple loci on the reference genome). Allowing for multiple mapped reads increases the number of usable reads and sensitivity of peak detection; however, the number of false positives may also increase[1]. Therefore we need to filter out alignment files to contain only uniquely mapping reads in order to increase confidence in site discovery and improve reproducibility. Since there is no parameter in Bowtie2 to keep only uniquely mapping reads, we will need to perform the following steps to generate alignment files containing only the uniquely mapping reads:

1. Change alignment file format from SAM to BAM by samtools view

parameters included in this step:

-h: include header in output

-S: input is in SAM format

-b: output BAM format

-o: /path/to/output/file

2. Sort BAM file by read coordinate locations(sambamba sort or samtools sort)

the advantage to using sambamba is that along with the newly sorted file, an index file is generated. If we used samtools this would have been a two-step process.

3. Filter to keep only uniquely mapping reads(this will also remove any unmapped reads and duplicates)

We filter out multimappers by specifying XS:

XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i

or We can filter by MAPQ.

* for sambamba

-t: number of threads / cores

-h: print SAM header before reads

-f: format of output file (default is SAM)

-F: set custom filter - we will be using the filter to remove duplicates, multimappers and unmapped reads.

sambamba view -h -t 2 -f bam -F "[XS] == null and not unmapped? and not duplicate" sorted.bam>sort.filter.bam

*for samtools

samtools view -Shub -f 2 -q 30 $sam | samtools sort - -T $path/$sample -o $filter_bam

TO BE CONTINUED


???

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi閱讀 7,817評(píng)論 0 10
  • Introduction What is Bowtie 2? Bowtie 2 is an ultrafast a...
    wzz閱讀 6,157評(píng)論 0 5
  • 也許,在我懵懂的時(shí)候我就渴望著看見(jiàn)在黑夜的天空中閃耀著的繁星,灑滿(mǎn)天空的無(wú)數(shù)顆恒星與行星居住在宇宙中,它們閃耀著如...
    鈐魚(yú)擺擺閱讀 362評(píng)論 1 1
  • 業(yè)精于勤荒于嬉,堅(jiān)持就會(huì)勝利! 一如往常,準(zhǔn)點(diǎn)準(zhǔn)備。 大花臉開(kāi)始了。
    Xiewy123閱讀 317評(píng)論 0 0
  • 真的非常感恩圣靈。因?yàn)橘I(mǎi)的機(jī)票是早上10:50,昨天下午開(kāi)始就一直非常擔(dān)心是否趕得上航班,想到之前來(lái)到機(jī)場(chǎng)確實(shí)花了...
    yadie閱讀 217評(píng)論 0 0

友情鏈接更多精彩內(nèi)容