200904 Circ之旅4.1-Circexplorer2

Circexplorer2

參考文檔:

CIRCexplorer2手冊

tophat的安裝與使用

這個(gè)主要記錄一下使用Circexplorer2的流程

目前已經(jīng)通過之前的步驟獲得了trimmed的序列,并且qc過了,然后bowtie2,bwa,hisat,等等的索引都已經(jīng)做好。

下面參考文檔CIRCexplorer2手冊,里面提供了單端和雙端的align。這兒我使用的是GSE99531

這個(gè)是一個(gè)雙端測序的rna文庫,參數(shù)如下:

Instrument: Illumina HiSeq 4000
Strategy: RNA-Seq
Source: TRANSCRIPTOMIC
Selection: cDNA
Layout: PAIRED

但是因?yàn)檫@個(gè)軟件本身默認(rèn)是用tophat處理單端測序所以就記錄一下使用。

Align

單端

From index files (bowtie1_index is the prefix for bowtie1 index files, and bowtie2_index is the prefix for bowtie2 index files):

CIRCexplorer2 align -G hg19_kg.gtf -i bowtie1_index -j bowtie2_index -f RNA_seq.fastq > CIRCexplorer2_align.log

-i -j 應(yīng)該是拿來指定索引文件的,用一個(gè)就行了,gtf應(yīng)該用的是knowgene那個(gè)gtf。

雙端測序

tophat 2.1.0

tophat利用bowtie2建立的索引進(jìn)行比對。這個(gè)來自circexplorer2的手冊。

tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search index fq1 fq2

-o | --output < string > default: ./tophat_out 輸出的文件夾路徑
-p 15 線程數(shù),老生常談
--fusion-search 開啟融合轉(zhuǎn)錄子的比對
--keep-fasta-order 對比對結(jié)果按基因組fasta文件進(jìn)行排序。該參數(shù)會使輸出的SAM/BAM文件和tophat的1.41或以前版本不兼容
--bowtie1 default: bowtie2使用Bowtie1來代替Bowtie2進(jìn)行比對。特別是使用colorspace reads時(shí),因?yàn)橹挥蠦owtie1支持,而Bowtie2不支持。
--no-coverage-search取消以覆蓋度為基礎(chǔ)來搜尋junctions,和下一個(gè)參數(shù)對立,該參數(shù)為默認(rèn)參數(shù)。

我的命令如下:

tophat2 -o tophat_fusion -p 30 --fusion-search --keep-fasta-order --no-coverage-search ~/Circ/index/bowtie2/hg19/hg19 ~/Circ/fqc/SRR5635537_1_val_1.fq.gz ~/Circ/fqc/SRR5635537_2_val_2.fq.gz

tophat2 -o tophat_fusion -p 30 --fusion-search --keep-fasta-order --no-coverage-search ~/Circ/index/bowtie2/hg19/hg19 ~/Circ/fqc/SRR5635538_1_val_1.fq.gz ~/Circ/fqc/SRR5635538_2_val_2.fq.gz

腳本寫法:

ls *.gz |cut -d "_" -f 1 | sort -u |while read id;do
ls lh ${id}_1_val_1.fq.gz  ${id}_2_val_2.fq.gz
tophat2 -o tophat_fusion -p 30 --fusion-search --keep-fasta-order --no-coverage-search ~/Circ/index/bowtie2/hg19/hg19 ${id}_1_val_1.fq.gz  ${id}_2_val_2.fq.gz
done

任務(wù)丟后臺留一些參考吧

linux后臺執(zhí)行命令:&和nohup

linux后臺運(yùn)行&符號、nohup命令、輸出重定向等使用方法

nohup   conmmand   1>nohup37.out 2>error37   &

輸出:

[2020-07-07 18:28:18] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2020-07-07 18:28:18] Checking for Bowtie
          Bowtie version:    2.3.5.1
[2020-07-07 18:28:18] Checking for Bowtie index files (genome)..
[2020-07-07 18:28:18] Checking for reference FASTA file
[2020-07-07 18:28:18] Generating SAM header for /home/dick/Circ/index/bowtie2/hg19/hg19
[2020-07-07 18:28:20] Preparing reads
     left reads: min. length=25, max. length=51, 65118503 kept reads (29794 discarded)
    right reads: min. length=25, max. length=51, 65111529 kept reads (36768 discarded)
[2020-07-07 18:48:44] Mapping left_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 19:23:03] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 19:27:28] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 19:34:15] Mapping right_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 20:07:28] Mapping right_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 20:12:43] Mapping right_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 20:18:47] Searching for junctions via segment mapping
[2020-07-07 21:20:20] Retrieving sequences for splices
[2020-07-07 21:27:10] Indexing splices
Building a LARGE index
[2020-07-08 00:11:23] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-08 00:39:47] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-08 01:01:28] Joining segment hits
[2020-07-08 01:25:55] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-08 01:54:26] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-08 02:18:06] Joining segment hits
[2020-07-08 02:43:22] Reporting output tracks
-----------------------------------------------
[2020-07-08 03:45:32] A summary of the alignment counts can be found in tophat_fusion/37/align_summary.txt
[2020-07-08 03:45:32] Run complete: 09:17:14 elapsed

###########################################################################################
-rw-rw-r-- 1 dick dick 7.0G 7月   8 03:44 accepted_hits.bam
-rw-rw-r-- 1 dick dick  569 7月   8 03:12 align_summary.txt
-rw-rw-r-- 1 dick dick 5.7M 7月   8 03:12 deletions.bed
-rw-rw-r-- 1 dick dick  42M 7月   8 03:12 fusions.out
-rw-rw-r-- 1 dick dick 3.0M 7月   8 03:12 insertions.bed
-rw-rw-r-- 1 dick dick  12M 7月   8 03:12 junctions.bed
drwxrwxr-x 2 dick dick 4.0K 7月   8 03:44 logs
-rw-rw-r-- 1 dick dick  184 7月   7 18:48 prep_reads.info
-rw-rw-r-- 1 dick dick 243M 7月   8 03:45 unmapped.bam

報(bào)錯(cuò)踩坑記錄:

[2020-07-07 09:45:52] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2020-07-07 09:45:52] Checking for Bowtie
          Bowtie version:    2.3.5.1
[2020-07-07 09:45:52] Checking for Bowtie index files (genome)..
[2020-07-07 09:45:52] Checking for reference FASTA file
[2020-07-07 09:45:52] Generating SAM header for /home/dick/Circ/index/bowtie2/hg19/hg19
[2020-07-07 09:45:54] Preparing reads
     left reads: min. length=25, max. length=51, 65118503 kept reads (29794 discarded)
    right reads: min. length=25, max. length=51, 65111529 kept reads (36768 discarded)
[2020-07-07 10:06:11] Mapping left_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 10:52:28] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 10:57:41] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 11:05:37] Mapping right_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 11:46:10] Mapping right_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 11:50:11] Mapping right_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 11:55:02] Searching for junctions via segment mapping
[2020-07-07 12:49:01] Retrieving sequences for splices
[2020-07-07 12:55:46] Indexing splices
Building a LARGE index
[2020-07-07 15:24:33] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-07 15:46:39] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-07 16:10:08] Joining segment hits
[2020-07-07 16:36:08] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-07 17:00:17] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-07 17:22:40] Joining segment hits
[2020-07-07 17:48:31] Reporting output tracks
    [FAILED]
Error running /home/dick/anaconda3/envs/rna/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir tophat_fusion/37/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 --fusion-search --fusion-anchor-length 20 --fusion-min-dist 10000000 --fusion-read-mismatches 2 --fusion-multireads 2 --fusion-multipairs 2 -z gzip -p30 --inner-dist-mean 50 --inner-dist-std-dev 20 --no-closure-search --no-coverage-search --no-microexon-search --sam-header tophat_fusion/37/tmp/hg19_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/home/dick/anaconda3/envs/rna/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/dick/Circ/index/bowtie2/hg19/hg19.fa tophat_fusion/37/junctions.bed tophat_fusion/37/insertions.bed tophat_fusion/37/deletions.bed tophat_fusion/37/fusions.out tophat_fusion/37/tmp/accepted_hits tophat_fusion/37/tmp/left_kept_reads.mapped.bam,tophat_fusion/37/tmp/left_kept_reads.candidates tophat_fusion/37/tmp/left_kept_reads.bam tophat_fusion/37/tmp/right_kept_reads.mapped.bam,tophat_fusion/37/tmp/right_kept_reads.candidates tophat_fusion/37/tmp/right_kept_reads.bam
./SeqAn-1.4.2/seqan/basic/basic_exception.h:236 FAILED!  (Uncaught exception of type St12out_of_range: basic_string::substr)

這個(gè)好像是版本原因?qū)е拢裻ophat換成2.1.0就可以了。

STAR

STAR --chimSegmentMin 10 \
     --runThreadN 30 \
     --genomeDir ~/Circ/index/STAR/hg19 \
     --readFilesCommand zcat \
     --readFilesIn ~/Circ/fqc/SRR5635537_1_val_1.fq ~/Circ/fqc/SRR5635537_2_val_2.fq \
     --outFileNamePrefix ~/Circ/STAR_map/37/SRR5635537_

這兒readFileCommand我一開始寫的gzip后來發(fā)現(xiàn)不行,就還是用它給的參數(shù)吧

回頭記錄一下輸出,這個(gè)速度真的快

STAR --chimSegmentMin 10 \
     --runThreadN 30 \
     --genomeDir ~/Circ/index/STAR/hg38 \
     --readFilesCommand zcat \
     --readFilesIn ~/Circ/fqc/SRR5635537_1_val_1.fq ~/Circ/fqc/SRR5635537_2_val_2.fq \
     --outFileNamePrefix ~/Circ/STAR_map/37/SRR5635537_

parse

CIRCexplorer2 parse -t STAR SRR5635537_Chimeric.out.junction > CIRCexplorer2_parse.log

annotation

CIRCexplorer2 annotate -r hg19_ref_all.txt -g hg19.fa -b back_spliced_junction.bed -o circularRNA_known.txt > CIRCexplorer2_annotate.log

這個(gè) hg19_ref_all.txt

fetch_ucsc.py hg19 ref hg19_ref.txt
fetch_ucsc.py hg19 kg hg19_kg.txt
fetch_ucsc.py hg19 ens hg19_ens.txt
cat hg19_ref.txt hg19_kg.txt hg19_ens.txt > hg19_ref_all.txt

轉(zhuǎn)gtf

cut -f2-11 hg19_ref.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_ref.gtf
# or
cut -f2-11 hg19_kg.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_kg.gtf
# or
cut -f2-11 hg19_ens.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_ens.gtf
cut -f2-11 hg19_ref_all.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_ref_all.gtf

annotation

Annotating for Circular RNAs
CIRCexplorer2 annotate -r ~/Circ/index/ref/hg19_ref_all.txt -g ~/Circ/index/raw/hg19/hg19.fa -b back_spliced_junction.bed -o circularRNA_known.txt > CIRCexplorer2_annotate.log
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容