參考文獻
[PMID: 27694136]
DRETools - F1000Research
數(shù)據(jù)下載
使用本實驗室一個RNA-seq(GSE56087)和一個circRNA-seq(未發(fā)布)的數(shù)據(jù)作為discovery dataset 和 validation dataset。
RNA-seq數(shù)據(jù)下載
shell命令批量下載
#!/bin/sh
#調(diào)用prefetch下載SRA數(shù)據(jù)庫的測序文件,并用pfastq-dump轉(zhuǎn)化為fastq文件。
#需要在當(dāng)前目錄的download_list.txt文件中把SRA編號,每個一行事先準備好,具體格式參考example.txt
#建議使用NCBI的SRA RUN SELECTOR直接導(dǎo)出。
read -p "請輸入下載列表文件名:" downloadlist
read -p "請選擇單端(SE),雙端(PE)測序:" class
read -p "請輸入要使用的線程數(shù):" thread
if [ $class = "SE" ]; then
cat $downloadlist | while read line
do
echo $line
prefetch $line -o ./${line}.sra
pfastq-dump --threads $thread --outdir ./ ./${line}.sra
done
elif [ $class = "PE" ]; then
cat $downloadlist | while read line
do
echo $line
prefetch $line -o ./${line}.sra
pfastq-dump --split-3 --threads $thread --outdir ./ ./${line}.sra
done
else echo "錯誤的輸入!"
fi
相應(yīng)的accession number
SRR1200850
SRR1200851
SRR1200852
SRR1200853
SRR1200854
SRR1200855
SRR1200856
SRR1200857
SRR1200860
SRR1200861
SRR1200858
SRR1200859
SRR1200862
SRR1200863
SRR1200864
SRR1200865
SRR1200866
SRR1200867
SRR1200868
SRR1200869
SRR1200870
SRR1200871
SRR1200872
SRR1200873
SRR1200874
SRR1200875
SRR1200876
SRR1200877
SRR1200878
SRR1200879
SRR1200880
SRR1200881
SRR1200882
SRR1200883
SRR1200884
SRR1200885
數(shù)據(jù)分析
主要使用RNAEditor軟件分析,環(huán)境配置按照官網(wǎng)的配置,注意相應(yīng)軟件的版本,命令需要在RNAEditor目錄下運行,并且要求不能在SSH界面下運行,會報錯(無法畫圖),需要在圖形界面下打開終端運行。
shell批量分析
for ((i=1200851;i<1200886;i++))
do
fastp -w 16 -i ~/RNASEQ/EC_lina/SRR${i}_1.fastq -I ~/RNASEQ/EC_lina/SRR${i}_2.fastq -o ~/RNASEQ/EC_lina/SRR${i}_1_fastpedited.fastq -O ~/RNASEQ/EC_lina/SRR${i}_2_fastpedited.fastq
python RNAEditor.py -i ~/RNASEQ/EC_lina/SRR${i}_1_fastpedited.fastq ~/RNASEQ/EC_lina/SRR${i}_2_fastpedited.fastq -c configuration.txt
done
配置文件configuration.txt如下
#This file is used to configure the behaviour of RNAeditor
#Standard input files
refGenome = /home/zhou/rnaEditor_annotations/human/GRCH38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
gtfFile = /home/zhou/rnaEditor_annotations/human/GRCH38/Homo_sapiens.GRCh38.83.gtf
dbSNP = /home/zhou/rnaEditor_annotations/human/GRCH38/dbSNP.vcf
hapmap = /home/zhou/rnaEditor_annotations/human/GRCH38/HAPMAP.vcf
omni = /home/zhou/rnaEditor_annotations/human/GRCH38/1000GenomeProject.vcf
esp = /home/zhou/rnaEditor_annotations/human/GRCH38/ESP_filtered
aluRegions = /home/zhou/rnaEditor_annotations/human/GRCH38/repeats.bed
output = default
sourceDir = /usr/local/bin/
maxDiff = 0.04
seedDiff = 2
standCall = 0
standEmit = 0
edgeDistance = 3
paired = True
keepTemp = True
overwrite = False
threads = 23
開23個核,每個樣本分析時間大概在十個小時左右。
結(jié)果解析
結(jié)果實例

image.png
VCF文件
VCF文件包含所有的編輯站點以供進一步分析.

image.png
GCF文件
GCF文件保存了每個編輯站點的附加信息,如基因名稱、片段、總讀數(shù)、編輯讀數(shù)和編輯比。

image.png
summary文件
summary文件顯示每個基因的RNA編輯所處基因位置(如3‘UTR,5‘UTR)的數(shù)量。

image.png
bed文件
Editing island的bed文件,可供后續(xù)可視化等分析。

image.png
打包除了bam,sam,sai這兩種文件以外的所有結(jié)果
tar -cvf RNAEditor_1.tar ~/RNASEQ/EC_lina/rnaEditor/ --exclude=*bam --exclude=*sam --exclude=*sai
pigz RNAEditor_1.tar
circRNA-seq數(shù)據(jù)集的驗證工作
shell批量分析
#!/bin/sh
#把fastq文件路徑存入數(shù)組
c=0
for file in `find ~/RNASEQ/CircularRNA-seq/rawdata/ -name *fastq.gz -print`
do
filelist[$c]=$file
((c++))
done
for ((i=0;i<${#filelist[@]};i=i+2))
do
tmp1=$(echo ${filelist[$i]}|sed 's/\.fastq\.gz/\_fastpedited\.fastq/g')
tmp2=$(echo ${filelist[$i+1]}|sed 's/\.fastq\.gz/\_fastpedited\.fastq/g')
#fastp預(yù)處理
fastp -w 16 -i ${filelist[$i]} -I ${filelist[$i+1]} -o $tmp1 -O $tmp2
#pigz -d $tmp1
#pigz -d $tmp2
python RNAEditor.py -i $tmp1 $tmp2 -c configuration.txt
done
configuration.txt不變
打包除了bam,sam,sai這兩種文件以外的所有結(jié)果
tar -cvf RNAEditor_2.tar ~/RNASEQ/CircularRNA-seq/rawdata/Sample_ZYF-*/rnaEditor --exclude=*bam --exclude=*sam --exclude=*sai
pigz RNAEditor_2.tar
后續(xù)分析
DREtools差異分析
#Merge editing sites from multiple samples
dretools edsite-merge --min-editing 3 --min-coverage 5 --min-samples 3 --vcf ./*/*.editingSites.vcf > DRE/consensus_sites.vcf
#Calculate sample-wise EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ;
do
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');
dretools sample-epk --name $name --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.sample_epk.tsv;
done
#Calculate site-wise EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ;
do
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');
dretools edsite-epk --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.edsite_epk.tsv;
done
#Find differentially edited editing sites
dretools edsite-diff --max-depth-cov 5.0 --min-depth 2 \
--names normal,tumor \
--sites ./DRE/consensus_sites.vcf \
--sample-epk ./DRE/SRR1200850.sample_epk.tsv,./DRE/SRR1200851.sample_epk.tsv,./DRE/SRR1200854.sample_epk.tsv,./DRE/SRR1200855.sample_epk.tsv,./DRE/SRR1200858.sample_epk.tsv,./DRE/SRR1200859.sample_epk.tsv,./DRE/SRR1200862.sample_epk.tsv,./DRE/SRR1200863.sample_epk.tsv,./DRE/SRR1200866.sample_epk.tsv,./DRE/SRR1200867.sample_epk.tsv,./DRE/SRR1200872.sample_epk.tsv,./DRE/SRR1200873.sample_epk.tsv,./DRE/SRR1200874.sample_epk.tsv,./DRE/SRR1200875.sample_epk.tsv,./DRE/SRR1200878.sample_epk.tsv,./DRE/SRR1200879.sample_epk.tsv,./DRE/SRR1200882.sample_epk.tsv,./DRE/SRR1200883.sample_epk.tsv \
./DRE/SRR1200852.sample_epk.tsv,./DRE/SRR1200853.sample_epk.tsv,./DRE/SRR1200856.sample_epk.tsv,./DRE/SRR1200857.sample_epk.tsv,./DRE/SRR1200860.sample_epk.tsv,./DRE/SRR1200861.sample_epk.tsv,./DRE/SRR1200864.sample_epk.tsv,./DRE/SRR1200865.sample_epk.tsv,./DRE/SRR1200868.sample_epk.tsv,./DRE/SRR1200869.sample_epk.tsv,./DRE/SRR1200870.sample_epk.tsv,./DRE/SRR1200871.sample_epk.tsv,./DRE/SRR1200876.sample_epk.tsv,./DRE/SRR1200877.sample_epk.tsv,./DRE/SRR1200880.sample_epk.tsv,./DRE/SRR1200881.sample_epk.tsv,./DRE/SRR1200884.sample_epk.tsv,./DRE/SRR1200885.sample_epk.tsv \
--site-epk ./DRE/SRR1200850.edsite_epk.tsv,./DRE/SRR1200851.edsite_epk.tsv,./DRE/SRR1200854.edsite_epk.tsv,./DRE/SRR1200855.edsite_epk.tsv,./DRE/SRR1200858.edsite_epk.tsv,./DRE/SRR1200859.edsite_epk.tsv,./DRE/SRR1200862.edsite_epk.tsv,./DRE/SRR1200863.edsite_epk.tsv,./DRE/SRR1200866.edsite_epk.tsv,./DRE/SRR1200867.edsite_epk.tsv,./DRE/SRR1200872.edsite_epk.tsv,./DRE/SRR1200873.edsite_epk.tsv,./DRE/SRR1200874.edsite_epk.tsv,./DRE/SRR1200875.edsite_epk.tsv,./DRE/SRR1200878.edsite_epk.tsv,./DRE/SRR1200879.edsite_epk.tsv,./DRE/SRR1200882.edsite_epk.tsv,./DRE/SRR1200883.edsite_epk.tsv \
./DRE/SRR1200852.edsite_epk.tsv,./DRE/SRR1200853.edsite_epk.tsv,./DRE/SRR1200856.edsite_epk.tsv,./DRE/SRR1200857.edsite_epk.tsv,./DRE/SRR1200860.edsite_epk.tsv,./DRE/SRR1200861.edsite_epk.tsv,./DRE/SRR1200864.edsite_epk.tsv,./DRE/SRR1200865.edsite_epk.tsv,./DRE/SRR1200868.edsite_epk.tsv,./DRE/SRR1200869.edsite_epk.tsv,./DRE/SRR1200870.edsite_epk.tsv,./DRE/SRR1200871.edsite_epk.tsv,./DRE/SRR1200876.edsite_epk.tsv,./DRE/SRR1200877.edsite_epk.tsv,./DRE/SRR1200880.edsite_epk.tsv,./DRE/SRR1200881.edsite_epk.tsv,./DRE/SRR1200884.edsite_epk.tsv,./DRE/SRR1200885.edsite_epk.tsv \
> ./DRE/diff_sites.tsv
#Detect editing islands
dretools find-islands \
--min-editing 3 \
--min-coverage 5 \
--min-length 20 \
--min-points 5 \
--epsilon 50 \
--vcf ./*/*.editingSites.vcf > ./DRE/islands.bed
#Calculate Island-EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ;
do
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');
dretools region-epk --regions ./DRE/islands.bed --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.island_epk.tsv;
done
#Find differentially edited islands
dretools region-diff --regions ./DRE/islands.bed --min-area 1 --min-depth 1 \
--names normal,tumor \
--sample-epk ./DRE/SRR1200850.sample_epk.tsv,./DRE/SRR1200851.sample_epk.tsv,./DRE/SRR1200854.sample_epk.tsv,./DRE/SRR1200855.sample_epk.tsv,./DRE/SRR1200858.sample_epk.tsv,./DRE/SRR1200859.sample_epk.tsv,./DRE/SRR1200862.sample_epk.tsv,./DRE/SRR1200863.sample_epk.tsv,./DRE/SRR1200866.sample_epk.tsv,./DRE/SRR1200867.sample_epk.tsv,./DRE/SRR1200872.sample_epk.tsv,./DRE/SRR1200873.sample_epk.tsv,./DRE/SRR1200874.sample_epk.tsv,./DRE/SRR1200875.sample_epk.tsv,./DRE/SRR1200878.sample_epk.tsv,./DRE/SRR1200879.sample_epk.tsv,./DRE/SRR1200882.sample_epk.tsv,./DRE/SRR1200883.sample_epk.tsv \
./DRE/SRR1200852.sample_epk.tsv,./DRE/SRR1200853.sample_epk.tsv,./DRE/SRR1200856.sample_epk.tsv,./DRE/SRR1200857.sample_epk.tsv,./DRE/SRR1200860.sample_epk.tsv,./DRE/SRR1200861.sample_epk.tsv,./DRE/SRR1200864.sample_epk.tsv,./DRE/SRR1200865.sample_epk.tsv,./DRE/SRR1200868.sample_epk.tsv,./DRE/SRR1200869.sample_epk.tsv,./DRE/SRR1200870.sample_epk.tsv,./DRE/SRR1200871.sample_epk.tsv,./DRE/SRR1200876.sample_epk.tsv,./DRE/SRR1200877.sample_epk.tsv,./DRE/SRR1200880.sample_epk.tsv,./DRE/SRR1200881.sample_epk.tsv,./DRE/SRR1200884.sample_epk.tsv,./DRE/SRR1200885.sample_epk.tsv \
--region-epk ./DRE/SRR1200850.island_epk.tsv,./DRE/SRR1200851.island_epk.tsv,./DRE/SRR1200854.island_epk.tsv,./DRE/SRR1200855.island_epk.tsv,./DRE/SRR1200858.island_epk.tsv,./DRE/SRR1200859.island_epk.tsv,./DRE/SRR1200862.island_epk.tsv,./DRE/SRR1200863.island_epk.tsv,./DRE/SRR1200866.island_epk.tsv,./DRE/SRR1200867.island_epk.tsv,./DRE/SRR1200872.island_epk.tsv,./DRE/SRR1200873.island_epk.tsv,./DRE/SRR1200874.island_epk.tsv,./DRE/SRR1200875.island_epk.tsv,./DRE/SRR1200878.island_epk.tsv,./DRE/SRR1200879.island_epk.tsv,./DRE/SRR1200882.island_epk.tsv,./DRE/SRR1200883.island_epk.tsv \
./DRE/SRR1200852.island_epk.tsv,./DRE/SRR1200853.island_epk.tsv,./DRE/SRR1200856.island_epk.tsv,./DRE/SRR1200857.island_epk.tsv,./DRE/SRR1200860.island_epk.tsv,./DRE/SRR1200861.island_epk.tsv,./DRE/SRR1200864.island_epk.tsv,./DRE/SRR1200865.island_epk.tsv,./DRE/SRR1200868.island_epk.tsv,./DRE/SRR1200869.island_epk.tsv,./DRE/SRR1200870.island_epk.tsv,./DRE/SRR1200871.island_epk.tsv,./DRE/SRR1200876.island_epk.tsv,./DRE/SRR1200877.island_epk.tsv,./DRE/SRR1200880.island_epk.tsv,./DRE/SRR1200881.island_epk.tsv,./DRE/SRR1200884.island_epk.tsv,./DRE/SRR1200885.island_epk.tsv \
> ./DRE/diff_islands.tsv
DREtools結(jié)果展示

image.png
MET-DB交集