子宮內(nèi)膜癌的RNA editing數(shù)據(jù)分析

參考文獻

[PMID: 27694136]
DRETools - F1000Research

數(shù)據(jù)下載

使用本實驗室一個RNA-seq(GSE56087)和一個circRNA-seq(未發(fā)布)的數(shù)據(jù)作為discovery dataset 和 validation dataset。

RNA-seq數(shù)據(jù)下載

shell命令批量下載

#!/bin/sh
#調(diào)用prefetch下載SRA數(shù)據(jù)庫的測序文件,并用pfastq-dump轉(zhuǎn)化為fastq文件。
#需要在當(dāng)前目錄的download_list.txt文件中把SRA編號,每個一行事先準備好,具體格式參考example.txt
#建議使用NCBI的SRA RUN SELECTOR直接導(dǎo)出。
read  -p "請輸入下載列表文件名:" downloadlist
read  -p "請選擇單端(SE),雙端(PE)測序:" class
read  -p "請輸入要使用的線程數(shù):" thread
if [ $class = "SE" ]; then
    cat $downloadlist | while read line
  do
    echo $line 
    prefetch $line -o ./${line}.sra
    pfastq-dump --threads $thread --outdir ./ ./${line}.sra
  done
elif [ $class = "PE" ]; then
    cat $downloadlist | while read line
  do
    echo $line
    prefetch $line -o ./${line}.sra
    pfastq-dump --split-3 --threads $thread --outdir ./ ./${line}.sra 
  done
else echo "錯誤的輸入!"
fi

相應(yīng)的accession number

SRR1200850
SRR1200851
SRR1200852
SRR1200853
SRR1200854
SRR1200855
SRR1200856
SRR1200857
SRR1200860
SRR1200861
SRR1200858
SRR1200859
SRR1200862
SRR1200863
SRR1200864
SRR1200865
SRR1200866
SRR1200867
SRR1200868
SRR1200869
SRR1200870
SRR1200871
SRR1200872
SRR1200873
SRR1200874
SRR1200875
SRR1200876
SRR1200877
SRR1200878
SRR1200879
SRR1200880
SRR1200881
SRR1200882
SRR1200883
SRR1200884
SRR1200885

數(shù)據(jù)分析

主要使用RNAEditor軟件分析,環(huán)境配置按照官網(wǎng)的配置,注意相應(yīng)軟件的版本,命令需要在RNAEditor目錄下運行,并且要求不能在SSH界面下運行,會報錯(無法畫圖),需要在圖形界面下打開終端運行。
shell批量分析

for ((i=1200851;i<1200886;i++))
do
fastp -w 16 -i ~/RNASEQ/EC_lina/SRR${i}_1.fastq -I ~/RNASEQ/EC_lina/SRR${i}_2.fastq -o ~/RNASEQ/EC_lina/SRR${i}_1_fastpedited.fastq -O ~/RNASEQ/EC_lina/SRR${i}_2_fastpedited.fastq 
python RNAEditor.py -i ~/RNASEQ/EC_lina/SRR${i}_1_fastpedited.fastq ~/RNASEQ/EC_lina/SRR${i}_2_fastpedited.fastq -c configuration.txt
done

配置文件configuration.txt如下

#This file is used to configure the behaviour of RNAeditor

#Standard input files
refGenome = /home/zhou/rnaEditor_annotations/human/GRCH38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
gtfFile = /home/zhou/rnaEditor_annotations/human/GRCH38/Homo_sapiens.GRCh38.83.gtf
dbSNP = /home/zhou/rnaEditor_annotations/human/GRCH38/dbSNP.vcf
hapmap = /home/zhou/rnaEditor_annotations/human/GRCH38/HAPMAP.vcf
omni = /home/zhou/rnaEditor_annotations/human/GRCH38/1000GenomeProject.vcf
esp = /home/zhou/rnaEditor_annotations/human/GRCH38/ESP_filtered
aluRegions = /home/zhou/rnaEditor_annotations/human/GRCH38/repeats.bed
output = default
sourceDir = /usr/local/bin/
maxDiff = 0.04
seedDiff = 2
standCall = 0
standEmit = 0
edgeDistance = 3
paired = True 
keepTemp = True
overwrite = False
threads = 23

開23個核,每個樣本分析時間大概在十個小時左右。

結(jié)果解析

結(jié)果實例

image.png

VCF文件

VCF文件包含所有的編輯站點以供進一步分析.


image.png

GCF文件

GCF文件保存了每個編輯站點的附加信息,如基因名稱、片段、總讀數(shù)、編輯讀數(shù)和編輯比。


image.png

summary文件

summary文件顯示每個基因的RNA編輯所處基因位置(如3‘UTR,5‘UTR)的數(shù)量。


image.png

bed文件

Editing island的bed文件,可供后續(xù)可視化等分析。


image.png

打包除了bam,sam,sai這兩種文件以外的所有結(jié)果

tar -cvf RNAEditor_1.tar ~/RNASEQ/EC_lina/rnaEditor/ --exclude=*bam --exclude=*sam --exclude=*sai
pigz RNAEditor_1.tar

circRNA-seq數(shù)據(jù)集的驗證工作

shell批量分析

#!/bin/sh
#把fastq文件路徑存入數(shù)組
c=0
for file in `find ~/RNASEQ/CircularRNA-seq/rawdata/ -name *fastq.gz -print`
do
  filelist[$c]=$file
  ((c++))
done

for ((i=0;i<${#filelist[@]};i=i+2))
    do
        tmp1=$(echo ${filelist[$i]}|sed 's/\.fastq\.gz/\_fastpedited\.fastq/g')
        tmp2=$(echo ${filelist[$i+1]}|sed 's/\.fastq\.gz/\_fastpedited\.fastq/g')
    #fastp預(yù)處理
        fastp -w 16 -i ${filelist[$i]} -I ${filelist[$i+1]} -o $tmp1 -O $tmp2 
        #pigz -d $tmp1
        #pigz -d $tmp2
        python RNAEditor.py -i $tmp1 $tmp2 -c configuration.txt
    done

configuration.txt不變

打包除了bam,sam,sai這兩種文件以外的所有結(jié)果

tar -cvf RNAEditor_2.tar ~/RNASEQ/CircularRNA-seq/rawdata/Sample_ZYF-*/rnaEditor --exclude=*bam --exclude=*sam --exclude=*sai
pigz RNAEditor_2.tar

后續(xù)分析

DREtools差異分析

#Merge editing sites from multiple samples
dretools edsite-merge     --min-editing  3      --min-coverage 5      --min-samples  3      --vcf ./*/*.editingSites.vcf > DRE/consensus_sites.vcf
#Calculate sample-wise EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ; 
do  
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');  
dretools sample-epk --name $name --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.sample_epk.tsv; 
done
#Calculate site-wise EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ; 
do 
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');  
dretools edsite-epk --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.edsite_epk.tsv; 
done
#Find differentially edited editing sites
dretools edsite-diff                    --max-depth-cov 5.0                 --min-depth 2       \
--names normal,tumor \
--sites ./DRE/consensus_sites.vcf \
--sample-epk ./DRE/SRR1200850.sample_epk.tsv,./DRE/SRR1200851.sample_epk.tsv,./DRE/SRR1200854.sample_epk.tsv,./DRE/SRR1200855.sample_epk.tsv,./DRE/SRR1200858.sample_epk.tsv,./DRE/SRR1200859.sample_epk.tsv,./DRE/SRR1200862.sample_epk.tsv,./DRE/SRR1200863.sample_epk.tsv,./DRE/SRR1200866.sample_epk.tsv,./DRE/SRR1200867.sample_epk.tsv,./DRE/SRR1200872.sample_epk.tsv,./DRE/SRR1200873.sample_epk.tsv,./DRE/SRR1200874.sample_epk.tsv,./DRE/SRR1200875.sample_epk.tsv,./DRE/SRR1200878.sample_epk.tsv,./DRE/SRR1200879.sample_epk.tsv,./DRE/SRR1200882.sample_epk.tsv,./DRE/SRR1200883.sample_epk.tsv \
./DRE/SRR1200852.sample_epk.tsv,./DRE/SRR1200853.sample_epk.tsv,./DRE/SRR1200856.sample_epk.tsv,./DRE/SRR1200857.sample_epk.tsv,./DRE/SRR1200860.sample_epk.tsv,./DRE/SRR1200861.sample_epk.tsv,./DRE/SRR1200864.sample_epk.tsv,./DRE/SRR1200865.sample_epk.tsv,./DRE/SRR1200868.sample_epk.tsv,./DRE/SRR1200869.sample_epk.tsv,./DRE/SRR1200870.sample_epk.tsv,./DRE/SRR1200871.sample_epk.tsv,./DRE/SRR1200876.sample_epk.tsv,./DRE/SRR1200877.sample_epk.tsv,./DRE/SRR1200880.sample_epk.tsv,./DRE/SRR1200881.sample_epk.tsv,./DRE/SRR1200884.sample_epk.tsv,./DRE/SRR1200885.sample_epk.tsv  \
--site-epk ./DRE/SRR1200850.edsite_epk.tsv,./DRE/SRR1200851.edsite_epk.tsv,./DRE/SRR1200854.edsite_epk.tsv,./DRE/SRR1200855.edsite_epk.tsv,./DRE/SRR1200858.edsite_epk.tsv,./DRE/SRR1200859.edsite_epk.tsv,./DRE/SRR1200862.edsite_epk.tsv,./DRE/SRR1200863.edsite_epk.tsv,./DRE/SRR1200866.edsite_epk.tsv,./DRE/SRR1200867.edsite_epk.tsv,./DRE/SRR1200872.edsite_epk.tsv,./DRE/SRR1200873.edsite_epk.tsv,./DRE/SRR1200874.edsite_epk.tsv,./DRE/SRR1200875.edsite_epk.tsv,./DRE/SRR1200878.edsite_epk.tsv,./DRE/SRR1200879.edsite_epk.tsv,./DRE/SRR1200882.edsite_epk.tsv,./DRE/SRR1200883.edsite_epk.tsv \
./DRE/SRR1200852.edsite_epk.tsv,./DRE/SRR1200853.edsite_epk.tsv,./DRE/SRR1200856.edsite_epk.tsv,./DRE/SRR1200857.edsite_epk.tsv,./DRE/SRR1200860.edsite_epk.tsv,./DRE/SRR1200861.edsite_epk.tsv,./DRE/SRR1200864.edsite_epk.tsv,./DRE/SRR1200865.edsite_epk.tsv,./DRE/SRR1200868.edsite_epk.tsv,./DRE/SRR1200869.edsite_epk.tsv,./DRE/SRR1200870.edsite_epk.tsv,./DRE/SRR1200871.edsite_epk.tsv,./DRE/SRR1200876.edsite_epk.tsv,./DRE/SRR1200877.edsite_epk.tsv,./DRE/SRR1200880.edsite_epk.tsv,./DRE/SRR1200881.edsite_epk.tsv,./DRE/SRR1200884.edsite_epk.tsv,./DRE/SRR1200885.edsite_epk.tsv  \
> ./DRE/diff_sites.tsv
#Detect editing islands
dretools find-islands \
    --min-editing 3   \
    --min-coverage 5  \
    --min-length 20   \
    --min-points 5    \
    --epsilon 50      \
    --vcf ./*/*.editingSites.vcf  > ./DRE/islands.bed
#Calculate Island-EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ; 
do  
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');  
dretools region-epk --regions ./DRE/islands.bed --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.island_epk.tsv; 
done
#Find differentially edited islands
dretools region-diff         --regions  ./DRE/islands.bed           --min-area  1                 --min-depth 1       \
--names normal,tumor \
--sample-epk ./DRE/SRR1200850.sample_epk.tsv,./DRE/SRR1200851.sample_epk.tsv,./DRE/SRR1200854.sample_epk.tsv,./DRE/SRR1200855.sample_epk.tsv,./DRE/SRR1200858.sample_epk.tsv,./DRE/SRR1200859.sample_epk.tsv,./DRE/SRR1200862.sample_epk.tsv,./DRE/SRR1200863.sample_epk.tsv,./DRE/SRR1200866.sample_epk.tsv,./DRE/SRR1200867.sample_epk.tsv,./DRE/SRR1200872.sample_epk.tsv,./DRE/SRR1200873.sample_epk.tsv,./DRE/SRR1200874.sample_epk.tsv,./DRE/SRR1200875.sample_epk.tsv,./DRE/SRR1200878.sample_epk.tsv,./DRE/SRR1200879.sample_epk.tsv,./DRE/SRR1200882.sample_epk.tsv,./DRE/SRR1200883.sample_epk.tsv \
./DRE/SRR1200852.sample_epk.tsv,./DRE/SRR1200853.sample_epk.tsv,./DRE/SRR1200856.sample_epk.tsv,./DRE/SRR1200857.sample_epk.tsv,./DRE/SRR1200860.sample_epk.tsv,./DRE/SRR1200861.sample_epk.tsv,./DRE/SRR1200864.sample_epk.tsv,./DRE/SRR1200865.sample_epk.tsv,./DRE/SRR1200868.sample_epk.tsv,./DRE/SRR1200869.sample_epk.tsv,./DRE/SRR1200870.sample_epk.tsv,./DRE/SRR1200871.sample_epk.tsv,./DRE/SRR1200876.sample_epk.tsv,./DRE/SRR1200877.sample_epk.tsv,./DRE/SRR1200880.sample_epk.tsv,./DRE/SRR1200881.sample_epk.tsv,./DRE/SRR1200884.sample_epk.tsv,./DRE/SRR1200885.sample_epk.tsv  \
--region-epk  ./DRE/SRR1200850.island_epk.tsv,./DRE/SRR1200851.island_epk.tsv,./DRE/SRR1200854.island_epk.tsv,./DRE/SRR1200855.island_epk.tsv,./DRE/SRR1200858.island_epk.tsv,./DRE/SRR1200859.island_epk.tsv,./DRE/SRR1200862.island_epk.tsv,./DRE/SRR1200863.island_epk.tsv,./DRE/SRR1200866.island_epk.tsv,./DRE/SRR1200867.island_epk.tsv,./DRE/SRR1200872.island_epk.tsv,./DRE/SRR1200873.island_epk.tsv,./DRE/SRR1200874.island_epk.tsv,./DRE/SRR1200875.island_epk.tsv,./DRE/SRR1200878.island_epk.tsv,./DRE/SRR1200879.island_epk.tsv,./DRE/SRR1200882.island_epk.tsv,./DRE/SRR1200883.island_epk.tsv \
./DRE/SRR1200852.island_epk.tsv,./DRE/SRR1200853.island_epk.tsv,./DRE/SRR1200856.island_epk.tsv,./DRE/SRR1200857.island_epk.tsv,./DRE/SRR1200860.island_epk.tsv,./DRE/SRR1200861.island_epk.tsv,./DRE/SRR1200864.island_epk.tsv,./DRE/SRR1200865.island_epk.tsv,./DRE/SRR1200868.island_epk.tsv,./DRE/SRR1200869.island_epk.tsv,./DRE/SRR1200870.island_epk.tsv,./DRE/SRR1200871.island_epk.tsv,./DRE/SRR1200876.island_epk.tsv,./DRE/SRR1200877.island_epk.tsv,./DRE/SRR1200880.island_epk.tsv,./DRE/SRR1200881.island_epk.tsv,./DRE/SRR1200884.island_epk.tsv,./DRE/SRR1200885.island_epk.tsv  \
> ./DRE/diff_islands.tsv

DREtools結(jié)果展示

image.png

MET-DB交集

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容