參考文獻(xiàn):
Pertea M, Kim D,Pertea G M, et al. Transcript-level expression analysis of RNA-seq experimentswith HISAT, StringTie and Ballgown.[J]. Nature Protocols, 2016, 11(9):1650.
1.clean reads質(zhì)量檢驗(yàn):
用fastqc檢驗(yàn)clean reads質(zhì)量,代碼如下:
fastqc -o 【待測(cè)序列所在目錄】--extract -f fastq *.fq.gz(序列名稱)
質(zhì)檢結(jié)果:
所有序列的Kmer Content均為FAIL,Per Base Sequence Content、Per Tile sequence quality、sequence duplication levels至少有一個(gè)WARN。
Per Base Sequence Content圖中四條線交織,表明存在overrepresented sequence:
猜測(cè):轉(zhuǎn)錄組中,表達(dá)量高的reads被識(shí)別為overrepresented sequence,屬于正?,F(xiàn)象。
參考:http://www.cnblogs.com/longjianggu/p/5078782.html
2.制作索引(indexes):
在GENOSCOPE下載油菜基因組序列文件(.fa.gz)和基因組注釋文件(.gff3.gz):http://www.genoscope.cns.fr/brassicanapus/data/
hisat2-build /genome/GCF_000686985.1_Brassica_napus_assembly_v1.0_genomic.fna(基因組序列文件目錄) brassica_tran(索引名稱及輸出目錄)
3.序列比對(duì)(alignment):
原始代碼:
hisat2?-p(線程數(shù)) 16?--dta?-x?indexes/brassica_tran(索引文件目錄) -1?/data/CleanData/$reads1(reads1目錄)?-2?/data/CleanData/$reads2(reads2目錄)?-S?/hisat2_results/$【sample?name】.sam(輸出文件名稱及目錄)
每個(gè)樣本都需通過(guò)上述代碼進(jìn)行比對(duì),封裝成BASH腳本運(yùn)行:
vi alignment(新建空白文檔alignment,以下代碼在vi中輸入)
#! /bin/bash
# run some hisat2 alignments
while read line
do
??????? reads1=${line}_1.fq.gz
??????? reads2=${line}_2.fq.gz
??????? hisat2 -p 16 --dta -x /indexes/brassica_tran -1/data/CleanData/$reads1 -2 /data/CleanData/$reads2 -S/hisat2_results/${line}.sam
done </files/samples.txt(將樣本名稱按行寫在位于/files目錄下的samples.txt文件中)
<Esc>:wq保存退出
chmod +x alignment(賦予可執(zhí)行權(quán)限)
nohup ./alignment & 后臺(tái)運(yùn)行腳本,輸出結(jié)果將儲(chǔ)存在生成的nohup.log文件中
4.samtools排序及格式轉(zhuǎn)換
原始代碼:
samtools sort -@ 16 -o /data/bamfiles/【sample name】.bam(bam文件名稱及輸出目錄) /data/hisat2_results/【sample name】.sam(sam文件名稱及輸出目錄)
封裝成BASH腳本:
#! /bin/bash
# samtools sort
while read line
do
?????? samtools sort -@ 16 -o/data/bamfiles/${line}.bam /data/hisat2_results/${line}.sam
done < /files/samples.txt
5.序列初組裝(assembly)
原始代碼:
stringtie -p 16 -G /genes/brassica.gff(基因組注釋文件) -o /data/transcripts/【sample name】.gtf(輸出,比對(duì)結(jié)果,gtf文件) -l 【sample name】(命名規(guī)則)? /data/bamfiles/【sample name】.bam(輸入,bam文件所在目錄)
封裝成BASH腳本:
#! /bin/bash
# stringtie 1st step
while read line
do
?????? stringtie -p 16 -G/genes/brassica.gff -o /data/transcripts/${line}.gtf -l ${line}/data/bamfiles/${line}.bam
done < /files/samples.txt
6.Merge
stringtie --merge -p 16 -G /genes/brassica.gff -o stringtie_merged.gtf/data/transcripts/mergelist.txt
gffcompare檢測(cè)組裝結(jié)果:
gffcompare -r /genes/brassica.gff -G -o merged stringtie_merged.gtf
7.計(jì)算表達(dá)量并輸出成ballgown格式
原始代碼:
stringtie -e -B -p 16 -G /data/transcripts/stringtie_merged.gtf(用merge生成的gtf文件代替基因組注釋) -o /data/ballgown/$【sample name】/$【sample name】.gtf(輸出為ballgown所需的輸入格式) /data/bamfiles/【sample name】.bam(輸入,bam文件)
封裝成BASH腳本:
#! /bin/bash
# stringtie 3rd step
while read line
do
??????? stringtie -e -B -p 16 -G/data/transcripts/stringtie_merged.gtf -o /data/ballgown/${line}/${line}.gtf/data/bamfiles/${line}.bam
done < /files/samples.txt
8.ballgown分析差異表達(dá)基因(在R中進(jìn)行)
>install.packages(‘devtools’)
>source('http://www.bioconductor.org/biocLite.R')
>biocLite(c(’ballgown’, 'genefilter’,'dplyr’,'devtools’))
>library(ballgown)
>library(genefilter)
>library(dplyr)
>library(devtools)
>file <- ballgown(dataDir='/data/ballgown',samplePattern=【sample名前綴】)輸入基因表達(dá)數(shù)據(jù)
>samplesNames(file)查看ballgown文件中各樣本的排列順序
>pData(file) <- read.csv(‘/data/geuvadis_phenodata.csv’)(導(dǎo)入自制的表型數(shù)據(jù))
>filefilt <-subset(file,'rowVars(texpr(file))>1',genomesubset=TRUE)(過(guò)濾掉表達(dá)差異較小的基因)
>diff_genes <- stattest(filefilt,feature='gene',covariate=【自變量】,adjustvars=【無(wú)關(guān)變量】,meas='FPKM')(統(tǒng)計(jì)差異表達(dá)的基因)
將差異表達(dá)基因按pval從小到大排序,并寫入csv文件:
>diff_genes <- arrange(diff_genes,pval)
>write.csv(diff_genes,'/data/diff_genes',row.names=FALSE)
————————————————
版權(quán)聲明:本文為CSDN博主「ygyxl」的原創(chuàng)文章,遵循CC 4.0 BY-SA版權(quán)協(xié)議,轉(zhuǎn)載請(qǐng)附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/ygyxl/article/details/74375031
步均做完,不知道第8步.ballgown分析差異表達(dá)基因(在R中進(jìn)行)怎么辦了,一直報(bào)錯(cuò)。比如>install.packages(‘devtools’)
報(bào)錯(cuò)如下
