Protocols for RNA-seq data analysis and obtain target genes

1.Using rice as an example
2.All scripts and rawdata can be found in '/data/dta/shared/rnaseqworkflow'(For lab members)

Before working :

  1. Create a root directory to store all future data
  2. Create a subdirectory , download reference genome data and annotations
  3. Use the alignment software you like to make index for genome
  4. Create other subdirectories to store different data such as raw data, matrix, script
code:
$ mkdir Drought_stress
$ mkdir Drought_stress/Rice  && cd Drought_stress/Rice
$ mkdir data matrix homology olddata reference src_rice
$ mkdir reference/IRGSP && cd  reference/IRGSP
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gtf.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gff3.gz
$ gunzip *.gz
$ module load Anaconda3 hisat2
$ mkdir hsindex
$ hisat2-build -p 8 Oryza_sativa.IRGSP-1.0.dna.toplevel.fa hsindex/IRGSP
$ module unload Anaconda3 hisat2



Workflow:

1-3 :Run on the server. 4-7:Run on personal computer. 8-9:Run on the server
  1. Find bioprojects according to drought, roots and other conditions
  2. Make a samplelist.txt and save the sra number to be downloaded under data subdirectory
  3. command : nohup sh RNAseq_workflow.sh &
code:
$ cd ~/Drought_stress/Rice/data
$ vim samplelist.txt  # Then Enter the sra number we want to download
$ cd ../src_rice
$ nohup sh RNAseq_workflow.sh &  # This script can be found in the attachment

  1. Send count files to the local for downstream analysis(The R version of the server is too high to support the R package “biomRt”)
    (We can use scp command or FileZilla software to transfer files between local and server )
  2. Build an R project and use DESeq2 and biomaRt for diff analysis and annotation in Rstudio locally
  3. Run the following R scripts in sequence :downstream.R > Deseq2analysis.R > merge_desingn.R (Whole project can be found in the attachment named Rice4.zip)
  4. Send the diff gene table and gene count table to the server,Put them in the '~/Drought_stress/Rice/homology' directory

  1. Go to src_rice subdirectory
  2. Run related scripts
code:
$ cd ~/Drought_stress/Rice/src_rice
$ nohup sh anno.sh &
$ nohup sh merge.sh &
#Scripts can be found in the attachment
# the Rice.anno.txt can be found in the attachment
# the head.txt is a Colname for the final output table which was edit  and bind  from the colname of those raw files we used.

Attention:

If you have any suggestions or comments, please contact the author via xuyp8121@mail.ustc.edu.cn
We have been looking forward to friends who have the same interests in systems biology and comparative biology ?。?!
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容