exit1是退出的意思

忽視輸入的報錯，重定向在日志文件，不是報錯【error】

image.png

轉(zhuǎn)錄組分析總流程

image.png

數(shù)據(jù)質(zhì)控

背景知識

數(shù)據(jù)量的統(tǒng)計方式

image.png

sra轉(zhuǎn)換成fastq

image.png

# 定義文件夾
fqdir= ~/project/Human_16-Asthma-Trans/data/rawdata/sra/fastq

# 單個轉(zhuǎn)換
fastq-dump --gzip --split-2 -X 25000 -O ${fqdir} SRR1039510.sra
#fasterq-dump --split-files SRR11180057.sra
##-O是輸出到命令

標準輸出格式；為什么要拆分呢？因為后續(xù)的軟件不支持sra格式

image.png

# 批量轉(zhuǎn)換
ls /trainee2/Nov10/project/Human_16-Asthma-Trans/data/rawdata/sra/*sra|while read id
do
  echo "fastq-dump --gzip --split-e -X 25000 -O ${fqdir} ${id}"
done >sra2fq.sh

# 提交后臺運行命令
nohup sh sra2fq.sh >sra2fq.log &

質(zhì)控

image.png

image.png

But what does this quality score mean?
The quality score for each sequence is a string of characters, one for each base of the nucleic sequence, used to characterize the probability of mis-identification of each base.

The score is encoded using the ASCII character table :

image.png

image.png

顯示第一行

質(zhì)控軟件

image.png

# 激活conda環(huán)境
conda activate rna

# 連接數(shù)據(jù)到自己的文件夾
ln -s /teach/data/airway/fastq_raw25000/*gz .

# 使用FastQC軟件對單個fastq文件進行質(zhì)量評估，結(jié)果輸出到qc/文件夾下
qcdir=~/project/Human_16-Asthma-Trans/data/rawdata/qc
fqdir=~/project/Human_16-Asthma-Trans/data/rawdata/fastq
fastqc -t 3 -o $qcdir $fqdir/SRR1039510_1.fastq.gz

image.png

生成的文件通過Xftp拖出

雙擊打開html后文件如下

如何看質(zhì)控報告

綠色的勾勾表示達到了標準，xx表示沒有達到標準

Basic Statistics

image.png

image.png
數(shù)據(jù)量的統(tǒng)計方式

通常說數(shù)據(jù)量多少是指的測了多少個堿基，而不是說這個文件的大小

3.9G是測序的數(shù)據(jù)量，2.2G是文件的大小

Per base sequence quality

每一個位置的的堿基都有一個對應的箱圖

Per tile sequence quality

image.png

Per Sequence Quality Scores

image.png

Per Sequen GC

image.png


# 多個數(shù)據(jù)質(zhì)控
fastqc -t 2 -o $qcdir $fqdir/SRR*.fastq.gz

# 使用MultiQc整合FastQC結(jié)果
multiqc *.zip

image.png

數(shù)據(jù)過濾

過濾條件

image.png

1、adapter是一段短的序列已知的核酸鏈，用于鏈接序列未知的目標測序片段。
2、barcode，也稱為index，是一段很短的寡居核酸鏈，用于在多個樣品混合測序時，標記不同的樣品。
3、insert是用于測序的目標片段，因為是包括在兩個adapter之間，所以被稱為“插入”片段。
一個常見測序片段類似與adapter--barcode--insert--adapter。測序開始時前幾個堿基無法測得，第一個adapter在數(shù)據(jù)輸出時被去除；由于測序儀讀長限制，第二個adapter通常無法測得。所以，經(jīng)常得到類似 barcode--部分insert的read。最后，把barcode去除，只保留測度insert的片段，這個操作的術語是demultiplexing。但是有時候測序時會測穿，也就是說會得到barcode--insert的read--部分adapter，那么這里就包含了接頭了，這里的接頭也就是大家經(jīng)常說去接頭要去除的部分。

image.png

image.png

trim_galore過濾

image.png

# 定義文件夾
rawdata=/teach/project/Human-16-Asthma-Trans/data/rawdata/fastq
cleandata=/teach/project/Human-16-Asthma-Trans/data/cleandata/trim_galore

# 單個樣本
trim_galore --phred33 -q 30 --length 30 --stringency 3 --fastqc --paired --max_n 3 -o $cleandata $rawdata/SRR1039510_1.fastq.gz $rawdata/SRR1039510_2.fastq.gz
##length是小于30的不要  max——n指的是含有未知的大于3就要去掉了

image.png

# 多個
cat /teach/project/Human-16-Asthma-Trans/data/rawdata/sra/sampleId.txt | while read id
do
    echo "trim_galore --phred33 -q 20 --length 36 --stringency 3 --fastqc --paired --max_n 3 -o ${cleandata} ${rawdata}/${id}_1.fastq.gz ${rawdata}/${id}_2.fastq.gz"
done >trim_galore.sh

nohup sh trim_galore.sh >trim_galore.log &

# 使用MultiQc整合FastQC結(jié)果
multiqc *.zip

fastp過濾

image.png

# 定義文件夾
cleandata=/teach/project/Human-16-Asthma-Trans/data/cleandata/fastp

# 單個樣本
fastp -i $rawdata/SRR1039510_1.fastq.gz -I $rawdata/SRR1039510_2.fastq.gz \
-o $cleandata/SRR1039510_1.fastp.fq.gz -O $cleandata/SRR1039510_2.fastp.fq.gz \
-l 36 -q 20 --compression=6 -R $cleandata/SRR1039510 \
-h $cleandata/SRR1039510.fastp.html -j $cleandata/SRR1039510.fastp.json 

# 多個樣本
cat /teach/project/Human-16-Asthma-Trans/data/rawdata/sra/sampleId.txt | while read id
do
    echo "fastp -i ${rawdata}/${id}_1.fastq.gz -I ${rawdata}/${id}_2.fastq.gz -o ${cleandata}/${id}_1.fastp.fq.gz -O ${cleandata}/${id}_2.fastp.fq.gz -l 36 -q 20 --compression=6 -R ${cleandata}/${id} -h ${cleandata}/${id}.fastp.html -j ${cleandata}/${id}.fastp.json 1>$cleandata/${id}.fastp.log 2>&1"
done >fastp.sh

# 運行fastp腳本
nohup sh fastp.sh >fastp.log &

數(shù)據(jù)過濾前后的比較

# 進入過濾目錄
cd /teach/project/Human-16-Asthma-Trans/data/cleandata/trim_galore

# 原始數(shù)據(jù)
zcat $rawdata/SRR1039510_1.fastq.gz | paste - - - - > raw.txt

#  過濾后的數(shù)據(jù)
zcat SRR1039510_1_val_1.fq.gz |paste - - - - > trim.txt
awk '(length($4)<63){print$1}' trim.txt > ID
head -n 100 ID > ID100
grep -w -f ID100 trim.txt | awk '{print$1,$4}' > trim.sm
grep -w -f ID100 raw.txt | awk '{print$1,$4}' > raw.sm
paste raw.sm trim.sm | awk '{print$2,$4}' | tr ' ' '\n' |less -S

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

【轉(zhuǎn)錄組03】報錯分析&數(shù)據(jù)質(zhì)控和過濾

【轉(zhuǎn)錄組03】報錯分析&數(shù)據(jù)質(zhì)控和過濾

轉(zhuǎn)錄組分析總流程

數(shù)據(jù)質(zhì)控

背景知識

sra轉(zhuǎn)換成fastq

質(zhì)控

質(zhì)控軟件

如何看質(zhì)控報告

數(shù)據(jù)量的統(tǒng)計方式

Per base sequence quality

Per tile sequence quality

Per Sequence Quality Scores

Per Sequen GC

數(shù)據(jù)過濾

過濾條件

trim_galore過濾

fastp過濾

數(shù)據(jù)過濾前后的比較

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

【轉(zhuǎn)錄組03】報錯分析&數(shù)據(jù)質(zhì)控和過濾

轉(zhuǎn)錄組分析總流程

數(shù)據(jù)質(zhì)控

背景知識

sra轉(zhuǎn)換成fastq

質(zhì)控

質(zhì)控軟件

如何看質(zhì)控報告

數(shù)據(jù)量的統(tǒng)計方式

Per base sequence quality

Per tile sequence quality

Per Sequence Quality Scores

Per Sequen GC

數(shù)據(jù)過濾

過濾條件

trim_galore過濾

fastp過濾

數(shù)據(jù)過濾前后的比較

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av