文章復(fù)現(xiàn)-全外顯子數(shù)據(jù)分析學(xué)習(xí)3去接頭前質(zhì)控

教程在:腫瘤外顯子數(shù)據(jù)處理系列教程(二)質(zhì)控與去接頭 (qq.com)

安裝軟件

這里要用到fastqc和python,先檢查自己有沒有

fastqc -h
是有安裝的

如果沒有就要按照教程進(jìn)行安裝,如下:

## fastqc使用多線程要用--threads

wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.8.zip
unizp fastqc_v0.11.8.zip
ln -s ~/tools/FastQC/fastqc ~/tools/bin/fastqc

mkdir -p 2.qc/{pre, post}
nohup fastqc --outdir ../2.qc/pre --threads 16 *.fastq.gz > ../2.qc/pre/fastqc.log 2>&1 &


## 需要先安裝python
## sudo apt install libffi-dev
cd
cd tools/
mkdir python
wget https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz
tar zxvf Python-3.7.3.tgz
cd Python-3.7.3
./configure --prefix="/home/llwu/tools/python/"
make
make install
cd ../python/bin
ls * | while read id
do
ln -s ~/tools/python/bin/${id} ~/tools/bin/${id}
done

這里的鏈接可能也是過時(shí)的,需要去找新的進(jìn)行下載
我檢查了一下我的python的版本是

(/home/jiarongf/my-envs/wes) 10:11:01 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/0.sra
$
python --version
Python 2.7.15

不知道后面是否會(huì)有影響,結(jié)果發(fā)現(xiàn)自己的安裝實(shí)在home下的環(huán)境,所以趕緊去更改了一下,具體可以看我發(fā)布的一篇更改虛擬環(huán)境位置的文章,
更改之后就python就是3.8的了

(/home/jiarongf/my-envs/wes) 10:36:29 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/0.sra
$
python --version
Python 3.8.8

更改了之后記得重新進(jìn)行wes的環(huán)境

source activate wes
(wes) 11:18:00 jiarongf@172.16.10.223:/data1/jiarongf

還會(huì)用到MultiQC,進(jìn)行安裝,也是直接安裝在run文件夾下
下面是安裝的腳本,可以分開運(yùn)行

curl -LOk https://github.com/ewels/MultiQC/archive/master.zip
unzip master.zip
cd MultiQC-master
python setup.py install

下載了之后才發(fā)現(xiàn)這個(gè)可以用pip或者conda安裝

Installation

You can install MultiQC from PyPI
using pip as follows:

bash

pip install multiqc

Alternatively, you can install using Conda
from the bioconda channel:

conda install -c bioconda multiqc

If you would like the development version instead, the command is:

pip install --upgrade --force-reinstall git+https://github.com/ewels/MultiQC.git

但是此處我已經(jīng)下載了,所以直接安裝就好了

python setup.py install
安裝完成

檢測(cè)安裝是否成功

multiqc  -h
image.png

去接頭前質(zhì)控

首先需要把之前生成的fastq的文件放到一個(gè)目錄下,定義為raw_data


image.png
(wes) 15:48:37 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim pre_qc_report.sh
(wes) 15:49:31 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_qc_report.sh

  /// MultiQC ?? | v1.13.dev0

|           multiqc | Report title:  QC REPORT OF SRP070662
|           multiqc | Search path : /data1/jiarongf/learning/cancer-WES/0.sra/raw_data
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 60/60
|           multiqc | No analysis results found. Cleaning up..
|           multiqc | MultiQC complete
(wes) 15:49:51 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat pre_qc_report.sh
multiqc ../0.sra/raw_data/ -n pre_qc_report -p -i " QC REPORT OF SRP070662" -o multiqc
(wes) 15:52:06 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_qc_report.sh > ../log/pre_qc_report.log 2>&1

但是感覺沒有輸出結(jié)果
這里突然感覺應(yīng)該先fastqc,然后再去multi,找到一個(gè)好的教程就是從下載sra數(shù)據(jù),解壓為fastq,在做fastqc在multi,然后合成一個(gè)報(bào)告,還有報(bào)告結(jié)果的解讀都很不錯(cuò)
MultiQC使用 | 碼農(nóng)家園 (codenong.com)

fastqc

淺試一個(gè)

(wes) 15:58:02 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
fastqc /data1/jiarongf/learning/cancer-WES/0.sra/raw_data/case3_techrep_2_WES_2.fastq.gz
Started analysis of case3_techrep_2_WES_2.fastq.gz
Approx 5% complete for case3_techrep_2_WES_2.fastq.gz
Approx 10% complete for case3_techrep_2_WES_2.fastq.gz
Approx 15% complete for case3_techrep_2_WES_2.fastq.gz
Approx 20% complete for case3_techrep_2_WES_2.fastq.gz
Approx 25% complete for case3_techrep_2_WES_2.fastq.gz
Approx 30% complete for case3_techrep_2_WES_2.fastq.gz
Approx 35% complete for case3_techrep_2_WES_2.fastq.gz
Approx 40% complete for case3_techrep_2_WES_2.fastq.gz
Approx 45% complete for case3_techrep_2_WES_2.fastq.gz
Approx 50% complete for case3_techrep_2_WES_2.fastq.gz
Approx 55% complete for case3_techrep_2_WES_2.fastq.gz
Approx 60% complete for case3_techrep_2_WES_2.fastq.gz
Approx 65% complete for case3_techrep_2_WES_2.fastq.gz
Approx 70% complete for case3_techrep_2_WES_2.fastq.gz
Approx 75% complete for case3_techrep_2_WES_2.fastq.gz
Approx 80% complete for case3_techrep_2_WES_2.fastq.gz
Approx 85% complete for case3_techrep_2_WES_2.fastq.gz
Approx 90% complete for case3_techrep_2_WES_2.fastq.gz
Approx 95% complete for case3_techrep_2_WES_2.fastq.gz
Analysis complete for case3_techrep_2_WES_2.fastq.gz
(wes) 16:05:53 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$

時(shí)間太久了,建議后臺(tái)跑,再有就是這個(gè)成功的,
跑完以后再原來的目錄生成了

image.png

這只是一個(gè)的,所以需要循環(huán)一下,run目錄下寫個(gè)腳本
前面的腳本pre_qc_report.sh也需要改個(gè)名字

(wes) 16:08:26 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
mv pre_qc_report.sh pre_qc_multiqc.sh

上述做錯(cuò)了,重新搞

去接頭前質(zhì)控

(wes) 16:33:05 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
cd /data1/jiarongf/learning/cancer-WES
(wes) 16:31:37 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
mkdir -p 2.qc/{pre,post}
(wes) 16:33:01 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
l
0.sra/  2.qc/  data/  log/  run/
(wes) 16:33:02 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
$
ls 2.qc/
post  pre

質(zhì)控

nohup fastqc --outdir ../2.qc/pre --threads 16 *.fastq.gz > ../2.qc/pre/fastqc.log 2>&1 &
(wes) 16:35:55 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim pre_fastqc.sh

(wes) 16:41:22 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat pre_fastqc.sh
nohup fastqc --outdir ../2.qc/pre --threads 16 ../0.sra/raw_data/*.fastq.gz > ../log/pre.fastqc.log 2>&1 &
(wes) 16:41:09 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_fastqc.sh
(wes) 16:41:30 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$


直接掛到后臺(tái)去,


image.png

一共有60個(gè),


image.png

16個(gè)16個(gè)的跑,感覺也要跑很久,可以先做去接頭


image.png

跑完啦

/data1/jiarongf/learning/cancer-WES/2.qc/pre

image.png

可以multi了

10:24:45 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
vim pre_qc_multiqc.sh
10:26:37 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
cat pre_qc_multiqc.sh
multiqc /data1/jiarongf/learning/cancer-WES/2.qc/pre  -n pre_qc_report -p -i " QC REPORT OF SRP070662" -o /data1/jiarongf/learning/cancer-WES/2.qc/pre/
10:26:42 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$
sh pre_qc_multiqc.sh > ../log/pre_qc_multiqc.sh.log 2>&1
10:28:29 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
$

image.png

image.png

看結(jié)果

1.general statistical

image.png

image.png

2.fastqc

2.1 Sequence Counts

image.png

image.png

黑色的是重復(fù)序列,藍(lán)色的是有用的序列(reads)
可以看到重復(fù)序列的比例還是小的

2.2 Sequence Quality Histograms

每個(gè)read各位置堿基的平均測(cè)序質(zhì)量

每一條線是一個(gè)樣本,應(yīng)該打分都在綠色區(qū)域就會(huì)好一點(diǎn)
綠色區(qū)間——質(zhì)量很好,橙色區(qū)間——質(zhì)量合理,紅色區(qū)間——質(zhì)量不好


每個(gè)read各位置堿基的平均測(cè)序質(zhì)量

這60個(gè)都是綠色的線,代表測(cè)序質(zhì)量挺好的

2.3 Per Sequence Quality Scores

具有平均質(zhì)量分?jǐn)?shù)的reads的數(shù)量

綠色區(qū)間——質(zhì)量很好、橙色區(qū)間——質(zhì)量合理、紅色區(qū)間——質(zhì)量不好


質(zhì)量不好的這一條

2.4 Per base Sequence Content

每個(gè)read各位置堿基ATCG的比例

image.png

結(jié)果顯示3序列都warn,如果是紅色的說明每個(gè)位置每種堿基出現(xiàn)的概率差別很大,可能有過表達(dá)序列的污染。


中文版哈哈哈哈

2.5 Per Sequence GC Content

image.png

image.png

image.png

以下是一個(gè)錯(cuò)誤的例子


image.png

這里結(jié)果顯示四條序列都被報(bào)錯(cuò),從形狀上來看曲線和正態(tài)曲線相差甚遠(yuǎn),可能是由于文庫(kù)的污染或是部分reads構(gòu)成的子集有偏差造成的

2.6 Per Base N Content 每條reads各位置N堿基含量比例

image.png

說明測(cè)序儀器能辨別這60個(gè)樣本中每條reads的每個(gè)位置的堿基

2.7 Sequence Length Distribution 序列長(zhǎng)度分布

image.png

對(duì)于這幾個(gè)樣本,每次測(cè)序儀測(cè)出來的長(zhǎng)度主要都在76bp

2.8 Sequence Duplication Levels 每個(gè)序列的相對(duì)重復(fù)水平

image.png

2.9 Overrepresented sequences 文庫(kù)中過表達(dá)序列的比例

image.png

這些序列中過表達(dá)的序列的比例都遠(yuǎn)遠(yuǎn)超過1%,如果有的序列中過表達(dá)的序列超過50%,這種情況的出現(xiàn),不是這種轉(zhuǎn)錄本巨量表達(dá),就是樣品被污染。

2.10 Adapter Content 接頭含量

image.png

這是沒有接頭的意思????不太清楚了,一會(huì)和去接頭以后的再比較以下,看看有什么差別

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容