日本免费1区2区,色婷婷激情四射

擴(kuò)增子測(cè)序是分析環(huán)境微生物的常見(jiàn)手段，通常使用的是16s rRNA片段。16srRNA分析主要有質(zhì)控、去冗余、聚類OTU、去嵌合體、生成OTU表和物種注釋等步驟。

出發(fā)點(diǎn)

最開始聽(tīng)人講擴(kuò)增子分析，我是云里霧里完全聽(tīng)不懂的蒙蔽狀態(tài)。后來(lái)有幸認(rèn)識(shí)了一位不辭辛苦或者說(shuō)對(duì)“傻子”友好的技術(shù)達(dá)人，在他的幫助下了解了擴(kuò)增子分析內(nèi)的16s rRNA的具體流程等。加上最近剛剛學(xué)習(xí)了流程管理工具snakemake，于是萌發(fā)了用snakemake串聯(lián)16s分析的想法，說(shuō)做就做，先看看前期數(shù)據(jù)處理的可視化圖。

數(shù)據(jù)

18份來(lái)自宏基因組公眾號(hào)的雙端16s rRNA原始下機(jī)數(shù)據(jù)。

sample	fq1	fq2
KO1	data/KO1_1.fq.gz	data/KO1_2.fq.gz
KO2	data/KO2_1.fq.gz	data/KO2_2.fq.gz
KO3	data/KO3_1.fq.gz	data/KO3_2.fq.gz
KO4	data/KO4_1.fq.gz	data/KO4_2.fq.gz
KO5	data/KO5_1.fq.gz	data/KO5_2.fq.gz
KO6	data/KO6_1.fq.gz	data/KO6_2.fq.gz
OE1	data/OE1_1.fq.gz	data/OE1_2.fq.gz
OE2	data/OE2_1.fq.gz	data/OE2_2.fq.gz
OE3	data/OE3_1.fq.gz	data/OE3_2.fq.gz
OE4	data/OE4_1.fq.gz	data/OE4_2.fq.gz
OE5	data/OE5_1.fq.gz	data/OE5_2.fq.gz
OE6	data/OE6_1.fq.gz	data/OE6_2.fq.gz
WT1	data/WT1_1.fq.gz	data/WT1_2.fq.gz
WT2	data/WT2_1.fq.gz	data/WT2_2.fq.gz
WT3	data/WT3_1.fq.gz	data/WT3_2.fq.gz
WT4	data/WT4_1.fq.gz	data/WT4_2.fq.gz
WT5	data/WT5_1.fq.gz	data/WT5_2.fq.gz
WT6	data/WT6_1.fq.gz	data/WT6_2.fq.gz

步驟

質(zhì)控
去冗余
聚類OTU
去嵌合體
生成OTU表
物種注釋

import os
import sys
import shutil
import pandas as pd

configfile: "config.yaml"
samples = pd.read_csv(config["samples"], sep="\t", index_col=["sample"])

rule all:
    input:
        expand("{taxonomy}/taxonomy.{{ref}}.{{type}}.{res}", 
                        taxonomy=config["results"]["taxonomy"],
                        ref=["sliva", "RDP"], 
                        type=["cluster","unoise3"],
                        res=["biom","mothur","txt"])

include: "rules/00.fastqc.snk"
include: "rules/01.trim.snk"
include: "rules/02.deredundancy.snk"
include: "rules/03.clusterOTU.snk"
include: "rules/04.rechimeras.snk"
include: "rules/05.OTUtable.snk" 
include: "rules/06.assign_taxonomy.snk"
#include: "rules/07.picrust.snk"  # 這一步還沒(méi)有實(shí)現(xiàn)

質(zhì)控

質(zhì)控包括剔除質(zhì)量低的reads和切除帶有barcodes的接頭等

檢查reads質(zhì)量情況

設(shè)置通配符函數(shù)，匹配樣本名字，wildcards是snakemake的通配符函數(shù)。
fastqc和multiqc軟件可以生成可視化的reads質(zhì)量分布情況等的網(wǎng)頁(yè)版文件。
config是snakemake內(nèi)調(diào)用配置文件的參數(shù)。
fastqc和multiqc的結(jié)果可以確定質(zhì)控過(guò)程reads應(yīng)該保留的長(zhǎng)度。

去冗余

移除出現(xiàn)頻率低的reads，這些reads可能是測(cè)序錯(cuò)誤引起的，minuniquesize參數(shù)越大，相對(duì)而言計(jì)算量越小，正常建議設(shè)置成2，只去除單次出現(xiàn)的序列。
Discard_singletons排序后再去除單次出現(xiàn)的序列。

rule dereplicate:
    input:
        os.path.join(config["results"]["trim"], "summary_trimmed.fa")
    output:
        temp(os.path.join(config["results"]["deredundancy"], "deredundancy.fa"))
    params:
        name = config["params"]["deredundancy"]["name"],
        mins = config["params"]["deredundancy"]["mins"]
    log:
        os.path.join(config["logs"], "02.deredundancy.log")
    shell:
        '''
        vsearch --derep_fulllength {input} --output {output} --relabel {params.name} \
            --minuniquesize {params.mins} --sizeout 2>{log}
        '''

rule Discard_singletons:
    input:
        os.path.join(config["results"]["deredundancy"], "deredundancy.fa")
    output:
        os.path.join(config["results"]["deredundancy"], "sorted.fa")
    params:
        size = config["params"]["deredundancy"]["size"]
    log:
        os.path.join(config["logs"], "02.sorted.log")
    shell:
        '''
        vsearch --sortbysize {input} --output {output} --minsize {params.size} 2>{log}
        '''

聚類OTU

根據(jù)序列相似性聚類（一般閾值通常為97%）能有效避免測(cè)序錯(cuò)誤，將聚類后的序列集合稱為可操作分類單元（operational taxonomic units OTUs），每一個(gè)OTU代表一類物種。常用的方法有uparse、unoise3等，但這些方法沒(méi)有考慮因單堿基多態(tài)性而存在的序列多樣性，于是deblur和DADA2算法被開放出來(lái)針對(duì)聚類和單堿基多態(tài)性。這里我們暫時(shí)用uparse和unoise3算法。

rule cluster_vsearch:
    input:
        os.path.join(config["results"]["deredundancy"], "sorted.fa")
    output:
        fa     = os.path.join(config["results"]["clusterOTU"], "OTU.cluster.fa"),
        biom   = os.path.join(config["results"]["clusterOTU"], "OTU.cluster.biom"),
        mothur = os.path.join(config["results"]["clusterOTU"], "OTU.cluster.mothur"),
        tab    = os.path.join(config["results"]["clusterOTU"], "OTU.cluster.txt"),
        uc     = os.path.join(config["results"]["clusterOTU"], "OTU.cluster.uc")
    params:
        identity = config["params"]["clusterOTU"]["identity"],
        threads  = config["params"]["clusterOTU"]["threads"],
        name     = config["params"]["clusterOTU"]["name"]
    log:
        os.path.join(config["logs"], "03.clusterOTU.vsearch.log")
    shell:
        '''
        vsearch --threads {params.threads} --cluster_fast {input} --id {params.identity} \
            --centroids {output.fa} --biomout {output.biom} \
            --mothur_shared_out {output.mothur} --otutabout {output.tab} \
            --relabel {params.name} --uc {output.uc}  2>{log}
        '''

rule cluster_uparse:
    input:
        os.path.join(config["results"]["deredundancy"], "sorted.fa")
    output:
        os.path.join(config["results"]["clusterOTU"], "OTU.uparse.fa")
    params:
        name = config["params"]["clusterOTU"]["name"]
    log:
        os.path.join(config["logs"], "03.clusterOTU.uparse.log")
    shell:
        '''
        usearch11 -cluster_OTUs {input} -otus {output} -relabel {params.name} 2>{log}
        '''

rule cluster_unoise3:
    input:
        os.path.join(config["results"]["clusterOTU"], "OTU.uparse.fa")
    output:
        os.path.join(config["results"]["clusterOTU"], "OTU.unoise3.fa")
    log:
        os.path.join(config["logs"], "03.clusterOTU.unoise3.log")
    shell:
        '''
        usearch11 -unoise3 {input} -zotus {output} 2>{log}
        '''

去嵌合體

嵌合體的產(chǎn)生是因?yàn)镻CR過(guò)程可能某兩端DNA序列連接在一起而擴(kuò)增。
去嵌合體的原理是將序列比對(duì)到對(duì)應(yīng)的參考數(shù)據(jù)庫(kù)，16s rRNA的參考數(shù)據(jù)庫(kù)主要有RDP、GreenGenes和sliva，其中常用的是RDP和sliva。
去嵌合體后保留的序列即為代表序列。

rule rechimeras_sliva:
    input:
        expand("{cluster}/OTU.{{type}}.fa", 
                cluster=config["results"]["clusterOTU"], 
                type=["cluster","unoise3"])
    output:
        borderline  = os.path.join(config["results"]["rechimeras"], "chimeric.{type}.borderline"),
        nonchimeras = os.path.join(config["results"]["rechimeras"], "OTU.rechimera_silva.{type}.fa"),
        chimeras    = os.path.join(config["results"]["rechimeras"], "chimeric_silva.{type}.sequence")
    params:
        db = config["params"]["rechimeras"]["silva"] 
    log:
        os.path.join(config["logs"], "04.OTU.rechimera_silva.{type}.log")
    shell:
        '''
        vsearch --uchime_ref {input} --db {params.db} --borderline {output.borderline} \
            --chimeras {output.chimeras} --nonchimeras {output.nonchimeras} 2>{log}
        '''

rule rechimeras_RDP:
    input:
        expand("{cluster}/OTU.{{type}}.fa", 
                cluster=config["results"]["clusterOTU"], 
                type=["cluster","unoise3"])
    output:
        os.path.join(config["results"]["rechimeras"], "OTU.rechimera_RDP.{type}.fa")
    params:
        db = config["params"]["rechimeras"]["rdp"]
    log:
        os.path.join(config["logs"], "04.OTU.rechimera_RDP.{type}.log")
    shell:
        '''
        usearch11 -uchime2_ref {input} -db {params.db} -chimeras {output} -strand plus -mode balanced 2>{log}
        '''

生成OTU表

聚類生成的OTU比對(duì)到去嵌合體后的OTU得到OTU表。
設(shè)置比對(duì)閾值 identity: 97%。

rule make_otutab_vsearch:
    input:
        otu_ref = expand("{rechimeras}/OTU.rechimera_{{ref}}.{{type}}.fa", 
                        rechimeras=config["results"]["rechimeras"],
                        ref=["sliva", "RDP"], 
                        type=["cluster","unoise3"]),
        trim_fa = os.path.join(config["results"]["trim"], "summary_trimmed.fa")
    output:
        aln    = os.path.join(config["results"]["OTUtable"], "OTU_table.{ref}.{type}.shr.aln"),
        biom   = os.path.join(config["results"]["OTUtable"], "OTU_table.{ref}.{type}.biom"),
        mothur = os.path.join(config["results"]["OTUtable"], "OTU_table.{ref}.{type}.mothur"),
        uc     = os.path.join(config["results"]["OTUtable"], "OTU_table.{ref}.{type}.uc"),
        txt    = os.path.join(config["results"]["OTUtable"], "OTU_table.{ref}.{type}.txt")
    params:
        identity = config["params"]["OTUtable"]["identity"],
        threads  = config["params"]["OTUtable"]["threads"]
    log:
        os.path.join(config["logs"], "05.OTU_table.{ref}.{type}.log")
    shell:
        '''
        vsearch --usearch_global {input.trim_fa} --db {input.otu_ref} --id {params.identity}\
            --strand plus --threads {params.threads} --alnout {output.aln} --biomout {output.biom} \
            --mothur_shared_out {output.mothur} --uc {output.uc} --otutabout {output.txt}   2>{log}
        '''

rule OTU_tight_clusters:
    input:
        otu_ref = expand("{rechimeras}/OTU.rechimera_{{ref}}.{{type}}.fa", 
                        rechimeras=config["results"]["OTUtable"],
                        ref=["sliva", "RDP"], 
                        type=["cluster","unoise3"])
    output:
        fa = os.path.join(config["results"]["OTUtable"], "OTU.rechimera.{ref}.{type}_new.fa"),
        uc = os.path.join(config["results"]["OTUtable"], "OTU.rechimera.{ref}.{type}_hits.uc")
    params:
        identity = config["params"]["OTUtable"]["identity"],
        accepts  = config["params"]["OTUtable"]["accepts"],
        rejects  = config["params"]["OTUtable"]["rejects"]
    log:
        os.path.join(config["logs"], "05.OTU_table.{ref}.{type}.log")
    shell:
        '''
        usearch11 -cluster_fast {input.otu_ref} -id {params.identity} \
            -maxaccepts {params.accepts} -maxrejects {params.rejects} \
            -top_hit_only -uc {output.uc} -centroids {output.fa} 2>{log}
        '''

物種注釋

代表序列比對(duì)到16s rRNA參考數(shù)據(jù)庫(kù)。
控制比對(duì)identity參數(shù)。

rule assign_taxonomy:
    input:
        otu_ref = expand("{rechimeras}/OTU.rechimera_{{ref}}.{{type}}.fa", 
                        rechimeras=config["results"]["rechimeras"],
                        ref=["sliva", "RDP"], 
                        type=["cluster","unoise3"])
    output:
        biom   = os.path.join(config["results"]["taxonomy"], "taxonomy.{ref}.{type}.biom"),
        mothur = os.path.join(config["results"]["taxonomy"], "taxonomy.{ref}.{type}.mothur"),
        txt    = os.path.join(config["results"]["taxonomy"], "taxonomy.{ref}.{type}.txt")
    params:
        identity = config["params"]["taxonomy"]["identity"],
        threads  = config["params"]["taxonomy"]["threads"],
        database = config["params"]["taxonomy"]["silva"]
    log:
        os.path.join(config["logs"], "06.taxonomy.{ref}.{type}.log")
    shell:
        '''
        vsearch --usearch_global {input.otu_ref} --db {params.database} \
                --biomout {output.biom} --mothur_shared_out {output.mothur} --otutabout {output.txt} \
                --id {params.identity} --threads {params.threads} 2>{log}
        '''

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

擴(kuò)增子分析：16s rRNA分析snakemake流程

擴(kuò)增子分析：16s rRNA分析snakemake流程

出發(fā)點(diǎn)

數(shù)據(jù)

步驟

質(zhì)控

檢查reads質(zhì)量情況

去冗余

聚類OTU

去嵌合體

生成OTU表

物種注釋

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

擴(kuò)增子分析：16s rRNA分析snakemake流程

出發(fā)點(diǎn)

數(shù)據(jù)

步驟

質(zhì)控

檢查reads質(zhì)量情況

去冗余

聚類OTU

去嵌合體

生成OTU表

物種注釋

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av