bam diff-bowtie2&bwa結(jié)果比較-2020-01-09

參考:https://genome.sph.umich.edu/wiki/BamUtil:_diff

安裝BamUtil:

conda install -c bioconda bamutil

conda install -c bioconda/label/cf201901 bamut

準(zhǔn)備文件{bam1},{bam2}
bam diff --in1 ${bam1} --in2 ${bam2} --mapQual --onlyDiffs > ${bamdiff}.txt

結(jié)果比較:
導(dǎo)入包:

import os, subprocess
import pandas as pd

定義函數(shù)整理文件為pandas的dataframe:

def GetBamdiffArr(Bamdifftxt):
    BamdiffArr = []
    SeqIDs = []
    Bam1Dic, Bam2Dic = {}, {}
    with open(Bamdifftxt, 'r') as fr:
        lines = fr.readlines()
        for line in lines:
            if line.startswith(">"):
                items = line.strip().split()
                flag, pos, cigar, mapqual = "", "", "", ""
                for index, item in enumerate(items[1:]):
                    if index == 0:
                        flag = item
                    elif not str.isalnum(item):
                        pos = item
                    elif str.isdigit(item):
                        mapqual = item
                    else:
                        cigar = item

                Bam1Dic[key] = [flag, pos, cigar, mapqual]
            elif line.startswith("<"):
                items = line.strip().split()
                flag, pos, cigar, mapqual = "", "", "", ""
                for index, item in enumerate(items[1:]):
                    if index == 0:
                        flag = item
                    elif not str.isalnum(item):
                        pos = item
                    elif str.isdigit(item):
                        mapqual = item
                    else:
                        cigar = item

                Bam2Dic[key] = [flag, pos, cigar, mapqual]
            else:
                key = line.strip()
                SeqIDs.append(key)
    for key in SeqIDs:
        try:
            flag1, pos1, cigar1, mapqual1 = Bam1Dic[key]
            tag1 = 1
        except KeyError:
            flag1, pos1, cigar1, mapqual1 = "", "", "", ""
            tag1 = 0
        try:
            flag2, pos2, cigar2, mapqual2 = Bam2Dic[key]
            tag2 = -1
        except KeyError:
            flag2, pos2, cigar2, mapqual2 = "", "", "", ""
            tag2 = 0
        tag = tag1+tag2
        BamdiffArr.append([key,flag1, pos1, cigar1, mapqual1, tag1, flag2, pos2, cigar2, mapqual2, tag2, tag])

    return(BamdiffArr)

定義函數(shù)寫文件:

def WritFile(GetBamdiffArr, samplename):
    BamdiffDF = pd.DataFrame(BamdiffArr,columns = ["key", "flag1", "pos1", "cigar1", "mapqual1", "tag1", "flag2", "pos2", "cigar2", "mapqual2", "tag2", "tag"])
    BamdiffDF[['mapqual1','mapqual2']] = BamdiffDF[['mapqual1','mapqual2']].apply(pd.to_numeric)

    BamdiffDF["tag"][(BamdiffDF.loc[:,"pos2"]=="")&(BamdiffDF.loc[:,"cigar2"]=="")&(BamdiffDF.loc[:,"tag"]==1)] = "Bam1only"

    BamdiffDF["tag"][(BamdiffDF.loc[:,"pos1"]=="")&(BamdiffDF.loc[:,"cigar1"]=="")&(BamdiffDF.loc[:,"tag"]==-1)] = "Bam2only"

    BamdiffDF["tag"][((BamdiffDF.loc[:,"mapqual1"]<20)&(BamdiffDF.loc[:,"mapqual2"]<20))&(BamdiffDF.loc[:,"tag"]==0)] = "poorquality"

    BamdiffDF["tag"][(BamdiffDF.loc[:,"pos1"]=="")&(BamdiffDF.loc[:,"cigar1"]=="")&(BamdiffDF.loc[:,"pos2"]=="")&(BamdiffDF.loc[:,"cigar2"]=="")&(BamdiffDF.loc[:,"tag"]!="poorquality")] = "match"
    BamdiffDF["tag"][(BamdiffDF.loc[:,"pos1"]=="")&(BamdiffDF.loc[:,"cigar1"]=="")&(BamdiffDF.loc[:,"tag"]!="poorquality")] = "match"

    BamdiffDF["tag"][BamdiffDF.loc[:,"tag"]==0] = "mismatch"

    pd.DataFrame(BamdiffDF).to_csv(f"./{samplename}.csv", index=False)

    TagUniqCounts = pd.Series(BamdiffDF.loc[:,"tag"]).value_counts()
    pd.DataFrame(TagUniqCounts).to_excel(f"./{samplename}.counts.xlsx", sheet_name='sheet1')

執(zhí)行函數(shù):

samplename = Bamdifftxt.strip(".txt")
BamdiffArr = GetBamdiffArr(Bamdifftxt)
WritFile(GetBamdiffArr, samplename)

最后根據(jù)xlsx文件,可以將結(jié)果整理成venn圖。

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容