GATK中如何計(jì)算Inbreeding coefficient(近交系數(shù))

cover

[注]本文同時(shí)發(fā)布于我的個(gè)人博客

關(guān)于近交系數(shù)是什么的定義,除了英文資料,中文上也給出了清晰的定義,這里引用一下:

近交系數(shù)(inbreeding coefficient)是指根據(jù)近親交配的世代數(shù),將基因的純化程度用百分?jǐn)?shù)來表示即為近交系數(shù),也指個(gè)體由于近交而造成異質(zhì)基因減少時(shí),同質(zhì)基因或純合子所占的百分比也叫近交系數(shù),普遍以F或f來表示。

GATK近交系數(shù)的計(jì)算程序在github上可以找到:AS_InbreedingCoeff.java

代碼不短,但計(jì)算很簡單,我主要說展示一下這個(gè)計(jì)算的核心部分并在代碼中做些注釋,如下:

    protected double calculateIC(final VariantContext vc, final Allele altAllele) {
        final int AN = vc.getCalledChrCount();
        final double altAF;

        final double hetCount = heterozygosityUtils.getHetCount(vc, altAllele);

        final double F;
        //shortcut to get a value closer to the non-alleleSpecific value for bialleleics
        if (vc.isBiallelic()) {
            double refAC = heterozygosityUtils.getAlleleCount(vc, vc.getReference());
            double altAC = heterozygosityUtils.getAlleleCount(vc, altAllele);
            double refAF = refAC/(altAC+refAC);
            altAF = 1 - refAF;
            F = 1.0 - (hetCount / (2.0 * refAF * altAF * (double) heterozygosityUtils.getSampleCount())); // inbreeding coefficient
        } else {
            //compare number of hets for this allele (and any other second allele) with the expectation based on AFs
            //derive the altAF from the likelihoods to account for any accumulation of fractional counts from non-primary likelihoods,
            //e.g. for a GQ10 variant, the probability of the call will be ~0.9 and the second best call will be ~0.1 so adding up those 0.1s for het counts can dramatically change the AF compared with integer counts
            altAF = heterozygosityUtils.getAlleleCount(vc, altAllele)/ (double) AN;

            // 計(jì)算inbreeding coefficient
            F = 1.0 - (hetCount / (2.0 * (1 - altAF) * altAF * (double) heterozygosityUtils.getSampleCount())); // heterozygosityUtils.getSampleCount() 獲取總樣本數(shù) 
        }

        return F;
    }

總的來說,是利用哈迪溫伯格定律來計(jì)算的。 1.0 - (hetCount / (2.0 * (1 - altAF) * altAF(double)N ,N是人數(shù)。這個(gè)值給出的是期望的雜合變異的個(gè)數(shù)。所以參數(shù)F說的就是"實(shí)際的hetCount”除以"期望的hetCount"再與1.0取差。當(dāng)F值越接近0,就意味著實(shí)際的hetCount與理論的hetCount越接近。

====

關(guān)注我的公眾號:堿基礦工(helixminer),更及時(shí)了解更多信息

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容