用米氏方程解決單細胞轉(zhuǎn)錄組dropout現(xiàn)象
米氏方程(Michaelis-Menten equation): v=Vmax × [S] /(Km+[S])
在假定存在一個穩(wěn)態(tài)反應(yīng)條件下推導出來的,其中 Km 值稱為米氏常數(shù),Vmax是酶被底物飽和時的反應(yīng)速度,[S]為底物濃度。
Km值的物理意義為反應(yīng)速度(v)達到1/2Vmax時的底物濃度(即Km=[S]),單位一般為mol/L,只由酶的性質(zhì)決定,而與酶的濃度無關(guān)??捎肒m的值鑒別不同的酶。
今天要介紹的這篇文章提出了一個算法,R包是:M3Drop , 文章是:Modelling dropouts for feature selection in scRNASeq experiments
挑選重要基因
目前已有的尋找單細胞轉(zhuǎn)錄組測序數(shù)據(jù)中的重要基因(feature selection)的方法都不夠好,比如 scLVM 主要是根據(jù)先驗基因集,比如cell-cycle or apoptosis來區(qū)分細胞。與此相反,基于 highly variable genes (HVG) 的方法挑選到的變化量大的那些基因很可能是技術(shù)帶來的誤差。而且低表達量基因的變動往往大于高表達量基因,而且所謂的表達變化大也并沒有很好的生物學解釋。
一個比較好理解的概念是差異基因,但是需要預先把細胞群體分組后進行比較才能得到,而很多時候細胞太相似了,沒辦法很好的分開。像PCA或者t-SNE這樣的降維方法也可以用來挑選重要基因,但它們也受制于系統(tǒng)誤差或者批次誤差等等。
dropout是scRNASeq數(shù)據(jù)的一大特點,就是很多基因在某些細胞根本就不表達,但是在另外的細胞卻高表達。這篇文章作者對全長轉(zhuǎn)錄本數(shù)據(jù)和基于UMI的表達量數(shù)據(jù)分別提出了對應(yīng)的解決方案,Michaelis-Menten equation 和 depth adjusted negative binomial (DANB)
單細胞轉(zhuǎn)錄組數(shù)據(jù)里面的dropouts可以達到50%,但是通常認為這個dropouts是因為在文庫構(gòu)建的過程中,有部分基因沒有被成功的反轉(zhuǎn)錄,是一個酶促反應(yīng)。
所以作者用Michaelis-Menten 來建模。
比較了9種 feature selection 方法,
使用它們分別對基因排序,算法如下:
- by the magnitude of their loadings in principal component analysis (PCA)
- by the strength of their most negative gene-gene correlation (Cor)
- by their relative Gini index (Gini)
- M3Drop dropouts-mean expression curve (M3Drop)
- the squared coefficient of variation (CV2)
- mean expression relationship (HVG)
- the dispersion-mean expression relationship fit by DANB (NBDisp)
- the dropouts-mean expression relationship fit by DANB (NBDrop).
這些算法都不需要預先對樣本進行分類,是無監(jiān)督的算法。
- differentially variable (DV)genes
- highly variable (HV) genes
- differentially expressed (DE) genes
單細胞轉(zhuǎn)錄組數(shù)據(jù)的batch effects比較嚴重,所以 feature selection 過程的一個主要目的就是降低技術(shù)誤差的影響,集中在有生物學意義的差異上面。
公共數(shù)據(jù)集
作者比較了 5個公共數(shù)據(jù)集,都是小鼠的胚胎細胞,含有17~255個細胞的測序數(shù)據(jù),包括zygote to blastocyst.
Tung et al. (2017) [12] considered iPSCs from three different individuals and performed three replicates of UMI-tagged scRNASeq and three replicates of bulk RNASeq for each. (GSE77288 ).
-
For Kolodziejczyk et al. (2015),we considered ESCs grown under two conditions: alternative 2i and serum for which there were three replicates of scRNASeq and two replicates of bulk RNASeq.( E-MTAB-2600 )
對bulk轉(zhuǎn)錄組數(shù)據(jù)用了3種方法找差異基因,分別是 DESeq2,edgeR,limma-voom?
只有3種方法都是 5% FDR的差異基因才認為是陽性標準基因集,那些3種方法都在 20% FDR的非差異基因認為是陰性金標準。
1,915 positives, and 8,398 negatives for the iPSCs
709 positives and 11,278 negatives for the ESCs
有了這些基因,就可以計算ROC
都細胞轉(zhuǎn)錄組數(shù)據(jù)文章一般分成下面兩大類:
第一大類是:deep sequencing of full-transcripts for a relatively small number of cells
代表性的文章如下:
- Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10,? 1093–1095 (2013).
- Fast, scalable and accurate differential expression analysis for single cells. (2016). doi:10.1101/049734
- Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343,? 193–196 (2014). 14. Brennecke, P. et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10,? 1093–1095 (2013).
- Dynamics of Global Gene Expression Changes during Mouse Preimplantation Development. Dev. Cell 6, 117–131 (2004).
- Roles of CDX2 and EOMES in human induced trophoblast progenitor cells. Biochem. Biophys. Res. Commun. 431, 197–202 (2013).
第二類是:high-cell number, low-depth sequencing of 3’ or 5’ ends of transcripts tagged with unique molecular identifiers
代表性的文章是:
- Quantification noise in single cell experiments. Nucleic Acids Res. 39,? e124 (2011).
- Quantification of mRNA in single cells and modelling of RT-qPCR induced noise. BMC Mol. Biol. 9,? 63 (2008).
- ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16,? 241 (2015).
- DNA methylation dynamics during epigenetic reprogramming in the germline and preimplantation embryos. Genes Dev. 28, 812–828 (2014).
- Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500,? 593–597 (2013).
(文章轉(zhuǎn)自jimmy的2018年閱讀文獻筆記)
生信基礎(chǔ)知識大全系列:生信基礎(chǔ)知識100講
史上最強的生信自學環(huán)境準備課來啦??! 7次改版,11節(jié)課程,14K的講稿,30個夜晚打磨,100頁PPT的課程。
如果需要組裝自己的服務(wù)器;代辦生物信息學服務(wù)器
如果需要幫忙下載海外數(shù)據(jù)(GEO/TCGA/GTEx等等),點我?
如果需要線下輔導及培訓,看招學徒
如果需要個人電腦:個人計算機推薦
如果需要置辦生物信息學書籍,看:生信人必備書單
如果需要實習崗位:實習職位發(fā)布
如果需要售后:點我
如果需要入門資料大全:點我