單細(xì)胞轉(zhuǎn)錄組測(cè)序 (scRNA-seq)中擬時(shí)序分析是很常見的,是用來推斷不同的細(xì)胞類型間的發(fā)育關(guān)系,有一類通過RNA速率來推斷軌跡的方法,在此介紹一下。
RNA velocity
- RNA速率 (RNA velocity),是基因表達(dá)狀態(tài)的時(shí)間倒數(shù),可以通過剪切和未剪切的mRNA數(shù)目來推斷;
- RNA速率是一個(gè)高維向量,可以用來推斷一定時(shí)間后單個(gè)細(xì)胞的未來轉(zhuǎn)態(tài);
- RNA速率可以幫助分析發(fā)育譜系和細(xì)胞動(dòng)力學(xué)。
RNA velocity求解
1 對(duì)轉(zhuǎn)錄動(dòng)力學(xué)進(jìn)行數(shù)學(xué)建模
未剪切速率和剪切數(shù)隨時(shí)間的變化分別定義如下:
其中,α, β和γ分別指轉(zhuǎn)錄、剪切和降解的反應(yīng)率。
2 兩個(gè)模型
- 假設(shè)剪切數(shù)隨時(shí)間的變化是個(gè)常數(shù),則:
- 假設(shè)未剪切的分子數(shù)是個(gè)常數(shù),則:
3 求解數(shù)學(xué)模型
-
為了簡(jiǎn)化模型,我們假設(shè):
- 細(xì)胞最終會(huì)達(dá)到一個(gè)穩(wěn)定的狀態(tài),此時(shí)ds(t)/dt = 0
- RNA速率定義為偏離穩(wěn)態(tài)的程度
- 所有基因的剪切反應(yīng)率一致,即β=1
- 因此對(duì)于穩(wěn)態(tài)細(xì)胞,可以通過γ=u(t)/s(t)來計(jì)算降解反應(yīng)率γ,并由此計(jì)算RNA速率
計(jì)算γ參數(shù)
對(duì)于每個(gè)基因,通過最小二乘擬合以下線性方程來得到:
u和s矯正過后的未剪切和剪切數(shù)目。-
計(jì)算RNA速率
- 在model I中,RNA速率是個(gè)常數(shù),且
- 在modle II中,RNA速率為
此處,u為未剪切數(shù)目,s0為初始剪切數(shù)目。
- 在model I中,RNA速率是個(gè)常數(shù),且
4 改善模型表現(xiàn)
為了改善模型表現(xiàn),作者引入了如下操作:
- 對(duì)剪切及未剪切的數(shù)目進(jìn)行矯正
- 引入基因特異性的補(bǔ)償
- 根據(jù)k最相似細(xì)胞對(duì)轉(zhuǎn)錄本數(shù)目進(jìn)行pooling
- 根據(jù)相關(guān)性基因?qū)D(zhuǎn)錄本數(shù)目進(jìn)行pooling
velocyto實(shí)踐
文章[1]作者開發(fā)了可用于實(shí)際使用的軟件velocyto.py和velocyto.R,本人使用過velocyto.R (v 0.6),在此匯總一下。
步驟
- velocyto run根據(jù)bam比對(duì)文件生成unspliced與spliced矩陣如下:
gene S1_AAAACTAGACCG S1_AAACAATTCAGT S1_AAACCGCAGGGT
Pcmt1 0 2 0
Zc2hc1b 0 1 0
Aig1 0 0 1
- filter.genes.by.cluster.expression()函數(shù)根據(jù)基因表達(dá)均值過濾基因
filter.genes.by.cluster.expression <- function(emat,clusters,min.max.cluster.average=0.1) {
if(!any(colnames(emat) %in% names(clusters))) stop("provided clusters do not cover any of the emat cells!")
vc <- intersect(colnames(emat),names(clusters))
cl.emax <- apply(do.call(cbind,tapply(vc,as.factor(clusters[vc]),function(ii) Matrix::rowMeans(emat[,ii]))),1,max)
vi <- cl.emax>min.max.cluster.average;
emat[vi,]
}
- gene.relative.velocity.estimates()函數(shù)為主函數(shù),主要進(jìn)行以下步驟
- 矯正unspliced和spliced矩陣,通過knn近鄰細(xì)胞和基因的表達(dá)進(jìn)行矯正,并去除文庫(kù)大小影響等
- 計(jì)算每個(gè)基因的降解速率γ,通過lm()函數(shù)
- get the transcriptional change matrix,即velocities
- 預(yù)測(cè)經(jīng)過一個(gè)單位時(shí)間后的spliced count matrix
結(jié)果概覽
- 原始的unspliced count matrix, 行為基因,列為細(xì)胞,數(shù)值為reads數(shù)
gene S1_AAAACTAGACCG S1_AAACAATTCAGT S1_AAACCGCAGGGT
Pcmt1 0 2 0
Zc2hc1b 0 1 0
Aig1 0 0 1
- 原始的spliced count matrix
gene S1_AAAACTAGACCG S1_AAACAATTCAGT S1_AAACCGCAGGGT
Pcmt1 2 0 0
Zc2hc1b 0 0 0
Aig1 0 0 0
- cellKNN matrix,k近鄰則為1,否則為0
S1_AAAACTAGACCG S1_AAACAATTCAGT S1_AAACCGCAGGGT S1_AAACTTGAGCAA
S1_AAAACTAGACCG 1 . . .
S1_AAACAATTCAGT . 1 . .
S1_AAACCGCAGGGT . . 1 .
S1_AAACTTGAGCAA . . . 1
S1_AAATACGGAGAT . . . .
S1_AAATAACATGAT . . . .
S1_AAATACGGAGAT S1_AAATAACATGAT
S1_AAAACTAGACCG . .
S1_AAACAATTCAGT . .
S1_AAACCGCAGGGT . .
S1_AAACTTGAGCAA . .
S1_AAATACGGAGAT 1 .
S1_AAATAACATGAT . 1
- normalized spliced count matrix
S1_AAAACTAGACCG S1_AAACAATTCAGT S1_AAACCGCAGGGT S1_AAACTTGAGCAA
Pcmt1 1.44731535 1.20195086 0.7124199 0.36209745
Zc2hc1b 0.94905925 1.29440862 0.0000000 0.90524362
Aig1 0.00000000 0.00000000 0.0000000 0.00000000
1700020N01Rik 0.23726481 0.06934332 2.5647115 0.02263109
Rsph3 0.04745296 0.09245776 0.1424840 0.04526218
Arhgap35 0.02372648 0.04622888 0.0000000 0.02263109
S1_AAATACGGAGAT S1_AAATAACATGAT
Pcmt1 0.39392234 0.82674335
Zc2hc1b 0.84411930 0.61107117
Aig1 0.05627462 0.03594536
1700020N01Rik 0.28137310 1.07836089
Rsph3 0.16882386 0.25161754
Arhgap35 0.02813731 0.03594536
- 降解反應(yīng)率γ
Pcmt1 Zc2hc1b Aig1 1700020N01Rik Rsph3
9.513799 2.984177 15.547821 1.663814 18.872405
Arhgap35
9.660245
- transcriptional change matrix: log2 observed unspliced count/ expected unspliced count ratio
S1_AAAACTAGACCG S1_AAACAATTCAGT S1_AAACCGCAGGGT S1_AAACTTGAGCAA
Pcmt1 -0.380501503 -0.35166118 -0.6124347 0.560501410
Zc2hc1b -0.638238515 -0.42208600 0.0000000 -0.562667628
Aig1 0.004363889 0.00000000 0.0000000 0.000000000
1700020N01Rik 0.273165122 -0.05620864 -1.7473054 0.217611106
Rsph3 0.076477001 -0.03211146 -0.1424840 0.001545896
Arhgap35 -0.023724968 -0.04622593 0.0235771 -0.022629647
S1_AAATACGGAGAT S1_AAATAACATGAT
Pcmt1 0.36471764 -0.23093740
Zc2hc1b -0.42503141 0.06064056
Aig1 -0.03492573 0.07786126
1700020N01Rik -0.22807674 -0.70481700
Rsph3 -0.08377039 -0.09708474
Arhgap35 -0.02813552 -0.03594307
- 預(yù)測(cè)的經(jīng)過一個(gè)單位時(shí)間后的spliced count matrix
S1_AAAACTAGACCG S1_AAACAATTCAGT S1_AAACCGCAGGGT S1_AAACTTGAGCAA
Pcmt1 8.959464895 0.000000 0.3870690 0.561127391
Zc2hc1b 0.000000000 5.611424 0.0000000 0.000000000
Aig1 0.004361053 0.000000 0.0000000 0.000000000
1700020N01Rik 0.272987547 0.000000 5.1815394 0.217854139
Rsph3 0.076427287 0.000000 0.8444961 0.001547622
Arhgap35 0.000000000 0.000000 0.0229488 0.000000000
S1_AAATACGGAGAT S1_AAATAACATGAT
Pcmt1 0.3587747 0.00000000
Zc2hc1b 0.0000000 0.05853433
Aig1 0.0000000 0.07515691
1700020N01Rik 0.0000000 0.00000000
Rsph3 0.0000000 6.90097579
Arhgap35 0.0000000 0.00000000
-
速率可視化
image.png
后續(xù)模型[1-3]
velocyto.R運(yùn)用的是steady-state model,面對(duì)復(fù)雜的數(shù)據(jù),還有其他進(jìn)行優(yōu)化的模型如下:
- estimation based on gene structure, considering
- the number of exons
- total intronic length
- the number of predicted internal priming sites
- stochastic model
- to better capture the steady states
- treat transcription, splicing and degradation as probabilistic events
- dynamical model
- not simplify the model and
- solve the model in a likelihood-based expectation-maximization framework
- adapts RNA velocity to widely varying specifications such as non-stationary populations
參考文獻(xiàn)
[1] RNA velocity of single cells
[2] http://velocyto.org/
[3] Generalizing RNA velocity to transient cell states through dynamical modeling.
