RNAvelocity系列教程4:velocyto.R的使用教程

RNA注釋

這里以inDrop 實(shí)驗(yàn)數(shù)據(jù)舉例,spliced/unspliced的RNA可以通過:

1.使用dropEst 輸出管道生成 類似10X的 bam 文件:

~/dropEst/build/dropest -m -F -L eiEIBA -o run1 -g cellranger/refdata-cellranger-mm10-1.2.0/genes/genes.gtf -c ~/dropEst/configs/indrop_v3.xml *.bam

2.使用velocyto.py來注釋spliced/unspliced的reads,寫出標(biāo)準(zhǔn)loom文件:

velocyto run -u Gene -o out -e SCG71 -m mm10_rmsk_srt.gtf -v SCG_71_tophat.filtered.sorted.bam UCSC/mm10/Annotation/Genes/genes.gtf

(注意,也可以使用-V參數(shù)直接通過 DropEst 注釋spliced/unspliced的reads:)

~/dropEst/dropest -V -C 6000 -m -g ucsc_mm10_exons.gtf.gz -c ~/dropEst/configs/indrop_v3.xml *.aligned.bam)

下面的例子從 velocyto.py 生成的loom文件開始,使用 pagoda2 獲取細(xì)胞clusters/embedding,然后估計(jì)/可視化RNA速率。

加載R包:

library(velocyto.R)

Loading required package: Matrix

數(shù)據(jù)下載

下載預(yù)先計(jì)算的loom矩陣

wget http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom

讀取矩陣

ldat <-read.loom.matrices(url("http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom"))
#Error: is.character(name) is not TRUE

使用pagoda2標(biāo)準(zhǔn)化和聚類細(xì)胞

使用spliced 表達(dá)矩陣作為pagoda2的輸入,并查看分布。

emat <- ldat$spliced
hist(log10(colSums(emat)),col='wheat',xlab='cell size')
1625582623444
# this dataset has already been pre-filtered, but this is where one woudl do some filtering

emat <- emat[,colSums(emat)>=1e3]

pagoda2處理
pagoda2用于生成細(xì)胞嵌入、細(xì)胞聚類以及更精確的細(xì)胞距離矩陣。您也可以使用其他工具生成這些工具,如 Seurat等。

創(chuàng)建pagoda2對象,調(diào)整方差:

library(pagoda2)
r <- Pagoda2$new(emat,modelType='plain',trim=10,log.scale=T)
#2600 cells, 7301 genes; normalizing ... using plain model winsorizing ... log scale ... done.
r$adjustVariance(plot=T,do.par=T,gam.k=10)
#calculating variance fit ... using gam 137 overdispersed genes ... 137 persisting ... done.
1625582643118

運(yùn)行基本分析步驟以生成細(xì)胞嵌入和聚類,并可視化:

r$calculatePcaReduction(nPcs=100,n.odgenes=3e3,maxit=300)

running PCA using 3000 OD genes ..
Loading required package: irlba
.. done

r$makeKnnGraph(k=30,type='PCA',center=T,distance='cosine');

Loading required package: igraph

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:decompose, spectrum

The following object is masked from ‘package:base’:union

creating space of type angular done
adding data ... done
building index ... done
querying ... done

r$getKnnClusters(method=multilevel.community,type='PCA',name='multilevel')r$getEmbedding(type='PCA',embeddingType='tSNE',perplexity=50,verbose=T)

calculating distance ... pearson ...running tSNE using 16 cores:
Read the 2600 x 2600 data matrix successfully!
OpenMP is working...
Using no_dims = 2, perplexity = 50.000000, and theta = 0.500000
Computing input similarities...
Building tree...
Done in 9.36 seconds (sparsity = 0.084143)!
Learning embedding...
Iteration 50: error is 73.294295 (50 iterations in 3.70 seconds)
Iteration 100: error is 65.648297 (50 iterations in 3.19 seconds)
Iteration 150: error is 65.151406 (50 iterations in 3.17 seconds)
Iteration 200: error is 65.090414 (50 iterations in 3.24 seconds)
Iteration 250: error is 65.075571 (50 iterations in 3.27 seconds)
Iteration 300: error is 1.814009 (50 iterations in 3.11 seconds)
Iteration 350: error is 1.699950 (50 iterations in 3.08 seconds)
Iteration 400: error is 1.660607 (50 iterations in 3.01 seconds)
Iteration 450: error is 1.644676 (50 iterations in 2.98 seconds)
Iteration 500: error is 1.638744 (50 iterations in 2.96 seconds)
Iteration 550: error is 1.633982 (50 iterations in 3.00 seconds)
Iteration 600: error is 1.629732 (50 iterations in 2.99 seconds)
Iteration 650: error is 1.628101 (50 iterations in 3.12 seconds)
Iteration 700: error is 1.625991 (50 iterations in 3.27 seconds)
Iteration 750: error is 1.624012 (50 iterations in 3.15 seconds)
Iteration 800: error is 1.622847 (50 iterations in 3.27 seconds)
Iteration 850: error is 1.621847 (50 iterations in 3.31 seconds)
Iteration 900: error is 1.620831 (50 iterations in 3.34 seconds)
Iteration 950: error is 1.619936 (50 iterations in 3.21 seconds)
Iteration 1000: error is 1.618511 (50 iterations in 3.16 seconds)
Fitting performed in 63.52 seconds.

繪制嵌入、集群標(biāo)記(左)和"Xist"表達(dá)圖(將男性和女性分開展示)

par(mfrow=c(1,2))r$plotEmbedding(type='PCA',embeddingType='tSNE',show.legend=F,mark.clusters=T,min.group.size=10,shuffle.colors=F,mark.cluster.cex=1,alpha=0.3,main='cell clusters')r$plotEmbedding(type='PCA',embeddingType='tSNE',colors=r$depth,main='depth')  

treating colors as a gradient with zlim: 1000.9 2939

1625582662042

速率估計(jì)

準(zhǔn)備矩陣和聚類數(shù)據(jù):

emat <- ldat$spliced; nmat <- ldat$unspliced; # disregarding spanning reads, as there are too few of thememat <- emat[,rownames(r$counts)]; nmat <- nmat[,rownames(r$counts)]; # restrict to cells that passed p2 filter# take cluster labelscluster.label <- r$clusters$PCA$multilevel # take the cluster factor that was calculated by p2cell.colors <- pagoda2:::fac2col(cluster.label)# take embedding form p2emb <- r$embeddings$PCA$tSNE

除了聚類和 t-SNE 嵌入,從 p2 處理,我們將采取細(xì)胞距離,優(yōu)于默認(rèn)的R 通常會使用全轉(zhuǎn)錄體相關(guān)距離。

cell.dist <- as.dist(1-armaCor(t(r$reductions$PCA)))

基于最低平均表達(dá)量(至少在其中一個(gè)cluster中)篩選基因,輸出生成的有效基因總數(shù):

emat <- filter.genes.by.cluster.expression(emat,cluster.label,min.max.cluster.average = 0.2)nmat <- filter.genes.by.cluster.expression(nmat,cluster.label,min.max.cluster.average = 0.05)length(intersect(rownames(emat),rownames(nmat)))

[1] 2541
估計(jì)RNA速率:

fit.quantile <- 0.02rvel.cd <- gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells=25,cell.dist=cell.dist,fit.quantile=fit.quantile)

calculating cell knn ... done
calculating convolved matrices ... done
fitting gamma coefficients ... done. succesfful fit for 743 genes
filtered out 149 out of 743 genes due to low nmat-emat correlation
filtered out 27 out of 594 genes due to low nmat-emat slope
calculating RNA velocity shift ... done
calculating extrapolated cell state ... done
可視化 t-SNE 嵌入上的速率:

show.velocity.on.embedding.cor(emb,rvel.cd,n=200,scale='sqrt',cell.colors=ac(cell.colors,alpha=0.5),cex=0.8,arrow.scale=3,show.grid.flow=TRUE,min.grid.cell.mass=0.5,grid.n=40,arrow.lwd=1,do.par=F,cell.border.alpha = 0.1)

delta projections ... sqrt knn ... transition probs ... done
calculating arrows ... done
grid estimates ... grid.sd= 0.6580196 min.arrow.size= 0.01316039 max.grid.arrow.length= 0.05887327 done

1625583871396

特定基因可視化:

gene <- "Camp"gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells = 25,kGenes=1,fit.quantile=fit.quantile,cell.emb=emb,cell.colors=cell.colors,cell.dist=cell.dist,show.gene=gene,old.fit=rvel.cd,do.par=T)

calculating convolved matrices ... done
[1] 1

1625583896763

調(diào)整視圖

調(diào)整kCells,這會給我們一個(gè)更理想化的圖像視圖,差異肉眼可見:

gene <- "Camp"gene.relative.velocity.estimates(emat,nmat,deltaT=1,kCells = 100,kGenes=1,fit.quantile=fit.quantile,cell.emb=emb,cell.colors=cell.colors,cell.dist=cell.dist,show.gene=gene,do.par=T)

calculating cell knn ... done
calculating convolved matrices ... done
[1] 1

1625583909931
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容