---我不生產(chǎn)代碼,我只是個(gè)代碼的搬運(yùn)工。
今天來盤盤這個(gè)新工具,這個(gè)工具的一大優(yōu)點(diǎn)就是在有或者沒有splicing數(shù)據(jù)的時(shí)候,都可以用,效果如何, 還在用自己的數(shù)據(jù)評測中。
首先用帶有splcing 數(shù)據(jù)的分析,先加載包,讀入數(shù)據(jù)
import scvelo as scv
import scanpy as sc
import cellrank as cr
import numpy as np
adata = sc.read_h5ad('/home/Documents/integrated_20L_with_splicing.h5ad')
如果需要從seurat 轉(zhuǎn)換,推薦使用sceasy,支持格式幫你較多,大部分的轉(zhuǎn)換都能用,seurat轉(zhuǎn)anndata肯定可以(反向轉(zhuǎn)換暫時(shí)有些問題),個(gè)人感覺比SeuratDisk那個(gè)垃圾好。
https://github.com/cellgeni/sceasy
在R中處理seurat 轉(zhuǎn)換
## Seurat to AnnData
sceasy::convertFormat(scRNA,
from = "seurat", to = "anndata",
outFile = "scRNA.h5ad",
drop_single_values = FALSE
)
轉(zhuǎn)換完畢后,可以在python里加入splcing 數(shù)據(jù)。
然后在python中加載,再加入splicing 數(shù)據(jù)
你也可以用numpy加載數(shù)據(jù),但是使用pandas 加載速度更快。
adata = scv.read("scRNA.h5ad")
## add splicing info to the anndata object
path='/home/Downloads/star_output/'
spliced=pd.read_csv(path+'Velocyto/filtered/spliced.mtx',
skiprows=2, delimiter=' ',dtype=float).values
shape = np.loadtxt(path+'Velocyto/filtered/spliced.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)
adata.layers['spliced']=sparse.csr_matrix((spliced[:,2], (spliced[:,0]-1, spliced[:,1]-1)), shape = (shape)).tocsr().T
unspliced=pd.read_csv(path+'Velocyto/filtered/unspliced.mtx',
skiprows=2, delimiter=' ',dtype=float).values
adata.layers['unspliced']=sparse.csr_matrix((unspliced[:,2], (unspliced[:,0]-1, unspliced[:,1]-1)), shape = (shape)).tocsr().T
ambiguous= pd.read_csv(path+'Velocyto/filtered/ambiguous.mtx',
skiprows=2, delimiter=' ',dtype=float).values
adata.layers['ambiguous']=sparse.csr_matrix((ambiguous[:,2], (ambiguous[:,0]-1, ambiguous[:,1]-1)), shape = (shape)).tocsr().T
然后繼續(xù)處理,如果seurat 沒有進(jìn)行過標(biāo)準(zhǔn)化,可以按照scanpy標(biāo)準(zhǔn)程序如下,如果已經(jīng)標(biāo)準(zhǔn)化了,可以跳過
# optional preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
scv.tl.umap(adata)
scv.tl.leiden(adata)
開始scVelo
scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.filter_genes_dispersion(adata, n_top_genes=4000)
# KNN-imputation using scVelo's moments function
scv.pp.moments(adata, n_pcs=20, n_neighbors=50)
scv.tl.recover_dynamics(adata,n_jobs=18)
scv.tl.velocity(adata,mode='dynamical') ## or mode='stochastic'
scv.tl.velocity_graph(adata,n_jobs=12)
查看一下結(jié)果
#velocity map stremaplot
scv.pl.velocity_embedding_stream(adata, basis="umap", color="seurat_clusters", dpi=200)
## phase portraits
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index
scv.pl.scatter(adata, basis=top_genes[:15], color="seurat_clusters",ncols=3, frameon=False)
正式開始CellRank
##cellrank, takes a long long long time
cr.tl.terminal_states(adata, cluster_key="seurat_clusters", weight_connectivities=0.2)
## terminal states
cr.pl.terminal_states(adata)
## Identify initial states,takes a long long time
cr.tl.initial_states(adata, cluster_key="seurat_clusters")
cr.pl.initial_states(adata, discrete=True)
## Compute fate maps
cr.tl.lineages(adata)
cr.pl.lineages(adata, same_plot=False)
PAGA
scv.tl.recover_latent_time(adata, root_key="initial_states_probs", end_key="terminal_states_probs")
#use the inferred pseudotime to compute the directed PAGA.
scv.tl.paga(
adata,
groups="clusters",
root_key="initial_states_probs",
end_key="terminal_states_probs",
use_time_prior="velocity_pseudotime",
)
作圖
cr.pl.cluster_fates(
adata,
mode="paga_pie",
cluster_key="clusters",
basis="umap",
legend_kwargs={"loc": "top right "},
legend_loc="top left",
node_size_scale=2,
edge_width_scale=1,
max_edge_width=2,
title="directed PAGA",
)