CellRank

---我不生產(chǎn)代碼,我只是個(gè)代碼的搬運(yùn)工。

今天來盤盤這個(gè)新工具,這個(gè)工具的一大優(yōu)點(diǎn)就是在有或者沒有splicing數(shù)據(jù)的時(shí)候,都可以用,效果如何, 還在用自己的數(shù)據(jù)評測中。

首先用帶有splcing 數(shù)據(jù)的分析,先加載包,讀入數(shù)據(jù)

import scvelo as scv
import scanpy as sc
import cellrank as cr
import numpy as np
adata = sc.read_h5ad('/home/Documents/integrated_20L_with_splicing.h5ad')

如果需要從seurat 轉(zhuǎn)換,推薦使用sceasy,支持格式幫你較多,大部分的轉(zhuǎn)換都能用,seurat轉(zhuǎn)anndata肯定可以(反向轉(zhuǎn)換暫時(shí)有些問題),個(gè)人感覺比SeuratDisk那個(gè)垃圾好。
https://github.com/cellgeni/sceasy

在R中處理seurat 轉(zhuǎn)換

## Seurat to AnnData
sceasy::convertFormat(scRNA,
    from = "seurat", to = "anndata",
    outFile = "scRNA.h5ad",
    drop_single_values = FALSE
)

轉(zhuǎn)換完畢后,可以在python里加入splcing 數(shù)據(jù)。

然后在python中加載,再加入splicing 數(shù)據(jù)

你也可以用numpy加載數(shù)據(jù),但是使用pandas 加載速度更快。

adata = scv.read("scRNA.h5ad")
## add splicing info to the anndata object
path='/home/Downloads/star_output/'
spliced=pd.read_csv(path+'Velocyto/filtered/spliced.mtx', 
                    skiprows=2, delimiter=' ',dtype=float).values
shape = np.loadtxt(path+'Velocyto/filtered/spliced.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)
adata.layers['spliced']=sparse.csr_matrix((spliced[:,2], (spliced[:,0]-1, spliced[:,1]-1)), shape = (shape)).tocsr().T
unspliced=pd.read_csv(path+'Velocyto/filtered/unspliced.mtx', 
                      skiprows=2, delimiter=' ',dtype=float).values
adata.layers['unspliced']=sparse.csr_matrix((unspliced[:,2], (unspliced[:,0]-1, unspliced[:,1]-1)), shape = (shape)).tocsr().T
ambiguous= pd.read_csv(path+'Velocyto/filtered/ambiguous.mtx', 
                       skiprows=2, delimiter=' ',dtype=float).values
adata.layers['ambiguous']=sparse.csr_matrix((ambiguous[:,2], (ambiguous[:,0]-1, ambiguous[:,1]-1)), shape = (shape)).tocsr().T

然后繼續(xù)處理,如果seurat 沒有進(jìn)行過標(biāo)準(zhǔn)化,可以按照scanpy標(biāo)準(zhǔn)程序如下,如果已經(jīng)標(biāo)準(zhǔn)化了,可以跳過

# optional preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
scv.tl.umap(adata)
scv.tl.leiden(adata)

開始scVelo

scv.pp.filter_genes(adata, min_shared_counts=20)
scv.pp.filter_genes_dispersion(adata, n_top_genes=4000)

# KNN-imputation using scVelo's moments function
scv.pp.moments(adata, n_pcs=20, n_neighbors=50)
scv.tl.recover_dynamics(adata,n_jobs=18) 
scv.tl.velocity(adata,mode='dynamical') ## or mode='stochastic'
scv.tl.velocity_graph(adata,n_jobs=12)

查看一下結(jié)果

#velocity map  stremaplot
scv.pl.velocity_embedding_stream(adata, basis="umap", color="seurat_clusters", dpi=200)
##  phase portraits
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index
scv.pl.scatter(adata, basis=top_genes[:15], color="seurat_clusters",ncols=3, frameon=False)

正式開始CellRank

##cellrank, takes a long long long time
cr.tl.terminal_states(adata, cluster_key="seurat_clusters", weight_connectivities=0.2)
## terminal states
cr.pl.terminal_states(adata)

## Identify initial states,takes a long long time
cr.tl.initial_states(adata, cluster_key="seurat_clusters")
cr.pl.initial_states(adata, discrete=True)
## Compute fate maps
cr.tl.lineages(adata)
cr.pl.lineages(adata, same_plot=False)

PAGA

scv.tl.recover_latent_time(adata, root_key="initial_states_probs", end_key="terminal_states_probs")

#use the inferred pseudotime to compute the directed PAGA.
scv.tl.paga(
    adata,
    groups="clusters",
    root_key="initial_states_probs",
    end_key="terminal_states_probs",
    use_time_prior="velocity_pseudotime",
)

作圖

cr.pl.cluster_fates(
    adata,
    mode="paga_pie",
    cluster_key="clusters",
    basis="umap",
    legend_kwargs={"loc": "top right "},
    legend_loc="top left",
    node_size_scale=2,
    edge_width_scale=1,
    max_edge_width=2,
    title="directed PAGA",
)

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容