pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data

R環(huán)境中操作流程見:http://www.itdecent.cn/p/b0fd795ad05c

分析步驟:

0. Preliminary work
1. Inference of co-expression modules
2. Prune modules for targets with cis regulatory footprints (aka RcisTarget)
3. Cellular regulon enrichment matrix (aka AUCell)

0. 前期參數(shù)準(zhǔn)備

import os
import glob
import pickle
import pandas as pd
import numpy as np

from dask.diagnostics import ProgressBar

from arboreto.utils import load_tf_names
from arboreto.algo import grnboost2

from pyscenic.rnkdb import FeatherRankingDatabase as RankingDatabase
from pyscenic.utils import modules_from_adjacencies, load_motifs
from pyscenic.prune import prune2df, df2regulons
from pyscenic.aucell import aucell

# 這個包我在導(dǎo)入的時候出現(xiàn)了如下報錯:(Segmentation fault)
# 因為該函數(shù)只涉及到出圖,因此沒有導(dǎo)入的必要(后期結(jié)果直接導(dǎo)入到R中進(jìn)行出圖)
# import seaborn as sns

DATA_FOLDER="~/tmp"
RESOURCES_FOLDER="~/resources"
DATABASE_FOLDER = "~/databases/"
SCHEDULER="123.122.8.24:8786"
DATABASES_GLOB = os.path.join(DATABASE_FOLDER, "mm9-*.feather")
MOTIF_ANNOTATIONS_FNAME = os.path.join(RESOURCES_FOLDER, "motifs-v9-nr.mgi-m0.001-o0.0.tbl")
MM_TFS_FNAME = os.path.join(RESOURCES_FOLDER, 'mm_tfs.txt')
SC_EXP_FNAME = os.path.join(RESOURCES_FOLDER, "GSE60361_C1-3005-Expression.txt")
REGULONS_FNAME = os.path.join(DATA_FOLDER, "regulons.p")
MOTIFS_FNAME = os.path.join(DATA_FOLDER, "motifs.csv")

# 讀入表達(dá)矩陣,表達(dá)矩陣的格式:橫坐標(biāo)是基因,縱坐標(biāo)是細(xì)胞
ex_matrix = pd.read_csv(SC_EXP_FNAME, sep='\t', header=0, index_col=0).T
ex_matrix.shape

# 導(dǎo)入轉(zhuǎn)錄因子
tf_names = load_tf_names(MM_TFS_FNAME)

# 導(dǎo)入數(shù)據(jù)庫
db_fnames = glob.glob(DATABASES_GLOB)
def name(fname):
    return os.path.basename(fname).split(".")[0]   
dbs = [RankingDatabase(fname=fname, name=name(fname)) for fname in db_fnames]
dbs
1. Inference of co-expression modules
# 兩條命令解決
adjacencies = grnboost2(ex_matrix, tf_names=tf_names, verbose=True) #耗時
modules = list(modules_from_adjacencies(adjacencies, ex_matrix))
2. Prune modules for targets with cis regulatory footprints (aka RcisTarget)
# Calculate a list of enriched motifs and the corresponding target genes for all modules.
with ProgressBar():
    df = prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME)

# Create regulons from this table of enriched motifs.
regulons = df2regulons(df)

# Save the enriched motifs and the discovered regulons to disk.
df.to_csv(MOTIFS_FNAME)
with open(REGULONS_FNAME, "wb") as f:
    pickle.dump(regulons, f)
3. Cellular regulon enrichment matrix (aka AUCell)
auc_mtx = aucell(ex_matrix, regulons, num_workers=4)
# 這一步出圖
# sns.clustermap(auc_mtx, figsize=(8,8))

將結(jié)果導(dǎo)入到R

首先將pySCENIC中得到的結(jié)果保存成.loom格式

# 還是在剛才的環(huán)境中(Python)
from pyscenic.export import export2loom
export2loom(ex_mtx = ex_matrix, auc_mtx = auc_mtx, regulons = regulons, out_fname = "your path way/xxx.loom")
# 這里會有提示(見圖1),按照提示改一下

export2loom(ex_mtx = ex_matrix, auc_mtx = auc_mtx, regulons = [r.rename(r.name.replace('(+)',' ('+str(len(r))+'g)')) for r in regulons], out_fname = "your path way/xxx.loom")
# 這一句話運行完畢后,會在指定目錄下生成xxx.loom文件,這就是導(dǎo)入R所需要的文件

圖1

然后進(jìn)入R環(huán)境

# 保存xxx.loom文件的路徑
pyScenicDir <- "pySCENIC_example/output"

library(SCENIC)
library(SCopeLoomR)

pyScenicLoomFile <- file.path(pyScenicDir, "SCENIC.loom")
loom <- open_loom(pyScenicLoomFile, mode="r")

# Read information from loom file:
regulons_incidMat <- get_regulons(loom)
regulons <- regulonsToGeneLists(regulons_incidMat)
regulonsAUC <- get_regulonsAuc(loom)
regulonsAucThresholds <- get_regulonThresholds(loom)
embeddings <- get_embeddings(loom)

exprMat <- get_dgem(loom)
cellInfo <- get_cellAnnotation(loom)
clusterings <- get_clusterings_withName(loom)

close_loom(loom)

以上過程完成之后,pySCENIC中的結(jié)果就成功導(dǎo)入到R環(huán)境中了,接下來就可以在R中出圖(可參考:SCENIC—Single Cell rEgulatory Network Inference and Clustering

所用數(shù)據(jù)集

# Transcription factors:
wget https://raw.githubusercontent.com/aertslab/scenic-nf/master/example/allTFs_hg38.txt 
# Motif to TF annotation database:
wget https://raw.githubusercontent.com/aertslab/scenic-nf/master/example/motifs.tbl
# Ranking databases:
wget https://raw.githubusercontent.com/aertslab/scenic-nf/master/example/genome-ranking.feather
# Finally, get a small sample expression matrix (loom format):
wget https://raw.githubusercontent.com/aertslab/scenic-nf/master/example/expr_mat.loom
# Alternatively, in tsv format:
wget https://raw.githubusercontent.com/aertslab/scenic-nf/master/example/expr_mat.tsv

參考鏈接:
Importing pySCENIC results
pyscenic
數(shù)據(jù)集

補充

當(dāng)我用真實數(shù)據(jù)跑這一套流程的時候,在“將pySCENIC中得到的結(jié)果保存成.loom格式”中出現(xiàn)了如下報錯:

網(wǎng)上搜到比較靠譜的解釋是HDF5文件類型大小有限制(HDF5 has a header limit of 64kb for all metadata of the columns. This include name, types, etc. When you go about roughly 2000 columns, you will run out of space to store all the metadata.)

因為不是很懂怎么解決這個問題,再加上我只需要regulonAUC這個對象(對應(yīng)著pyscenic當(dāng)中的auc_mtx),所以我后面就沒有用到上述提到的方法導(dǎo)入R

# 接上面的 Cellular regulon enrichment matrix (aka AUCell)
AUC_FNAME=os.path.join(DATA_FOLDER, "auc.tsv")
regulons = [r.rename(r.name.replace('(+)',' ('+str(len(r))+'g)')) for r in regulons]
auc_mtx = aucell(ex_matrix, regulons, num_workers=4)
auc_mtx.to_csv(AUC_FNAME)

# 進(jìn)入R環(huán)境
# cellInfo就是SeuratObject@metadata
library(SCENIC) 
library(SCopeLoomR)
library(AUCell)
pyScenicDir=DATA_FOLDER
regulonAUC <- importAUCfromText(file.path(pyScenicDir,  "auc..tsv"))
regulonAUC <- regulonAUC[onlyNonDuplicatedExtended(rownames(regulonAUC)),]
regulonActivity_byCellType <- sapply(split(rownames(cellInfo), cellInfo$CellType),
                                     function(cells) rowMeans(getAUC(regulonAUC)[,cells]))
regulonActivity_byCellType_Scaled <- t(scale(t(regulonActivity_byCellType), center = T, scale=T))

pheatmap::pheatmap(regulonActivity_byCellType_Scaled, #fontsize_row=3, 
                   color=colorRampPalette(c("blue","white","red"))(100), breaks=seq(-3, 3, length.out = 100),
                   treeheight_row=10, treeheight_col=10, border_color=NA)

還有一種方法,因為這個包里面限速的步驟是GENIE3(R里面),也可以單獨把這一步在Python中運行(GRNBoost,其他步驟可在R中運行),生成的結(jié)果保存到R中的/int目錄下。具體操作:
https://github.com/aertslab/SCENIC/issues/40
https://arboreto.readthedocs.io/en/latest/examples.html

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

友情鏈接更多精彩內(nèi)容