hello,大家好,今天我們來分享Velocyto如何在空間上的運用,10X單細胞數(shù)據(jù)做RNA Velocyto大家應(yīng)該都已經(jīng)不陌生了吧,相信很多人都做過,大家也應(yīng)該很了解RNA Velocyto了,那么今天我們分享一個內(nèi)容,就是RNA Velocyto如何在空間數(shù)據(jù)上的運用,發(fā)育不僅僅是細胞的轉(zhuǎn)變,更重要的是,空間位置上的變化,我們多么希望在空間轉(zhuǎn)錄組上直接同時體現(xiàn)細胞的發(fā)育進程和位置變化。文章在SIRV: Spatial inference of RNA velocity at the single-cell resolution,大家可以看一下。
我們簡單分享一下文獻,看看原理,這里我們就分享重點
Abstract(這一段大家看看就行了)
Studying cellular differentiation using single-cell RNA sequencing (scRNA-seq) rapidly expands our understanding of cellular development processes. Recently, RNA velocity has created new possibilities in studying these cellular differentiation processes, as differentiation dynamics can be obtained from measured spliced and unspliced mRNA expression. However, to map these differentiation processes to developments within a tissue, the spatial context of the tissue should be taken into account, which is not possible with current approaches as they start from dissociated cells. We present SIRV (Spatially Inferred RNA Velocity), a method to infer spatial differentiation trajectories within the spatial context of a tissue at the single-cell resolution. SIRV integrates spatial transcriptomics data with reference scRNA-seq data, to enrich the spatially measured genes with spliced and unspliced expressions from the scRNA-seq data. (看來也是單細胞空間數(shù)據(jù)的聯(lián)合分析)。Next, SIRV calculates RNA velocity vectors for every spatially measured cell and maps these vectors to the spatial coordinates within the tissue. We tested SIRV on the Developing Mouse Brain Atlas data and obtained biologically relevant spatial differentiation trajectories. Additionally, SIRV annotates spatial cells with cellular identities and the region of origin which are transferred from the annotated reference scRNA-seq data. Altogether, with SIRV, we introduce a new tool to enrich spatial transcriptomics data that can assist in understanding how tissues develop.
Introduction
(1)Current protocols(空間轉(zhuǎn)錄組) can be divided into two main categories:
- sequencing-based technologies that detect and quantify the mRNA in situ, such as 10X Genomics Visium, Slide-seq and ST
- imaging-based technologies using fluorescence in situ hybridization (FISH), such as smFISH, MERFISH and seqFISH
In principle, it is possible to apply RNA velocity analysis to spatial transcriptomics measured using sequencing-based protocols, as the spliced and unspliced expression ratios can be directly obtained from the sequencing data.(這也是最初的想法,但是經(jīng)不起推敲,結(jié)果不能令人信服)。
第一步,SIRV integrates spatial transcriptomics and scRNA-seq data in order to predict the spliced and unspliced expression of the spatially measured genes(借助對應(yīng)的單細胞數(shù)據(jù)進行空間的注釋和Velocyto,這個想法一致都有,看看如何實現(xiàn))。
第二步,Next it calculates RNA velocity vectors for each cell that are then projected onto the two-dimensional spatial coordinates, which are then used to derive flow fields by averaging dynamics of spatially neighboring cells。(這個大家應(yīng)該都知道)
第三步,SIRV transfers various label annotations of the scRNA-seq to the spatial transcriptomics data, allowing us to richly annotate the spatial data(利用單細胞數(shù)據(jù)對空間數(shù)據(jù)進行注釋)。
第四步,produced biologically relevant spatial differentiation trajectories(時間和空間的軌跡信息我們就全部獲得了,很贊).
方法:
輸入需求,the spatial transcriptomics data represented by a gene expression matrix, and the scRNA-seq data having three expression matrices corresponding to the spliced (mature mRNA), unspliced (immature mRNA) and full mRNA expression(單細胞數(shù)據(jù)需要metadata,包括細胞注釋,組織來源等),un/spliced expressions are then used to calculate the RNA velocity of each gene for each cell.最后可變信息的內(nèi)容和空間對應(yīng),起到軌跡上空間上和時間上的相互配合,信息最為全面。
SIRV包括4部分:
- (1)integration of the spatial transcriptomics and scRNA-seq datasets(單細胞空間的聯(lián)合分析)。
- (2)predictions of un/spliced(這主要是單細胞的數(shù)據(jù)).
- (3) label/metadata transfer (optional) 這個最好做一下,不注釋的結(jié)果都是耍流氓。
*(4)estimation of RNA velocities within the spatial context.(最想知道的結(jié)果).
首先第一步的聯(lián)合分析,
The spatial transcriptomics and scRNA-seq dataset are integrated by finding the common signal between the two datasets.(這個大家最為熟悉的聯(lián)合分析也可以用,比如Seurat,SPOTlight等)。但是這里的聯(lián)合分析,作者的方法與普通的聯(lián)合方法有差別,大家可以深入了解一下。Building on SpaGE, the integration step is performed using PRECISE to define a common latent space. In brief, using the set of shared genes across the two datasets, we calculate a separate Principal Component Analysis (PCA) for each dataset, and then aligns these separate principal components, resulting in principal vectors (PVs). These PVs have a one-to-one correspondence between the two datasets, and the highly correlated PV-pairs represent the common signal. Finally, both the spatial transcriptomics and scRNA-seq datasets are projected onto the PVs of the reference dataset (scRNA-seq in this case), producing an integrated and aligned version of both datasets.這里作者也提到,the spliced and unspliced expressions are only used in the prediction (following) step.(也就是說,單細胞的可變剪切的結(jié)果,投射到空間轉(zhuǎn)錄組上面)。
第二步,Un/spliced expression prediction。
After obtaining the aligned datasets, SIRV enriches the spatially measured genes with spliced and unspliced expression predicted from the scRNA-seq dataset.(果然),Such prediction is performed using a kNN regression(這個大家應(yīng)該都不陌生吧,不知道的拉出去槍斃5分鐘),For each spatial cell ??(這里就是SPOT), we calculate the k-nearest-neighbors from the (aligned) scRNA-seq dataset and assign a weight to each neighbor inversely proportional to its distance.

圖片.png
with ????j representing the weight between each spatial cell ?? and its ??-th nearest neighbor, ??ist(??,??) being the cosine distance between spatial cell ?? and scRNA-seq cell ?? ∈????(??), and ?? equaling the number of nearest-neighbors used.
For every spatially measured gene ??, the spliced (????g) and unspliced (????g) expression are predicted by:(看來懂一點數(shù)學(xué)知識還是很有必要的)。

圖片.png
with ??R??g and ??R ??g representing the spliced and unspliced expression of gene ?? from the scRNA-seq dataset, respectively.(理解起來還是有點費勁啊??)。
第三步,Label (metadata) transfer,這里我們可以直接用Seurat或者SPOTlight的方法進行注釋。
SIRV can annotate the spatial transcriptomics dataset with any relevant labels from the scRNA-seq dataset using the same kNN regression scheme as introduced earlier。Taking the cell identity annotation as an example(看看注釋的過程,大家了解了解)

圖片.png
第四步,RNA velocity analysis
first, we calculated the high-dimensional RNA velocity vectors for the spatially measured genes (set of genes originally measured in the spatial dataset), next we projected and visualized these vectors on the spatial coordinates of the cells in order to define directions of cellular differentiation in the spatial context.(總之就是聯(lián)合)。
當然,單細胞空間數(shù)據(jù)都進行了基本的分析,這個大家都很了解了。
我們來看看結(jié)果,
(1)SIRV overview

圖片.png
(2)SIRV produces interesting spatial differentiation trajectories in the developing mouse brain

圖片.png

圖片.png
(3)SIRV correctly transferred label annotation verified by spatial organization

圖片.png
(4)RNA velocities interpretation based on transferred cell labels

圖片.png
至于代碼,大家看看就好,多運用起來,就不一一進行展示了
"""
Created on Sun May 30 19:40:39 2021
@author: trmabdelaal
"""
import scvelo as scv
import scanpy as sc
import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from sklearn.metrics.cluster import contingency_matrix
import sys
sys.path.insert(1,'SIRV/')
from main import SIRV
import warnings
warnings.filterwarnings('ignore')
# load preprocessed scRNA-seq and spatial datasets
RNA = scv.read('SIRV_data/RNA_adata.h5ad')
HybISS = scv.read('SIRV_data/HybISS_adata.h5ad')
# Apply SIRV to integrate both datasets and predict the un/spliced expressions
# for the spatially measured genes, additionally transfer 'Region', 'Class' and
# 'Subclass' label annotations from scRNA-seq to spatial data
HybISS_imputed = SIRV(HybISS,RNA,50,['Tissue','Region','Class','Subclass'])
# Normalize the imputed un/spliced expressions, this will also re-normalize the
# full spatial mRNA 'X', this needs to be undone
scv.pp.normalize_per_cell(HybISS_imputed, enforce=True)
# Undo the double normalization of the full mRNA 'X'
HybISS_imputed.X = HybISS.to_df()[HybISS_imputed.var_names]
# Zero mean and unit variance scaling, PCA, building neibourhood graph, running
# umap and cluster the HybISS spatial data using Leiden clustering
sc.pp.scale(HybISS_imputed)
sc.tl.pca(HybISS_imputed)
sc.pl.pca_variance_ratio(HybISS_imputed, n_pcs=50, log=True)
sc.pp.neighbors(HybISS_imputed, n_neighbors=30, n_pcs=30)
sc.tl.umap(HybISS_imputed)
sc.tl.leiden(HybISS_imputed)
# Fig. 2A
sc.pl.umap(HybISS_imputed, color='leiden')
# Fig. 2B
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='leiden')
# Calculating RNA velocities and projecting them on the UMAP embedding and spatial
# coordinates of the tissue
scv.pp.moments(HybISS_imputed, n_pcs=30, n_neighbors=30)
scv.tl.velocity(HybISS_imputed)
scv.tl.velocity_graph(HybISS_imputed)
# Fig. 2C
scv.pl.velocity_embedding_stream(HybISS_imputed, basis='umap', color='leiden')
# Fig. 2D
scv.pl.velocity_embedding_stream(HybISS_imputed, basis='xy_loc', color='leiden',size=60,legend_fontsize=4,legend_loc='right')
# Cell-level RNA velocities
# Fig. 3
scv.pl.velocity_embedding(HybISS_imputed,basis='xy_loc', color='leiden')
# Visualizing transferred label annotations on UMAP embedding and spatial coordinates
# Fig. 4A
sc.pl.umap(HybISS_imputed, color='Region')
# Fig. 4B
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='Region')
# Fig. 4C
sc.pl.umap(HybISS_imputed, color='Subclass')
# Fig. 4D
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='Subclass')
# Supplementary Fig. S3A
sc.pl.umap(HybISS_imputed, color='Class')
# Supplementary Fig. S3B
sc.pl.scatter(HybISS_imputed, basis='xy_loc',color='Class')
# Intepretation of RNA velocities using transferred label annotations
# Fig. 5
scv.pl.velocity_embedding(HybISS_imputed,basis='xy_loc', color='Subclass')
# Supplementary Fig. S3C
scv.pl.velocity_embedding(HybISS_imputed,basis='xy_loc', color='Class')
# Comparing cell clusters with transferred 'Subclass' and 'Class' annotations
def Norm(x):
return (x/np.sum(x))
# Subclass annotation
cont_mat = contingency_matrix(HybISS_imputed.obs.leiden.astype(np.int_),HybISS_imputed.obs.Subclass)
df_cont_mat = pd.DataFrame(cont_mat,index = np.unique(HybISS_imputed.obs.leiden.astype(np.int_)),
columns=np.unique(HybISS_imputed.obs.Subclass))
df_cont_mat = df_cont_mat.apply(Norm,axis=1)
# Supplementary Fig. S5A
plt.figure()
sns.heatmap(df_cont_mat,annot=True,fmt='.2f')
plt.yticks(np.arange(df_cont_mat.shape[0])+0.5,df_cont_mat.index)
plt.xticks(np.arange(df_cont_mat.shape[1])+0.5,df_cont_mat.columns)
# Class annotation
cont_mat = contingency_matrix(HybISS_imputed.obs.leiden.astype(np.int_),HybISS_imputed.obs.Class)
df_cont_mat = pd.DataFrame(cont_mat,index = np.unique(HybISS_imputed.obs.leiden.astype(np.int_)),
columns=np.unique(HybISS_imputed.obs.Class))
df_cont_mat = df_cont_mat.apply(Norm,axis=1)
# Supplementary Fig. S5B
plt.figure()
sns.heatmap(df_cont_mat,annot=True,fmt='.2f')
plt.yticks(np.arange(df_cont_mat.shape[0])+0.5,df_cont_mat.index)
plt.xticks(np.arange(df_cont_mat.shape[1])+0.5,df_cont_mat.columns)
方法思路很好,值得大家一試
生活很好,等你超越