? ? ? 上節(jié)我們完成了Space Ranger分析,在輸出結(jié)果binned_outputs文件夾內(nèi)默認(rèn)生成了2um,8um,16um bin的結(jié)果,這節(jié)我們使用10x文章中推薦的8x8um bin結(jié)果進(jìn)行后續(xù)分析。為了演示,我們?nèi)1_CRC樣本數(shù)據(jù)進(jìn)行后續(xù)分析(多樣本合并一般都需要進(jìn)行去批次處理,如果有需要,后續(xù)我們會(huì)繼續(xù)分享多樣本合并分析步驟)。
主要步驟可以分為,
? ? 1.讀取8um bin結(jié)果;
? ? 2.數(shù)據(jù)質(zhì)控;
? ? 3.細(xì)胞過濾;
? ? 4.降維聚類;
? ? 5.特征基因篩選
0. 環(huán)境加載
```python
import os
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import omicverse as ov
import matplotlib.pyplot as plt
from matplotlib.pyplot import rc_context
import warnings
warnings.filterwarnings('ignore')
plt.style.use('default')
plt.rcParams['figure.facecolor'] = 'white'
sc.settings.set_figure_params(dpi=100, dpi_save=200, figsize=(5, 5), facecolor='white')
```
1. 8um bin數(shù)據(jù)讀取
? ? ? 數(shù)據(jù)讀取使用scanpy的read_visium函數(shù)即可,要注意一點(diǎn),Visium HD使用的Space Ranger v3.0為了節(jié)省輸出文件空間,將之前的spatial文件夾下的tissue_positions_list.csv文件格式變成了tissue_positions.parquet,為了使用read_visium函數(shù)正確讀取數(shù)據(jù),首先要將tissue_positions.parquet文件轉(zhuǎn)成tissue_positions_list.csv文件。
```python
data_dir = './CRC/P1_CRC/'
tissue_position_df = pd.read_parquet(f'{data_dir}/binned_outputs/square_008um/spatial/tissue_positions.parquet')
tissue_position_df.to_csv(f'{data_dir}/binned_outputs/square_008um/spatial/tissue_positions_list.csv', index=False, header=None)
adata = sc.read_visium(f'{data_dir}/binned_outputs/square_008um', library_id='P1')
adata.obs['sample'] = 'P1'
adata.var_names_make_unique()
```
2. 數(shù)據(jù)質(zhì)控
?? 空間轉(zhuǎn)錄組學(xué)實(shí)驗(yàn)中不可避免會(huì)引入一些技術(shù)噪音,比如由于實(shí)驗(yàn)處理或數(shù)據(jù)測序中的技術(shù)誤差,導(dǎo)致部分細(xì)胞的測序數(shù)據(jù)異常(例如某些細(xì)胞可能由于實(shí)驗(yàn)處理不當(dāng)或?qū)嶋H轉(zhuǎn)錄活性低,表現(xiàn)為轉(zhuǎn)錄本數(shù)量非常少)。細(xì)胞QC的目的是識(shí)別和過濾掉這些異常細(xì)胞,以避免它們對下游分析(如差異表達(dá)分析、聚類或功能分析)產(chǎn)生負(fù)面影響。通過去除低質(zhì)量的細(xì)胞,我們可以確保保留下來的細(xì)胞是轉(zhuǎn)錄活性正常、具有代表性的細(xì)胞。這樣不僅可以提高分析結(jié)果的生物學(xué)意義,還能減少假陽性結(jié)果。
```python
# Data QC
adata.var['mt'] = adata.var_names.str.upper().str.startswith('MT-')
adata.var['ribo'] = adata.var_names.str.upper().str.startswith(("RPS","RPL"))
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt', 'ribo'], percent_top=None, log1p=False, inplace=True)
adata.obs['log10GenesPerUMI'] = np.log10(adata.obs['n_genes_by_counts']) / np.log10(adata.obs['total_counts'])
sc.pl.violin(
? ? adata,
? ? ['n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'log10GenesPerUMI'],
? ? groupby='sample',
? ? jitter=False,
? ? rotation=90,
? ? multi_panel=True,
? ? show=False
)
```

3. 細(xì)胞過濾
低質(zhì)量細(xì)胞會(huì)影響后續(xù)分析結(jié)果,這里我們和單細(xì)胞數(shù)據(jù)分析一樣,首先將低質(zhì)量細(xì)胞進(jìn)行去除(當(dāng)然也有些人不進(jìn)行過濾,直接后續(xù)分析),將8um bin內(nèi)鑒定到UMI counts數(shù)小于50個(gè)的細(xì)胞、線粒體基因占比大于30%的細(xì)胞、以及l(fā)og10GenesPerUMI(主要反映的是細(xì)胞中鑒定到基因文庫的復(fù)雜性)值小于0.8的細(xì)胞認(rèn)為是低質(zhì)量細(xì)胞,進(jìn)行去除。
```python
## Cell filter
sc.pp.filter_cells(adata, min_counts=50)
adata = adata[(adata.obs['pct_counts_mt'] < 30) & (adata.obs['log10GenesPerUMI'] > 0.8) & (adata.obs['log10GenesPerUMI'] < 0.99)]
sc.pl.violin(
? ? adata,
? ? ['n_genes_by_counts', 'total_counts', 'pct_counts_mt', 'log10GenesPerUMI'],
? ? groupby='sample',
? ? jitter=False,
? ? rotation=90,
? ? multi_panel=True,
? ? show=False
)
with rc_context({'figure.figsize': (8, 8)}):
? ? sc.pl.spatial(
? ? ? ? adata,
? ? ? ? color=['n_genes_by_counts', 'total_counts', 'log10GenesPerUMI'],
? ? ? ? library_id='P1',
? ? ? ? alpha_img=0.6
? ? )
```

細(xì)胞過濾后,空間上展示展示基因數(shù)、counts數(shù)等

4. 降維聚類
經(jīng)過過濾之后的表達(dá)矩陣,使用Scanpy 進(jìn)行降維聚類分析:
1)首先進(jìn)行數(shù)據(jù)的表達(dá)量均一化,對測序深度進(jìn)行校正,消除技術(shù)噪音;
2)選取高可變基因(細(xì)胞間高特征基因),用于提高細(xì)胞分群準(zhǔn)確度(默認(rèn)值為2000);
3)進(jìn)一步對表達(dá)量值均一化處理后,利用PCA分析對數(shù)據(jù)進(jìn)行降維,PCA降維是一種線性降維方法,運(yùn)用方差分解,將高維的數(shù)據(jù)映射到低維的空間中;然后基于SNN聚類算法對細(xì)胞進(jìn)行聚類和分群,構(gòu)建細(xì)胞間的聚類關(guān)系;
4)最后將降維后的數(shù)據(jù)傳遞到UMAP進(jìn)行可視化展示,細(xì)胞之間的基因表達(dá)模式越相似,在UMAP圖中的距離也越接近。
```python
## data normalization and log10 transform
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
adata.raw = adata
## select highly variable gene and data scale
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5, n_top_genes=2000)
sc.pl.highly_variable_genes(adata, show=False)
adata = adata[:, adata.var['highly_variable']]
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])
sc.pp.scale(adata, max_value=10)
## PCA
sc.tl.pca(adata, svd_solver='arpack')
sc.pl.pca(adata, color='sample', show=False)
sc.pl.pca_variance_ratio(adata, log=True, show=False)
## UMAP
sc.pp.neighbors(adata, n_neighbors=15, n_pcs=15)
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.6, key_added='res_0.6')
sc.tl.leiden(adata, resolution=0.8, key_added='res_0.8')
```
降維聚類結(jié)果UMAP及空間展示
```python
from matplotlib import patheffects
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6,6))
ov.pl.embedding(adata,
? ? ? ? ? ? ? ? basis='X_umap',
? ? ? ? ? ? ? ? color=['res_0.6'],
? ? ? ? ? ? ? ? size=4,
? ? ? ? ? ? ? ? show=False,
? ? ? ? ? ? ? ? legend_loc=None,
? ? ? ? ? ? ? ? add_outline=False,
? ? ? ? ? ? ? ? frameon='small',
? ? ? ? ? ? ? ? legend_fontoutline=2,
? ? ? ? ? ? ? ? ax=ax
? ? ? ? ? ? ? ? )
ov.utils.gen_mpl_labels(
? ? adata,
? ? 'res_0.6',
? ? exclude=("None",), ?
? ? basis='X_umap',
? ? ax=ax,
? ? adjust_kwargs=dict(arrowprops=dict(arrowstyle='-', color='black')),
? ? text_kwargs=dict(fontsize= 12 ,weight='bold', path_effects=[patheffects.withStroke(linewidth=2, foreground='w')] ),
)
```
單細(xì)胞降維聚類結(jié)果展示

降維聚類結(jié)果空間展示

5. 特征基因篩選
```python
sc.tl.rank_genes_groups(
? ? adata,
? ? groupby='res_0.6',
? ? use_raw=True,
? ? method="wilcoxon",
? ? pts=True
)
sc.pl.rank_genes_groups_dotplot(
? ? adata,
? ? groupby='res_0.6',
? ? standard_scale="var",
? ? n_genes=5,
? ? cmap='bwr',
? ? show=False,
)
result = adata.uns['rank_genes_groups']
groups = result['names'].dtype.names
all_diff = []
for c in groups:
? ? tmp = sc.get.rank_genes_groups_df(adata, group=c)
? ? tmp = tmp[(tmp['logfoldchanges']>0) & (tmp['pvals_adj']<0.05)]
? ? tmp['cluster'] = c
? ? all_diff.append(tmp)
all_diff = pd.concat(all_diff, axis=0)
```
各細(xì)胞Cluster Marker基因Dotplot, 結(jié)合上一步各Cluster特征基因,進(jìn)行細(xì)胞類型定義
```python
markers = ["EPCAM", "CEACAM6","SAT1","CSF3R","CXCR2","LYZ","SPP1", "SELENOP","C1QC",
? ? ? ? ? ?"FCGBP","MUC2","CLCA1","COL1A1","MMP2","CD3E","CD4","CD8A","TRAC",
? ? ? ? ? ?"JCHAIN", "TNFRSF17", "CD1C","ACTA2", "CD19","TAGLN", "PECAM1","VWF"]
sc.pl.dotplot(adata, markers, "res_0.6", show=False)
```
