亚洲无码视频久久久,二区久久久

處理單細(xì)胞不可避免的一個(gè)問題就是樣本整合問題。
那如何將不同器官，不同測序平臺，不同物種之間的單細(xì)胞數(shù)據(jù)進(jìn)行整合分析呢？
Scanpy使用python語言構(gòu)建了一套完整的單細(xì)胞分析流程，其中就包括使用ingest和BBKNN整合方法。

Fig1. https://scanpy-tutorials.readthedocs.io/en/latest/integrating-data-using-ingest.html

單細(xì)胞轉(zhuǎn)錄數(shù)據(jù)分析之Scanpy：http://www.itdecent.cn/p/e22a947e6c60
單細(xì)胞轉(zhuǎn)錄組之Scanpy - 軌跡推斷/擬時(shí)序分析：http://www.itdecent.cn/p/0b2ca0e0b544
單細(xì)胞轉(zhuǎn)錄組之Scanpy - 樣本整合分析：http://www.itdecent.cn/p/beef8a8be360
單細(xì)胞空間轉(zhuǎn)錄分析之Scanpy：http://www.itdecent.cn/p/8dc231c06932
單細(xì)胞空間轉(zhuǎn)錄分析之Scanpy-多樣本整合：http://www.itdecent.cn/p/caa98aeac191
單細(xì)胞空間轉(zhuǎn)錄分析之Seurat：http://www.itdecent.cn/p/c9a601ced91f
單細(xì)胞空間轉(zhuǎn)錄分析之Seurat-多樣本整合（淺談空間批次）：http://www.itdecent.cn/p/609b04096b79

如果僅僅是通過相同基因把不同的矩陣拼接起來，我們通常得到的結(jié)果是: Fig1 Raw。
細(xì)胞通常是以不同樣本，不同平臺，或者不同時(shí)間點(diǎn)等分開，也就是說兩組數(shù)據(jù)中相同的細(xì)胞類別是并沒有能合在一起。
當(dāng)然，如果按基因直接整合后，細(xì)胞能均勻混合，我們是可以直接進(jìn)行矩陣合并后分析的，但是當(dāng)整合后混合結(jié)果不理想時(shí)，我們可能需要在整合時(shí)去除批次，主要是樣本批次，平臺批次，不同時(shí)間批次等，保留生物學(xué)意義。

因此，一系列單細(xì)胞整合分析工具（去批次）應(yīng)運(yùn)而生，Tran等人在Genome Biology上發(fā)表了一篇比較14種單細(xì)胞批次矯正的方法的文章：A benchmark of batch-effect correction methods for single-cell RNA sequencing data。 https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1850-9

Fig2. batch-effect correction methods

這兒我們不講這些整合方法的好壞，主要還是學(xué)習(xí)Scanpy提供的單細(xì)胞整合方法：包括ingest和BBKNN（Batch balanced kNN）。其中ingest主要用于含有reference的數(shù)據(jù)，而BBKNN主要用于整合（去批次），當(dāng)然整合的方法也可以用來做reference。

ingest注釋：非常簡單，過程清晰，工作流程是透明且快速的。主要是在參考數(shù)據(jù)上擬合模型并使用它來投影新數(shù)據(jù)。該函數(shù)使用knn分類器來映射標(biāo)簽，使用UMAP來映射嵌入。

導(dǎo)入相關(guān)包

import scanpy as sc
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as pt
import bbknn

sc.settings.verbosity = 1             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_versions()
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(3, 3), facecolor='white')
import os 
os.getcwd()  ##查看當(dāng)前路徑
os.chdir('./Integrate/ingest') ##修改路徑
os.getcwd() 
results_file = 'ingest.h5ad'

讀取數(shù)據(jù) （Scanpy自帶的兩個(gè)數(shù)據(jù)集，一個(gè)是pbmc3k的，另一個(gè)是pbmc68k的部分細(xì)胞，都已經(jīng)將細(xì)胞類別注釋好了）

adata_ref = sc.datasets.pbmc3k_processed()  # this is an earlier version of the dataset from the pbmc3k tutorial
adata = sc.datasets.pbmc68k_reduced()

我們可以查看一下數(shù)據(jù)集的內(nèi)容：

adata_ref
AnnData object with n_obs × n_vars = 2638 × 1838
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
    var: 'n_cells'
    uns: 'draw_graph', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups'
    obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_draw_graph_fr'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'

adata_ref.obs.head()
                  n_genes  percent_mito  n_counts          louvain
index                                                             
AAACATACAACCAC-1      781      0.030178    2419.0      CD4 T cells
AAACATTGAGCTAC-1     1352      0.037936    4903.0          B cells
AAACATTGATCAGC-1     1131      0.008897    3147.0      CD4 T cells
AAACCGTGCTTCCG-1      960      0.017431    2639.0  CD14+ Monocytes
AAACCGTGTATGCG-1      522      0.012245     980.0         NK cells

adata_ref.obs.louvain.value_counts()
CD4 T cells          1144
CD14+ Monocytes       480
B cells               342
CD8 T cells           316
NK cells              154
FCGR3A+ Monocytes     150
Dendritic cells        37
Megakaryocytes         15
Name: louvain, dtype: int64

data.obs.head()
                      bulk_labels  n_genes  percent_mito  n_counts   S_score  G2M_score phase louvain
index                                                                                                
AAAGCCTGGCTAAC-1   CD14+ Monocyte     1003      0.023856    2557.0 -0.119160  -0.816889    G1       1
AAATTCGATGCACA-1        Dendritic     1080      0.027458    2695.0  0.067026  -0.889498     S       1
AACACGTGGTCTTT-1         CD56+ NK     1228      0.016819    3389.0 -0.147977  -0.941749    G1       3
AAGTGCACGTGCTA-1  CD4+/CD25 T Reg     1007      0.011797    2204.0  0.065216   1.469291   G2M       9
ACACGAACGGAGTG-1        Dendritic     1178      0.017277    3878.0 -0.122974  -0.868185    G1       2

可以看到參考數(shù)據(jù)集adata_ref已經(jīng)注釋好了細(xì)胞類別信息，這兒作為reference數(shù)據(jù)集。
可以看到refence細(xì)胞類別在umap上的分布。

var_names = adata_ref.var_names.intersection(adata.var_names) #提取共有基因
adata_ref = adata_ref[:, var_names]
adata = adata[:, var_names]
sc.pp.pca(adata_ref)  ##對參考數(shù)據(jù)集進(jìn)行降維
sc.pp.neighbors(adata_ref)
sc.tl.umap(adata_ref)
sc.pl.umap(adata_ref, color='louvain',legend_loc='on data',legend_fontsize='xx-small')
pt.savefig('pbmc3k_ref_umap.pdf')

adata_ref umap-type

使用ingest預(yù)測pbmc68k數(shù)據(jù)集（這兒pbmc68k假設(shè)是不知道細(xì)胞類別的）

sc.tl.ingest(adata, adata_ref, obs='louvain')
adata.uns['louvain_colors'] = adata_ref.uns['louvain_colors']  # fix colors
sc.pl.umap(adata, color=['louvain', 'bulk_labels'], wspace=0.5,legend_fontsize='xx-small')
pt.savefig('pbmc_data_umap.pdf')

adata umap-type

左邊顯示了預(yù)測數(shù)據(jù)pbmc68k預(yù)測到的細(xì)胞類別，右邊是pbmc68k原來的細(xì)胞標(biāo)簽。通過將“ bulk_labels”注釋與“ louvain”進(jìn)行比較，我們看到數(shù)據(jù)已被合理地映射，只有樹突狀細(xì)胞的注釋似乎是模棱兩可的。

對數(shù)據(jù)集合并，查看是否存在批次效應(yīng)。

adata_concat = adata_ref.concatenate(adata, batch_categories=['ref', 'new']) #合并數(shù)據(jù)集
adata_concat.obs.louvain = adata_concat.obs.louvain.astype('category')
adata_concat.obs.louvain.cat.reorder_categories(adata_ref.obs.louvain.cat.categories, inplace=True)  # fix category ordering
adata_concat.uns['louvain_colors'] = adata_ref.uns['louvain_colors']  # fix category colors
sc.pl.umap(adata_concat, color=['batch', 'louvain'],legend_fontsize='xx-small')
pt.savefig('pbmc_batch_umap.pdf')

pbmc_batch_umap

盡管在單核細(xì)胞和樹突狀細(xì)胞簇中似乎有一些批處理效應(yīng)，但新數(shù)據(jù)在其他方面相對均勻地映射。
使用BBKNN整合

import bbknn                     
sc.tl.pca(adata_concat)
sc.external.pp.bbknn(adata_concat, batch_key='batch')  
sc.pl.umap(adata_concat, color=['batch', 'louvain'],legend_fontsize='xx-small')
pt.savefig('pbmc_batch_umap_bbknn.pdf')

pbmc_batch_umap_bbknn

我們可以看到使用bbknn整合后，也不能維護(hù)巨核細(xì)胞簇。但是，似乎可以更均勻地混合細(xì)胞。

使用BBKNN整合，Pancreas胰腺數(shù)據(jù)
BBKNN，一個(gè)快速和直觀的批處理效果去除工具，可以直接在scanpy工作流中使用。
它包含來自4個(gè)不同研究（Segerstolpe，Baron，Wang，Muraro）的人類胰腺數(shù)據(jù)，這些數(shù)據(jù)已在有關(guān)單細(xì)胞數(shù)據(jù)集集成的開創(chuàng)性論文。數(shù)據(jù)可以直接在這網(wǎng)址下載：https://www.dropbox.com/s/qj1jlm9w10wmt0u/pancreas.h5ad?dl=1

導(dǎo)入包，設(shè)置輸出路徑

import scanpy as sc
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as pt
import bbknn                     
sc.settings.set_figure_params(dpi=80, frameon=False, figsize=(3, 3), facecolor='white')
import os 
os.getcwd()  ##查看當(dāng)前路徑
os.chdir(./Integrate/BBKNN') ##修改路徑
os.getcwd() 
results_file = 'bbknn.h5ad'

導(dǎo)入數(shù)據(jù)

adata_all = sc.read("./Integrate/BBKNN/pancreas.h5ad")
adata_all.shape
counts = adata_all.obs.celltype.value_counts()
counts

counts
alpha                     4214
beta                      3354
ductal                    1804
acinar                    1368
not applicable            1154
delta                      917
gamma                      571
endothelial                289
activated_stellate         284
dropped                    178
quiescent_stellate         173
mesenchymal                 80
macrophage                  55
PSC                         54
unclassified endocrine      41
co-expression               39
mast                        32
epsilon                     28
mesenchyme                  27
schwann                     13
t_cell                       7
MHC class II                 5
unclear                      4
unclassified                 2
Name: celltype, dtype: int64

minority_classes = counts.index[-5:].tolist()        # 獲得細(xì)胞數(shù)目少的類別
adata_all = adata_all[                             
    ~adata_all.obs.celltype.isin(minority_classes)]
adata_all.obs.celltype.cat.reorder_categories(counts.index[:-5].tolist(), inplace=True) # 去除細(xì)胞少的類別

降維

sc.pp.pca(adata_all)
sc.pp.neighbors(adata_all)
sc.tl.umap(adata_all)
sc.pl.umap(adata_all, color=['batch', 'celltype'], palette=sc.pl.palettes.vega_20_scanpy,legend_fontsize='xx-small')
pt.savefig('pancreas_batch_umap.pdf')

pancreas_batch_umap

這兒我們可以看到明顯的批次效應(yīng)存在，樣本之間的批次很嚴(yán)重。

使用BBKNN來整合數(shù)據(jù)

sc.external.pp.bbknn(adata_all, batch_key='batch')
sc.tl.umap(adata_all)
sc.pl.umap(adata_all, color=['batch', 'celltype'],legend_fontsize='xx-small')
pt.savefig('pancreas_batch_umap_bbknn.pdf')

pancreas_batch_umap_bbknn

我們通過BBKNN整合之后，可以明顯發(fā)現(xiàn)批次效應(yīng)去除非常明顯，右邊相同細(xì)胞類別能較好的合并在一起。

如果知道其中一套數(shù)據(jù)集中的細(xì)胞類別，這個(gè)數(shù)據(jù)集可以作為reference，就可以用上面提到的ingest。
我們這兒試著把batch 0作為reference，給其他數(shù)據(jù)集進(jìn)行類別注釋，并評估注釋類別的準(zhǔn)確性。

adata_ref = adata_all[adata_all.obs.batch == '0']
sc.pp.pca(adata_ref)
sc.pp.neighbors(adata_ref)
sc.tl.umap(adata_ref)
sc.pl.umap(adata_ref, color='celltype',legend_loc='on data',legend_fontsize='xx-small')
pt.savefig('pancreas_batch0_umap.pdf')

reference umap-celltype

可以看到單一reference數(shù)據(jù)集細(xì)胞類別清晰。

使用ingest預(yù)測其它3個(gè)數(shù)據(jù)集的細(xì)胞類別

adatas = [adata_all[adata_all.obs.batch == i].copy() for i in ['1', '2', '3']]
sc.settings.verbosity = 2  # a bit more logging
for iadata, adata in enumerate(adatas):
    print(f'... integrating batch {iadata+1}')
    adata.obs['celltype_orig'] = adata.obs.celltype  # save the original cell type
    sc.tl.ingest(adata, adata_ref, obs='celltype')
    
adata_concat = adata_ref.concatenate(adatas)
adata_concat.obs.celltype = adata_concat.obs.celltype.astype('category')
adata_concat.obs.celltype.cat.reorder_categories(adata_ref.obs.celltype.cat.categories, inplace=True)  # fix category ordering
adata_concat.uns['celltype_colors'] = adata_ref.uns['celltype_colors']  # fix category coloring
sc.pl.umap(adata_concat, color=['batch', 'celltype','celltype_orig'],legend_fontsize='xx-small')
pt.savefig('pancreas_batch_umap_ingest.pdf')

pancreas_batch_umap_ingest

與BBKNN的結(jié)果相比，這是以一種更加明顯的方式保持分群。
預(yù)測數(shù)據(jù)細(xì)胞類別一致性評估

adata_query = adata_concat[adata_concat.obs.batch.isin(['1', '2', '3'])] #只提取batch 1，2，3
sc.pl.umap(adata_query, color=['batch', 'celltype', 'celltype_orig'], wspace=0.4,legend_fontsize='xx-small')
pt.savefig('pancreas_batch_umap_ingest_query.pdf')

image.png

這個(gè)結(jié)果依然不能很好的反映一致性，讓我們首先關(guān)注與參考保守的細(xì)胞類型，以簡化混淆矩陣的reads。

obs_query = adata_query.obs
conserved_categories = obs_query.celltype.cat.categories.intersection(obs_query.celltype_orig.cat.categories)  # intersected categories
obs_query_conserved = obs_query.loc[obs_query.celltype.isin(conserved_categories) & obs_query.celltype_orig.isin(conserved_categories)]  # intersect categories
obs_query_conserved.celltype.cat.remove_unused_categories(inplace=True)  # remove unused categoriyes
obs_query_conserved.celltype_orig.cat.remove_unused_categories(inplace=True)  # remove unused categoriyes
obs_query_conserved.celltype_orig.cat.reorder_categories(obs_query_conserved.celltype.cat.categories, inplace=True)  # fix category ordering
pd.crosstab(obs_query_conserved.celltype, obs_query_conserved.celltype_orig)
pd.crosstab(adata_query.obs.celltype, adata_query.obs.celltype_orig)

pd.crosstab(adata_query.obs.celltype, adata_query.obs.celltype_orig)
celltype_orig       PSC  acinar  alpha  beta  ...  mesenchymal  mesenchyme  not applicable  unclassified endocrine
celltype                                      ...                                                                 
alpha                 0       0   1819     3  ...            0           0             310                      10
beta                  0       1     49   804  ...            0           1             511                      24
ductal                0     240      7     5  ...            1           0             102                       1
acinar                0     168      2     3  ...            0           0              89                       0
delta                 0       0      5     4  ...            0           0             102                       6
gamma                 0       0      1     5  ...            0           0              14                       0
endothelial           1       0      2     0  ...            0           6               7                       0
activated_stellate   49       1      1     3  ...           79          20              17                       0
quiescent_stellate    4       0      1     1  ...            0           0               1                       0
macrophage            0       0      1     1  ...            0           0               1                       0
mast                  0       0      0     0  ...            0           0               0                       0

[11 rows x 16 columns]

sc.tl.embedding_density(adata_concat, groupby='batch')
sc.pl.embedding_density(adata_concat, groupby='batch')
pt.savefig('pancreas_batch_umap_density.pdf')

可視化分布的批次

pancreas_batch_umap_density

總體而言，保守細(xì)胞類型也按預(yù)期進(jìn)行映射。主要的例外是原始注釋中的某些腺泡細(xì)胞顯示為腺泡細(xì)胞。然而，已經(jīng)觀察到參考數(shù)據(jù)具有腺泡和導(dǎo)管細(xì)胞的簇，這解釋了差異，并指示了初始注釋中的潛在不一致。

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

單細(xì)胞轉(zhuǎn)錄組之Scanpy - 樣本整合分析

單細(xì)胞轉(zhuǎn)錄組之Scanpy - 樣本整合分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

單細(xì)胞轉(zhuǎn)錄組之Scanpy - 樣本整合分析

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av