不同數(shù)據(jù)集聯(lián)合分析

選擇了通過四種技術產生的人胰島細胞數(shù)據(jù)集：CelSeq（GSE81076) CelSeq2（GSE85241），F(xiàn)luidigm C1（GSE86469）和SMART-Seq2（E-MTAB-5061）。我們在此處(數(shù)據(jù))提供組合的原始數(shù)據(jù)矩陣和相關的元數(shù)據(jù)文件以便開始。

數(shù)據(jù)集預處理

加載表達式矩陣和元數(shù)據(jù)。元數(shù)據(jù)文件包含四個數(shù)據(jù)集中每個單元格的技術（tech列）和單元格類型注釋（cell type列）。

library(Seurat)

pancreas.data <- readRDS(file = "../data/pancreas_expression_matrix.rds")

metadata <- readRDS(file = "../data/pancreas_metadata.rds")

為了構建參考，我們將識別各個數(shù)據(jù)集之間的“錨點”。首先，我們將組合對象拆分為一個列表，每個數(shù)據(jù)集都作為一個元素。

pancreas <- CreateSeuratObject(pancreas.data, meta.data = metadata)

pancreas.list <- SplitObject(pancreas, split.by = "tech")

在找到錨點之前，我們執(zhí)行標準預處理（對數(shù)標準化），并為每個錨點單獨識別變量要素。請注意，Seurat v3基于方差穩(wěn)定轉換實現(xiàn)了一種改進的變量特征選擇方法（"vst"）

for (i in 1:length(pancreas.list)) {

? ? pancreas.list[[i]] <- NormalizeData(pancreas.list[[i]], verbose = FALSE)

? ? pancreas.list[[i]] <- FindVariableFeatures(pancreas.list[[i]], selection.method = "vst",

? ? ? ? nfeatures = 2000, verbose = FALSE)

}

整合3個胰島細胞數(shù)據(jù)集

接下來，我們使用FindIntegrationAnchors函數(shù)識別錨點，該函數(shù)將Seurat對象列表作為輸入。在這里，我們將三個對象集成到一個引用中（我們將在后面的小插圖中使用第四個）

我們在這里使用所有默認參數(shù)來識別錨點，包括數(shù)據(jù)集的“維度”（30;隨意嘗試在寬范圍內改變此參數(shù)，例如在10到50之間）。

reference.list <- pancreas.list[c("celseq","celseq2","smartseq2")]pancreas.anchors <- FindIntegrationAnchors(object.list = reference.list, dims =1:30)

然后我們將這些錨傳遞給IntegrateData函數(shù)，該函數(shù)返回一個Seurat對象。

返回的對象將包含一個new?Assay，它包含所有單元格的集成（或“批量修正”）表達式矩陣，使它們能夠被聯(lián)合分析。

pancreas.integrated <- IntegrateData(anchorset = pancreas.anchors, dims =1:30)

運行后IntegrateData，該Seurat對象將包含一個Assay帶有集成表達式矩陣的new?。請注意，原始（未校正的值）仍存儲在“RNA”分析中的對象中，因此您可以來回切換。

然后我們可以使用這個新的集成矩陣進行下游分析和可視化。在這里，我們擴展集成數(shù)據(jù)，運行PCA，并使用UMAP可視化結果。集成數(shù)據(jù)集按單元格類型而不是技術集群。

library(ggplot2)

library(cowplot)

# switch to integrated assay. The variable features of this assay are automatically set during Integrate Data#

DefaultAssay(pancreas.integrated) <-"integrated"

# Run the standard workflow for visualization and clustering

pancreas.integrated <- ScaleData(pancreas.integrated, verbose =FALSE)

pancreas.integrated <- RunPCA(pancreas.integrated, npcs =30, verbose =FALSE)

pancreas.integrated <- RunUMAP(pancreas.integrated, reduction ="pca", dims =1:30)

p1 <- DimPlot(pancreas.integrated, reduction ="umap", group.by ="tech")

p2 <- DimPlot(pancreas.integrated, reduction ="umap", group.by ="celltype", label =TRUE,repel =TRUE) + NoLegend()

plot_grid(p1, p2)

使用集成參考的細胞類型分類

Seurat v3還支持將參考數(shù)據(jù)（或元數(shù)據(jù)）投影到查詢對象上。雖然許多方法都是守恒的（兩個程序都以識別錨點開始），但數(shù)據(jù)傳輸和集成之間存在兩個重要區(qū)別：

在數(shù)據(jù)傳輸中，Seurat不會更正或修改查詢表達式數(shù)據(jù)。

在數(shù)據(jù)傳輸中，Seurat有一個選項（默認設置）將參考的PCA結構投影到查詢上，而不是學習與CCA的聯(lián)合結構。我們通常建議在scRNA-seq數(shù)據(jù)集之間投影數(shù)據(jù)時使用此選項。

在找到錨之后，我們使用該TransferData函數(shù)基于參考數(shù)據(jù)（參考單元類型標簽的向量）對查詢單元進行分類。TransferData返回具有預測ID和預測分數(shù)的矩陣，我們可以將其添加到查詢元數(shù)據(jù)中。

pancreas.query <- pancreas.list[["fluidigmc1"]]

pancreas.anchors <- FindTransferAnchors(reference = pancreas.integrated, query = pancreas.query,? ? dims =1:30)

predictions <- TransferData(anchorset = pancreas.anchors, refdata = pancreas.integrated$celltype,? ? dims =1:30)

?pancreas.query <- AddMetaData(pancreas.query, metadata = predictions)

因為我們從完整的綜合分析中獲得了原始標簽注釋，所以我們可以評估我們預測的細胞類型注釋與完整參考的匹配程度。在這個例子中，我們發(fā)現(xiàn)細胞類型分類有很高的一致性，超過97％的細胞被正確標記。

pancreas.query$prediction.match <- pancreas.query$predicted.id == pancreas.query$celltype

table(pancreas.query$prediction.match)

## FALSE? TRUE

##? ? 16? 622

為了進一步驗證這一點，我們可以檢查特定胰島細胞群的一些經典細胞類型標記。請注意，即使這些細胞類型中的一些僅由一個或兩個細胞（例如ε細胞）表示，我們仍然能夠正確地對它們進行分類。

table(pancreas.query$predicted.id)

##? ? ? ? ? ? acinar? ? ? activated_stellate? ? ? ? ?alpha

##? ? ? ? ? ? ? ? 21? ? ? ? ? ? ? ? ? 17? ? ? ? ? ? ? ? ? ? ? ? ?248

##? ? ? ? ? ? ? beta? ? ? ? ? ? ? ? delta? ? ? ? ? ? ? ? ? ?ductal

##? ? ? ? ? ? ? ? 258? ? ? ? ? ? ? ? 22? ? ? ? ? ? ? ? ? ? ? ? 33

##? ? ? ? endothelial? ? ? ? ? ? epsilon? ? ? ? ? ? ? gamma

##? ? ? ? ? ? ? ? 13? ? ? ? ? ? ? ? ? ?1? ? ? ? ? ? ? ? ? ? ? ? ?17

##? ? ? ? macrophage? ? ? ? ? mast? ? ? ? ? ? ? ? schwann

##? ? ? ? ? ? ? ? ? 1? ? ? ? ? ? ? ? ? ? 2? ? ? ? ? ? ? ? ? ? ? ? 5

VlnPlot(pancreas.query, c("REG1A","PPY","SST","GHRL","VWF","SOX10"), group.by ="predicted.id")

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

單細胞分析seurat包學習筆記2

單細胞分析seurat包學習筆記2

不同數(shù)據(jù)集聯(lián)合分析

數(shù)據(jù)集預處理

整合3個胰島細胞數(shù)據(jù)集

使用集成參考的細胞類型分類

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

單細胞分析seurat包學習筆記2

不同數(shù)據(jù)集聯(lián)合分析

數(shù)據(jù)集預處理

整合3個胰島細胞數(shù)據(jù)集

使用集成參考的細胞類型分類

相關閱讀更多精彩內容

友情鏈接更多精彩內容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av