系列回顧:
ArchR官網(wǎng)教程學(xué)習(xí)筆記1:Getting Started with ArchR
ArchR官網(wǎng)教程學(xué)習(xí)筆記2:基于ArchR推測(cè)Doublet
ArchR官網(wǎng)教程學(xué)習(xí)筆記3:創(chuàng)建ArchRProject
ArchR官網(wǎng)教程學(xué)習(xí)筆記4:ArchR的降維
ArchR官網(wǎng)教程學(xué)習(xí)筆記5:ArchR的聚類
ArchR官網(wǎng)教程學(xué)習(xí)筆記6:單細(xì)胞嵌入(Single-cell Embeddings)
ArchR官網(wǎng)教程學(xué)習(xí)筆記7:ArchR的基因評(píng)分和Marker基因
ArchR官網(wǎng)教程學(xué)習(xí)筆記8:定義與scRNA-seq一致的聚類
ArchR官網(wǎng)教程學(xué)習(xí)筆記9:ArchR的偽批量重復(fù)
ArchR官網(wǎng)教程學(xué)習(xí)筆記10:ArchR的call peak
在前面討論基因評(píng)分的章節(jié)中,我們已經(jīng)介紹了標(biāo)記特征的識(shí)別??梢允褂孟嗤暮瘮?shù)(getMarkerFeatures())來標(biāo)識(shí)ArchRProject中存儲(chǔ)的任何矩陣中的標(biāo)記特性。標(biāo)記特征是特定細(xì)胞群所特有的特征。這些對(duì)于理解特定cluster或細(xì)胞類型非常有用。在本章中,我們將討論如何使用這個(gè)功能來識(shí)別Marker峰。
(一)鑒定Marker峰
通常,我們感興趣的是想知道哪些峰是單個(gè)cluster或一小群cluster所特有的。在ArchR中,我們可以使用addMarkerFeatures()函數(shù)和useMatrix = "PeakMatrix"以無監(jiān)督的方式完成此任務(wù)。
首先,讓我們回顧一下我們?cè)趐rojHeme5中研究的細(xì)胞類型和它們的相對(duì)比例。
#Our scRNA labels
> table(projHeme5$Clusters2)
B CD4.M CD4.N CLP Erythroid GMP Mono
432 639 1279 384 732 1201 2579
NK pDC PreB Progenitor
874 300 358 1473
現(xiàn)在,我們準(zhǔn)備使用useMatrix = "PeakMatrix"調(diào)用addMarkerFeatures()函數(shù)來識(shí)別Marker峰。此外,我們通過設(shè)置bias參數(shù)來考慮TSS富集和每個(gè)細(xì)胞唯一片段的數(shù)量,讓ArchR考慮細(xì)胞群之間數(shù)據(jù)質(zhì)量的差異。
> markersPeaks <- getMarkerFeatures(
ArchRProj = projHeme5,
useMatrix = "PeakMatrix",
groupBy = "Clusters2",
bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon"
)
這個(gè)對(duì)象會(huì)通過getMarkerFeatures()返回一個(gè)SummarizedExperiment,其中包含一些不同的assays:
> markersPeaks
class: SummarizedExperiment
dim: 140865 11
metadata(2): MatchInfo Params
assays(7): Log2FC Mean ... AUC MeanBGD
rownames(140865): 1 2 ... 140864 140865
rowData names(4): seqnames idx start end
colnames(11): B CD4.M ... PreB Progenitor
colData names(0):
我們可以使用getMarkers()函數(shù)檢索這個(gè)SummarizedExperiment中我們感興趣的特定片段。此函數(shù)的默認(rèn)行為是返回一個(gè)DataFrame對(duì)象列表,每個(gè)對(duì)象是一個(gè)細(xì)胞群。
> markerList <- getMarkers(markersPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1")
> markerList
List of length 11
names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
如果你對(duì)某一個(gè)細(xì)胞群的Marker峰感興趣,可以單獨(dú)查看:
> markerList$Erythroid
DataFrame with 2656 rows and 7 columns
seqnames idx start end Log2FC FDR MeanDiff
<Rle> <array> <array> <array> <numeric> <numeric> <numeric>
87111 chr22 1219 30129813 30130313 4.28186227428957 1.83893337171705e-15 1.07953356543409
6670 chr1 6670 110407019 110407519 8.67005424991154 2.96546706376336e-13 0.820868679383739
58340 chr17 6183 73631581 73632081 8.61339322841304 3.53910479048555e-13 0.789176607419672
15006 chr10 1557 30440804 30441304 5.96401972746272 1.43825817219516e-12 0.758436365804101
2590 chr1 2590 27869127 27869627 6.24899843746968 3.09707434920801e-12 0.892089356271252
... ... ... ... ... ... ... ...
104472 chr5 770 32714018 32714518 1.82287547506236 0.0094547934235557 0.126001924074063
125339 chr7 6487 150724392 150724892 2.78589698640991 0.0094547934235557 0.111227644844517
51924 chr16 5054 89045463 89045963 1.75678742468666 0.00948266858646735 0.20433103914127
120570 chr7 1718 30185784 30186284 1.58859209325531 0.00955616601235496 0.326414734260411
83256 chr20 3287 50157615 50158115 1.35032538225079 0.00960141941952872 0.462686019918627
我們可以使用getMarkers()來代替DataFrame對(duì)象列表,通過設(shè)置returnGR = TRUE來返回GRangesList對(duì)象:
> markerList <- getMarkers(markersPeaks, cutOff = "FDR <= 0.01 & Log2FC >= 1", returnGR = TRUE)
> markerList
List of length 11
names(11): B CD4.M CD4.N CLP Erythroid GMP Mono NK pDC PreB Progenitor
這個(gè)GRangesList對(duì)象也可以查看某一個(gè)細(xì)胞群的marker峰:
> markerList$Erythroid
GRanges object with 2656 ranges and 3 metadata columns:
seqnames ranges strand | Log2FC FDR MeanDiff
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
[1] chr22 30129813-30130313 * | 4.28186227428957 1.83893337171705e-15 1.07953356543409
[2] chr1 110407019-110407519 * | 8.67005424991154 2.96546706376336e-13 0.820868679383739
[3] chr17 73631581-73632081 * | 8.61339322841304 3.53910479048555e-13 0.789176607419672
[4] chr10 30440804-30441304 * | 5.96401972746272 1.43825817219516e-12 0.758436365804101
[5] chr1 27869127-27869627 * | 6.24899843746968 3.09707434920801e-12 0.892089356271252
... ... ... ... . ... ... ...
[2652] chr5 32714018-32714518 * | 1.82287547506236 0.0094547934235557 0.126001924074063
[2653] chr7 150724392-150724892 * | 2.78589698640991 0.0094547934235557 0.111227644844517
[2654] chr16 89045463-89045963 * | 1.75678742468666 0.00948266858646735 0.20433103914127
[2655] chr7 30185784-30186284 * | 1.58859209325531 0.00955616601235496 0.326414734260411
[2656] chr20 50157615-50158115 * | 1.35032538225079 0.00960141941952872 0.462686019918627
-------
seqinfo: 23 sequences from an unspecified genome; no seqlengths
(二)繪制Marker峰
(1)熱圖
我們可以使用markerHeatmap()函數(shù)將這些Marker峰(或getMarkerFeatures()輸出的任何特征)可視化為一個(gè)熱圖:
> heatmapPeaks <- markerHeatmap(
seMarker = markersPeaks,
cutOff = "FDR <= 0.1 & Log2FC >= 0.5",
transpose = TRUE
)
> reorder_name = c("Mono","Erythroid","GMP","Progenitor","NK","CD4.M","CD4.N","B","PreB","CLP","pDC")
> draw(heatmapPeaks, row_order= reorder_name,heatmap_legend_side = "bot", annotation_legend_side = "bot")

(2)MA Plot和火山圖
除了繪制熱圖,我們還可以為任何單個(gè)細(xì)胞群繪制一個(gè)MA plot或火山圖。我們使用markerPlot()函數(shù)。對(duì)于MA圖,我們指定plotAs =“MA”。這里,我們通過name參數(shù)指定“Erythroid”細(xì)胞群。
> pma <- markerPlot(seMarker = markersPeaks, name = "Erythroid", cutOff = "FDR <= 0.1 & Log2FC >= 1", plotAs = "MA")
> pma

> pv <- markerPlot(seMarker = markersPeaks, name = "Erythroid", cutOff = "FDR <= 0.1 & Log2FC >= 1", plotAs = "Volcano")
> pv

(3)在Browser Tracks展示Marker峰
此外,通過將相關(guān)的峰區(qū)域傳遞給plotBrowserTrack()函數(shù)中的特性參數(shù),我們可以看到這些峰區(qū)域在browser tracks上的覆蓋。這將增加一個(gè)額外的BED形式的marker峰在ArchR軌跡圖的底部。這里我們通過geneSymbol參數(shù)繪制GATA1基因:
> p <- plotBrowserTrack(
ArchRProj = projHeme5,
groupBy = "Clusters2",
geneSymbol = c("GATA1"),
features = getMarkers(markersPeaks, cutOff = "FDR <= 0.1 & Log2FC >= 1", returnGR = TRUE)["Erythroid"],
upstream = 50000,
downstream = 50000
)
> grid::grid.newpage()
> grid::grid.draw(p$GATA1)
> plotPDF(p, name = "Plot-Tracks-With-Features", width = 5, height = 5, ArchRProj = projHeme5, addDOC = FALSE)

(三)比較兩個(gè)細(xì)胞群
Marker特征識(shí)別是一種非常特殊的差異測(cè)試。不過,ArchR還使用相同的getMarkerFeatures()函數(shù)支持標(biāo)準(zhǔn)差異測(cè)試。這里的小竅門是將useGroups設(shè)置為兩個(gè)細(xì)胞組中的一個(gè),而將bgdGroups設(shè)置為另一個(gè)細(xì)胞組。這將在提供的兩個(gè)組之間執(zhí)行差異測(cè)試。在所有這些差異測(cè)試中,傳遞給useGroups的組中較高的峰將具有正的fold change值,而傳遞給bgdGroups的組中較高的峰值將具有負(fù)的fold change值。
在這里,我們?cè)凇癊rythroid”細(xì)胞組和“Progenitor”細(xì)胞組之間進(jìn)行了配對(duì)試驗(yàn):
> markerTest <- getMarkerFeatures(
ArchRProj = projHeme5,
useMatrix = "PeakMatrix",
groupBy = "Clusters2",
testMethod = "wilcoxon",
bias = c("TSSEnrichment", "log10(nFrags)"),
useGroups = "Erythroid",
bgdGroups = "Progenitor"
)
畫MA plot:
> pma <- markerPlot(seMarker = markerTest, name = "Erythroid", cutOff = "FDR <= 0.1 & abs(Log2FC) >= 1", plotAs = "MA")
> pma

畫火山圖:
> pv <- markerPlot(seMarker = markerTest, name = "Erythroid", cutOff = "FDR <= 0.1 & abs(Log2FC) >= 1", plotAs = "Volcano")
> pv

下一章我們將繼續(xù)這個(gè)差異分析,來尋找差異可接近peaks的motif富集。