使用SnapATAC分析單細(xì)胞ATAC-seq數(shù)據(jù)(二):10X Adult Mouse Brain

image.png

10X Adult Mouse Brain

在本示例中,我們將分析來(lái)自10X genomics平臺(tái)測(cè)序產(chǎn)生的5000個(gè)成年小鼠大腦細(xì)胞的數(shù)據(jù)集。該示例數(shù)據(jù)可以從以下網(wǎng)址進(jìn)行下載:http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X/

分析內(nèi)容

  • Step 0. Data download
  • Step 1. Barcode selection
  • Step 2. Add cell-by-bin matrix
  • Step 3. Matrix binarization
  • Step 4. Bin filtering
  • Step 5. Dimensionality reduction
  • Step 6. Determine significant components
  • Step 7. Graph-based clustering
  • Step 8. Visualization
  • Step 9. Gene based annotation
  • Step 10. Heretical clustering
  • Step 11. Identify peak
  • Step 12. Create a cell-by-peak matrix
  • Step 13. Add cell-by-peak matrix
  • Step 14. Identify differentially accessible regions
  • Step 15. Motif analysis
  • Step 16. GREAT analysis

Step 0. Data download

在本示例中,我們將直接下載所需的snap文件,該文件中已經(jīng)包含了cell-by-bin/cell-by-peak matrix的計(jì)數(shù)矩陣。

wget http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X/atac_v1_adult_brain_fresh_5k.snap
wget http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X/atac_v1_adult_brain_fresh_5k_singlecell.csv

Step 1. Barcode selection

首先,我們對(duì)數(shù)據(jù)進(jìn)行初步的過(guò)濾,基于以下標(biāo)準(zhǔn)選擇出高質(zhì)量barcodes的細(xì)胞: 1) number of unique fragments; 2) fragments in promoter ratio;

# 加載SnapATAC包
library(SnapATAC);
# 使用createSnap函數(shù)構(gòu)建snap對(duì)象
x.sp = createSnap(
    file="atac_v1_adult_brain_fresh_5k.snap",
    sample="atac_v1_adult_brain_fresh_5k",
    num.cores=1
  );
# 讀取barcode信息
barcodes = read.csv(
    "atac_v1_adult_brain_fresh_5k_singlecell.csv",
    head=TRUE
  );
barcodes = barcodes[2:nrow(barcodes),];

# 計(jì)算比對(duì)到promoter區(qū)域的比率
promoter_ratio = (barcodes$promoter_region_fragments+1) / (barcodes$passed_filters + 1);
UMI = log(barcodes$passed_filters+1, 10);
data = data.frame(UMI=UMI, promoter_ratio=promoter_ratio);
barcodes$promoter_ratio = promoter_ratio;

library(viridisLite);
library(ggplot2);
p1 = ggplot(
    data, 
    aes(x= UMI, y= promoter_ratio)) + 
    geom_point(size=0.1, col="grey") +
    theme_classic() +
    ggtitle("10X Fresh Adult Brain") +
    ylim(0, 1) + xlim(0, 6) +
    labs(x = "log10(UMI)", y="promoter ratio") 
p1 

# 根據(jù)條件篩選barcode
barcodes.sel = barcodes[which(UMI >= 3 & UMI <= 5 & promoter_ratio >= 0.15 & promoter_ratio <= 0.6),];
rownames(barcodes.sel) = barcodes.sel$barcode;
x.sp = x.sp[which(x.sp@barcode %in% barcodes.sel$barcode),];
x.sp@metaData = barcodes.sel[x.sp@barcode,];
x.sp
## number of barcodes: 4100
## number of bins: 0
## number of genes: 0
## number of peaks: 0
## number of motifs: 0
image.png

Step 2. Add cell-by-bin matrix

接下來(lái),我們將5kb分辨率的cell-by-bin計(jì)數(shù)矩陣添加到snap對(duì)象中。addBmatToSnap函數(shù)將自動(dòng)讀取cell-by-bin計(jì)數(shù)矩陣,并將其添加到snap對(duì)象的bmat slot 中。

# show what bin sizes exist in atac_v1_adult_brain_fresh_5k.snap file
# 使用showBinSizes函數(shù)查看snap文件中的bin size信息
showBinSizes("atac_v1_adult_brain_fresh_5k.snap");
[1] 1000 5000 10000

# 使用addBmatToSnap函數(shù)添加cell-by-bin計(jì)數(shù)矩陣
x.sp = addBmatToSnap(x.sp, bin.size=5000, num.cores=1);

Step 3. Matrix binarization

接下來(lái),我們將cell-by-bin的計(jì)數(shù)矩陣轉(zhuǎn)換為二進(jìn)制矩陣。計(jì)數(shù)矩陣中的某些items具有異常高的覆蓋率,這可能是由比對(duì)錯(cuò)誤造成的。因此,我們會(huì)將計(jì)數(shù)矩陣中覆蓋率最高的0.1%的items進(jìn)行刪除,然后將其余的非零的items轉(zhuǎn)換為1。

# 使用makeBinary函數(shù)將計(jì)數(shù)矩陣轉(zhuǎn)換為二進(jìn)制矩陣
x.sp = makeBinary(x.sp, mat="bmat");

Step 4. Bin filtering

首先,我們將過(guò)濾掉與ENCODE blacklist區(qū)域重疊的bins,避免潛在的人為因素產(chǎn)生的誤差。

system("wget http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/mm10-mouse/mm10.blacklist.bed.gz");

library(GenomicRanges);
black_list = read.table("mm10.blacklist.bed.gz");
black_list.gr = GRanges(
    black_list[,1], 
    IRanges(black_list[,2], black_list[,3])
  );

idy = queryHits(findOverlaps(x.sp@feature, black_list.gr));
if(length(idy) > 0){x.sp = x.sp[,-idy, mat="bmat"]};
x.sp
## number of barcodes: 4100
## number of bins: 546103
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

接下來(lái),我們將移除那些不需要的染色體上的信息。

chr.exclude = seqlevels(x.sp@feature)[grep("random|chrM", seqlevels(x.sp@feature))];

idy = grep(paste(chr.exclude, collapse="|"), x.sp@feature);
if(length(idy) > 0){x.sp = x.sp[,-idy, mat="bmat"]};
x.sp
## number of barcodes: 4100
## number of bins: 545183
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

最后,bin的覆蓋率大致服從對(duì)數(shù)正態(tài)分布(log normal distribution),我們會(huì)將與invariant features(如管家基因的啟動(dòng)子)區(qū)域重疊的前5%的bins進(jìn)行刪除。

bin.cov = log10(Matrix::colSums(x.sp@bmat)+1);
hist(
    bin.cov[bin.cov > 0], 
    xlab="log10(bin cov)", 
    main="log10(Bin Cov)", 
    col="lightblue", 
    xlim=c(0, 5)
  );
bin.cutoff = quantile(bin.cov[bin.cov > 0], 0.95);
idy = which(bin.cov <= bin.cutoff & bin.cov > 0);
x.sp = x.sp[, idy, mat="bmat"];
x.sp
## number of barcodes: 4100
## number of bins: 474624
## number of genes: 0
## number of peaks: 0
## number of motifs: 0
image.png

Step 5. Dimensionality reduction

使用diffusion maps的方法進(jìn)行數(shù)據(jù)降維

x.sp = runDiffusionMaps(
    obj=x.sp,
    input.mat="bmat", 
    num.eigs=50
  );

Step 6. Determine significant components

接下來(lái),我們基于數(shù)據(jù)降維的結(jié)果確定用于下游分析的維數(shù)。我們繪制不同維數(shù)之間的配對(duì)散點(diǎn)圖,并選擇散點(diǎn)圖開(kāi)始看起來(lái)零散的維數(shù)。在下面的示例中,我們選擇了前20個(gè)維度。

plotDimReductPW(
    obj=x.sp, 
    eigs.dims=1:50,
    point.size=0.3,
    point.color="grey",
    point.shape=19,
    point.alpha=0.6,
    down.sample=5000,
    pdf.file.name=NULL, 
    pdf.height=7, 
    pdf.width=7
  );
image.png

Step 7. Graph-based clustering

選擇好有效的降維維度后,我們基于它們來(lái)構(gòu)造一個(gè)K近鄰(KNN)的聚類(lèi)圖(K =15)。在該圖中,每個(gè)點(diǎn)代表一個(gè)細(xì)胞,并根據(jù)歐氏距離確定每個(gè)細(xì)胞的k近鄰個(gè)點(diǎn)。

x.sp = runKNN(
    obj=x.sp,
    eigs.dims=1:20,
    k=15
  );

x.sp=runCluster(
    obj=x.sp,
    tmp.folder=tempdir(),
    louvain.lib="R-igraph",
    seed.use=10
  );

x.sp@metaData$cluster = x.sp@cluster;

Step 8. Visualization

SnapATAC可以使用tSNE(FI-tsne)或UMAP的方法對(duì)降維聚類(lèi)后的結(jié)果進(jìn)行可視化展示。在此例中,我們計(jì)算t-SNE embedding,使用tSNE方法進(jìn)行可視化展示。同時(shí),我們還將測(cè)序深度或其他偏差投射到t-SNE embedding上。

x.sp = runViz(
    obj=x.sp, 
    tmp.folder=tempdir(),
    dims=2,
    eigs.dims=1:20, 
    method="Rtsne",
    seed.use=10
  );

par(mfrow = c(2, 2));
plotViz(
    obj=x.sp,
    method="tsne", 
    main="10X Brain Cluster",
    point.color=x.sp@cluster, 
    point.size=1, 
    point.shape=19, 
    point.alpha=0.8, 
    text.add=TRUE,
    text.size=1.5,
    text.color="black",
    text.halo.add=TRUE,
    text.halo.color="white",
    text.halo.width=0.2,
    down.sample=10000,
    legend.add=FALSE
  );

plotFeatureSingle(
    obj=x.sp,
    feature.value=log(x.sp@metaData[,"passed_filters"]+1,10),
    method="tsne", 
    main="10X Brain Read Depth",
    point.size=0.2, 
    point.shape=19, 
    down.sample=10000,
    quantiles=c(0.01, 0.99)
  ); 

plotFeatureSingle(
    obj=x.sp,
    feature.value=x.sp@metaData$peak_region_fragments / x.sp@metaData$passed_filters,
    method="tsne", 
    main="10X Brain FRiP",
    point.size=0.2, 
    point.shape=19, 
    down.sample=10000,
    quantiles=c(0.01, 0.99) # remove outliers
  );

plotFeatureSingle(
    obj=x.sp,
    feature.value=x.sp@metaData$duplicate / x.sp@metaData$total,
    method="tsne", 
    main="10X Brain Duplicate",
    point.size=0.2, 
    point.shape=19, 
    down.sample=10000,
    quantiles=c(0.01, 0.99) # remove outliers
  );
image.png

Step 9. Gene based annotation

為了幫助注釋聚類(lèi)后分群的細(xì)胞簇,SnapATAC接下來(lái)將創(chuàng)建cell-by-gene計(jì)數(shù)矩陣,并可視化marker基因的富集情況,根據(jù)marker基因的表達(dá)情況進(jìn)行cluster的注釋。

system("wget http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X/gencode.vM16.gene.bed");
genes = read.table("gencode.vM16.gene.bed");
genes.gr = GRanges(genes[,1], 
    IRanges(genes[,2], genes[,3]), name=genes[,4]
  );

marker.genes = c(
    "Snap25", "Gad2", "Apoe",
    "C1qb", "Pvalb", "Vip", 
    "Sst", "Lamp5", "Slc17a7"
  );

genes.sel.gr <- genes.gr[which(genes.gr$name %in% marker.genes)];

# re-add the cell-by-bin matrix to the snap object;
x.sp = addBmatToSnap(x.sp);
x.sp = createGmatFromMat(
    obj=x.sp, 
    input.mat="bmat",
    genes=genes.sel.gr,
    do.par=TRUE,
    num.cores=10
  );

# normalize the cell-by-gene matrix
x.sp = scaleCountMatrix(
    obj=x.sp, 
    cov=x.sp@metaData$passed_filters + 1,
    mat="gmat",
    method = "RPM"
  );

# smooth the cell-by-gene matrix
x.sp = runMagic(
    obj=x.sp,
    input.mat="gmat",
    step.size=3
  );

par(mfrow = c(3, 3));
for(i in 1:9){
    plotFeatureSingle(
        obj=x.sp,
        feature.value=x.sp@gmat[, marker.genes[i]],
        method="tsne", 
        main=marker.genes[i],
        point.size=0.1, 
        point.shape=19, 
        down.sample=10000,
        quantiles=c(0, 1)
  )};
images

Step 10. Heretical clustering

接下來(lái),我們將屬于同一cluster的細(xì)胞匯集到一起,用以創(chuàng)建每個(gè)cluster的聚合信號(hào)(aggregate signal)。

# calculate the ensemble signals for each cluster
ensemble.ls = lapply(split(seq(length(x.sp@cluster)), x.sp@cluster), function(x){
    SnapATAC::colMeans(x.sp[x,], mat="bmat");
    })

# cluster using 1-cor as distance  
hc = hclust(as.dist(1 - cor(t(do.call(rbind, ensemble.ls)))), method="ward.D2");
plotViz(
    obj=x.sp,
    method="tsne", 
    main="10X Brain Cluster",
    point.color=x.sp@cluster, 
    point.size=1, 
    point.shape=19, 
    point.alpha=0.8, 
    text.add=TRUE,
    text.size=1.5,
    text.color="black",
    text.halo.add=TRUE,
    text.halo.color="white",
    text.halo.width=0.2,
    down.sample=10000,
    legend.add=FALSE
    );
plot(hc, hang=-1, xlab="");

在本示例中,cluster 20到25是興奮性神經(jīng)元細(xì)胞,cluster 19到5為抑制性神經(jīng)元細(xì)胞,而其余的為非神經(jīng)元細(xì)胞。


image.png

Step 11. Identify peaks

接下來(lái),我們將每個(gè)cluster群的細(xì)胞信息聚合起來(lái),創(chuàng)建一個(gè)用于peak calling和可視化的集成track。在該步驟中,將生成一個(gè).narrowPeak的文件,其中包含識(shí)別出的所有peak的信息,和一個(gè).bedGraph文件,可以用于可視化展示。為了獲得最robust的結(jié)果,我們不建議對(duì)細(xì)胞數(shù)目小于100的cluster執(zhí)行此步驟。

# 查看snaptools的安裝路徑
system("which snaptools")
/home/r3fang/anaconda2/bin/snaptools
# 查看macs2的安裝路徑
system("which macs2")
/home/r3fang/anaconda2/bin/macs2

# 調(diào)用macs2進(jìn)行peak callling
runMACS(
    obj=x.sp[which(x.sp@cluster==1),], 
    output.prefix="atac_v1_adult_brain_fresh_5k.1",
    path.to.snaptools="/home/r3fang/anaconda2/bin/snaptools",
    path.to.macs="/home/r3fang/anaconda2/bin/macs2",
    gsize="mm", 
    buffer.size=500, 
    num.cores=5,
    macs.options="--nomodel --shift 37 --ext 73 --qval 1e-2 -B --SPMR --call-summits",
    tmp.folder=tempdir()
    );

接下來(lái),我們將提供一個(gè)簡(jiǎn)短的腳本,用于為所有的cluster進(jìn)行批量操作此步驟。

# call peaks for all cluster with more than 100 cells
clusters.sel = names(table(x.sp@cluster))[which(table(x.sp@cluster) > 200)];

peaks.ls = mclapply(seq(clusters.sel), function(i){
    print(clusters.sel[i]);
    runMACS(
        obj=x.sp[which(x.sp@cluster==clusters.sel[i]),], 
        output.prefix=paste0("atac_v1_adult_brain_fresh_5k.", gsub(" ", "_", clusters.sel)[i]),
        path.to.snaptools="/home/r3fang/anaconda2/bin/snaptools",
        path.to.macs="/home/r3fang/anaconda2/bin/macs2",
        gsize="hs", # mm, hs, etc
        buffer.size=500, 
        num.cores=1,
        macs.options="--nomodel --shift 100 --ext 200 --qval 5e-2 -B --SPMR",
        tmp.folder=tempdir()
   );
 }, mc.cores=5);

# assuming all .narrowPeak files in the current folder are generated from the clusters
peaks.names = system("ls | grep narrowPeak", intern=TRUE);

peak.gr.ls = lapply(peaks.names, function(x){
    peak.df = read.table(x)
    GRanges(peak.df[,1], IRanges(peak.df[,2], peak.df[,3]))
  })

# 合并所有的peak信息
peak.gr = reduce(Reduce(c, peak.gr.ls));
peak.gr
## GRanges object with 242847 ranges and 0 metadata columns:
##           seqnames               ranges strand
##              <Rle>            <IRanges>  <Rle>
##       [1]     chr1   [3094889, 3095629]      *
##       [2]     chr1   [3113499, 3114060]      *
##       [3]     chr1   [3118103, 3118401]      *
##       [4]     chr1   [3119689, 3120845]      *
##       [5]     chr1   [3121534, 3121786]      *
##       ...      ...                  ...    ...
##  [242843]     chrY [90797373, 90798136]      *
##  [242844]     chrY [90804709, 90805456]      *
##  [242845]     chrY [90808580, 90808819]      *
##  [242846]     chrY [90808850, 90809131]      *
##  [242847]     chrY [90810817, 90811057]      *
##  -------

Step 12. Create a cell-by-peak matrix

接下來(lái),我們基于合并后的peak信息作為參考,使用原始的snap文件創(chuàng)建一個(gè) cell-by-peak的計(jì)數(shù)矩陣。

peaks.df = as.data.frame(peak.gr)[,1:3];
write.table(peaks.df,file = "peaks.combined.bed",append=FALSE,
        quote= FALSE,sep="\t", eol = "\n", na = "NA", dec = ".", 
        row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"),
        fileEncoding = "")
saveRDS(x.sp, file="atac_v1_adult_brain_fresh_5k.snap.rds");

我們使用snaptools創(chuàng)建cell-by-peak的計(jì)數(shù)矩陣,并將其添加到snap文件中,這一步可能需要一段時(shí)間。

snaptools snap-add-pmat \
    --snap-file atac_v1_adult_brain_fresh_5k.snap \
    --peak-file peaks.combined.bed

Step 13. Add cell-by-peak matrix

接下來(lái),我們將計(jì)算好的cell-by-peak計(jì)數(shù)矩陣添加到現(xiàn)有的snap對(duì)象中。

x.sp = readRDS("atac_v1_adult_brain_fresh_5k.snap.rds");
x.sp = addPmatToSnap(x.sp);
x.sp = makeBinary(x.sp, mat="pmat");
x.sp
## number of barcodes: 4100
## number of bins: 546206
## number of genes: 16
## number of peaks: 242847
## number of motifs: 0

Step 14. Identify differentially accessible regions

SnapATAC通過(guò)差異分析來(lái)識(shí)別出不同cluster群中的差異可及性區(qū)域( differentially accessible regions,DARs)。默認(rèn)情況下,它只尋找每個(gè)cluster群中的positive peaks(可以通過(guò)cluster.pos參數(shù)指定), 與一組陰性對(duì)照細(xì)胞相比。如果默認(rèn)的cluster.neg=NULL, findDAR函數(shù)將尋找最接近positive細(xì)胞的一組作為背景細(xì)胞。

DARs = findDAR(
    obj=x.sp,
    input.mat="pmat",
    cluster.pos=26,
    cluster.neg.method="knn",
    test.method="exactTest",
    bcv=0.1, #0.4 for human, 0.1 for mouse
    seed.use=10
  );
DARs$FDR = p.adjust(DARs$PValue, method="BH");
idy = which(DARs$FDR < 5e-2 & DARs$logFC > 0);
par(mfrow = c(1, 2));
plot(DARs$logCPM, DARs$logFC, 
    pch=19, cex=0.1, col="grey", 
    ylab="logFC", xlab="logCPM",
    main="Cluster 26"
  );

points(DARs$logCPM[idy], 
    DARs$logFC[idy], 
    pch=19, 
    cex=0.5, 
    col="red"
  );

abline(h = 0, lwd=1, lty=2);
covs = Matrix::rowSums(x.sp@pmat);
vals = Matrix::rowSums(x.sp@pmat[,idy]) / covs;
vals.zscore = (vals - mean(vals)) / sd(vals);
plotFeatureSingle(
    obj=x.sp,
    feature.value=vals.zscore,
    method="tsne", 
    main="Cluster 26",
    point.size=0.1, 
    point.shape=19, 
    down.sample=5000,
    quantiles=c(0.01, 0.99)
  );
image.png

接下來(lái),我們識(shí)別出每個(gè)cluster群中的DARs。對(duì)于缺乏揭示DARs的靜態(tài)能力(static power)的簇,特別是比較小的簇,我們根據(jù)peak的富集程度對(duì)其進(jìn)行排序,并使用top 2000個(gè)peak用于motif discovery的代表性峰。

idy.ls = lapply(levels(x.sp@cluster), function(cluster_i){
    DARs = findDAR(
        obj=x.sp,
        input.mat="pmat",
        cluster.pos=cluster_i,
        cluster.neg=NULL,
        cluster.neg.method="knn",
        bcv=0.1,
        test.method="exactTest",
        seed.use=10
        );
    DARs$FDR = p.adjust(DARs$PValue, method="BH");
    idy = which(DARs$FDR < 5e-2 & DARs$logFC > 0);
    if((x=length(idy)) < 2000L){
            PValues = DARs$PValue;
            PValues[DARs$logFC < 0] = 1;
            idy = order(PValues, decreasing=FALSE)[1:2000];
            rm(PValues); # free memory
    }
    idy
  })
names(idy.ls) = levels(x.sp@cluster);
par(mfrow = c(3, 3));
for(cluster_i in levels(x.sp@cluster)){
    print(cluster_i)
    idy = idy.ls[[cluster_i]];
    vals = Matrix::rowSums(x.sp@pmat[,idy]) / covs;
    vals.zscore = (vals - mean(vals)) / sd(vals);
    plotFeatureSingle(
        obj=x.sp,
        feature.value=vals.zscore,
        method="tsne", 
        main=cluster_i,
        point.size=0.1, 
        point.shape=19, 
        down.sample=5000,
        quantiles=c(0.01, 0.99)
        );
  }
image.png

Step 15. Motif analysis identifies master regulators

SnapATAC可以調(diào)用Homer來(lái)鑒定識(shí)別出的差異可及性區(qū)域(DARs)中富集的master regulators。運(yùn)行完runHomer函數(shù)后,會(huì)在./homer/C5文件夾中生成一個(gè)homer motif的報(bào)告knownResults.html。我們需要預(yù)先安裝好Homer程序。

system("which findMotifsGenome.pl");
/projects/ps-renlab/r3fang/public_html/softwares/homer/bin/findMotifsGenome.pl

motifs = runHomer(
    x.sp[,idy.ls[["5"]],"pmat"], 
    mat = "pmat",
    path.to.homer = "/projects/ps-renlab/r3fang/public_html/softwares/homer/bin/findMotifsGenome.pl",
    result.dir = "./homer/C5",
    num.cores=5,
    genome = 'mm10',
    motif.length = 10,
    scan.size = 300,
    optimize.count = 2,
    background = 'automatic',
    local.background = FALSE,
    only.known = TRUE,
    only.denovo = FALSE,
    fdr.num = 5,
    cache = 100,
    overwrite = TRUE,
    keep.minimal = FALSE
  );
image.png

SnapATAC還可以調(diào)用chromVAR(Schep等)來(lái)進(jìn)行motif的可變性分析。

library(chromVAR);
library(motifmatchr);
library(SummarizedExperiment);
library(BSgenome.Mmusculus.UCSC.mm10);

x.sp = makeBinary(x.sp, "pmat");
x.sp@mmat = runChromVAR(
    obj=x.sp,
    input.mat="pmat",
    genome=BSgenome.Mmusculus.UCSC.mm10,
    min.count=10,
    species="Homo sapiens"
  );

motif_i = "MA0497.1_MEF2C";
dat = data.frame(x=x.sp@metaData[,"cluster"], y=x.sp@mmat[,motif_i]);

p1 <- ggplot(dat, aes(x=x, y=y, fill=x)) + 
    theme_classic() +
    geom_violin() + 
    xlab("cluster") +
    ylab("motif enrichment") + 
    ggtitle(motif_i) +
    theme(
          plot.margin = margin(5,1,5,1, "cm"),
          axis.text.x = element_text(angle = 90, hjust = 1),
          axis.ticks.x=element_blank(),
          legend.position = "none"
   );

motif_i = "MA0660.1_MEF2B";
dat = data.frame(x=x.sp@metaData[,"cluster"], y=x.sp@mmat[,motif_i]);

p2 <- ggplot(dat, aes(x=x, y=y, fill=x)) + 
    theme_classic() +
    geom_violin() + 
    xlab("cluster") +
    ylab("motif enrichment") + 
    ggtitle(motif_i) +
    theme(
          plot.margin = margin(5,1,5,1, "cm"),
          axis.text.x = element_text(angle = 90, hjust = 1),
          axis.ticks.x=element_blank(),
          legend.position = "none"
   );

p1
p2
image.png

image.png

Step 16. GREAT analysis

SnapATAC還可以使用GREAT來(lái)識(shí)別每個(gè)細(xì)胞cluster中活躍的生物學(xué)通路。在本示例中,我們將首先識(shí)別小膠質(zhì)細(xì)胞(cluster 13)中的差異可及性區(qū)域DARs,并展示使用GREAT analysis識(shí)別出的top 6個(gè)GO term的富集情況。

## install R package rGREAT
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("rGREAT")
## or install the latest version
library(devtools)
install_github("jokergoo/rGREAT")

library(rGREAT);
DARs = findDAR(
    obj=x.sp,
    input.mat="pmat",
    cluster.pos=13,
    cluster.neg.method="knn",
    test.method="exactTest",
    bcv=0.1, #0.4 for human, 0.1 for mouse
    seed.use=10
  );

DARs$FDR = p.adjust(DARs$PValue, method="BH");
idy = which(DARs$FDR < 5e-2 & DARs$logFC > 0);

job = submitGreatJob(
    gr                    = x.sp@peak[idy],
    bg                    = NULL,
    species               = "mm10",
    includeCuratedRegDoms = TRUE,
    rule                  = "basalPlusExt",
    adv_upstream          = 5.0,
    adv_downstream        = 1.0,
    adv_span              = 1000.0,
    adv_twoDistance       = 1000.0,
    adv_oneDistance       = 1000.0,
    request_interval = 300,
    max_tries = 10,
    version = "default",
    base_url = "http://great.stanford.edu/public/cgi-bin"
  );

job
## Submit time: 2019-09-04 14:14:02
## Version: default
## Species: mm10
## Inputs: 25120 regions
## Background: wholeGenome
## Model: Basal plus extension
##   Proximal: 5 kb upstream, 1 kb downstream,
##   plus Distal: up to 1000 kb
## Include curated regulatory domains
## 
## Enrichment tables for following ontologies have been downloaded:
##   None

運(yùn)行完所提交的job后,我們可以提取GREAT分析的結(jié)果。第一個(gè)主要的分析結(jié)果是一個(gè)富集統(tǒng)計(jì)的信息表。默認(rèn)情況下,它將檢索來(lái)自三種GO(包括MF,BP,CC)terms的結(jié)果和pathway的注釋結(jié)果。所有的表中都包含所有terms 的統(tǒng)計(jì)信息,無(wú)論它們是否具有顯著性。并且,用戶(hù)可以通過(guò)自定義的cutoff 進(jìn)行篩選。

tb = getEnrichmentTables(job);
names(tb);
## [1] "GO Molecular Function" "GO Biological Process" "GO Cellular Component"

GBP = tb[["GO Biological Process"]];
head(GBP[order(GBP[,"Binom_Adjp_BH"]),1:5]);
##           ID                                      name Binom_Genome_Fraction
## 1 GO:0002376                     immune system process            0.12515840
## 2 GO:0002682       regulation of immune system process            0.09012561
## 3 GO:0009987                          cellular process            0.80870120
## 4 GO:0048518 positive regulation of biological process            0.43002240
## 5 GO:0050789          regulation of biological process            0.68873070
## 6 GO:0050794            regulation of cellular process            0.66837300
##   Binom_Expected Binom_Observed_Region_Hits
## 1       3095.918                       5592
## 2       2229.347                       4148
## 3      20004.030                      22241
## 4      10637.030                      13697
## 5      17036.440                      19871
## 6      16532.870                      19356

參考來(lái)源:https://gitee.com/booew/SnapATAC/blob/master/examples/10X_brain_5k/README.md

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容