使用SnapATAC分析單細(xì)胞ATAC-seq數(shù)據(jù)(四):Integrative Analysis of 10X and snATAC

簡(jiǎn)介

在本教程中,我們將對(duì)來(lái)自10X和snATAC-seq技術(shù)產(chǎn)生的成年小鼠大腦的單細(xì)胞ATAC-seq測(cè)序數(shù)據(jù)進(jìn)行整合分析。該示例的所有數(shù)據(jù)可以從以下鏈接進(jìn)行下載:http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X_snATAC/

image.png

分析流程

  • Step 0. Download data
  • Step 1. Create snap object
  • Step 2. Select barcode
  • Step 3. Add cell-by-bin matrix
  • Step 4. Combine snap objects
  • Step 5. Filter bins
  • Step 6. Dimensionality reduction
  • Step 7. Determine significant components
  • Step 8. Remove batch effect
  • Step 9. Graph-based cluster
  • Step 10. Visualization

Step 0. Download data

# 下載所需的數(shù)據(jù)集
$ wget http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X_snATAC/CEMBA180305_2B.snap
$ wget http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X_snATAC/CEMBA180305_2B.barcode.txt
$ wget http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X_snATAC/atac_v1_adult_brain_fresh_5k.snap  
$ wget http://renlab.sdsc.edu/r3fang/share/github/Mouse_Brain_10X_snATAC/atac_v1_adult_brain_fresh_5k.barcode.txt

Step 1. Create snap object

首先,我們將所用的兩個(gè)數(shù)據(jù)集讀取到snap對(duì)象列表中。

# 加載SnapATAC包
> library(SnapATAC);

> file.list = c("CEMBA180305_2B.snap", "atac_v1_adult_brain_fresh_5k.snap");
> sample.list = c("snATAC", "10X");

# 讀取snap文件
> x.sp.ls = lapply(seq(file.list), function(i){
    x.sp = createSnap(file=file.list[i], sample=sample.list[i]);
    x.sp
  })
> names(x.sp.ls) = sample.list;

# 查看snap文件信息
> x.sp.ls
## $snATAC
## number of barcodes: 15136
## number of bins: 0
## number of genes: 0
## number of peaks: 0
## number of motifs: 0
## 
## $`10X`
## number of barcodes: 20000
## number of bins: 0
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

Step 2. Select barcode

接下來(lái),我們將讀取這兩個(gè)數(shù)據(jù)集的barcode信息,并選擇高質(zhì)量的barcodes。

> barcode.file.list = c("CEMBA180305_2B.barcode.txt", "atac_v1_adult_brain_fresh_5k.barcode.txt");

# 讀取barcode信息
> barcode.list = lapply(barcode.file.list, function(file){
    read.table(file)[,1];
  })

> x.sp.list = lapply(seq(x.sp.ls), function(i){
    x.sp = x.sp.ls[[i]];
    x.sp  = x.sp[x.sp@barcode %in% barcode.list[[i]],];
  })
> names(x.sp.list) = sample.list;
> x.sp.list
## $snATAC
## number of barcodes: 9646
## number of bins: 0
## number of genes: 0
## number of peaks: 0
## number of motifs: 0
## 
## $`10X`
## number of barcodes: 4100
## number of bins: 0
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

Step 3. Add cell-by-bin matrix

# 使用addBmatToSnap函數(shù)計(jì)算cell-by-bin計(jì)數(shù)矩陣并添加到snap對(duì)象中
> x.sp.list = lapply(seq(x.sp.list), function(i){
    x.sp = addBmatToSnap(x.sp.list[[i]], bin.size=5000);
    x.sp
  })

> x.sp.list
## $snATAC
## number of barcodes: 9646
## number of bins: 545118
## number of genes: 0
## number of peaks: 0
## number of motifs: 0
## 
## $`10X`
## number of barcodes: 4100
## number of bins: 546206
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

可以看到,這兩個(gè)snap對(duì)象中含有不同數(shù)目的bins,這是因?yàn)檫@兩個(gè)數(shù)據(jù)集使用的參考基因組有細(xì)微的差異。

Step 4. Combine snap objects

接下來(lái),我們將這個(gè)數(shù)據(jù)集進(jìn)行合并。
To combine these two snap objects, common bins are selected.

# 選擇兩個(gè)數(shù)據(jù)集共有的bins
> bin.shared = Reduce(intersect, lapply(x.sp.list, function(x.sp) x.sp@feature$name));

> x.sp.list <- lapply(x.sp.list, function(x.sp){
    idy = match(bin.shared, x.sp@feature$name);
    x.sp[,idy, mat="bmat"];
  })
# 合并兩個(gè)數(shù)據(jù)集
> x.sp = Reduce(snapRbind, x.sp.list);

> rm(x.sp.list); # free memory
> gc();
> table(x.sp@sample);
##   10X snATAC
##  4100   9646

Step 5. Binarize matrix

# 使用makeBinary函數(shù)將計(jì)數(shù)矩陣轉(zhuǎn)換為二進(jìn)制矩陣
> x.sp = makeBinary(x.sp, mat="bmat");

Step 6. Filter bins

首先,我們將與ENCODE中blacklist區(qū)域重疊的bins進(jìn)行過(guò)濾,以防止?jié)撛诘腶rtifacts。

> system("wget http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/mm10-mouse/mm10.blacklist.bed.gz");

> library(GenomicRanges);
> black_list = read.table("mm10.blacklist.bed.gz");
> black_list.gr = GRanges(
    black_list[,1], 
    IRanges(black_list[,2], black_list[,3])
  );

> idy = queryHits(findOverlaps(x.sp@feature, black_list.gr));
> if(length(idy) > 0){x.sp = x.sp[,-idy, mat="bmat"]};
> x.sp
## number of barcodes: 13746
## number of bins: 545015
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

接下來(lái),我們將過(guò)濾掉那些不需要的染色體信息。

> chr.exclude = seqlevels(x.sp@feature)[grep("random|chrM", seqlevels(x.sp@feature))];

> idy = grep(paste(chr.exclude, collapse="|"), x.sp@feature);
> if(length(idy) > 0){x.sp = x.sp[,-idy, mat="bmat"]};
> x.sp
## number of barcodes: 13746
## number of bins: 545011
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

第三,bins的覆蓋率大致是服從對(duì)數(shù)正態(tài)分布的。我們將與不變特征(如管家基因的啟動(dòng)子)重疊的前5%的bins進(jìn)行刪除 。

> bin.cov = log10(Matrix::colSums(x.sp@bmat)+1);
> bin.cutoff = quantile(bin.cov[bin.cov > 0], 0.95);
> idy = which(bin.cov <= bin.cutoff & bin.cov > 0);
> x.sp = x.sp[, idy, mat="bmat"];
> x.sp
## number of barcodes: 13746
## number of bins: 479127
## number of genes: 0
## number of peaks: 0
## number of motifs: 0

Step 7. Reduce dimensionality

我們使用diffusion maps的方法來(lái)計(jì)算landmark diffusion maps進(jìn)行數(shù)據(jù)降維。首先,我們隨機(jī)選擇出10,000個(gè)細(xì)胞作為landmarks,然后將剩余的query細(xì)胞映射到diffusion maps embedding中。

> row.covs = log10(Matrix::rowSums(x.sp@bmat)+1);
> row.covs.dens = density(
    x = row.covs, 
    bw = 'nrd', adjust = 1
  );

> sampling_prob = 1 / (approx(x = row.covs.dens$x, y = row.covs.dens$y, xout = row.covs)$y + .Machine$double.eps); 
> set.seed(1);
> idx.landmark.ds = sort(sample(x = seq(nrow(x.sp)), size = 10000, prob = sampling_prob));

> x.landmark.sp = x.sp[idx.landmark.ds,];
> x.query.sp = x.sp[-idx.landmark.ds,];

> x.landmark.sp = runDiffusionMaps(
    obj= x.landmark.sp,
    input.mat="bmat", 
    num.eigs=50
  );
> x.query.sp = runDiffusionMapsExtension(
    obj1=x.landmark.sp, 
    obj2=x.query.sp,
    input.mat="bmat"
  );

> x.landmark.sp@metaData$landmark = 1;
> x.query.sp@metaData$landmark = 0;

> x.sp = snapRbind(x.landmark.sp, x.query.sp);
## combine landmarks and query cells;
> x.sp = x.sp[order(x.sp@sample),]; # IMPORTANT
> rm(x.landmark.sp, x.query.sp); # free memory

Step 8. Determine significant components

> plotDimReductPW(
    obj=x.sp, 
    eigs.dims=1:50,
    point.size=0.3,
    point.color="grey",
    point.shape=19,
    point.alpha=0.6,
    down.sample=5000,
    pdf.file.name=NULL, 
    pdf.height=7, 
    pdf.width=7
  );
image.png

Step 9. Remove batch effect

> library(harmony);
# 使用runHarmony函數(shù)進(jìn)行批次校正
> x.after.sp = runHarmony(
    obj=x.sp, 
    eigs.dims=1:22, 
    meta_data=x.sp@sample # sample index
  );

Step 10. Graph-based cluster

> x.after.sp = runKNN(
    obj= x.after.sp,
    eigs.dim=1:22,
    k=15
  );

> x.after.sp = runCluster(
     obj=x.after.sp,
     tmp.folder=tempdir(),
     louvain.lib="R-igraph",
     path.to.snaptools=NULL,
     seed.use=10
  );

> x.after.sp@metaData$cluster = x.after.sp@cluster;

Step 11. Visualization

> x.sp = runViz(
    obj=x.sp, 
    tmp.folder=tempdir(),
    dims=2,
    eigs.dims=1:22, 
    method="Rtsne",
    seed.use=10
  );
> x.after.sp = runViz(
    obj=x.after.sp, 
    tmp.folder=tempdir(),
    dims=2,
    eigs.dims=1:22, 
    method="Rtsne",
    seed.use=10
  );
> par(mfrow = c(2, 3));
> plotViz(
    obj=x.sp,
    method="tsne", 
    main="Before Harmony",
    point.color=x.sp@sample, 
    point.size=0.1, 
    text.add= FALSE,
    down.sample=10000,
    legend.add=TRUE
  );
> plotViz(
    obj=x.after.sp,
    method="tsne", 
    main="After Harmony",
    point.color=x.sp@sample, 
    point.size=0.1, 
    text.add=FALSE,
    down.sample=10000,
    legend.add=TRUE
  );
> plotViz(
    obj=x.after.sp,
    method="tsne", 
    main="Cluster",
    point.color=x.after.sp@cluster, 
    point.size=0.1, 
    text.add=TRUE,
    text.size=1,
    text.color="black",
    text.halo.add=TRUE,
    text.halo.color="white",
    text.halo.width=0.2,
    down.sample=10000,
    legend.add=FALSE
  );
image.png

參考來(lái)源:https://gitee.com/booew/SnapATAC/blob/master/examples/10X_snATAC/README.md

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

友情鏈接更多精彩內(nèi)容