国产一二三专区,无码三区久久久久,最新地址一区二区

教程來(lái)源：http://bioconductor.org/books/release/OSCA/data-infrastructure.html#background

用一張圖展示SingleCellExperiment的結(jié)構(gòu)：

image-20210413210641826.png

SingleCellExperiment對(duì)象中每一個(gè)數(shù)據(jù)代表一個(gè)分離的slot（來(lái)源于S4對(duì)象）。假如我們將SingleCellExperiment比作一艘貨船，那么slot可以理解為單個(gè)的裝載不同貨物的boxes，比如有的專(zhuān)門(mén)存放數(shù)值類(lèi)型的矩陣，另外一些則單獨(dú)存放數(shù)據(jù)框。

在本次學(xué)習(xí)中，我們討論可以獲得哪些slot，他們的特定格式，我們?cè)鯓优c他們進(jìn)行交互。

厲害的人可能早就發(fā)現(xiàn)了SingleCellExperiment與SummarizedExperiment對(duì)象是一樣。

1.存儲(chǔ)主要的實(shí)驗(yàn)數(shù)據(jù)

1.1 assay slot

如果只創(chuàng)建一個(gè)基本的SingleCellExperiment對(duì)象，我們只需要賦值assay 數(shù)據(jù)槽就可以了（上圖中的藍(lán)色框框）。這個(gè)slot包含了主要的數(shù)據(jù)如：counts 矩陣。我們來(lái)隨便生成一個(gè)具有三個(gè)細(xì)胞和10個(gè)基因的count矩陣進(jìn)行測(cè)試。

counts_matrix <- data.frame(cell_1 = rpois(10, 10), 
                    cell_2 = rpois(10, 10), 
                    cell_3 = rpois(10, 30))
rownames(counts_matrix) <- paste0("gene_", 1:10)
counts_matrix <- as.matrix(counts_matrix) # must be a matrix object!

# 生成的矩陣，rpois為隨機(jī)生成一個(gè)具有泊松分布特征的數(shù)據(jù)
counts_matrix
        cell_1 cell_2 cell_3
gene_1       9     11     33
gene_2      10      6     32
gene_3       9      8     28
gene_4      12      7     34
gene_5       8     13     29
gene_6      14     12     24
gene_7       7     12     26
gene_8       4     12     27
gene_9       9      9     26
gene_10      8      7     29

現(xiàn)在，我們可以開(kāi)始創(chuàng)建SingleCellExperiment對(duì)象了，并將數(shù)據(jù)命名：counts

sce <- SingleCellExperiment(assays = list(counts = counts_matrix))

我們可以直接在命令行輸入sce來(lái)查看初步的主要信息。

sce

class: SingleCellExperiment 
dim: 10 3 
metadata(0):
assays(1): counts
rownames(10): gene_1 gene_2 ... gene_9 gene_10
rowData names(0):
colnames(3): cell_1 cell_2 cell_3
colData names(0):
reducedDimNames(0):
altExpNames(0):

有兩種方法可以獲取counts值：

assay(sce, "counts") ，這是最常用的方法，第二個(gè)參數(shù)使用assay的name，就是剛剛我們命名的這個(gè)名字：counts
counts(sce) ，這個(gè)是上面方法的簡(jiǎn)寫(xiě)，但是旨在assay具有特殊名字的數(shù)據(jù)才有效"counts"。

counts(sce)
        cell_1 cell_2 cell_3
gene_1       9     11     33
gene_2      10      6     32
gene_3       9      8     28
gene_4      12      7     34
gene_5       8     13     29
gene_6      14     12     24
gene_7       7     12     26
gene_8       4     12     27
gene_9       9      9     26
gene_10      8      7     29

1.2 添加更多的assays

assays數(shù)據(jù)槽非常強(qiáng)大的原因是它可以存儲(chǔ)主要數(shù)據(jù)的不同格式。這在這個(gè)時(shí)候非常有用：我想保存原始count矩陣，還想保存標(biāo)準(zhǔn)化后的normalized 版本?，F(xiàn)在我們使用scater包來(lái)計(jì)算標(biāo)準(zhǔn)化并log轉(zhuǎn)換后的數(shù)據(jù)。

sce <- scater::logNormCounts(sce)

在做單細(xì)胞數(shù)據(jù)分析的時(shí)候，你可能已經(jīng)注意到了我們每次都是對(duì)同一個(gè)對(duì)象如sce進(jìn)行賦值，那為什么原有數(shù)據(jù)沒(méi)有被覆蓋掉呢？

# sce對(duì)象的assays變成嘞counts和logcounts
sce
class: SingleCellExperiment 
dim: 10 3 
metadata(0):
assays(2): counts logcounts
rownames(10): gene_1 gene_2 ... gene_9 gene_10
rowData names(0):
colnames(3): cell_1 cell_2 cell_3
colData names(1): sizeFactor
reducedDimNames(0):
altExpNames(0):

sce中此時(shí)多了一個(gè)assays，原始的counts并沒(méi)有被覆蓋掉。這也是為什么SingleCellExperiment對(duì)象特殊的地方，每次返回結(jié)果包含了原來(lái)的結(jié)果，新的結(jié)果是增加在對(duì)象中而不是替換。

與counts相似，我們也可以使用同樣的方法取標(biāo)化后的值

logcounts(sce)
assay(sce,'logcounts')

          cell_1   cell_2   cell_3
gene_1  4.126532 3.701210 3.987843
gene_2  4.456647 3.497187 4.135947
gene_3  3.855265 4.644657 4.312379
gene_4  3.855265 4.542172 3.568449
gene_5  3.697747 3.497187 4.270252
gene_6  4.641056 3.701210 4.135947
gene_7  4.126532 4.830075 4.270252
gene_8  3.319303 4.431846 3.987843
gene_9  4.725113 3.701210 4.270252
gene_10 3.855265 3.879924 4.182118

查看對(duì)象中包含的所有assay

assays(sce)

List of length 2
names(2): counts logcounts

上面的功能告訴我們，我們可以自動(dòng)添加assay到sce對(duì)象中，但是更多的時(shí)候是使用我們自己的計(jì)算方式，但是這個(gè)時(shí)候返回的并不是SingleCellExperiment對(duì)象，不能將結(jié)果自動(dòng)添加到assay中。這個(gè)時(shí)候想將新計(jì)算的結(jié)果添加進(jìn)去怎么辦呢？

使用以下方法

counts_100 <- counts(sce) + 100
assay(sce, "counts_100") <- counts_100 # assign a new entry to assays slot
assays(sce) # new assay has now been added.

List of length 3
names(3): counts logcounts counts_100

2.處理metadata

2.1 On the columns

為了注釋SingleCellExperiment對(duì)象，我們需要增加以下metadata來(lái)描述我們的主要數(shù)據(jù)的列，比如實(shí)驗(yàn)的樣本或者細(xì)胞類(lèi)型描述。這個(gè)數(shù)據(jù)就保存在colData數(shù)據(jù)槽中，通常是一個(gè)data.frame或者DataFrame，行為細(xì)胞，列為對(duì)應(yīng)的元數(shù)據(jù)如治療信息，批次信息。

現(xiàn)在，讓我們往sce中添加一些細(xì)胞信息在colData slot中。

cell_metadata <- data.frame(batch = c(1, 1, 2))
rownames(cell_metadata) <- paste0("cell_", 1:3)

cell_metadata
       batch
cell_1     1
cell_2     1
cell_3     2

可以使用兩種方式將細(xì)胞信息添加到sce對(duì)象中去

sce <- SingleCellExperiment(assays = list(counts = counts_matrix),
    colData = cell_metadata)

sce
class: SingleCellExperiment 
dim: 10 3 
metadata(0):
assays(1): counts
rownames(10): gene_1 gene_2 ... gene_9 gene_10
rowData names(0):
colnames(3): cell_1 cell_2 cell_3
colData names(1): batch
reducedDimNames(0):
altExpNames(0):

提取colData信息

colData(sce)

DataFrame with 3 rows and 1 column
           batch
       <numeric>
cell_1         1
cell_2         1
cell_3         2

# 更簡(jiǎn)單的取值方式
sce$batch

scater 的addPerCellQC()可以自動(dòng)計(jì)算一些細(xì)胞指標(biāo)并添加到colData數(shù)據(jù)槽中

sce <- scater::addPerCellQC(sce)
colData(sce)

DataFrame with 3 rows and 8 columns
           batch       sum  detected percent_top_50 percent_top_100 percent_top_200
       <numeric> <integer> <integer>      <numeric>       <numeric>       <numeric>
cell_1         1       110        10            100             100             100
cell_2         1        96        10            100             100             100
cell_3         2       288        10            100             100             100
       percent_top_500     total
             <numeric> <integer>
cell_1             100       110
cell_2             100        96
cell_3             100       288

手動(dòng)添加更多colData信息

sce$more_stuff <- runif(ncol(sce))
colnames(colData(sce))

[1] "batch"           "sum"             "detected"        "percent_top_50"  "percent_top_100"
[6] "percent_top_200" "percent_top_500" "total"           "more_stuff"

使用colData取子集

sce[, sce$batch == 1]

class: SingleCellExperiment 
dim: 10 2 
metadata(0):
assays(1): counts
rownames(10): gene_1 gene_2 ... gene_9 gene_10
rowData names(0):
colnames(2): cell_1 cell_2
colData names(9): batch sum ... total more_stuff
reducedDimNames(0):
altExpNames(0):

2.2 On the rows

存儲(chǔ)feature水平的注釋為rowData數(shù)據(jù)槽，rowData是一個(gè)DataFrame，行對(duì)應(yīng)基因，保存的信息如：轉(zhuǎn)錄本長(zhǎng)度，基因名。還有一個(gè)rowRanges數(shù)據(jù)槽保存GRanges或GRangesList對(duì)象的基因組坐標(biāo)。rowRanges保存基因的染色體，起始位置，終止位置。

這兩個(gè)數(shù)據(jù)槽可以使用rowRanges()和rowData()獲取。

在此處，sce中的rowRanges數(shù)據(jù)槽沒(méi)有保存信息，運(yùn)行會(huì)返回一個(gè)空值。

rowRanges(sce) # empty

在rowData中添加信息

sce <- scater::addPerFeatureQC(sce)
rowData(sce)

DataFrame with 10 rows and 2 columns
             mean  detected
        <numeric> <numeric>
gene_1    14.6667       100
gene_2    16.3333       100
gene_3    18.6667       100
gene_4    13.6667       100
gene_5    15.3333       100
gene_6    17.3333       100
gene_7    19.6667       100
gene_8    14.6667       100
gene_9    18.6667       100
gene_10   15.6667       100

與colData相似，rowData在創(chuàng)建SingleCellExperiment對(duì)象的時(shí)候就已經(jīng)初始化保存在對(duì)象中。具體還要取決于物種，比對(duì)和定量使用的注釋信息等。

如，使用Ensembl ID，我們可能會(huì)使用AnnotationHub 資源獲得Ensembl注釋對(duì)象并提取基因body信息保存在我們的SingleCellExperiment對(duì)象的rowRanges中。

library(AnnotationHub)
edb <- AnnotationHub()[["AH73881"]] # Human, Ensembl v97.
genes(edb)[,2]

如何在基因/feature水平提取子集？類(lèi)似于行操作。

sce[c("gene_1", "gene_4"), ]
sce[c(1, 4), ] # same as above in this case

## class: SingleCellExperiment 
## dim: 2 3 
## metadata(0):
## assays(1): counts
## rownames(2): gene_1 gene_4
## rowData names(2): mean detected
## colnames(3): cell_1 cell_2 cell_3
## colData names(5): batch sum detected total more_stuff
## reducedDimNames(0):
## altExpNames(0):

2.3 Other metadata

還有一些數(shù)據(jù)信息不適合存儲(chǔ)在colData或者rowData里面，那么可以保存在metadata數(shù)據(jù)槽中。

它可以是任何你想放的信息。

比如，我們有一些高變基因像保存在sce的slot中，我們就可以加入到metadata中。

my_genes <- c("gene_1", "gene_5")
metadata(sce) <- list(favorite_genes = my_genes)
metadata(sce)

## $favorite_genes
## [1] "gene_1" "gene_5"

我們還可以簡(jiǎn)單的通過(guò)$添加更多信息

your_genes <- c("gene_4", "gene_8")
metadata(sce)$your_genes <- your_genes
metadata(sce)

## $favorite_genes
## [1] "gene_1" "gene_5"
## 
## $your_genes
## [1] "gene_4" "gene_8"

3.單細(xì)胞特異的fields

總結(jié)前面的，我們了解了SingleCellExperiment中的assays，colData，rowData/rowRanges以及metadata數(shù)據(jù)槽。

這些slots實(shí)際上繼承自它的parent：SummarizedExperiment。

那么SingleCellExperiment對(duì)象還有一些自己的特有的數(shù)據(jù)槽（slots）。

3.1 Dimensionality reduction results

reducedDims數(shù)據(jù)槽保存通過(guò)PCA或t-SNE降維后的數(shù)據(jù)，行對(duì)應(yīng)primary data數(shù)據(jù)的列即cells，列代表維度。由于這個(gè)數(shù)據(jù)槽以list形式保存數(shù)據(jù)，對(duì)同一個(gè)數(shù)據(jù)集，我們可以保存多個(gè)PCA/t-SNE/etc。

下面，我們使用來(lái)在scater包的runPCA()計(jì)算PCA

sce <- scater::logNormCounts(sce)
sce <- scater::runPCA(sce)
reducedDim(sce, "PCA")

##               PC1        PC2
## cell_1 -0.6690868 -0.2484418
## cell_2  0.7974507 -0.1451026
## cell_3 -0.1283639  0.3935444
## attr(,"varExplained")
## [1] 0.5500410 0.1188276
## attr(,"percentVar")
## [1] 82.23453 17.76547
## attr(,"rotation")
##                 PC1         PC2
## gene_3   0.59064322  0.12776255
## gene_9  -0.52370773 -0.15070567
## gene_4   0.37398222 -0.44373578
## gene_5  -0.34759518 -0.23398435
## gene_7  -0.17223272  0.53514989
## gene_10  0.19898391  0.44548482
## gene_8  -0.15868758  0.34292404
## gene_2   0.08684966 -0.21813365
## gene_1  -0.11744576 -0.01881143
## gene_6  -0.02022031 -0.24277404

同樣，使用runTSNE()計(jì)算t-SNE。

sce <- scater::runTSNE(sce, perplexity = 0.1)
reducedDim(sce, "TSNE")

##             [,1]        [,2]
## cell_1  5694.636   -88.68314
## cell_2 -2769.304  4975.60635
## cell_3 -2925.333 -4886.92321

我們可以使用reducedDims(sce)查看sce的降維數(shù)據(jù)列表，注意與reducedDim()的區(qū)別。

reducedDims(sce)

## List of length 2
## names(2): PCA TSNE

同樣，可以手動(dòng)添加對(duì)象到reducedDims()數(shù)據(jù)槽中。

使用uwot包的umap()函數(shù)，生成UMAP坐標(biāo)保存到reducedDims中去。

u <- uwot::umap(t(logcounts(sce)), n_neighbors = 2)
reducedDim(sce, "UMAP_uwot") <- u
reducedDims(sce) # Now stored in the object.

## List of length 3
## names(3): PCA TSNE UMAP_uwot

reducedDim(sce, "UMAP_uwot") 

##               [,1]        [,2]
## cell_1  0.69215895  0.07642523
## cell_2 -0.59922171  0.26388137
## cell_3 -0.09293724 -0.34030660
## attr(,"scaled:center")
## [1]  -0.6138766 -11.2867896

3.2 可選的Experiments

這個(gè)地方可以保存如 spike-in等的信息。

如果我們有可選的feature信息，我們可以保存在 SingleCellExperiment中。

spike_counts <- cbind(cell_1 = rpois(5, 10), 
    cell_2 = rpois(5, 10), 
    cell_3 = rpois(5, 30))
rownames(spike_counts) <- paste0("spike_", 1:5)
spike_se <- SummarizedExperiment(list(counts=spike_counts))
spike_se

## class: SummarizedExperiment 
## dim: 5 3 
## metadata(0):
## assays(1): counts
## rownames(5): spike_1 spike_2 spike_3 spike_4 spike_5
## rowData names(0):
## colnames(3): cell_1 cell_2 cell_3
## colData names(0):

然后使用altExp()保存在sce對(duì)象中

altExp(sce, "spike") <- spike_se
altExps(sce)

## List of length 1
## names(1): spike

提取

altExp(sce, "spike") <- spike_se
altExps(sce)

## List of length 1
## names(1): spike

取子集

sub <- sce[,1:2] # retain only two samples.
altExp(sub, "spike")

## class: SummarizedExperiment 
## dim: 5 2 
## metadata(0):
## assays(1): counts
## rownames(5): spike_1 spike_2 spike_3 spike_4 spike_5
## rowData names(0):
## colnames(2): cell_1 cell_2
## colData names(0):

所有的SummarizedExperiment對(duì)象都可以保存在Experiments中，甚至是SingleCellExperiment。

3.3 Size factors

sizeFactors()返回每一個(gè)細(xì)胞的標(biāo)化因子組成的數(shù)值型向量，用于后續(xù)的標(biāo)準(zhǔn)化。

一般是自動(dòng)生成。

如，使用scran包生成。

sce <- scran::computeSumFactors(sce)
sizeFactors(sce)

## [1] 0.5856574 0.6095618 1.8047809

手動(dòng)添加

sizeFactors(sce) <- scater::librarySizeFactors(sce)
sizeFactors(sce)

##    cell_1    cell_2    cell_3 
## 0.5856574 0.6095618 1.8047809

3.4 Column labels

colLabels()函數(shù)返回每個(gè)細(xì)胞標(biāo)簽的因子或向量，通常與非監(jiān)督聚類(lèi)的分組信息相關(guān)。

colLabels(sce) <- LETTERS[1:3]
colLabels(sce)

## [1] "A" "B" "C"

4.總結(jié)

SingleCellExperiment對(duì)象為單細(xì)胞相關(guān)的包提供了一個(gè)基石，生于一個(gè)包，可以為許多包的輸入。

assays
colData
rowData
metadata
reducedDims
altExp
sizeFactors
colLabels()

后續(xù)，我們將使用SingleCellExperiment作為后續(xù)基本數(shù)據(jù)結(jié)構(gòu)。

至此，再回頭看看開(kāi)始的那張圖吧！

image-20210413210641826.png

突然有了種，能隨心所欲的感覺(jué)！

書(shū)還是看少了?。?/p>

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

OSCA教程1 | SingleCellExperiment對(duì)象

OSCA教程1 | SingleCellExperiment對(duì)象

1.存儲(chǔ)主要的實(shí)驗(yàn)數(shù)據(jù)

1.1 assay slot

1.2 添加更多的assays

2.處理metadata

2.1 On the columns

2.2 On the rows

2.3 Other metadata

3.單細(xì)胞特異的fields

3.1 Dimensionality reduction results

3.2 可選的Experiments

3.3 Size factors

3.4 Column labels

4.總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

OSCA教程1 | SingleCellExperiment對(duì)象

1.存儲(chǔ)主要的實(shí)驗(yàn)數(shù)據(jù)

1.1 assay slot

1.2 添加更多的assays

2.處理metadata

2.1 On the columns

2.2 On the rows

2.3 Other metadata

3.單細(xì)胞特異的fields

3.1 Dimensionality reduction results

3.2 可選的Experiments

3.3 Size factors

3.4 Column labels

4.總結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av