人妻丝袜美腿中文字幕,人妻一二区视频,岛国福利在线

前言

與傳統(tǒng)的單細(xì)胞分析工具不同，muscat更加關(guān)注于揭示樣本間的變異性，從而得出更具普遍性的結(jié)論。在上一期的推文中muscat：專注于多樣本多分組的單細(xì)胞差異分析，我們介紹了muscat的功能框架。那么從本期推文開始，我們將通過代碼實(shí)操的方式來介紹muscat的使用技巧。那么，今天讓我們一起來學(xué)習(xí)一下如何使用muscat模擬復(fù)雜的 scRNA-seq 數(shù)據(jù)以及評估不同的差異表達(dá)分析方法的性能。

代碼流程

0. 加載R包

library(cowplot)
library(dplyr)
library(reshape2)
#BiocManager::install("muscat")
library(muscat)
library(purrr)
library(scater)
library(SingleCellExperiment)

1. prepSim：準(zhǔn)備模擬數(shù)據(jù)

數(shù)據(jù)來源：GSE96583；包含8名狼瘡患者在IFN-β治療前后獲得的scRNA-seq PBCM數(shù)據(jù)

#estimate simulation parameters
data(example_sce)
table(example_sce$group_id)
# ctrl stim 
# 750  806 
ref <- prepSim(example_sce, verbose = FALSE)
# only samples from `ctrl` group are retained
table(ref$sample_id)
# ctrl101 ctrl107 
# 200     200 
# 用法prepSim(x, min_count = 1, min_cells = 10, min_genes = 100, 
#           min_size = 100, group_keep = NULL, verbose = TRUE)
# 參數(shù)x是一個(gè)SingleCellExperiment。
# min_count，min_cells用于過濾基因；只保留在>=min_cells中具有>min_count計(jì)數(shù)的基因。
# min_genes用于過濾細(xì)胞；只保留在>=min_genes中具有>0計(jì)數(shù)的細(xì)胞。
# min_size用于過濾亞群-樣本組合；只保留具有>=min_size個(gè)細(xì)胞的實(shí)例。指定min_size=NULL跳過此步驟。
# group_keep字符字符串；默認(rèn)值NULL保留來自levels(x$group_id)[1]的樣本；

prepSim通過以下步驟為muscat的simData函數(shù)準(zhǔn)備輸入SCE進(jìn)行模擬：

基本過濾基因和細(xì)胞
(可選)過濾亞群-樣本實(shí)例
分別估計(jì)細(xì)胞(庫大小)和基因參數(shù)(離散度和樣本特定均值)。

# cell parameters: library sizes
sub <- assay(example_sce[rownames(ref), colnames(ref)])
all.equal(exp(ref$offset), as.numeric(colSums(sub)))
## [1] "names for target but not for current"
## [2] "Mean relative difference: 0.4099568"
#模擬數(shù)據(jù)與原始數(shù)據(jù)平均相對差異為0.4099568

平均相對差異的值范圍是0到1（或0%到100%）。值越接近0，表示兩組數(shù)據(jù)越相似；值越接近1，表示兩組數(shù)據(jù)越不相似。

# gene parameters: dispersions & sample-specific means
head(rowData(ref))
# DataFrame with 6 rows and 4 columns
# ENSEMBL      SYMBOL                           beta      disp
# <character> <character>                    <DataFrame> <numeric>
# ISG15    ENSG00000187608       ISG15 -7.84574:-0.24711310:-1.039480 4.6360532
# AURKAIP1 ENSG00000175756    AURKAIP1 -7.84859: 0.00768812:-1.171896 0.1784345
# MRPL20   ENSG00000242485      MRPL20 -8.31434:-0.58684477:-0.304827 0.6435324
# SSU72    ENSG00000160075       SSU72 -8.05160:-0.17119703:-0.793222 0.0363892
# RER1     ENSG00000157916        RER1 -7.75327: 0.10731331:-1.261821 0.5046166
# RPL22    ENSG00000116251       RPL22 -8.03553:-0.03357193: 0.143506 0.2023632

2. simData: Simulating complex designs

# simulated 10% EE genes
sim <- simData(ref, p_dd = diag(6)[1, ],
               nk = 3, ns = 3, nc = 2e3,
               ng = 1e3, force = TRUE)
讓我們看一下muscat包設(shè)定的差異模式
Non-differential ：EE、EP
Differential ：DE、DP、DM、DB
according to a probability vector p_dd =(pEE,pEP,pDE,pDP,pDM,pDB)
nk = 3：定義了3個(gè)子群。
ns = 3：定義了3個(gè)樣本。
nc = 2e3：定義了2000個(gè)細(xì)胞。
ng = 1e3：定義了1000個(gè)基因。
table(sim$sample_id, sim$cluster_id)
#            cluster1 cluster2 cluster3
# sample1.A      123      110       91
# sample2.A      116      102      111
# sample3.A      112      114      107
# sample1.B      120      101      129
# sample2.B      109      110      128
# sample3.B       99       97      121
metadata(sim)$ref_sids
#             A         B        
# sample1 "ctrl101" "ctrl107"
# sample2 "ctrl101" "ctrl101"
# sample3 "ctrl101" "ctrl101"

# simulated paired design
sim <- simData(ref, paired = TRUE, 
               nk = 3, ns = 3, nc = 2e3,
               ng = 1e3, force = TRUE)
#使用了paired = TRUE參數(shù)，表示模擬的是配對設(shè)計(jì)。
# same set of reference samples for both groups
ref_sids <- metadata(sim)$ref_sids
#             A         B        
# sample1 "ctrl107" "ctrl107"
# sample2 "ctrl101" "ctrl101"
# sample3 "ctrl107" "ctrl107"
all(ref_sids[, 1] == ref_sids[, 2])
## [1] TRUE
#表示兩列完全相同，即兩組使用了相同的參考樣本。

在這里，我們介紹3個(gè)參數(shù)

p_dd: Simulating differential distributions: 參數(shù)p_dd指定要為每個(gè)DD類別模擬的單元格的比例。它的值應(yīng)該在[0,1]中，和為1。

# simulate genes from all DD categories
sim <- simData(ref, p_dd = c(0.5, rep(0.1, 5)),
               nc = 2e3, ng = 1e3, force = TRUE)
gi <- metadata(sim)$gene_info
table(gi$category)
# ee  ep  de  dp  dm  db 
# 976 202 228 202 188 204

image.png

rel_lfc: Simulating cluster-specific state changes

rel_lfc=c(1,1,1)所有子種群的變化幅度相等
rel_lfc=c(1,1,3) 單個(gè)集群的變化更大
rel_lfc=c(0,1,2) 單個(gè)集群的特定FC因素沒有變化

image.png

p_type: Simulating type features

差異狀態(tài)(DS)分析用于測試不同實(shí)驗(yàn)條件下亞群特異性表達(dá)變化:

使用穩(wěn)定的分子特征(即類型特征)將細(xì)胞分組為有意義的亞群;
對狀態(tài)特征進(jìn)行統(tǒng)計(jì)測試，這些狀態(tài)特征是更短暫的表達(dá)，可能會在例如治療或疾病期間的表達(dá)發(fā)生變化。

image.png

引入每個(gè)子種群的類型特征的比例通過參數(shù)p_type指定，p_type值的增加導(dǎo)致按簇ID上色時(shí)細(xì)胞分離越來越多，但狀態(tài)變化的缺乏導(dǎo)致按group ID上色時(shí)細(xì)胞混合均勻。

3. Simulation a hierarchical cluster structure

simData包含三個(gè)參數(shù)，控制亞群的相似和差異:

p_type determines the percentage of type genes exclusive to each cluster
phylo_tree represents a phylogenetic tree specifying of clusters relate to one another
phylo_pars controls how branch distances are to be interpreted

3.1 p_type: Introducing type features

# simulate 5% of type genes; one group only
sim <- simData(ref, p_type = 0.1, 
    nc = 2e3, ng = 1e3, force = TRUE,
    probs = list(NULL, NULL, c(1, 0)))
# do log-library size normalization
sim <- logNormCounts(sim)

image.png

3.2 phylo_tree: Introducing a cluster phylogeny.

上面描述的場景可以說是不太現(xiàn)實(shí)的，在生物學(xué)環(huán)境中，亞種群之間并不存在特定的基因子集差異，但可能共享一些決定其生物學(xué)作用的基因。也就是說，集合類型特征對每個(gè)給定的子種群來說都不是排他性的，并且一些子種群之間比其他子種群更相似。為了引入更實(shí)際的亞種群結(jié)構(gòu)，可以為simData提供一個(gè)系統(tǒng)發(fā)育樹phylo_tree，它指定集群之間的關(guān)系和距離。

# specify cluster phylogeny 
tree <- "(('cluster1':0.4,'cluster2':0.4):0.4,('cluster3':
    0.5,('cluster4':0.2,'cluster5':0.2,'cluster6':0.2):0.4):0.4);"
# visualize cluster tree
library(phylogram)
plot(read.dendrogram(text = tree))

image.png

# simulate 5% of type genes; one group only
sim <- simData(ref, 
    phylo_tree = tree, phylo_pars = c(0.1, 1),
    nc = 800, ng = 1e3, dd = FALSE, force = TRUE)
# do log-library size normalization
sim <- logNormCounts(sim)
# extract gene metadata & number of clusters
rd <- rowData(sim)
nk <- nlevels(sim$cluster_id)
# filter for type & shared genes with high expression mean
is_type <- rd$class != "state"
is_high <- rowMeans(assay(sim, "logcounts")) > 1
gs <- rownames(sim)[is_type & is_high]
# sample 100 cells per cluster for plotting
cs <- lapply(split(seq_len(ncol(sim)), sim$cluster_id), sample, 50)
plotHeatmap(sim[, unlist(cs)], features = gs, 
    center = TRUE, show_rownames = FALSE,
    colour_columns_by = "cluster_id")

image.png

4. 質(zhì)量控制

但模擬數(shù)據(jù)的質(zhì)量因參考數(shù)據(jù)集而異，并可能受到過于極端的模擬參數(shù)的影響。因此，作者建議生成countsimQC報(bào)告(Soneson and Robinson 2018)查看數(shù)據(jù)質(zhì)量。

#BiocManager::install("countsimQC")
library(countsimQC)
library(DESeq2)
# simulate data
sim <- simData(ref, 
               ng = nrow(ref), 
               nc = ncol(ref),
               dd = FALSE)
# construct 'DESeqDataSet's for both, 
# simulated and reference dataset
dds_sim <- DESeqDataSetFromMatrix(
  countData = counts(sim),
  colData = colData(sim),
  design = ~ sample_id)
dds_ref <- DESeqDataSetFromMatrix(
  countData = counts(ref),
  colData = colData(ref),
  design = ~ sample_id)
dds_list <- list(sim = dds_sim, data = dds_ref)
# generate 'countsimQC' report
countsimQCReport(
  ddsList = dds_list,
  outputFile = "QC.html",
  outputDir = "./",
  outputFormat = "html_document",
  maxNForCorr = 200, 
  maxNForDisp = 500)
完整的html報(bào)告將會輸出在你指定的文件下。

5. 各種方法性能評估

iCOBRA包中提供了各種用于計(jì)算和可視化評估排名和二元分類(分配)方法的性能指標(biāo)的功能。作者首先定義了一個(gè)包裝器，它將傳遞給pbDS的方法作為輸入，并將結(jié)果重新格式化為整齊格式的數(shù)據(jù)幀，然后將其與模擬基因metadata右連接。

# 'm' is a character string specifying a valid `pbDS` method
.run_method <- function(m) {
    res <- pbDS(pb, method = m, verbose = FALSE)
    tbl <- resDS(sim, res)
    left_join(gi, tbl, by = c("gene", "cluster_id"))
}

計(jì)算了一組方法的結(jié)果data.frames之后，作者接下來定義了一個(gè)包裝器，該包裝器使用COBRAData構(gòu)造函數(shù)為iCOBRA的評估準(zhǔn)備數(shù)據(jù)，并使用calculate_performance計(jì)算感興趣的任何性能度量(通過方面指定):

# 'x' is a list of result 'data.frame's
.calc_perf <- function(x, facet = NULL) {
    cd <- COBRAData(truth = gi,
        pval = data.frame(bind_cols(map(x, "p_val"))),
        padj = data.frame(bind_cols(map(x, "p_adj.loc"))))
    perf <- calculate_performance(cd, 
        binary_truth = "is_de", maxsplit = 1e6,
        splv = ifelse(is.null(facet), "none", facet),
        aspects = c("fdrtpr", "fdrtprcurve", "curve"))
}

運(yùn)行一組DS分析方法，計(jì)算它們的性能，并根據(jù).calc_perf計(jì)算的方面繪制各種性能指標(biāo):

# simulation with all DD types
sim <- simData(ref, 
    p_dd = c(rep(0.3, 2), rep(0.1, 4)),
ng = 1e3, nc = 2e3, ns = 3, force = TRUE)
#使用simData函數(shù)模擬數(shù)據(jù)，其中p_dd參數(shù)定義了不同類型的差異表達(dá)基因的比例。
# aggregate to pseudobulks
pb <- aggregateData(sim)
#將單細(xì)胞數(shù)據(jù)聚合為偽批量數(shù)據(jù)。
# extract gene metadata
gi <- metadata(sim)$gene_info
# add truth column (must be numeric!)
gi$is_de <- !gi$category %in% c("ee", "ep")
gi$is_de <- as.numeric(gi$is_de) 
#為基因metadata添加了一個(gè)真實(shí)值列is_de，表示基因是否是差異表達(dá)的。

# specify methods for comparison & run them
# (must set names for methods to show in visualizations!)
names(mids) <- mids <- c("edgeR", "DESeq2", "limma-trend", "limma-voom")
res <- lapply(mids, .run_method)

# calculate performance measures 
# and prep. for plotting with 'iCOBRA'
library(iCOBRA)
perf <- .calc_perf(res, "cluster_id")
pd <- prepare_data_for_plot(perf)

使用前面定義的包裝器.calc_perf計(jì)算性能度量，并使用prepare_data_for_plot函數(shù)準(zhǔn)備數(shù)據(jù)以供繪圖。

# plot FDR-TPR curves by cluster
plot_fdrtprcurve(pd) +
    theme(aspect.ratio = 1) +
    scale_x_continuous(trans = "sqrt") +
    facet_wrap(~ splitval, nrow = 1)

image.png

總的來說，作者在這部分使用iCOBRA包來評估和可視化不同的差異表達(dá)分析方法的性能。它首先模擬數(shù)據(jù)，然后運(yùn)行不同的方法，計(jì)算性能度量，并繪制FDR-TPR曲線。

小結(jié)

在本期推文中，我們首先介紹了如何使用muscat模擬復(fù)雜的scRNA-seq數(shù)據(jù)。模擬數(shù)據(jù)是評估分析方法性能的關(guān)鍵，因?yàn)樗梢詾槲覀兲峁┮粋€(gè)已知的“真實(shí)”答案，從而幫助我們更好地理解各種方法的優(yōu)缺點(diǎn)。接下來，我們探討了如何使用muscat評估不同的差異表達(dá)分析方法。差異表達(dá)分析是scRNA-seq數(shù)據(jù)分析的核心部分，選擇合適的方法對于得出準(zhǔn)確的結(jié)論是至關(guān)重要的。在下一期的推文中，我們將詳細(xì)介紹如何使用muscat進(jìn)行差異狀態(tài)分析。

好啦，本期的分享到這里就結(jié)束了，我們下期再會~

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

muscat代碼實(shí)操（一）：模擬復(fù)雜的 scRNA-seq 數(shù)據(jù)以及評估不同的差異表達(dá)分析方法的性能

muscat代碼實(shí)操（一）：模擬復(fù)雜的 scRNA-seq 數(shù)據(jù)以及評估不同的差異表達(dá)分析方法的性能

前言

代碼流程

0. 加載R包

1. prepSim：準(zhǔn)備模擬數(shù)據(jù)

2. simData: Simulating complex designs

3. Simulation a hierarchical cluster structure

3.1 p_type: Introducing type features

3.2 phylo_tree: Introducing a cluster phylogeny.

4. 質(zhì)量控制

5. 各種方法性能評估

小結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

muscat代碼實(shí)操（一）：模擬復(fù)雜的 scRNA-seq 數(shù)據(jù)以及評估不同的差異表達(dá)分析方法的性能

前言

代碼流程

0. 加載R包

1. prepSim：準(zhǔn)備模擬數(shù)據(jù)

2. simData: Simulating complex designs

3. Simulation a hierarchical cluster structure

3.1 p_type: Introducing type features

3.2 phylo_tree: Introducing a cluster phylogeny.

4. 質(zhì)量控制

5. 各種方法性能評估

小結(jié)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av