无码久久久,Ay0AaaQO,色韩999

本文是參考學(xué)習(xí)CNS圖表復(fù)現(xiàn)14—檢查文獻(xiàn)的inferCNV流程的學(xué)習(xí)筆記?？赡芨鶕?jù)學(xué)習(xí)情況有所改動。
前面我們的教程講到了，自己取全部的上皮細(xì)胞，以及部分Fibroblasts和Endothelial_cells細(xì)胞來一起運行inferCNV流程，但是得到的結(jié)果很詭異，明明是作為二倍體正常細(xì)胞參考集的Fibroblasts和Endothelial_cells細(xì)胞居然也是在某些染色體上面有明顯的CNV情況。為了解決這個問題，讓我們一起看看文獻(xiàn)自己的inferCNV流程是如何使用的，以及對應(yīng)的數(shù)據(jù)集。

首先運行作者自己的代碼和數(shù)據(jù)
那，我們就看看作者自己的代碼和數(shù)據(jù)吧，運行他們的inferCNV流程，看看我們的差異究竟是在哪了？

我注意到文章的腳本里面有這樣的一句話：

Save all inferCNV files and run inferCNV in previous version of R

看了看作者準(zhǔn)備的3個文件，如下：

183K Aug 30 11:33 NI03_CNV_cell_metadata_shuffle_largefile.txt
544M Aug 30 12:02 NI03_CNV_data_out_all_cells_raw_counts_largefile.txt
671K Aug 30 11:31 NI03_CNV_hg19_genes_ordered_correct_noXY.txt
上面的3個文件作者的制作方式，跟我的大同小異，就不過多介紹啦，然后運行作者的inferCNV代碼，如下；

library(infercnv)
infercnv_obj = CreateInfercnvObject(raw_counts_matrix = "NI03_CNV_data_out_all_cells_raw_counts_largefile.txt", 
                                    annotations_file = "NI03_CNV_cell_metadata_shuffle_largefile.txt", 
                                    gene_order_file = "NI03_CNV_hg19_genes_ordered_correct_noXY.txt", 
                                    ref_group_names = c("endothelial_normal", "fibroblast_normal"), delim = "\t")
# Make sure that chrmosomes are ordered correctly 
slot(infercnv_obj, "gene_order")[,"chr"] <- factor(slot(infercnv_obj, "gene_order")[,"chr"], 
                                                   levels = c("chr1", "chr2","chr3","chr4", "chr5", "chr6","chr7", "chr8", "chr9","chr10", "chr11", "chr12","chr13", "chr14", "chr15","chr16", "chr17", "chr18","chr19", "chr20", "chr21","chr22"))
# Run infer CNV
infercnv_all = infercnv::run(infercnv_obj,
                             cutoff=1,  # use 1 for smart-seq, 0.1 for 10x-genomics
                             out_dir= "myresults",  # dir is auto-created for storing outputs
                             cluster_by_groups=F,   # cluster
                             hclust_method="ward.D2", plot_steps=F)

我仔細(xì)看了看作者運行inferCNV的代碼，差異真的很小，其中cluster_by_groups這個參數(shù)僅僅是可視化的選項，不會影響重要的結(jié)論。而hclust_method通常呢，影響細(xì)胞之間的距離，按照道理并不影響CNV，那么應(yīng)該是我前面的那些其它參數(shù)導(dǎo)致的。

讓我們看看這個函數(shù)的默認(rèn)參數(shù)：

run(infercnv_obj, cutoff = 1, min_cells_per_gene = 3, out_dir = NULL,
  window_length = 101, smooth_method = c("pyramidinal", "runmeans",
  "coordinates"), num_ref_groups = NULL,
  ref_subtract_use_mean_bounds = TRUE, cluster_by_groups = FALSE,
  cluster_references = TRUE, k_obs_groups = 1,
  hclust_method = "ward.D2", max_centered_threshold = 3,
  scale_data = FALSE, HMM = FALSE, HMM_transition_prob = 1e-06,
  HMM_report_by = c("subcluster", "consensus", "cell"),
  HMM_type = c("i6", "i3"), HMM_i3_pval = 0.05, HMM_i3_use_KS = TRUE,
  BayesMaxPNormal = 0.5, sim_method = "meanvar",
  sim_foreground = FALSE, reassignCNVs = TRUE,
  analysis_mode = c("samples", "subclusters", "cells"),
  tumor_subcluster_partition_method = c("random_trees", "qnorm",
  "pheight", "qgamma", "shc"), tumor_subcluster_pval = 0.1,
  denoise = FALSE, noise_filter = NA, sd_amplifier = 1.5,
  noise_logistic = FALSE, outlier_method_bound = "average_bound",
  outlier_lower_bound = NA, outlier_upper_bound = NA,
  final_scale_limits = NULL, final_center_val = NULL, debug = FALSE,
  num_threads = 4, plot_steps = FALSE, resume_mode = TRUE,
  png_res = 300, plot_probabilities = TRUE, save_rds = TRUE,
  save_final_rds = TRUE, diagnostics = FALSE,
  remove_genes_at_chr_ends = FALSE, prune_outliers = FALSE,
  mask_nonDE_genes = FALSE, mask_nonDE_pval = 0.05,
  test.use = "wilcoxon", require_DE_all_normals = "any",
  hspike_aggregate_normals = FALSE, no_plot = FALSE,
  no_prelim_plot = FALSE, output_format = "png", useRaster = TRUE,
  up_to_step = 100)

多到讓人頭皮發(fā)麻！

其中文獻(xiàn)運行infercnv::run的時候，下面兩個參數(shù)，都是默認(rèn)值：

HMM參數(shù) when set to True, runs HMM to predict CNV level (default: FALSE)
denoise  If True, turns on denoising according to options below (default: FALSE)

而我運行的時候，把這兩個參數(shù)都設(shè)置為了T，運行該文獻(xiàn)他自己的數(shù)據(jù)集和文獻(xiàn)代碼后，運行的日志文件如下所示：

INFO [2020-10-19 11:17:44] ::process_data:Start
INFO [2020-10-19 11:17:44] Creating output path myresults
INFO [2020-10-19 11:17:44] Checking for saved results.
INFO [2020-10-19 11:17:44] 

 STEP 1: incoming data

INFO [2020-10-19 11:18:19] 

 STEP 02: Removing lowly expressed genes

INFO [2020-10-19 11:18:19] ::above_min_mean_expr_cutoff:Start
INFO [2020-10-19 11:18:19] Removing 5929 genes from matrix as below mean expr threshold: 1
INFO [2020-10-19 11:18:20] validating infercnv_obj
INFO [2020-10-19 11:18:20] There are 14467 genes and 7181 cells remaining in the expr matrix.
INFO [2020-10-19 11:18:24] no genes removed due to min cells/gene filter

INFO [2020-10-19 12:19:51] plot_cnv_references:Start
INFO [2020-10-19 12:19:51] Reference data size: Cells= 1000 Genes= 14467
INFO [2020-10-19 12:20:07] plot_cnv_references:Number reference groups= 2
INFO [2020-10-19 12:20:07] plot_cnv_references:Plotting heatmap.
INFO [2020-10-19 12:20:10] Colors for breaks:  #00008B,#24249B,#4848AB,#6D6DBC,#9191CC,#B6B6DD,#DADAEE,#FFFFFF,#EEDADA,#DDB6B6,#CC9191,#BC6D6D,#AB4848,#9B2424,#8B0000
INFO [2020-10-19 12:20:10] Quantiles of plotted data range: 0.544327010335531,0.938892738431902,1,1.06175861606737,1.53099452887365
INFO [2020-10-19 12:20:10] plot_cnv_references:Writing reference data to myresults/infercnv.references.txt

耗時約1個小時（主要的時間花在了第15步），關(guān)鍵問題是，他得到的CNV非常漂亮。也就是說如果不考慮數(shù)據(jù)集的差異，這個時候得到的結(jié)論是HMM參數(shù)和 denoise參數(shù)都應(yīng)該是默認(rèn)值才行啊。

圖片

然后運行我的代碼在作者的數(shù)據(jù)

跟上一講我們的代碼大同小異，如下：

rm(list = ls())

dat=read.table('NI03_CNV_data_out_all_cells_raw_counts_largefile.txt',
               header = T,sep = '\t')
dim(dat)
library(AnnoProbe)
geneInfor=annoGene(rownames(dat),"SYMBOL",'human')
colnames(geneInfor)
geneInfor=geneInfor[with(geneInfor, order(chr, start)),c(1,4:6)]
geneInfor=geneInfor[!duplicated(geneInfor[,1]),]
length(unique(geneInfor[,1]))
head(geneInfor)
## 這里可以去除性染色體
# 也可以把染色體排序方式改變
dat=dat[rownames(dat) %in% geneInfor[,1],]
dat=dat[match( geneInfor[,1], rownames(dat) ),] 
dim(dat)

groupFiles='groupFiles.txt'
groupinfo=read.table('NI03_CNV_cell_metadata_shuffle_largefile.txt',header = F,sep = '\t')
table(groupinfo$V2)
dim(groupinfo)
head(groupinfo)
table(groupinfo$V1 %in% colnames(dat))
write.table(groupinfo,file = groupFiles,sep = '\t',quote = F,col.names = F,row.names = F)
dat=dat[, colnames(dat) %in% groupinfo$V1]

expFile='expFile.txt'
write.table(dat,file = expFile,sep = '\t',quote = F)

head(geneInfor)
geneFile='geneFile.txt'
write.table(geneInfor,file = geneFile,sep = '\t',quote = F,col.names = F,row.names = F)


infercnv_obj = CreateInfercnvObject(raw_counts_matrix=expFile,
                                    annotations_file=groupFiles,
                                    delim="\t",
                                    gene_order_file= geneFile,
                                    ref_group_names = c("endothelial_normal", "fibroblast_normal") )  
infercnv_obj = infercnv::run(infercnv_obj,
                             cutoff=1, # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
                             out_dir='jimmy_results', 
                             cluster_by_groups=TRUE, 
                             denoise=TRUE,
                             HMM=TRUE)

這個時候，我的時間主要是花費在了第STEP 18: Run Bayesian Network Model on HMM predicted CNV's

INFO [2020-10-20 09:25:35] Creating the following Directory:  jimmy_results/BayesNetOutput.HMMi6.hmm_mode-samples
INFO [2020-10-20 09:25:35] Initializing new MCM InferCNV Object.
INFO [2020-10-20 09:25:35] validating infercnv_obj
INFO [2020-10-20 09:25:36] Total CNV's:  1230
INFO [2020-10-20 09:25:36] Loading BUGS Model.
INFO [2020-10-20 09:25:38] Running Sampling Using Parallel with  4 Cores

中間調(diào)用了我MAC電腦的4個核心去計算，值得一提的是，因為等待時間過長，經(jīng)常出現(xiàn)錯誤?。?！，如下所示：

INFO [2020-10-19 15:46:13] Initializing new MCM InferCNV Object.
INFO [2020-10-19 15:46:13] validating infercnv_obj
INFO [2020-10-19 15:46:14] Total CNV's:  1239
INFO [2020-10-19 15:46:14] Loading BUGS Model.
INFO [2020-10-19 15:46:16] Running Sampling Using Parallel with  4 Cores
INFO [2020-10-19 18:21:03] Obtaining probabilities post-sampling
Error in do.call(rbind, mcmc[[j]]) : second argument must be a list
In addition: Warning message:
In parallel::mclapply(seq_along(obj@cell_gene), FUN = par_func,  :
  scheduled cores 1, 2, 3, 4 did not deliver results, all values of the jobs will be affected

運行了6次，都失敗，讓我很惱火，差不多的數(shù)據(jù)和代碼，為什么我自己運行十多分鐘即可，文章的這個需要十幾個小時。

最后
后來我仔細(xì)比較了，發(fā)現(xiàn)自己的數(shù)據(jù)里面，是因為 366 genes and 7044 cells ，得到是CNV數(shù)量太少了（第18步寫的是：Total CNV's: 31 ）計算量比較小，所以十幾分鐘就結(jié)束了。

但是文章的這個數(shù)據(jù)集呢， Total CNV's: 1229 太多了，耗費計算時間和資源有點過分了。這個數(shù)據(jù)量：14869 genes and 7181 cells 其實不能選擇 denoise=TRUE以及HMM=TRUE，都應(yīng)該是用默認(rèn)的FALSE即可。

所以我真正需要比較的是，為什么我自己運行inferCNV的時候的輸入數(shù)據(jù)跟作者的差異這么大?。?！

咱們明明都是取全部的上皮細(xì)胞，以及部分Fibroblasts和Endothelial_cells細(xì)胞來一起運行inferCNV流程?。。。?/p>

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

CNS圖表復(fù)現(xiàn)14—檢查文獻(xiàn)的inferCNV流程

CNS圖表復(fù)現(xiàn)14—檢查文獻(xiàn)的inferCNV流程

然后運行我的代碼在作者的數(shù)據(jù)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

CNS圖表復(fù)現(xiàn)14—檢查文獻(xiàn)的inferCNV流程

然后運行我的代碼在作者的數(shù)據(jù)

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av