feature selection

比較不同feature selection方法

單細(xì)胞轉(zhuǎn)錄組測序的確可以一次性對所有細(xì)胞都檢測到上千個(gè)基因的表達(dá)，但是大多數(shù)情況下，只有其中的少部分基因是有生物學(xué)意義。而且大多數(shù)基因之所以在不同的細(xì)胞里面表達(dá)有差異，其實(shí)是技術(shù)限制，背景噪音。這些技術(shù)限制，包括批次效應(yīng)，都會阻礙我們發(fā)現(xiàn)那些真正的有生物學(xué)意義的基因。

feature selection分析去除那些技術(shù)噪音相關(guān)基因，可以顯著的提高信噪比，降低后續(xù)分析的復(fù)雜度。

library(limma)
library(M3Drop)

## Warning: package 'M3Drop' was built under R version 3.5.2

library(scran)
usoskin1 <- readRDS("D:/paopaoR/testt/usoskin1.rds")
dim(usoskin1)

## [1] 25334   622

Normalize & filter expression matrix

uso_list <- M3Drop::M3DropCleanData(
    usoskin1,
    labels = colnames(usoskin1),
    min_detected_genes = 2000,
    is.counts = TRUE
)
expr_matrix <- uso_list$data 
dim(expr_matrix)

## [1] 15708   532

Brennecke

Brennecke_HVG <- M3Drop::BrenneckeGetVariableGenes(
    expr_matrix,
    fdr = 0.01,
    minBiolDisp = 0.5
)

image.png

High Dropout Genes

fit 米氏方程（按教程應(yīng)該是用m3dropfeatureselection那個(gè)函數(shù)，我當(dāng)時(shí)安裝的版本可能有問題，一直沒有這個(gè)函數(shù)，我就想既然原理是假設(shè)dropout是因?yàn)槟孓D(zhuǎn)錄失敗，這個(gè)酶促反應(yīng)，那就挑偏離米氏方程model的吧，但是后來我刪除了原來的包從github重新加載后，函數(shù)可以用了，心累。。。代碼也放著吧）

m3drop <- M3DropDropoutModels(expr_matrix)

image.png

m3drop <- sort(m3drop$MMFit$predictions,decreasing = FALSE)
m3drop_gene <- names(m3drop)[1:1500]

m3dGenes <- as.character(
    M3DropFeatureSelection(deng)$Gene
)

fit model

擬合線性趨勢

fit <- trendVar(expr_matrix, loess.args=list(span=0.05))
dec <- decomposeVar(expr_matrix, fit)
top.dec <- dec[order(dec$bio, decreasing=TRUE),]
top_genes <- rownames(top.dec)[1:1500]

correlation genes

這個(gè)很費(fèi)時(shí)間，還很占內(nèi)存。

pca first 2 components

用前兩個(gè)pc

pca <- prcomp(log(expr_matrix+1)/log(2))
score <- rowSums(abs(pca$x[,c(1,2)]))
names(score) <- rownames(expr_matrix)
score <- score[order(-score)]
PCA_genes <- names(score[1:1500])

看看幾種方法選擇的基因overlap情況

venn.diag <- vennCounts(
 cbind(rownames(expr_matrix) %in% Brennecke_HVG,
 rownames(expr_matrix) %in% PCA_genes,rownames(expr_matrix) %in% m3drop_gene,rownames(expr_matrix) %in% top_genes))
vennDiagram(
 venn.diag,
 names = c("bre", "pca","m3d","fit"),
 circle.col = c("blue", "green","yellow","red"))

image.png

DEseq 看交集

DESeq_table <- readRDS("D:/paopaoR/testt/DESeq_table.rds")
DE_genes = unique(DESeq_table$Gene)

sum(Brennecke_HVG %in% DE_genes)/length(Brennecke_HVG)

## [1] 0.2275862

sum(PCA_genes %in% DE_genes)/length(PCA_genes)

## [1] 0.382

sum(m3drop_gene %in% DE_genes)/length(m3drop_gene)

## [1] 0.374

sum(top_genes %in% DE_genes)/length(top_genes)

## [1] 0.3646667

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

單細(xì)胞學(xué)習(xí)2

單細(xì)胞學(xué)習(xí)2

feature selection

比較不同feature selection方法

Normalize & filter expression matrix

Brennecke

High Dropout Genes

fit model

correlation genes

pca first 2 components

看看幾種方法選擇的基因overlap情況

DEseq 看交集

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

單細(xì)胞學(xué)習(xí)2

feature selection

比較不同feature selection方法

Normalize & filter expression matrix

Brennecke

High Dropout Genes

fit model

correlation genes

pca first 2 components

看看幾種方法選擇的基因overlap情況

DEseq 看交集

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av