日本二区二区,国产碰超久久福利,超碰av一区在线

######################################

題目：單細胞數(shù)據(jù)的導入與質(zhì)控 - Seurat
語言：R
Author: YQ
######################################

step0.設置工作環(huán)境

方法一：新建repository，在該repo內(nèi)寫R.scripts
方法二：setwd()


setwd('/Users/apple/Desktop/Rsave/')
getwd()

創(chuàng)建以下目錄

single_cell_rnaseq/
├── data
├── results
└── figures

step1.導入數(shù)據(jù)，創(chuàng)建計數(shù)矩陣

目前我碰到的有這幾種情況；

- 情況一：三個文件

三個文件指的是“barcodes.tsv","features.tsv","matrix.mtx";
這個情況就比較好處理了，barcodes.tsv就是cell id，features.tsv就是gene id，matrix.mtx就是計數(shù)counts矩陣

##示例 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE136916

# 設置默認路徑
matrix_dir = '/Users/apple/Desktop/Rwork/Glasslung/' 

# 按路徑讀取三個文件
barcode.path<-paste0(matrix_dir,"barcodes.tsv")
genes.path<-paste0(matrix_dir,"features.tsv")
matrix.path<-paste0(matrix_dir,"matrix.mtx")

# readMM讀取數(shù)據(jù)的values,columns,index
zebrafish.data <- readMM(file = matrix.path) ##mac上不能讀壓縮文件
gene.names = read.delim(genes.path,header = FALSE, stringsAsFactors = FALSE)
barcode.names = read.delim(barcode.path,header = FALSE,stringsAsFactors = FALSE)
colnames(zebrafish.data) = barcode.names$V1
rownames(zebrafish.data) = gene.names$V2 ##把示例中的V1改成V2

# check矩陣
zebrafish.data[1:6, 1:6] ##check矩陣
dim(zebrafish.data) ##check矩陣

- 情況二：直接給了計數(shù)矩陣的csv

單個csv矩陣讀入

# 讀取csv格式的矩陣
data1 <- read.csv("/Users/quyue/Desktop/gene_bc_matrices.csv", header = T, row.names = 1)

# csv矩陣轉(zhuǎn)換成數(shù)據(jù)框
datan = data.frame(data1)

# 數(shù)據(jù)框轉(zhuǎn)換成稀疏矩陣matrix
dataan <- as(as.matrix(datan), "dgCMatrix")

多個csv矩陣讀入


# 設置默認路徑
matrix_dir = '/Users/quyue/Desktop/bioinfo/bonemarrowniche/0526data/'  # 為了避免重復輸入默認路徑，賦值給matrix_dir

# 讀取多個csv
data1 <- read.csv(paste0(matrix_dir, "GSM3714747_chow1_filtered_gene_bc_matrices.csv"), header = T,row.names= 1)
data2 <- read.csv(paste0(matrix_dir, "GSM3714748_chow2_filtered_gene_bc_matrices.csv"), header = T,row.names= 1)
data3 <- read.csv(paste0(matrix_dir, "GSM3714749_chow3_filtered_gene_bc_matrices.csv"), header = T,row.names= 1)

# csv合并成數(shù)據(jù)框
datan = data.frame(data1,data2,data3)

# 數(shù)據(jù)框轉(zhuǎn)換成稀疏矩陣matrix
dataan <- as(as.matrix(datan), "dgCMatrix")

- 情況三：直接給了計數(shù)矩陣的txt

單個txt矩陣讀入

# 讀取txt格式的矩陣
data1 <- read.table("/Users/quyue/Desktop/GSM2915579_5FU-cntrl-col23.txt", header = T, row.names = 1)

# txt矩陣轉(zhuǎn)換成數(shù)據(jù)框
datan = data.frame(data1)

# 數(shù)據(jù)框裝轉(zhuǎn)換成稀疏矩陣matrix
dataan <- as(as.matrix(datan),"dgCMatrix")

多個txt矩陣讀入


# 設置默認路徑
matrix_dir = "/Users/quyue/Desktop/"

# 讀取多個txt格式的矩陣
data1 = read.table(paste0(matrix_dir,"GSM2915579_5FU-cntrl-col23-1.txt"), header = T, row.names = 1)

data2 = read.table(paste0(matrix_dir,"GSM2915579_5FU-cntrl-col23-2.txt"), header = T, row.names = 1)

data3 = read.table(paste0(matrix_dir,"GSM2915579_5FU-cntrl-col23-3.txt"), header = T, row.names = 1)

dataa = cbind(data1, data2, data3) #行數(shù)不同，會生成NaN值

# txt矩陣轉(zhuǎn)換成數(shù)據(jù)框
datan = data.frame(dataa)

# 數(shù)據(jù)框轉(zhuǎn)換成稀疏矩陣matrix
dataan <- as(as.matrix(datan),"dgCMatrix") # 此步驟是否必要有待驗證

- 情況四：xslx文件

其實就是用R讀取excel文件，把xslx轉(zhuǎn)化成csv再讀取

# 加載openxslx這個R包
library('openxslx')

# 讀入xslx格式矩陣
a <- read.xslx("test.xslx",sheet = 1) # 文件名+sheet序號

# 把xslx格式矩陣轉(zhuǎn)換成csv格式矩陣
# 獲取excel中工作簿的名稱
sheetnames<-getSheetNames('test.xlsx')

#把每個工作薄的數(shù)據(jù)按照'工作薄名稱.csv'的名稱存儲
for(iin(1:length(sheetnames))){

write.table(read.xlsx('/Users/quyue/Desktop/test.xlsx',sheet=i),paste(sheetnames[i],'.csv',sep=''),row.names=F,sep=',')

} # 有待驗證，不是我自己寫的

# 讀取csv格式矩陣
data1 <- read.csv("/Users/quyue/Desktop/test.csv", header = T, row.names = 1)

# 把csv格式矩陣轉(zhuǎn)換成數(shù)據(jù)框
datan = data.frame(data1)

# 把數(shù)據(jù)框轉(zhuǎn)換成稀疏矩陣matrix
dataan <- as(as.matrix(datan), "dgCMatrix")

- 情況五：直接給了已經(jīng)構(gòu)建好seurat對象的R.data

# 直接load R.data
load(file = "/Users/quyue/Desktop/bioinfo/bonemarrowniche/RNAMagnetDataBundle/NicheData10x.rda")
    > Loading required package: Seurat

# load完以后，看到底是什么數(shù)據(jù)類型，并查看數(shù)據(jù)
NicheData10x
    > An object of class Seurat # 已經(jīng)是seurat對象了
    16701 features across 7497 samples within 1 assay 
    Active assay: RNA (16701 features, 2872 variable features)
     2 dimensional reductions calculated: pca, tsne

View(NicheData10x) # 顯示seurat的具體參數(shù)

step2.構(gòu)建矩陣為seurat對象

找到一個解釋Seurat中所有函數(shù)的網(wǎng)站，比R自帶的要清楚很多 rdrr

工作流程

generation of the count matrix 生成計數(shù)矩陣
格式化讀取、分離樣本、映射(mapping)、定量(quantification);
quality control of the raw counts 原始矩陣質(zhì)量控制
過濾掉質(zhì)量差的細胞
clustering of filtering counts 過濾后計數(shù)的聚類
將轉(zhuǎn)錄活動相似的細胞歸為一類（細胞類型=不同的聚類）
Marker identification 標記識別
識別每個細胞群的基因標記（marker）
Optional downstream steps 其他可選的下游步驟

2.1 稀疏矩陣到seurat對象

2.1.1 使用ReadMM()或者Read10x()讀取單個樣本數(shù)據(jù)

函數(shù)用法說明：

1.readMM():

input:"xx.mtx"

備注：readMM is the function of Matrix packages, it changes the standard matrix into sparse matrix.Of note,features.tsv and barcodes.tsv should be library first, and then combine sparse matrix、features.tsv and barcodes.tsv form a counts matrix with cell id and gene id.（詳見step1情況一）

2.read10X():

input:"raw_feature_bc_matrix"(files from CellRanger output)

備注：read10X id the function of Seurat,its input from CellRanger output(10X genomics data的專用軟件Cell Ranger的output會有一個out目錄，"raw_feature_bc_matrix"在該目錄中)

library(Seurat)

# 按1中方法完成數(shù)據(jù)讀入到稀疏矩陣輸出的過程
# sparse matrix稀疏矩陣
ctrl_counts <- Read10x(data.dir = "data/ctrl_raw_feature_bc_matrix")


# 將稀疏矩陣轉(zhuǎn)換成一個Seurat
ctrl <- CreateSeuratObject(counts = ctrl_counts,
                          min.features = 100)

# 查看Seurat對象的元數(shù)據(jù)
head(ctrl@meta.data)

## 備注：元數(shù)據(jù) meta.data代表什么？
orig.ident # 通常包含所知的樣品名字，默認為“SeuratProject”
nCount_RNA # 每個細胞的UMI數(shù)目
nFeature_RNA # 每個細胞所檢測到的基因數(shù)目

2.1.2 使用for循環(huán)讀入多個樣本數(shù)據(jù)

R中for 循環(huán)的語法結(jié)構(gòu)

for (variable in input){
    command1
    command2
    command3
}

R中for循環(huán)實現(xiàn)讀入多個樣本數(shù)據(jù)

## 目的：for loop iterate 兩個樣本的文件
## 思路：執(zhí)行三個命令 (1) 使用Read10x()讀取計數(shù)數(shù)據(jù)；（2）使用CreateSeuratObject()從讀取的數(shù)據(jù)中創(chuàng)建Seurat對象；（3）根據(jù)樣本名將Seurat對象賦值給一個新的變量（好處是不會覆蓋上一次迭代中創(chuàng)建的Seurat對象）

mtx_dir = "/Users/quyue/Desktop/"

for (files in c("ctrl_raw_feature_bc_matrix","stim_raw_feature_bc_matrix")){
    seurat_data <- Read10X(data.dir = paste0(mtx_dir, file)) # (1)
    seurat_obj <- CreateSeuratObject(counts = seurat_data,
                                    min.features = 100,
                                    project = file) #(2)
    assign(file, seurat_obj) #(3)
}

2.2 查看seurat對象元數(shù)據(jù)&merge

再完成2.1已經(jīng)創(chuàng)建好了Seurat對象以后，快速查看元數(shù)據(jù)以了解其大概；

## 查看新Seurat對象的元數(shù)據(jù)
head(ctrl_raw_feature_bc_matrix@meta.data)
head(stim_raw_feature_bc_matrix@meta.data)

接著，需要把多個seurat對象合并成一個，這樣比較方便我們同時為每個樣品進行質(zhì)控，且方便我們更容易地比較所有樣本的數(shù)據(jù)質(zhì)量，用merge()函數(shù)執(zhí)行該操作；

## 創(chuàng)建一個合并的Seurat對象
## 用add.cell.id參數(shù)將樣品特異的前綴添加到每個細胞ID，用以區(qū)分不同樣本中的同個細胞ID
merged_seurat <- merge(x = ctrl_raw_feature_bc_matrix,
                       y = stim_raw_feature_bc_matrix,
                       add.cell.id = c("ctrl","stim"))

## 查看合并對象是否有對應的樣本前綴
head(merged_seurat@meta.data)
tail(merged_seurat@meta.data)

step3. 進行質(zhì)控流程（QC and pre-processing）

3.1 熟悉質(zhì)控指標

通常根據(jù)以下四個指標篩選：

# 每個細胞的read counts數(shù)目（UMI指的是molecular ID）
nCount_RNA  # (> 1000)

# 每個細胞所檢測到的基因數(shù)目
nFeature_RNA # (> 1000)

# 線粒體基因比例 mitoRatio
mitoRatio  # (< 5%)

# 單位UMI(read count=molecular id)檢測到的gene數(shù)目的比例(這個數(shù)據(jù)讓我們知道數(shù)據(jù)的復雜性)
GenesPerUMI

3.2 標準的質(zhì)控分析流程 - Standard pre-processing workflow

3.2.1 計算質(zhì)控指標

需要計算的主要是mitoRatio和GenesPerUMI（因為nCount_RNA和nFeature_RNA在meta.data中）

1.計算mitoRatio

主要是利用的Seurat的PercentageFeatureSet()功能，這個函數(shù)將使用一個模式(pattern)搜索基因標識符，對于每一列（細胞），它將選取特征基因的計數(shù)之和，除以所有基因的計數(shù)之和

## count mitoRatio with PercentageFeatureSet() 并添加 mito Ratio這一項到 meta.data當中

# 方法一
## The [[ operator can add columns to object metadata. This is a great place to stash QC stats
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

# 方法二
pbmc$mitoRatio <- PercentageFeatureSet(object = pbmc, pattern = "^MT-")
pbmc$mitoRatio <- pbmc@meta.data$mitoRatio / 100

2.計算log10GenesPerUMI（option）

GenesPerUMI = nFeature_RNA / nCount_RNA
這個指標代表的意思是 “單位read counts的gene數(shù)目的比例” 反映的是數(shù)據(jù)的復雜度；常用于評估ctrl和stim基因的整體復雜性

## count log10GenesPerUMI and add it to seurat object metadata
pbmc$log10GenesPerUMI <- log10(pbmc$nFeature_RNA)/log10(pbmc$nCount_RNA)

3.2.2 可視化質(zhì)控指標

## 一般可視化以下三個參數(shù)即可："nFeature_RNA", "nCount_RNA", "percent.mt"
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size=0.05)

## 可視化log10GenesPerUMI（option）
# 通過可視化每一個UMI檢測到的基因數(shù)來可視化基因表達的整體復雜性
metadata %>%
    ggplot(aes(x=log10GenesPerUMI, color = sample, fill=sample)) +
    geom_density(alpha = 0.2) +
    theme_classic() +
    geom_vline(xintercept = 0.8)

3.2.3 篩選 Filtering（option - 因為一般給的是經(jīng)過篩選過后的gene mtx）

3.2.3.1 細胞水平的篩選（Cell-level filtering）

參考閾值

nUMI > 500
nGene > 250
log10GenesPerUMI > 0.8
mitoRatio < 0.2

細胞水平的過濾, 主要是用subset()函數(shù)過濾

# 使用選擇的閾值篩掉低質(zhì)量讀寫 - 這些將會隨實驗而改變
filtered_pbmc <- subset(x = pbmc, subset= (nUMI >= 500) & 
                                          (nGene >= 250) & 
                                          (log10GenesPerUMI > 0.80) & 
                                          (mitoRatio < 0.20))

3.2.3.2 基因水平的篩選（Gene-level filtering）

在我們的數(shù)據(jù)中，會有許多基因的計數(shù)為零。這些基因會顯著降低一個細胞的平均表達量，因此我們將從數(shù)據(jù)中刪除它們。

主要刪除兩類基因，
（1）刪除所有細胞中零表達的基因；

（2）如果一個基因只在少數(shù)幾個細胞中表達，那么它的意義并不是特別大，因為它仍然會把沒有表達它的其他所有細胞的平均表達量降下來。對于我們的數(shù)據(jù)，我們選擇只保留在10個或更多細胞中表達的基因。

# 提取計數(shù)
counts <- GetAssayData(object = filtered_pbmc, slot = "counts")

# 根據(jù)在每個細胞的計數(shù)是否大于0為每個基因輸出一個邏輯向量
nonzero <- counts > 0

# 將所有TRUE值相加，如果每個基因的TRUE值超過10個，則返回TRUE。
keep_genes <- Matrix::rowSums(nonzero) >= 10

# 僅保留那些在10個以上細胞中表達的基因
filtered_counts <- counts[keep_genes, ]

# 重新賦值給經(jīng)過過濾的Seurat對象
filtered_pbmc <- CreateSeuratObject(filtered_counts, meta.data = filtered_pbmc@meta.data)

3.2.4 重新評估質(zhì)量指標 (Re-assess QC metrics)

在進行過濾后，建議回過頭來再看看指標，確保你的數(shù)據(jù)符合你的預期，對下游分析有好處。

3.2.5 保存篩選過的細胞 (Saving filtered cells)

我們將保存篩選后的細胞對象用于聚類 (clustering) 和標記識別 (marker identification)，將過濾以后的Seurat數(shù)據(jù)創(chuàng)建為.RData對象

# 創(chuàng)建.RData對象以方便在任何時候?qū)?save(filtered_pbmc, file="data/pbmc_seurat_filtered.RData")

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

單細胞數(shù)據(jù)的導入與質(zhì)控 - Seurat

單細胞數(shù)據(jù)的導入與質(zhì)控 - Seurat

step0.設置工作環(huán)境

step1.導入數(shù)據(jù)，創(chuàng)建計數(shù)矩陣

- 情況一：三個文件

- 情況二：直接給了計數(shù)矩陣的csv

- 情況三：直接給了計數(shù)矩陣的txt

- 情況四：xslx文件

- 情況五：直接給了已經(jīng)構(gòu)建好seurat對象的R.data

step2.構(gòu)建矩陣為seurat對象

工作流程

2.1 稀疏矩陣到seurat對象

2.1.1 使用ReadMM()或者Read10x()讀取單個樣本數(shù)據(jù)

2.1.2 使用for循環(huán)讀入多個樣本數(shù)據(jù)

2.2 查看seurat對象元數(shù)據(jù)&merge

step3. 進行質(zhì)控流程（QC and pre-processing）

3.1 熟悉質(zhì)控指標

3.2 標準的質(zhì)控分析流程 - Standard pre-processing workflow

3.2.1 計算質(zhì)控指標

3.2.2 可視化質(zhì)控指標

3.2.3 篩選 Filtering（option - 因為一般給的是經(jīng)過篩選過后的gene mtx）

3.2.3.1 細胞水平的篩選（Cell-level filtering）

3.2.3.2 基因水平的篩選（Gene-level filtering）

3.2.4 重新評估質(zhì)量指標 (Re-assess QC metrics)

3.2.5 保存篩選過的細胞 (Saving filtered cells)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

單細胞數(shù)據(jù)的導入與質(zhì)控 - Seurat

step0.設置工作環(huán)境

step1.導入數(shù)據(jù)，創(chuàng)建計數(shù)矩陣

- 情況一：三個文件

- 情況二：直接給了計數(shù)矩陣的csv

- 情況三：直接給了計數(shù)矩陣的txt

- 情況四：xslx文件

- 情況五：直接給了已經(jīng)構(gòu)建好seurat對象的R.data

step2.構(gòu)建矩陣為seurat對象

工作流程

2.1 稀疏矩陣到seurat對象

2.1.1 使用ReadMM()或者Read10x()讀取單個樣本數(shù)據(jù)

2.1.2 使用for循環(huán)讀入多個樣本數(shù)據(jù)

2.2 查看seurat對象元數(shù)據(jù)&merge

step3. 進行質(zhì)控流程（QC and pre-processing）

3.1 熟悉質(zhì)控指標

3.2 標準的質(zhì)控分析流程 - Standard pre-processing workflow

3.2.1 計算質(zhì)控指標

3.2.2 可視化質(zhì)控指標

3.2.3 篩選 Filtering（option - 因為一般給的是經(jīng)過篩選過后的gene mtx）

3.2.3.1 細胞水平的篩選（Cell-level filtering）

3.2.3.2 基因水平的篩選（Gene-level filtering）

3.2.4 重新評估質(zhì)量指標 (Re-assess QC metrics)

3.2.5 保存篩選過的細胞 (Saving filtered cells)

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av