轉(zhuǎn)錄組分析--FPKM與TPM

讀取文件(featurecounts后產(chǎn)生的row count文件)
rm(list=ls()) 
options(stringsAsFactors = F)  
library(tidyverse) 
# ggplot2 stringer dplyr tidyr readr purrr  tibble forcats 
library(data.table) #可多核讀取文件 
a1 <- fread('all.featurecounts.txt', header = T, data.table = F)#載入counts,第一列設(shè)置為列名 

counts矩陣的構(gòu)建

counts <- a1[,7:ncol(a1)] #截取樣本基因表達(dá)量的counts部分作為counts  
rownames(counts) <- a1$Geneid #將基因名作為行名 
### 從featurecounts 原始輸出文件counts.txt中提取Geneid、Length(轉(zhuǎn)錄本長度), 
geneid_efflen <- subset(a1,select = c("Geneid","Length"))        
colnames(geneid_efflen) <- c("geneid","efflen")   
geneid_efflen_fc <- geneid_efflen #用于之后比較 
### 取出counts中g(shù)eneid的對應(yīng)的efflen 
dim(geneid_efflen) 
efflen <- geneid_efflen[match(rownames(counts),                               
                              geneid_efflen$geneid),"efflen"] 

FPKM/RPKM (Fragments/Reads Per Kilobase Million ) 每千個堿基的轉(zhuǎn)錄每百萬映射讀取的Fragments/reads

# RPKM與FPKM分別針對單端與雙端測序而言,計(jì)算公式是一樣的 
counts2FPKM <- function(count=count, efflength=efflen){    
  PMSC_counts <- sum(count)/1e6   #counts的每百萬縮放因子 (“per million” scaling factor) 深度標(biāo)準(zhǔn)化   
  FPM <- count/PMSC_counts        #每百萬reads/Fragments (Reads/Fragments Per Million) 長度標(biāo)準(zhǔn)化   
  FPM/(efflength/1000)                                       
}
FPKM <- as.data.frame(apply(counts,2,counts2FPKM))
colnames(FPKM) <- c("Simmental_1","Simmental_2","Simmental_3","Wagyu_1","Wagyu_2","Wagyu_3") # 修改列名
FPKM <- FPKM[rowSums(FPKM)>=1,] # 去除全部為0的列
colSums(FPKM)

當(dāng)前推薦使用 TPM 進(jìn)行相關(guān)性分析、PCA分析等 (Transcripts Per Kilobase Million) 每千個堿基的轉(zhuǎn)錄每百萬映射讀取的Transcripts

counts2TPM <- function(count=count, efflength=efflen){   
  RPK <- count/(efflength/1000)   #每千堿基reads (reads per kilobase) 長度標(biāo)準(zhǔn)化   
  PMSC_rpk <- sum(RPK)/1e6        #RPK的每百萬縮放因子 (“per million” scaling factor ) 深度標(biāo)準(zhǔn)化   
  RPK/PMSC_rpk                       
}
TPM <- as.data.frame(apply(counts,2,counts2TPM))
colnames(TPM) <- c("Zebu_1","Zebu_2","Zebu_3","Zebu_4","Zebu_5","Holstein_1","Holstein_2","Holstein_3","Holstein_4","Holstein_5") # 修改列名
TPM <- TPM[rowSums(TPM)>0,] # 去除全部為0的列
colSums(TPM)
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容