腳本 | R | 轉(zhuǎn)錄組 counts 2 tpm

看了一些文章,方法很多。比如先定義函數(shù),再用sapply......。我是個(gè)只會(huì)用tidyverse的R低端用戶,而且定義函數(shù)過程中用到了log等一些數(shù)學(xué)計(jì)算,我不是很懂,只能用最樸實(shí)的計(jì)算了。

  • featureCounts進(jìn)行轉(zhuǎn)錄組數(shù)據(jù)定量后,得到文件*counts.txt,整理成一個(gè)counts矩陣,并將exon length信息添加進(jìn)去。
awk '{if($0~/#/ || $0~/Geneid/)next;print FILENAME"\t"$1"\t"$6"\t"$7}' *.counts.txt | \
awk '{if(a[$2]=="")a[$2]=$2"\t"$3"\t"$4;else a[$2]=a[$2]"\t"$4}END{for(i in a)print a[i]}' > All.counts.Length.tsv

這里添加FILENAME其實(shí)沒太大必要,只是想確認(rèn)順序沒問題。

  • 利用ls *counts.txt | sed 's/.counts.txt//g' | xargs | sed 's/ /\\t/g'獲得表頭,然后添加表頭。
$ sed -i '1i Geneids\tLength\t.............' All.counts.Length.tsv
$ head -5 All.counts.Length.tsv | column -t
Geneids     Length  S01     S02     S03      S04     S05     S06     S07     S08     S09     S10     S11     S12     S13     S14     S15     S16     S17     S18     S19     S20     S21     S22     S23     S24     S25     S26     S27     S28     S29     S30     S31     S32     S33
gene-1      1932    53      225     41       63      27      11      69      42      118     94      65      23      117     45      111     113     36      28      30      26      42      19      49      12      37      52      51      67      35      46      43      38      189
gene-2      2074    761     1750    1246     1270    415     555     1218    588     736     882     1048    731     1238    802     746     1143    963     827     907     620     822     831     527     652     550     682     1474    1125    693     771     1041    1074    1286
gene-3      999     0       0       0        0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
gene-4      1689    48      143     64       62      52      49      131     74      99      46      91      64      98      80      65      151     126     17      32      63      92      77      16      6       34      37      91      76      65      35      124     99      168

這里比較笨辦法,因?yàn)槊總€(gè)人文件命名的區(qū)別。

  • R進(jìn)行counts轉(zhuǎn)tpm。
 # 導(dǎo)入包
library(tidyverse)

# 導(dǎo)入數(shù)據(jù)
countsMatrix <- read_tsv("All.counts.Length.tsv")

# 寬變長,按樣本分組,根據(jù)counts值和length計(jì)算tpm,長變寬
tpmMatrix <- countsMatrix %>%
  pivot_longer(c(-Geneids, -Length),
               names_to = "Group",
               values_to = "SampleCounts") %>%
  group_by(Group) %>%
  mutate(SampleTPM = (((SampleCounts/Length)*1e6)/sum(SampleCounts/Length))) %>%
  pivot_wider(id_cols = "Geneids",
              names_from = "Group",
              values_from = "SampleTPM")

# 導(dǎo)出tpm矩陣
write_tsv(tpmMatrix, "All.tpm.tsv")
$ head -5 All.tpm.tsv | column -t
Geneids     S01                   S02                   S03                   S04                   S05                   S06                   S07                   S08                   S09                   S10                   S11                    S12                   S13                   S14                   S15                   S16                   S17                   S18                   S19                   S20                   S21                   S22                   S23                   S24                   S25                   S26                   S27                   S28                   S29                   S30                   S31                   S32                   S33
gene-1      1.5238507710210807    6.310542071951338     1.2404535842920152    2.0663001763869144    0.9743108982268808    0.33974128224220745   2.019076017077262     1.4929440706884478    3.5567586201299908    3.2808012279442056    2.0141367829381984     0.7273476663573547    3.745576833513508     1.4293891079195502    3.86430047708595      3.3438601600173263    1.1944017312413042    0.8918109644551627    1.0411976601193402    0.9440828265392236    1.2502476891288166    0.627534624837922     1.6827401326544236    0.41501120242179135   1.198653312088212     1.7465312446943184    1.5573020278860934    2.2766184024976206    1.231820162316741     1.4683053863083524    1.3969678488346147    1.2888290768160424    5.910887556087811
gene-2      20.382131364015265    45.72151022236682     35.11664958446983     38.8020753148581      13.950194506595745    15.967870048962455    33.20085269354047     19.47017898938338     20.66562620912296     28.676029625384178    30.250688030110204     21.53426041707185     36.91916441916996     23.730707861496388    24.192742740279144    31.50752314249931     29.7627173926142      24.53684140053276     29.323620195611745    20.971370316403522    22.79381177499461     25.567218695129807    16.858928073760243    21.005089653143823    16.59789164983364     21.33810070431632     41.92745742194797     35.60953739855625     22.720132961041315    22.925103673788364    31.504098374003306    33.93238455898632     37.46538779255658
gene-3      0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                      0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0
gene-4      1.5786479115856311    4.587727969134735     2.214899923327977     2.326065957393124     2.146419535545147     1.7311280320418376    4.3848258322317335    3.0088701104934734    3.413383246304719     1.836484932100434     3.225480858786998      2.315110156505115     3.588693736495643     2.906734826919883     2.588443793237929     5.111214219034013     4.7818499151294835    0.6193571706962143    1.2703967661183757    2.6167050442922104    3.1326514034299304    2.9090573970704754    0.6285190244271279    0.23735987065686825   1.2599353329885021    1.421517505758171     3.178495038562206     2.9539728822630242    2.61679736080075      1.277921206556115     4.6080492600494445    3.8408239049024777    6.010043948877915
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容