看了一些文章,方法很多。比如先定義函數(shù),再用sapply......。我是個(gè)只會(huì)用
tidyverse的R低端用戶,而且定義函數(shù)過程中用到了log等一些數(shù)學(xué)計(jì)算,我不是很懂,只能用最樸實(shí)的計(jì)算了。
- 用featureCounts進(jìn)行轉(zhuǎn)錄組數(shù)據(jù)定量后,得到文件
*counts.txt,整理成一個(gè)counts矩陣,并將exon length信息添加進(jìn)去。
awk '{if($0~/#/ || $0~/Geneid/)next;print FILENAME"\t"$1"\t"$6"\t"$7}' *.counts.txt | \
awk '{if(a[$2]=="")a[$2]=$2"\t"$3"\t"$4;else a[$2]=a[$2]"\t"$4}END{for(i in a)print a[i]}' > All.counts.Length.tsv
這里添加FILENAME其實(shí)沒太大必要,只是想確認(rèn)順序沒問題。
- 利用
ls *counts.txt | sed 's/.counts.txt//g' | xargs | sed 's/ /\\t/g'獲得表頭,然后添加表頭。
$ sed -i '1i Geneids\tLength\t.............' All.counts.Length.tsv
$ head -5 All.counts.Length.tsv | column -t
Geneids Length S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 S21 S22 S23 S24 S25 S26 S27 S28 S29 S30 S31 S32 S33
gene-1 1932 53 225 41 63 27 11 69 42 118 94 65 23 117 45 111 113 36 28 30 26 42 19 49 12 37 52 51 67 35 46 43 38 189
gene-2 2074 761 1750 1246 1270 415 555 1218 588 736 882 1048 731 1238 802 746 1143 963 827 907 620 822 831 527 652 550 682 1474 1125 693 771 1041 1074 1286
gene-3 999 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
gene-4 1689 48 143 64 62 52 49 131 74 99 46 91 64 98 80 65 151 126 17 32 63 92 77 16 6 34 37 91 76 65 35 124 99 168
這里比較笨辦法,因?yàn)槊總€(gè)人文件命名的區(qū)別。
- 用
R進(jìn)行counts轉(zhuǎn)tpm。
# 導(dǎo)入包
library(tidyverse)
# 導(dǎo)入數(shù)據(jù)
countsMatrix <- read_tsv("All.counts.Length.tsv")
# 寬變長,按樣本分組,根據(jù)counts值和length計(jì)算tpm,長變寬
tpmMatrix <- countsMatrix %>%
pivot_longer(c(-Geneids, -Length),
names_to = "Group",
values_to = "SampleCounts") %>%
group_by(Group) %>%
mutate(SampleTPM = (((SampleCounts/Length)*1e6)/sum(SampleCounts/Length))) %>%
pivot_wider(id_cols = "Geneids",
names_from = "Group",
values_from = "SampleTPM")
# 導(dǎo)出tpm矩陣
write_tsv(tpmMatrix, "All.tpm.tsv")
$ head -5 All.tpm.tsv | column -t
Geneids S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 S21 S22 S23 S24 S25 S26 S27 S28 S29 S30 S31 S32 S33
gene-1 1.5238507710210807 6.310542071951338 1.2404535842920152 2.0663001763869144 0.9743108982268808 0.33974128224220745 2.019076017077262 1.4929440706884478 3.5567586201299908 3.2808012279442056 2.0141367829381984 0.7273476663573547 3.745576833513508 1.4293891079195502 3.86430047708595 3.3438601600173263 1.1944017312413042 0.8918109644551627 1.0411976601193402 0.9440828265392236 1.2502476891288166 0.627534624837922 1.6827401326544236 0.41501120242179135 1.198653312088212 1.7465312446943184 1.5573020278860934 2.2766184024976206 1.231820162316741 1.4683053863083524 1.3969678488346147 1.2888290768160424 5.910887556087811
gene-2 20.382131364015265 45.72151022236682 35.11664958446983 38.8020753148581 13.950194506595745 15.967870048962455 33.20085269354047 19.47017898938338 20.66562620912296 28.676029625384178 30.250688030110204 21.53426041707185 36.91916441916996 23.730707861496388 24.192742740279144 31.50752314249931 29.7627173926142 24.53684140053276 29.323620195611745 20.971370316403522 22.79381177499461 25.567218695129807 16.858928073760243 21.005089653143823 16.59789164983364 21.33810070431632 41.92745742194797 35.60953739855625 22.720132961041315 22.925103673788364 31.504098374003306 33.93238455898632 37.46538779255658
gene-3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
gene-4 1.5786479115856311 4.587727969134735 2.214899923327977 2.326065957393124 2.146419535545147 1.7311280320418376 4.3848258322317335 3.0088701104934734 3.413383246304719 1.836484932100434 3.225480858786998 2.315110156505115 3.588693736495643 2.906734826919883 2.588443793237929 5.111214219034013 4.7818499151294835 0.6193571706962143 1.2703967661183757 2.6167050442922104 3.1326514034299304 2.9090573970704754 0.6285190244271279 0.23735987065686825 1.2599353329885021 1.421517505758171 3.178495038562206 2.9539728822630242 2.61679736080075 1.277921206556115 4.6080492600494445 3.8408239049024777 6.010043948877915