在ASEReadCounter完成位點的覆蓋度信息計數(shù)統(tǒng)計之后,還需要對位點添加基因ID,隨后做二項分布和費舍爾精確檢驗,這里推薦GENEiase軟件。
ASE (等位基因特異性表達)—— ASEReadCounter - 簡書 (jianshu.com)
GENEiase軟件論文:
https://www.nature.com/articles/srep21134.pdf
找到了一個介紹ASE的PPT:
https://scilifelab.github.io/courses/rnaseq/1610/slides/ASE_Olof_Emanuelsson.pdf
1.下載安裝
1.1 下載
https://github.com/edsgard/geneiase/tags
1.2 安裝
$ tar xvf geneiase-1.0.1.tar.gz
$ cd /your/path/geneiase-1.0.1/bin
geneiase是基于R的,首先需要進入R環(huán)境,安裝依賴包:
$ R
> install.packages(c('getopt', 'binom', 'VGAM'))
> q()
安裝完成后,退出R,即可正常使用geneiase。
$ geneiase
Usage: geneiase [-[-ase.type|t] <character>] [-[-in.file|i] <character>] [-[-out.file|o] <character>] [-[-betabin.p|p] <double>] [-[-betabin.rho|r] <double>] [-[-n.bootstrap.samples|b] <integer>] [-[-min.feat.vars|m] <integer>] [-[-nmax.vars|x] <integer>] [-[-lib.file|l] <character>] [-[-help|h]]
出現(xiàn)Usage,安裝成功。
2. 參數(shù)
geneiase只需要兩個參數(shù),-t和-i:
-t,
"static"或者"icd",
指定數(shù)據(jù)類型是靜態(tài)的"static"還是獨立的條件依賴"icd"的ASE
-i,
輸入文件的文件名
安裝包解壓后的test文件夾中有兩種數(shù)據(jù)類型的示例數(shù)據(jù)。
static數(shù)據(jù)包含四列信息,分別為基因ID(feautureID), snpID, 替代等位基因數(shù)(alternative allele count),參考等位基因數(shù)目( reference allele count),示例格式:
$ less static.test.input.tab
gene snp.id alt.dp ref.dp
10.9 1 4 6
10.9 2 6 4
10.9 3 5 5
10.9 4 0 10
10.9 5 9 1
10.9 6 5 5
10.9 7 3 7
10.9 8 8 2
10.9 9 7 3
101.2 10 6 4
101.2 11 5 5
103.3 12 4 6
103.3 13 9 1
103.3 14 1 9
105.5 15 5 5
105.5 16 0 10
105.5 17 7 3
icd數(shù)據(jù)包含六列信息,分別為基因ID,SNPid,未經(jīng)處理的替代等位基因數(shù)目(Untreated alternative allele count), 未處理的參考等位基因數(shù)目(Untreated reference allele count), 處理的替代等位基因數(shù)目(Treated alternative allele count), 處理的參考等位基因數(shù)目(Treated reference allele count),示例格式:
$ less icd.test.input.tab
gene snp.id U.alt.dp U.ref.dp T.alt.dp T.ref.dp
1.11 1 8 2 7 3
1.11 2 3 7 4 6
1.11 3 8 2 6 4
1.11 4 5 5 7 3
1.11 5 6 4 1 9
1.11 6 9 1 5 5
1.11 7 4 6 5 5
3.ASE檢驗
ASEReadCounter完成位點的覆蓋度信息計數(shù)統(tǒng)計之后,將結(jié)果中的Chr和位點的位置信息提取出來,整理為下列各式的表格:
$ less LPF1_MP_pos.txt
Mpar_chr1 2001 2001
Mpar_chr1 2015 2015
Mpar_chr1 2034 2034
Mpar_chr1 2037 2037
Mpar_chr1 2206 2206
3.1 查找位點的基因信息
bedtools的使用方法,這篇文章有詳細的介紹:
最全Bedtools使用說明--只看本文就夠了 - 簡書 (jianshu.com)
首先對基因組文件position文件進行排序,注意pos文件和gff文件中的染色體名稱要一致:
$ bedtools sort -chrThenSizeA -i LPF1_MP.pos > LPF1_MP_sort.pos
$ bedtools sort -chrThenSizeA -i Mparg_v2.0.gff3 > Mparg_v2.0_sort.gff3
返回pos文件中,SNP位點在基因組上的位置:
$ bedtools intersect -a LPF1_MP_sort.pos -b Mparg_v2.0_sort.gff3 -wb > LPF1_MP_gene.pos

3.2 在R中添加基因信息
在ASEReadCounter輸出的位點覆蓋度信息計數(shù)文件結(jié)果中,添加上一步得到的基因信息。
ASE (等位基因特異性表達)—— ASEReadCounter - 簡書 (jianshu.com)
# 讀取LPF1_MP_ASE.table和LPF1_MP_gene.pos
> ASE<-read.table("LPF1_MP_ASE.table",header = T)
> gene<-read.table("LPF1_MP_gene.pos")


創(chuàng)建snp_id:合并LPF1_MP_ASE.table中的contig和position兩列,以及CPF1_CE_gene.pos中的V1和V2兩列,創(chuàng)建snp_id。
> ASE <- tidyr::unite(ASE, "snp_id", contig, position,remove = FALSE)
> head(ASE)
snp_id contig position variantID refAllele altAllele refCount altCount totalCount
1 Mpar_chr1_4724 Mpar_chr1 4724 . A C 47 39 86
2 Mpar_chr1_4881 Mpar_chr1 4881 . C G 52 33 85
3 Mpar_chr1_4900 Mpar_chr1 4900 . T C 46 31 77
4 Mpar_chr1_4962 Mpar_chr1 4962 . T C 49 34 83
5 Mpar_chr1_4995 Mpar_chr1 4995 . G T 45 44 89
lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
1 0 0 88 0 2
2 0 0 86 1 0
3 0 0 77 0 0
4 0 0 83 0 0
5 0 0 89 0 0
> gene <- tidyr::unite(gene, "snp_id", V1, V2,remove = FALSE)
> head(gene)
snp_id V1 V2 V3 V4 V5 V6 V7 V8
1 Mpar_chr1_29618717 Mpar_chr1 29618717 29618717 Mpar_chr1 AUGUSTUS intron 29618269 29618971
2 Mpar_chr1_29618717 Mpar_chr1 29618717 29618717 Mpar_chr1 AUGUSTUS gene 29617909 29621312
3 Mpar_chr1_29618717 Mpar_chr1 29618717 29618717 Mpar_chr1 AUGUSTUS transcript 29617909 29621312
4 Mpar_chr1_29511536 Mpar_chr1 29511536 29511536 Mpar_chr1 AUGUSTUS CDS 29511235 29511554
5 Mpar_chr1_29511536 Mpar_chr1 29511536 29511536 Mpar_chr1 AUGUSTUS exon 29511235 29511554
V9 V10 V11 V12
1 1 - . Parent=MP1G214900.1
2 1 - . ID=MP1G214900
3 1 - . ID=MP1G214900.1
4 1 - 0 Parent=MP1G214200.1
5 . - . Parent=MP1G214200.1
提取注釋中所有的CDS,ASE位于CDS區(qū)域更加準確:
> gene<-subset(gene,V6=='CDS')
根據(jù)snp_id進行匹配,并添加基因ID在ASE文件中:
> merga<-merge(ASE,gene, by = "snp_id", all.x = TRUE)
> write.csv(merga,"LPF1_MP_merga.csv",row.names = F)
3.3 準備輸入文件
以static數(shù)據(jù)為例,需要四列信息,LPF1_MP_merga.csv中提?。?/p>
> raw<-read.csv("LPF1_MP_merga.csv")
> GeneiASE_input<-raw[,c(26,1,8,7)]
> head(GeneiASE_input)
V12 snp_id altCount refCount
1 <NA> Mpar_c2518_pilon_116563 1 395
2 <NA> Mpar_c2518_pilon_132171 3 3
3 <NA> Mpar_c2518_pilon_133271 1 1
4 <NA> Mpar_c2518_pilon_153461 2 5
5 <NA> Mpar_c2518_pilon_155680 2 4
去除gene ID缺失的行:
> GeneiASE_input <- na.omit(GeneiASE_input)
> names(GeneiASE_input)[1] <-"gene_id"
> head(GeneiASE_input)
gene_id snp_id altCount refCount
72 Parent=MP1G130700.1 Mpar_chr1_10006428 14 19
73 Parent=MP1G130700.1 Mpar_chr1_10006455 14 17
87 Parent=MP1G130700.1 Mpar_chr1_10006863 27 24
88 Parent=MP1G130700.1 Mpar_chr1_10006921 23 18
89 Parent=MP1G130700.1 Mpar_chr1_10006970 23 18
寫出:
> write.table(GeneiASE_input,"LPF1_MP_GeneiASE_input.tab",quote = FALSE,row.names = FALSE,col.names = T,sep ='\t')
3.4 ASE檢驗
$ cd your/path/geneiase/bin
$ geneiase -t static -i LPF1_MP_GeneiASE_input.tab -b 100
- -b n.bootstrap.samples
The number of bootstrap samples (B) to be used to generate the null distribution. Default: 1e5

結(jié)果文件中包含以下幾列:
- feat: 基因ID
- n.vars: 基因變異的數(shù)量
- mean.s: Mean of s across the variants within the gene
- median.s: Median of s across the variants within the gene
- sd.s: Standard deviation of s across the variants within the gene
- cv.s: Coefficient of variation of s across the variants within the gene
- liptak.s: Stouffer-Liptak combination of s
- p.nom: Nominal p-value
- fdr: Benjamini-Hochberg corrected p-value
3.5 整理ASE檢驗結(jié)果
> p_value<-read.csv("LPF1_MP_GeneiASE_input.tab.static.gene.pval.tab",sep ='\t')
> names(p_value)[1] <-"gene_id"
> names(raw)[26] <-"gene_id"
> result <- merge(p_value,raw, by = "gene_id", all.x = TRUE)
> result <- result[,c(10:12,14:17,1:9)]
> write.csv(result,"LPF1_MP_result.csv",row.names = F)
引用轉(zhuǎn)載請注明出處,如有錯誤敬請指出。