miRNA靶基因預(yù)測

有人問miRNA靶基因如何預(yù)測的問題,鑒于自己也做過相關(guān)的分析,稍有點經(jīng)驗,總結(jié)記錄一下。

一、miranda預(yù)測

  1. 軟件安裝
$ wget http://cbio.mskcc.org/microrna_data/miRanda-aug2010.tar.gz
$ tar -xvf miRanda-aug2010.tar.gz
$ cd miRanda-3.3a/
$ ./configure --prefix=/Lustre01/user/software/miRanda-3.3a/
$ make && make install
$ cd ***/miRanda-3.3a/bin #你自己的軟件路徑
$ ./miranda --help #產(chǎn)看幫助信息和參數(shù)等等,具體如下

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
miranda v3.3a    microRNA Target Scanning Algorithm
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
(c) 2003 Memorial Sloan-Kettering Cancer Center, New York

Authors: Anton Enright, Bino John, Chris Sander and Debora Marks
(mirnatargets (at) cbio.mskcc.org - reaches all authors)

Software written by: Anton Enright
Distributed for anyone to use under the GNU Public License (GPL),
See the files 'COPYING' and 'LICENSE' for details

If you use this software please cite:
Enright AJ, John B, Gaul U, Tuschl T, Sander C and Marks DS;
(2003) Genome Biology; 5(1):R1.

   miranda comes with ABSOLUTELY NO WARRANTY;
   This is free software, and you are welcome to redistribute it
   under certain conditions; type `miranda --license' for details.

miRanda is an miRNA target scanner which aims to predict mRNA
targets for microRNAs using dynamic-programming alignment and
thermodynamics.

Usage:  miranda file1 file2 [options..]

Where:
    'file1' is a FASTA file with a microRNA query
    'file2' is a FASTA file containing the sequence(s)
    to be scanned.

OPTIONS

 --help -h  Display this message
 --version -v   Display version information
 --license  Display license information

Core algorithm parameters:
 -sc S      Set score threshold to S        [DEFAULT: 140.0]
 -en -E     Set energy threshold to -E kcal/mol [DEFAULT: 1.0]
 -scale Z   Set scaling parameter to Z      [DEFAULT: 4.0]
 -strict    Demand strict 5' seed pairing       [DEFAULT: off]

Alignment parameters:
 -go -X     Set gap-open penalty to -X      [DEFAULT: -4.0]
 -ge -Y     Set gap-extend penalty to -Y        [DEFAULT: -9.0]

General Options:
 -out file  Output results to file          [DEFAULT: off]
 -quiet     Output fewer event notifications    [DEFAULT: off]
 -trim T    Trim reference sequences to T nt    [DEFAULT: off]
 -noenergy  Do not perform thermodynamics       [DEFAULT: off]

 -restrict file Restricts scans to those between
                specific miRNAs and UTRs
                provided in a pairwise
                tab-separated file          [DEFAULT: off]

Other Options:
 -keyval    Key value pairs ??          [DEFAULT:]


This software will be further developed under the open source model,
coordinated by Anton Enright and Chris Sander (miranda (at) cbio.mskcc.org).

Please send bug reports to: miranda (at) cbio.mskcc.org.
  1. 測試example
$ cd ***/miRanda-3.3a/examples #你自己的軟件路徑
$ ls #查看路徑下有兩個fastq序列文件
bantam_stRNA.fasta  hid_UTR.fasta
# 我們挨個看一下:[如下,前者是miRNA成熟序列文件(可以是下載的miRNA序列,自己鑒定的miRNA序列等等),后者是待匹配/查找的基因文件(circRNA、lncRNA、PCG等序列文件)。]
$ less -SN bantam_stRNA.fasta
      1 >gi|29565487|emb|AJ550546.1| Drosophila melanogaster microRNA miR-bantam
      2 GTGAGATCATTTTGAAAGCTG
$ less -SN hid_UTR.fasta
      1 >gi|945100|gb|U31226.1|DMU31226 Drosophila melanogaster head involution defective protein (hid) mRNA, complete cds (3'UTR only)
      2 TGACAAAAAATAAAAAACGAAATCCATCGTGAACAGTTTTGTGTTTTTAAATCAGTTCTAAACACGAAAA
      3 GGGTTGATGAAAAACGCAGAAGAATCCGAAAAACTAACTAACCGAGCAAAAACTTGACTTGAGTGTTGTT
      4 TGACAAATCAGGAAAGATAAAAAACAAATCATAAGAAAAAACTGCACGAAAAATGAAAAAGTTTCTAATA
      5 TTCAAAATCTTGCACAAGAAATACAAAATCAATTAAAGTGAACTCTAACCAAAAGTTGTACACAAAATAA
      6 AAAGCAAAACAAAGCAGCGAAGAACAATCACAAGAAGAGCAAAGTGCCAACAAAGTGCAGGAAGGAAGGA
      7 AGCGGATAAGGACAAAAAGGAAGCCAGCACACACACACACACCCACACAATGGCCGTGCCCTTTTATTTG
      8 CCCGAGGGCGGCGCCGATGACGTAGCGTCGAGTTCATCGGGAGCCTCGGGCAACTCCTCCCCCCACAACC
      9 ACCCACTTCCCTCGAGCGCATCCTCGTCCGTCTCCTCCTCGGGCGTGTCCTCGGCCTCCGCCTCCTCGGC
     10 CTCATCTTCGTCATCCGCATCGTCGGACGGCGCCAGCAGCGCCGCCTCGCAATCGCCGAACACCACCACC
     11 TCGTCGGCCACGCAGACGCCGATGCAGTCTCCACTGCCCACCGACCAAGTGCTATACGCCCTCTACGAGT
     12 GGGTCAGGATGTACCAGAGCCAGCAGAGTGCCCCGCAAATCTTCCAGTATCCGCCGCCAAGCCCCTCTTG
     13 CAATTTCACTGGCGGCGATGTGTTCTTTCCGCACGGCCATCCGAATCCGAACTCGAATCCCCATCCGCGC
     14 ACCCCCCGAACCAGCGTGAGCTTCTCCTCCGGCGAGGAGTACAACTTCTTCCGGCAGCAGCAGCCGCAAC
     15 CACATCCGTCATATCCGGCGCCATCAACACCGCAGCCAATGCCACCGCAGTCAGCGCCGCCGATGCACTG
     16 CAGCCACAGCTACCCGCAGCAGTCGGCGCACATGATGCCACACCATTCCGCTCCCTTCGGAATGGGCGGT
     17 ACCTACTACGCCGGCTACACGCCACCACCCACTCCGAACACGGCCAGTGCGGGCACCTCCAGCTCATCGG
     18 CGGCCTTCGGCTGGCACGGCCACCCCCACAGCCCCTTCACGTCGACCTCCACGCCGTTATCGGCGCCAGT
     19 GGCGCCCAAGATGCGCCTGCAGCGCAGCCAGTCGGATGCGGCCAGACGCAAGCGATTGACCTCGACGGGC
     20 GAGGATGAGCGCGAGTACCAGAGCGATCATGAGGCCACTTGGGACGAGTTTGGCGATCGCTACGACAACT
運行測試預(yù)測靶基因代碼(提前添加bin到.bashrc):
$ miranda bantam_stRNA.fasta  hid_UTR.fasta > test.out.txt # 這里只設(shè)置輸入與輸出,其他參數(shù)默認(rèn),或者如下添加其他參數(shù)
$ miranda bantam_stRNA.fasta  hid_UTR.fasta -sc 140 -en -1 > test.out.txt
$ cat test.out.txt | grep ">" | less -SN
      1 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  167.00  -24.54  2 20    3340 3360       18      83.33%  94.44%
      2 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  156.00  -20.03  2 17    2505 2525       15      86.67%  93.33%
      3 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  155.00  -14.57  2 16    2852 2872       14      78.57%  85.71%
      4 >gi|29565487|emb|AJ550546.1|    gi|945100|gb|U31226.1|DMU31226  152.00  -14.18  2 18    3820 3841       17      76.47%  76.47%
      5 >>gi|29565487|emb|AJ550546.1|   gi|945100|gb|U31226.1|DMU31226  630.00  -73.32  167.00  -24.54  1       21      3902     3340 2505 2852 3820
# 結(jié)果解讀:
1)">"開頭的幾行,是miRNA(gi|29565487|emb|AJ550546.1|)靶標(biāo)到基因(gi|945100|gb|U31226.1|DMU31226)上的不同位置
2)">"開頭的行對應(yīng)的信息依次是:miRNA id,基因id,打分,自由能,miRNA起始位置,miRNA終止位置,基因起始位置,基因終止位置,靶標(biāo)結(jié)合miRNA長度,miRNA結(jié)合長度與miRNA總長占比,基因上結(jié)合長度(大于等于前者,可能是一個跨度)對miRNA總長占比
3)">>"的行是綜合信息,依次代表:miRNA id,基因id,總打分,總自由能,最大打分,最大自由能,鏈信息,miRNA長度,基因長度,靶標(biāo)到基因上的位置(1個或多個,這里是4個)
4)綜上,我們可以抓取“>>”的行,來獲取靶向的基因,如果需要其他信息也可以自己按需提取,so easy
  1. 再舉一例(個性化分析)

miRNA鑒定等分析這里就不說了(贅述),(1)我們測miRNA數(shù)據(jù),鑒定并得到差異miRNA(known or novel均可),是否希望看一下他們靶向與那些基因呢?(2)文獻、試驗等等,發(fā)現(xiàn)某些miRNA(可獲取序列或名字的)有著重要的生物學(xué)功能,機理尚不明確,是否也想看看其靶向關(guān)系呢?可以運行miranda。按照(2)我們造幾個miRNA進行舉例:

3.1 下載miRBase成熟序列,提取human的miRNA進行舉例
$ wget ftp://mirbase.org/pub/mirbase/CURRENT/mature.fa.gz
$ gunzip mature.fa.gz
$ less -SN mature.fa # 格式如下
      1 >cel-let-7-5p MIMAT0000001 Caenorhabditis elegans let-7-5p
      2 UGAGGUAGUAGGUUGUAUAGUU
      3 >cel-let-7-3p MIMAT0015091 Caenorhabditis elegans let-7-3p
      4 CUAUGCAAUUUUCUACCUUACC
      5 >cel-lin-4-5p MIMAT0000002 Caenorhabditis elegans lin-4-5p
      6 UCCCUGAGACCUCAAGUGUGA
      7 >cel-lin-4-3p MIMAT0015092 Caenorhabditis elegans lin-4-3p
      8 ACACCUGGGCUCUCCGGGUACC
我通過python編程取出10個human的miRNA,存為hsa_mature10.fa
$ cat hsa_mature10.fa
>hsa-let-7a-5p
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7a-3p
CUAUACAAUCUACUGUCUUUC
>hsa-let-7a-2-3p
CUGUACAGCCUCCUAGCUUUCC
>hsa-let-7b-5p
UGAGGUAGUAGGUUGUGUGGUU
>hsa-let-7b-3p
CUAUACAACCUACUGCCUUCCC
>hsa-let-7c-5p
UGAGGUAGUAGGUUGUAUGGUU
>hsa-let-7c-3p
CUGUACAACCUUCUAGCUUUCC
>hsa-let-7d-5p
AGAGGUAGUAGGUUGCAUAGUU
>hsa-let-7d-3p
CUAUACGACCUGCUGCCUUUCU
>hsa-let-7e-5p
UGAGGUAGGAGGUUGUAUAGUU
接著準(zhǔn)備靶向的基因集合,我是通過gtf位置提取了human基因組所有的轉(zhuǎn)錄本,取了其中17個來測試(17.fa),當(dāng)然你也可以去ensembl下載CDS序列,其他方法得到的基因序列均可。
$ grep ">" 17.fa
>ENST00000000233.9  +
>ENST00000000412.7  -
>ENST00000000442.10 +
>ENST00000001008.5  +
>ENST00000001146.6  -
>ENST00000002125.8  +
>ENST00000002165.10 -
>ENST00000002501.10 -
>ENST00000002596.5  -
>ENST00000002829.7  +
>ENST00000003084.10 +
>ENST00000003100.12 -
>ENST00000003302.8  -
>ENST00000003583.12 -
>ENST00000003912.7  +
>ENST00000004103.7  +
>ENST00000004531.14 +
然后預(yù)測二者的靶標(biāo)關(guān)系(通過管道輸出):
$ miranda hsa_mature10.fa 17.fa -sc 140 -en -1 | grep ">>" > hsa_mature10_targets.txt
我們大概看看幾行結(jié)果:
$ less -SN hsa_mature10_targets.txt
      1 >>hsa-let-7a-5p ENST00000000412.7       146.00  -11.97  146.00  -11.97  2       22      2756     960
      2 >>hsa-let-7a-5p ENST00000001008.5       143.00  -17.75  143.00  -17.75  4       22      3732     2611
      3 >>hsa-let-7a-5p ENST00000002125.8       156.00  -21.39  156.00  -21.39  6       22      2176     1248
      4 >>hsa-let-7a-5p ENST00000002829.7       146.00  -16.38  146.00  -16.38  10      22      3802     3480
      5 >>hsa-let-7a-5p ENST00000003084.10      299.00  -39.65  152.00  -20.76  11      22      6132     3077 5087
      6 >>hsa-let-7a-5p ENST00000003100.12      143.00  -18.40  143.00  -18.40  12      22      3210     2586
      7 >>hsa-let-7a-5p ENST00000003302.8       438.00  -47.15  152.00  -16.09  13      22      4669     1132 2011 4340
      8 >>hsa-let-7a-5p ENST00000004531.14      163.00  -20.40  163.00  -20.40  17      22      7560     1101
      9 >>hsa-let-7a-3p ENST00000002125.8       161.00  -12.52  161.00  -12.52  23      21      2176     1476
     10 >>hsa-let-7a-3p ENST00000002165.10      158.00  -14.96  158.00  -14.96  24      21      2356     1993
     11 >>hsa-let-7a-3p ENST00000003084.10      148.00  -6.47   148.00  -6.47   28      21      6132     5758
     12 >>hsa-let-7a-3p ENST00000003302.8       149.00  -16.95  149.00  -16.95  30      21      4669     3959
     13 >>hsa-let-7a-3p ENST00000003912.7       146.00  -7.39   146.00  -7.39   32      21      5481     3640
     14 >>hsa-let-7a-3p ENST00000004531.14      292.00  -16.76  146.00  -13.42  34      21      7560     3954 6657

tips:我們看到結(jié)果的格式如上面例子所描述的一致,此外,也看到同一miRNA靶向與不同的轉(zhuǎn)錄本(基因)上面,關(guān)于怎么去解釋結(jié)果和行文、試驗等等,就要看你的閱歷和水平了。可以提取所有結(jié)果進行網(wǎng)絡(luò)圖的構(gòu)建、可以提取最好(根據(jù)打分和自由能)的基因進行驗證(相關(guān)性計算、定量、濕試驗等等)、單純查找資料討論。【轉(zhuǎn)錄本與基因的對應(yīng)關(guān)系需要自行轉(zhuǎn)換】。下面也附上miRBase下載成熟序列的截圖,也很簡單?!颈窘坛淘偬崛⌒蛄袝r需要一定的編程基礎(chǔ),如果你只有兩者的序列,也是可以進行靶向預(yù)測的】


http://www.mirbase.org

download

二、

三、

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容