Orthomcl 的安裝教程請(qǐng)參考?OrthoMCL 安裝配置與使用-Bluesky's blog

這里僅介紹具體的使用流程。

1. 創(chuàng)建數(shù)據(jù)庫用戶

CREATE DATABASE orthomcl; #建數(shù)據(jù)庫 GRANT SELECT,INSERT,UPDATE,DELETE,CREATE VIEW,CREATE, INDEX, DROP on orthomcl.* TO orthomcl@localhost; #授權(quán) set password for orthomcl@localhost = password('123456');

這里的用戶名和密碼要在配置文件orthomcl.config.template中使用

2. orthomclInstallSchema 安裝orthomcl數(shù)據(jù)庫的表

在解壓好的目錄中（其實(shí)這個(gè)目錄可在隨意位置）建立my_orthomcl_dir目錄，并將orthomcl.config.template復(fù)制到該目錄，下面是如何設(shè)置該文件：

dbVendor=mysql? #使用的數(shù)據(jù)庫為mysql

dbConnectString=dbi:mysql:orthomcl? #使用mysql數(shù)據(jù)庫中名為orthomcl的數(shù)據(jù)庫

dbLogin=orthomcl? ? #數(shù)據(jù)庫的用戶名

dbPassword=123456? #相應(yīng)的密碼

similarSequencesTable=SimilarSequences #

orthologTable=Ortholog

inParalogTable=InParalog

coOrthologTable=CoOrtholog

interTaxonMatchView=InterTaxonMatch

percentMatchCutoff=50

evalueExponentCutoff=-5

oracleIndexTblSpc=NONE

注意：每次開始之前需要?jiǎng)h除上次產(chǎn)生的數(shù)據(jù)表，并創(chuàng)建一個(gè)空表。

mysql -u orthomcl -p?

123456

show databases;

drop database orthomcl;? 如果是一次運(yùn)行，這個(gè)表是沒有的。

create database orthomcl;

安裝orthomcl需要的表

orthomclInstallSchema orthomcl.config.template install_schema.log

3.?orthomclAdjustFasta 創(chuàng)建orthomcl的輸入文件

注意：

1. 每個(gè)亞組的基因蛋白需要放置于一個(gè)文件。

2. 建立目錄my_orthomcl_dir/complianFasta/,并切換到該目錄；對(duì)每個(gè)蛋白序列的fa文件運(yùn)行一次orthomclAdjustFast。這里主要是為了生成標(biāo)準(zhǔn)的序列ID。

orthomclAdjustFasta taxon_code fasta_file id_field

例如 orthomclAdjustFasta hum human.fa 4

taxon_code：3到4個(gè)字符組成，物種名稱的縮寫；?

fasta_file：輸入的蛋白序列文件；

id_field：整數(shù)；確定物種名稱中，哪個(gè)區(qū)域段的字符作為輸出的protein的? ? ?id（unique_protein_id）；區(qū)域段由“空格”或“|”分開。例如： ID (AP_000668.1)在第4區(qū)域段，>gi|89106888|ref|AP_000668.1|

4. orthomclFilterFasta 過濾文件

(需要進(jìn)入到compliantFasta的上層目錄：運(yùn)行：cd ../ #進(jìn)入my_orthomcl_dir) ，這一步將會(huì)對(duì)你剛才改寫的蛋白文件進(jìn)行過濾，去除長(zhǎng)度小于XX（自己設(shè)定min_length），stop coden（自己設(shè)定max_percent_stops）所占百分比的序列

運(yùn)行要求：1）compliantFasta目錄里，每個(gè)物種的所有的蛋白序列包含在一個(gè)fasta文件中；

? ? ? ? ? ? ? ? ? 2）每個(gè)fasta文件的名稱格式必須為xxxx.fasta，其中xxxx是unique的物種名稱，3-4個(gè)letter

? ? ? ? ? ? ? ? ? 3）每個(gè)序列的名稱格式統(tǒng)一成>xxxx|yyyyyyyy；其中xxxx為3-4個(gè)letter的unique物種名，yyyyyyy為unique的ID。

Usage:

orthomclFilterFasta input_dir min_length max_percent_stops [good_proteins_file poor_proteins_file] #

例如 orthomclFilterFasta compliantFasta/ 10 20

5. blast 對(duì)上一步得到的goodProteins.fasta序列進(jìn)行BLAST

這一步的主要目的是生成all_vs_all的blastp結(jié)果，這里介紹兩張方法

1. blast

formatdb -i goodProteins.fasta

blastall -p blastp -i goodProteins.fasta -d goodProteins.fasta -m 8 -F F -b 1000 -v 1000 -a 2 -o all_vs_all_blastp.tab

也可以用blast+，最后的結(jié)果是一樣的。

goodProteins.fasta 為上步生成的結(jié)果文件。

2. diamond

diamond makedb --in goodProteins.fasta -d goodProteins

diamond blastp -d goodProteins -q goodProteins.fasta -o all_vs_all_blastp.tab

diamond的速度快很多，如果物種和基因數(shù)量多，推薦用diamond。

6. orthomclBlastParser 將上一步得到的blast比對(duì)結(jié)果進(jìn)行解析，以用于導(dǎo)入到orthomcl數(shù)據(jù)庫中

Usage:

orthomclBlastParser my_blast_results compliantFasta/ > similarSequences.txt

#例如 orthomclBlastParser all_vs_all_blastp.tab compliantFasta/ > similarSequences.txt

7. orthomclLoadBlast 將similarSequences.txt載入到數(shù)據(jù)庫中

Usage:

orthomclLoadBlast orthomcl.config.template similarSequences.txt

#l例如 orthomclLoadBlast orthomcl.config.template similarSequences.txt

如果出現(xiàn)報(bào)錯(cuò)? DBD::mysql::st execute failed: Table 'orthomcl.SimilarSequences' doesn't exist

執(zhí)行 orthomclInstallSchema orthomcl.config.template 后重試

8. orthomclPairs 將在database中SimilarSequences表中的數(shù)據(jù)，進(jìn)行pairs的運(yùn)算

產(chǎn)生三個(gè)表格存在于 mysql

– PotentialOrthologs table

– PotentialInParalogs table

– PotentialCoOrthologs table

Usage:

orthomclPairs orthomcl.config orthomcl_pairs.log cleanup=no

#例如 orthomclPairs orthomcl.config.template orthomcl_pairs.log cleanup=no

注意：執(zhí)行這一步時(shí)需要?jiǎng)h除之前產(chǎn)生的pairs目錄，否則會(huì)報(bào)錯(cuò)。

9. orthomclDumpPairsFiles 將數(shù)據(jù)從數(shù)據(jù)庫中導(dǎo)出

生成mclInput文件和另外一個(gè)文件夾pairs，在這個(gè)pairs中，包含著這些蛋白之間的關(guān)系

Usage:

orthomclDumpPairsFiles orthomcl.config.template

#比如 orthomclDumpPairsFiles orthomcl.config.template

10. mcl 這一步開始對(duì)上一步給出的輸出文件，進(jìn)行mcl操作，開始聚類, 輸出文件為mclOutput文件

Usage：

mcl my_orthomcl_dir/mclInput --abc -I 1.5 -o my_orthomcl_dir/mclOutput

#比如 mcl mclInput --abc -I 1.5 -o mclOutput

-I（大寫的i）參數(shù)可調(diào)，控制最后聚類的細(xì)度

輸出結(jié)果文件為mclOutPut，寫腳本過濾結(jié)果即可得到單拷貝基因。

注意：mclOutPut中的蛋白ID并不都是按照物種縮寫排列，個(gè)別蛋白的順序是亂的。

這里提供一個(gè)過濾的perl腳本用于參考：

#!/usr/bin/perl

use warnings;

use strict;

while(<>){

? ? chomp;

? ? my @mem=split;

? ? next unless @mem == N; ##N 是你亞組的數(shù)量，例如 6

? ? my %hash;

? ??foreach(@mem){ /^(\w+)\|/; $hash{$1}++; }

? ??my @keys = keys %hash;

? ??if(@keys == N){ my @ar = sort @mem; print "@ar\n"; } ##N 同上

｝

Done!

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

利用orthomcl計(jì)算直系同源基因

利用orthomcl計(jì)算直系同源基因

1. 創(chuàng)建數(shù)據(jù)庫用戶

2. orthomclInstallSchema 安裝orthomcl數(shù)據(jù)庫的表

3.?orthomclAdjustFasta 創(chuàng)建orthomcl的輸入文件

4. orthomclFilterFasta 過濾文件

5. blast 對(duì)上一步得到的goodProteins.fasta序列進(jìn)行BLAST

1. blast

2. diamond

6. orthomclBlastParser 將上一步得到的blast比對(duì)結(jié)果進(jìn)行解析，以用于導(dǎo)入到orthomcl數(shù)據(jù)庫中

7. orthomclLoadBlast 將similarSequences.txt載入到數(shù)據(jù)庫中

8. orthomclPairs 將在database中SimilarSequences表中的數(shù)據(jù)，進(jìn)行pairs的運(yùn)算

9. orthomclDumpPairsFiles 將數(shù)據(jù)從數(shù)據(jù)庫中導(dǎo)出

10. mcl 這一步開始對(duì)上一步給出的輸出文件，進(jìn)行mcl操作，開始聚類, 輸出文件為mclOutput文件

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

利用orthomcl計(jì)算直系同源基因

1. 創(chuàng)建數(shù)據(jù)庫用戶

2. orthomclInstallSchema 安裝orthomcl數(shù)據(jù)庫的表

3.?orthomclAdjustFasta 創(chuàng)建orthomcl的輸入文件

4. orthomclFilterFasta 過濾文件

5. blast 對(duì)上一步得到的goodProteins.fasta序列進(jìn)行BLAST

1. blast

2. diamond

6. orthomclBlastParser 將上一步得到的blast比對(duì)結(jié)果進(jìn)行解析，以用于導(dǎo)入到orthomcl數(shù)據(jù)庫中

7. orthomclLoadBlast 將similarSequences.txt載入到數(shù)據(jù)庫中

8. orthomclPairs 將在database中SimilarSequences表中的數(shù)據(jù)，進(jìn)行pairs的運(yùn)算

9. orthomclDumpPairsFiles 將數(shù)據(jù)從數(shù)據(jù)庫中導(dǎo)出

10. mcl 這一步開始對(duì)上一步給出的輸出文件，進(jìn)行mcl操作，開始聚類, 輸出文件為mclOutput文件

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

6. orthomclBlastParser 將上一步得到的blast比對(duì)結(jié)果進(jìn)行解析，以用于導(dǎo)入到orthomcl數(shù)據(jù)庫中

8. orthomclPairs 將在database中SimilarSequences表中的數(shù)據(jù)，進(jìn)行pairs的運(yùn)算

10. mcl 這一步開始對(duì)上一步給出的輸出文件，進(jìn)行mcl操作，開始聚類, 輸出文件為mclOutput文件