背景
一行命令構(gòu)建系統(tǒng)進化樹。其實這個想法去年的時就開始構(gòu)思了。當時在給師兄師姐幫忙構(gòu)建幾個基因家族的系統(tǒng)進化樹,還是用著以前本科時候就學會的幾個Windows下的建樹軟件MEGA/fasttree/RaxML。起先時一切都算順利,給一個家族的基因序列,倒騰下就能出來結(jié)果,可以幫助到別人也是挺開心。但之后需要我反復的去給他們建樹的時候就突然覺得我這是在做重復性的機器運動,而這構(gòu)建系統(tǒng)進化樹已經(jīng)是一個流程化的操作了。既然如此,何不將其封裝成一個pipeline,直接一鍵化操作呢?
這一年多的時間學Perl學shell學各種生信知識,終于在前些天對perl語言進階之后覺得自己也可以實現(xiàn)一年前的這一個想法了。參考了下陳連福寫perl程序的一些技巧,花了些空余時間終于完成此1.0版本的 phylogeny系統(tǒng)樹的一行命令構(gòu)建pipeline。
程序說明
- 使用環(huán)境:此perl程序需在Linux操作環(huán)境下。需提前安裝三個依賴軟件:MAFFT/Gblocks/IQtree,具體安裝可利用conda進行一鍵化安裝。轉(zhuǎn)錄組學習一(軟件安裝)
- 配置完畢之后即可將以下Perl腳本在服務(wù)器上,利用vim編輯器里插入模式i再F4進入粘貼模式,復制粘貼保存即可。
-
程序使用
perl getTree.pl即可顯示幫助信息,perl getTree.pl 輸入文件.fasta 輸出文件名稱,例如perl getTree.pl AP3.fasta ap3 - 輸入原始多基因的fasta文件,會調(diào)用程序自動進行多序列比對,信息位點的篩選,核酸模型的選擇,ML樹的構(gòu)建結(jié)果會生成以treefile結(jié)尾的樹文件,可進一步利用figtree/Itol/ggtree進行可視化的操作。
tmp結(jié)尾中間文件夾,包括各個程序中間文件及運行記錄。 -
后續(xù)可以學習下如何在此pipeline中添加各個參數(shù)。比如輸入序列類型,線程數(shù)等。
程序基本信息
運行過程
生成結(jié)果文件
#!/usr/bin/env perl
use strict;
use File::Copy;
my $usage = <<USAGE;
Usage:
perl $0 MultiSequences.fasta OutPrefix
For example:
perl $0 AP3.fasta ap3_tree
This Pipeline depends on the following software that can be run directly in terminal:
1. mafft (v7.310)
2. Gblocks (0.91b)
3. iqtree (1.55)
by Wang Tianpeng wangtp@ibcas.ac.cn
Version 1.0
USAGE
if (@ARGV==0){die $usage}
my($fasta,$out_prefix)=@ARGV;
my $pwd0=`pwd`;
chomp $pwd0;
# step0 Detecting the dependency softwares
## mafft
print STDERR "\nDetecting the dependency softwares\n\n";
my $software=`mafft --help 2>&1`;
if($software=~/MAFFT/){
print STDERR "MAFFT:\tOK\n";
}else{
die "Mafft\tfailed\n";
}
## Gblocks
my $sofware1=`Gblocks -a -b >111`;
open IN,"111" or die "$!";
<IN>;my $software_info=<IN>;
if($software_info=~/^\*/){
print STDERR "Gblocks:\tOK\n";
}else{
die "Gblocks\tfailed\n";
}
close IN;system("rm 111");
## iqtree
$software=`iqtree 2>&1`;
if ($software=~/IQ-TREE/){
print STDERR "IQtree:\tOK\n";
}else{
die "iqtree\tfailed\n";
}
# step1 create temporary directory;
#print "\n======================================\n";
mkdir "$out_prefix.tmp" unless -e "$out_prefix.tmp";
chdir "$out_prefix.tmp";
my $pwd1 = `pwd`;chomp($pwd1);# print STDERR "PWD:$pwd";
#open OUT,">","genome.fasta" or die "cant create file genome.fasta,$!";
# step2 MultiSequences Alignment with mafft;
print "\n=====================================================\n";
print "STEP1 MultiSequences Alignment with mafft\n\n";
mkdir "1.Mafft" unless -e "1.Mafft";
unless (-e "1.Mafft.ok"){
chdir "1.Mafft";
my $pwd=`pwd`;print STDERR "PWD:$pwd";
my $command="mafft --auto $pwd0\/$fasta >$out_prefix.aln.fa 2>mafft.log";
print STDERR (localtime).": $command\n";
system($command)==0 or die "can not execute :$command\n";
my $pwd_mafft=`pwd`;chomp($pwd_mafft);
chdir "..";
open OUT,">1.Mafft.ok" or die "$!";close OUT;
}else{
print STDERR "CMD(skipped): mafft \n";
}
my $pwd_mafft="$pwd1\/1.Mafft";#print "$pwd_mafft\n";
# step3 informative alignment points filter with Gblocks
#my $pwd =`pwd`;print "$pwd\n";
print "\n===================================================\n";
print "STEP2 selection of informative blocks with Gblocks\n\n";
mkdir "2.Gblocks" unless -e "2.Gblocks";
unless (-e "2.Gblocks.ok"){
chdir "2.Gblocks";
my $pwd=`pwd`;print STDERR "PWD: $pwd\n";
my $command="Gblocks $pwd_mafft\/$out_prefix.aln.fa -t=d -b4=5 -b5=a >Gblocks.log";
print STDERR (localtime).": $command\n";
system($command) or die "can not execute : $command\n";
my $gb_files="$pwd_mafft"."\/$out_prefix.aln.fa-gb\*"; #my $command_move="mv $test .";print "$command_move\n";
system("mv $gb_files \.")==0 or die "cant move file\n";
chdir "..";
open OUT,">2.Gblocks.ok" or die "$!";close OUT;
}else{
print STDERR "CMD(skipped):mafft \n";
}
my $pwd_gblock="$pwd1\/2.Gblocks";
# step4 Construct phylogeny tree with IQtree
print "\n====================================================\n";
print "STEP3 Construction of phylogeny tree with iqtree\n\n";
mkdir "3.IQtree" unless -e "3.IQtree";
unless (-e "3.IQtree.ok"){
chdir "3.IQtree";
my $pwd_iqtree=`pwd`;print STDERR "PWD: $pwd_iqtree\n";
my $command="iqtree -s $pwd_gblock\/$out_prefix.aln.fa-gb -bb 10000 -pre $out_prefix -nt 6 >iqtree.log";
print STDERR (localtime).": $command\n";
system($command)==0 or die "can not execute: $command\n";
system("cp $out_prefix.treefile ../..")==0 or die "can not copy file\n";
}
print STDERR "ALL commands complete! :-)\n";


