線粒體組裝與分析

2 個(gè)rRNA(26s)算幾個(gè),tRNA也有復(fù)制,復(fù)制的是算幾個(gè),33個(gè)orf,32個(gè)cds 1個(gè)假基因.
gb文件需要重新翻譯序列
投稿沒有格式要求,參考文獻(xiàn)少,參考文獻(xiàn)格式,圖重新畫

Complete mitochondrial genome of Malus domestica (GeneBank accession: NC_018554) is used as reference.
Yanfu 8 is the breeding of Yanfu 3 buds. Compared with other Fuji varieties, Yanfu 8 has the obvious advantage of fast coloring. In the process of anthocyanin synthesis, the light utilization efficiency is higher, and no reflective film is used. It can also be filled with full color. After the reflective film is placed, it can be filled in full color quickly. Due to its high utilization of light, the inner capsule and the fruit can also be filled with full color. Picking leaves and transferring fruit, the fruit is more colored. If the flower and fruit management and water and fertilizer supply can be strengthened, the fruit traits have great advantages.

Geseq注釋 可能不能識(shí)別被內(nèi)含子分割的基因。

有內(nèi)含子區(qū)域分割基因,若未注釋到某一外顯子,可將參考基因組cds區(qū)域兩段序列與樣品測序兩段序列對(duì)比。 samtools faidx 構(gòu)建索引并提取CDS區(qū)域序列。
蛋白編碼的氨基酸序列,如果基因核酸序列組成和參考基因組一致,這種情況下直接復(fù)制參考基因組該基因編碼的氨基酸序列就可以了
1 參考線粒體基因組
NCBI Reference Sequence: NC_018554.1

  1. geneious 打開注釋后的文件


    nad5cds調(diào)整

只注釋到互補(bǔ)鏈的第一段基因

  1. 找不到nad5的下載通道,直接比對(duì)2個(gè)gb文件
    按照參考的注釋文件cds順序查看
    參考nad5信息
    gene:join(complement(71571..73864),121892..121913,41783..43256)
    /gene="nad5"
    CDS:join(complement(73635..73864),complement(71571..72786),
    121892..121913,41783..42177,43107..43256)

樣本nad5信息
gene:join(complement(71573..73866),121894..121915,41785..43258)
/gene="nad5"
CDS: join(complement(73637..73866),complement(71573..72788),
121894..121915,41785..42179,43109..43258)

注釋結(jié)果中有帶有-fragment CDS的注釋項(xiàng)目,不統(tǒng)計(jì)fragment注釋項(xiàng)目,樣本共注釋到33個(gè)基因。與參考的注釋基因數(shù)相同。

以cox1 CDS 為例,
參考注釋文件中
gene:complement(10301..11788)
/gene="cox1"
CDS:complement(10301..11788)
/gene="cox1"

看一下樣品中cox1-fragment的堿基序列

samtools faidx YanFu8Mito.fasta YanFu8Mito:132694-132875
>YanFu8Mito:132694-132875
ATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGATCAAATT
CTTGGTGGTAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATC
TTTTTTATGGTTATGCCGGCGATGATAGGCGGATCTGGTAATTGGTCTGTTCCGATTCTG
AT

查看參考注釋文件中cox1的堿基序列,可能沒有關(guān)系不確定
4 注釋結(jié)果有-fragment的項(xiàng)目不知道怎么改,是不是要?jiǎng)h除?
可以刪除fragment 項(xiàng)目,先更正需要修改的CDS之后刪除fragment項(xiàng)目
4.1 更改nad1CDS

參考nad1信息
gene            join(104195..104597,complement(260363..262083),
                     complement(236949..237007),complement(233268..233526))
                     /gene="nad1"
CDS             join(104195..104597,complement(262001..262083),
                     complement(260363..260554),complement(236949..237007),
                     complement(233268..233526))
                     /gene="nad1"
樣品nad1信息
gene            join(104197..104599,complement(260366..262086),  
                    complement(236952..237010),complement(233271..233529))
262086寫成了260086,后邊的數(shù)值比前邊的小,出現(xiàn)了整條序列都是這個(gè)基因的問題。啊啊啊
CDS             join(104197..104599,complement(262004..262086),
                    complement(260366..260557),complement(236952..237010),
                    complement(233271..233529))

4.2 更改nad2 CDS

參考nad2信息 mixed   5 
gene             join(165538..165690,166888..167279,
                     complement(245312..245472),complement(242893..243465),
                     complement(241341..241528))

CDS             join(165538..165690,166888..167279,
                     complement(245312..245472),complement(242893..243465),
                     complement(241341..241528))
                     /gene="nad2"

樣品nd2信息
gene             join(165540..165592,166890..167281,
                     complement(245315..245475),complement(242896..243468),
                     complement(241344..241531))
CDS             join(165540..165592,166890..167281,
                    complement(245315..245475),complement(242896..243468),
                    complement(241344..241531))

4.3
Sec-independent protein translocase protein CDS 參考
C3258 p30 樣品CDS對(duì)應(yīng)的名稱

4.4 對(duì)gb文件查找CDS中fragment的項(xiàng)目

在notepad++中使用正則表達(dá)式
.*[^\/]gene(.*\r\n){3,7}.*(CDS)(.*\r\n){1,4}.*fragment"

共有9次匹配,刪除CDS_fragment項(xiàng)目,統(tǒng)計(jì)修改后的CDS數(shù)量及氨基酸個(gè)數(shù),參考中有33個(gè)CDS,樣品修改后CDS有32個(gè)

4.5 查看參看與樣品CDS數(shù)量不一的原因

將樣品中CDS名稱與參考CDS名稱放在一列,
cat yanfu-gdcdsname.txt |sort|uniq -c  
nad2CDS只出現(xiàn)一次,查看gb文件發(fā)現(xiàn),()括號(hào)寫成了中文下的()
比對(duì) 參考與樣品的cds長度發(fā)現(xiàn),樣品nad2 與cox1 cds長度與參考不同 

4.6 將參考gb文件中cds與樣品gb文件cds比對(duì),發(fā)現(xiàn)。修改后的樣品cds中nad2cds 長度不是3的倍數(shù),重新修改

參考nad2信息

gene join(165538..165690,166888..167279,
                     complement(245312..245472),complement(242893..243465),
                     complement(241341..241528))
                     /gene="nad2"
CDS join(165538..165690,166888..167279,
                     complement(245312..245472),complement(242893..243465),
                     complement(241341..241528))

修改后的樣品na2信息

gene join(165538..165690,166888..167279,
    complement(245313..245473),complement(242894..243466),
                     complement(241342..241529))
                     /gene="nad2"
CDS join(165538..165690,166888..167279,
    complement(245313..245473),complement(242894..243466),
                     complement(241342..241529))
                     /gene="nad2" 


4.7 提取樣品gb文件cds翻譯序列。

發(fā)現(xiàn)YanFu8Mito - C3258 p30 CDS translation 的起始密碼子 是I

4.8 查看樣品cds 翻譯后的氨基酸是否有終止密碼子

$ grep -n "*" /home/Pomgroup/gdp/mito/regenome/yanfu8cdstrans.csv
17:YanFu8Mito - nad1 CDS translation,nad1 CDS,VNRKSKTYIAVPAEILGIILPLLLGVAFLVLAERKVMAFVQRRKGPDVVGSFGLLQPLADGLKLILKEPISPSSANFSLFRMAPVTTFMLSLVARAVVPFDYGMVLSDSNIGLLYLFAISSLGVYGIIIAGWSS......IRNMPF*EHYDLQLKWSLMKSLLVLFLILY*YV*VPVIRVRLSWRKSRYGPVFPCSLYWLCSSFLV*QKLIELRLISQKRKLNQLQAIM*NRLQWGLLFFFWESMPI*S*......LVHAHRSLQEVGRLS*IFPFSRRSPALSGLVSR*FCFRSYIYGSVQHFHDIVMIN*WDLAGKCSCLYH*LG*SPFLVFQSPFNGSL,1,332,>330,5
顯示只有第第17行CDS中除末尾氨基酸中間有終止密碼子,且末尾不為終止密碼子,起始密碼子還是V
查看參考nad1 cds 氨基酸起始密碼子 也為V
與參考對(duì)比修改nad1

4.9 修改后 nad1 序列數(shù)量為3整數(shù),除了cox1比參考少195個(gè)堿基,其他cds與參考cds長度相同。查看樣品中cds氨基酸末尾是否為終止密碼子
查看..., 共有33次匹配,即所有的cds末尾都為終止密碼子,且*無匹配即除末尾終止密碼子,中間無終止密碼子。
關(guān)于樣品cox1 cds 比參考cds序列短,是樣品中出現(xiàn)SNP,有堿基插入導(dǎo)致出現(xiàn)5端出現(xiàn)終止密碼子,不能從原先5端開始翻譯,需要從新出現(xiàn)的5端重新開始尋找起始密碼子并翻譯。
4.10 查看cds 的起始密碼子

$ cat yanfu8cdstrans.csv | awk 'BEGIN{FS=","}{print$2}' |awk -F "" '{print$1}' |sort | uniq -c
      1 I
     31 M
      1 V
   

5.0 構(gòu)建進(jìn)化樹
5.1 使用HomBlocks pipeline進(jìn)行序列比對(duì)

]# perl HomBlocks.pl --align --path=/root/yanfu8mito/HomBlocks-master/mitogenome/ -out_seq=yanfu8.output.fasta --mauve-out=yanfu8.mauve.out
Totla 10 files detected!
報(bào)錯(cuò)
Exception FileNotOpened thrown from Unknown() in .. \gnFileSource.cpp 67 Called by Unknown()"
Can't open yanfu8.mauve.out:No such file or directory

ccccc,折騰了一段時(shí)間,序列文件名稱中不能有空格,或者是逗號(hào)。。。啊啊啊啊啊。
更改文件名稱重新運(yùn)行顯示結(jié)果
The final concatenated sequences was writen in yanfu8_aligned.fasta

The location of each extracted modules on the final concatenated seq:

module_11 = 1-960;
module_12 = 961-990;
module_13 = 991-1810;
module_14 = 1811-2230;
module_15 = 2231-3267;
module_16 = 3268-3413;
module_17 = 3414-3533;
module_18 = 3534-3833;
module_19 = 3834-6885;
module_2 = 6886-7319;
module_20 = 7320-7845;
module_21 = 7846-8324;
module_22 = 8325-8444;
module_23 = 8445-9211;
module_25 = 9212-9988;
module_26 = 9989-11513;
module_27 = 11514-13242;
module_28 = 13243-15207;
module_29 = 15208-15857;
module_3 = 15858-17761;
module_30 = 17762-18599;
module_31 = 18600-21810;
module_32 = 21811-22466;
module_33 = 22467-23502;
module_34 = 23503-24693;
module_35 = 24694-25413;
module_36 = 25414-27114;
module_38 = 27115-28864;
module_4 = 28865-29630;
module_5 = 29631-32665;
module_6 = 32666-34203;
module_7 = 34204-34274;
module_8 = 34275-37194;
module_9 = 37195-37527;

The concatenated length is 37527 bp

HomBlocks DATA PREPRATION COMPLETED! ENJOY IT!!

5.2 使用IQ-tree構(gòu)建進(jìn)化樹
5.3重新下載序列并比對(duì)。
一個(gè)個(gè)下載還是慢

# perl HomBlocks.pl --align --path=/root/app/HomBlocks-master/mitogenome/ -out_seq=/root/yanfu8mito/mitogenome/yanfu8_aligned.fasta --mauve-out=/root/yanfu8mito/mitogenome/yanfu8_mauve.out

5.4 使用 IQ-tree構(gòu)建進(jìn)化樹

用到-out_seq=參數(shù) 生成yanfu8_aligned.fasta文件 

# /root/app/iq_tree/iqtree-1.6.12-Linux/bin/iqtree -s yanfu8_aligned.fasta  -o NC_045136_Manihot_esculenta -bb 1000

5.5 使用figtree修改treefile文件

因?yàn)樾薷牧诵蛄忻Q空格用_代替了,需要把序列名稱修改為正確格式,
把_換成空格不行,因?yàn)?,accession number含有_。所有_前不能為NC不能為
防止修改出錯(cuò)先復(fù)制一下文件。
[^(NC)]\_[^\d]    _ 前不能是NC,后不能是數(shù)字,測試不行,因?yàn)檫@樣相當(dāng)于匹配了3個(gè)字符??赡苄枰昧銓挾葦嘌?(?<!(NC))\_(?!\d) 單獨(dú)匹配_ ,且其前不能是NC,后不能是數(shù)字

6 統(tǒng)計(jì)序列信息。
無奈不知道有什么軟件可以計(jì)算GC含量,只能在notepad++中搜索ATCG并計(jì)數(shù)了,但是與geneious上顯示的有差別,原來是是沒有匹配大小寫,序列名稱中有AGCT字母,匹配后還是不行??紤]是不是gb文件中的序列與組裝出來的序列長度不完全一致。計(jì)算gb文件中序列的長度。發(fā)現(xiàn)兩者一樣。
搜一下什么軟件可以計(jì)算GC含量。

6.1 把序列文件ATCG刪除

>YanFu8Mito
SSMKRMYMYSKSKMMKYYWWMSKYRYYRYRWR*YKSYRK

>NC_018554.1 Malus x domestica mitochondrion, complete genome
YMKMMYKRKKYYWRYRRRRKMRRRYYYYRYKYYRRYRSRRRSYKRYRRKKYKMMWWWRRMKKMSMSRKKKMRRYMYYYKKWYRKYSYRYYRKRYS

$ /home/Pomgroup/gdp/app/Seqkit/seqkit fx2tab -i -g -n -l gdmitogenome.fasta
NC_018554.1                     396947  45.39
]$ /home/Pomgroup/gdp/app/Seqkit/seqkit fx2tab -i -g -n -l   yanfu8singleline.fasta
Seq                     396948  45.40


6.2 從 注釋gb文件里提取組裝序列,與參考長度一致

6.3 序列種有簡并堿基,用最新發(fā)表的Eriobotrya japonica 的線粒體做參考
-Assembly 1 finished: Contigs are automatically merged in Merged_contigs file
沒有組裝出環(huán)形吧應(yīng)該

  1. tree文件打不來,重新比對(duì)下,還是要按操作說明來,給予其權(quán)限
    chmod 755 *
    r:4 讀

w:2 寫

x:1 執(zhí)行 (運(yùn)行)
HomBlocks的中文說明

]$ perl /home/Pomgroup/gdp/app/HomBlocks/HomBlocks/HomBlocks.pl --align --path=/home/Pomgroup/gdp/mito/tree/allgenome/ -out_seq=/home/Pomgroup/gdp/mito/tree/out/yanfu8.fasta --mauve-out=/home/Pomgroup/gdp/mito/tree/out/yanfu8_mauve.out
結(jié)果文件為空
試了好幾次不行,查看--mauve-out= 參數(shù)成成的文件,有一個(gè)序列文件中有空格,不知道是不是因?yàn)檫@個(gè)原因,因?yàn)椴皇强崭袂闆r下,文件顯示絕對(duì)路徑,而有空格可能代表這個(gè)序列文件找不到,改一下序列名字重新試一下。
#Sequence9File  /home/Pomgroup/gdp/mito/tree/allgenome/NC_045228_Eriobotrya
#Sequence9Format    FastA
#Sequence10File _japonica.fasta
#Sequence10Format   FastA
重新試一下

Original alignment: 1688 positions
Gblocks alignment:  361 positions (21 %) in 9 selected block(s)


Sequence not in NBRF/PIR or FASTA format

Execution terminated
Only 44 blocks have conserved sequences.


The final concatenated sequences was writen in /home/Pomgroup/gdp/mito/tree/out/yanfu8_align.fasta

The location of each extracted modules on the final concatenated seq:

module_10 = 1-381;
module_11 = 382-1173;
module_12 = 1174-1850;
module_13 = 1851-2880;
module_14 = 2881-2929;
module_15 = 2930-3769;
module_16 = 3770-4232;
module_17 = 4233-4639;
module_18 = 4640-4899;
module_19 = 4900-8553;
module_2 = 8554-9001;
module_20 = 9002-9516;
module_21 = 9517-9871;
module_22 = 9872-10031;
module_23 = 10032-15235;
module_24 = 15236-16294;
module_25 = 16295-16949;
module_27 = 16950-17745;
module_28 = 17746-19272;
module_29 = 19273-21994;
module_3 = 21995-23974;
module_30 = 23975-25972;
module_32 = 25973-27268;
module_33 = 27269-27852;
module_34 = 27853-28765;
module_35 = 28766-29025;
module_36 = 29026-29201;
module_37 = 29202-31699;
module_38 = 31700-31715;
module_39 = 31716-31848;
module_4 = 31849-32628;
module_40 = 32629-32667;
module_41 = 32668-33836;
module_42 = 33837-35036;
module_43 = 35037-35843;
module_44 = 35844-36060;
module_45 = 36061-36300;
module_46 = 36301-37679;
module_49 = 37680-39263;
module_5 = 39264-42304;
module_6 = 42305-43814;
module_7 = 43815-43931;
module_8 = 43932-46833;
module_9 = 46834-47193;

The concatenated length is 47193 bp

想哭

8.計(jì)算agct的含量

Total count, all bases: 
396909
Adenine (A) count:   27.306%
108381
Thymine (T) count:  27.287%
108304
Guanine (G) count:  22.538%
89454
Cytosine (C) count: 22.869%
90770
%G~C content:   
45.4

  1. 畫圖
getwd()
library(ggtree)
library(ggplot2)
a <- read.newick("yanfu8_align.fasta.treefile",node.label = "support")
ggtree(a,branch.length = "none")+
  geom_tiplab(size=5,offset = 0.2)+
  geom_text2(aes(label=support),size=4,
             hjust=1.2,vjust=-1)+
  xlim(0,10)
  
a@phylo[3]

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容