精品人妻在线视频,亚洲91蜜桃精品宗合

本文主要工作

使用meme鑒定了SBT家族的蛋白質(zhì)模體組成

對(duì)meme鑒定結(jié)果進(jìn)行處理并用ggplot2進(jìn)行可視化

4.蛋白質(zhì)與基因結(jié)構(gòu)可視化分析

4.1 蛋白質(zhì)模體預(yù)測(cè)

Protein Motif這個(gè)概念比較混亂，需要在這里特別說(shuō)明。在生物化學(xué)中，一個(gè)比較清晰的英文定義是這樣給出的：

”P(pán)rotein motifs are small regions of protein three-dimensional structure or amino acid sequence shared among different proteins. They are recognizable regions of protein structure that may (or may not) be defined by a unique chemical or biological function.” (https://bio.libretexts.org/Bookshelves/Cell_and_Molecular_Biology/Book%3A_Basic_Cell_and_Molecular_Biology_(Bergtrom)/03%3A_Details_of_Protein_Structure/3.06%3A_Protein_Domains_Motifs_and_Folds_in_Protein_Structure))

翻譯成中文即為 “蛋白質(zhì)模體是不同蛋白質(zhì)中共有的、小區(qū)域的蛋白質(zhì)三維結(jié)構(gòu)或是氨基酸序列。它們是蛋白質(zhì)結(jié)構(gòu)中可被識(shí)別的區(qū)域，可能(或可能不)具備生物學(xué)功能?！币虼?，我們可以大致理解為結(jié)構(gòu)特別、可能具備一定功能的小的氨基酸序列，這個(gè)適用范圍是所有的蛋白質(zhì)的。而在基因家族分析中，它所適應(yīng)的范圍是我們所鑒定的基因家族內(nèi)部的所有蛋白質(zhì)，需要得到的模體短而保守，并可能具備一定的功能，從而借此呈現(xiàn)不同蛋白質(zhì)間的進(jìn)化關(guān)系。

在明細(xì)概念后，我們就可以進(jìn)行分析了。這里使用meme這款軟件在使用conda安裝后進(jìn)行本地化分析，當(dāng)然它也有網(wǎng)頁(yè)版的，但是靈活度上講我個(gè)人還是推薦本地分析。在這里的腳本中，我們使用之前鑒定得到的SBT家族蛋白質(zhì)序列。但是同樣為了美觀，我們要把蛋白質(zhì)序列中的序列號(hào)“.1”刪去。隨后使用meme鑒定motif。輸出的結(jié)果存放地址可以通過(guò)參數(shù)指定。此外，我在這里還通過(guò)seqtk comp生成了一個(gè)統(tǒng)計(jì)各個(gè)蛋白質(zhì)長(zhǎng)度的文件，這是為之后我們可視化蛋白質(zhì)模體分布服務(wù)的。

在meme輸出的文件夾中存放了不同類型的文件。其中以.png結(jié)尾的即為各個(gè)模體的seqlogo圖。模體的網(wǎng)頁(yè)版報(bào)告可以通過(guò).html結(jié)尾文件查看。而我們需要的信息存放在以.txtx結(jié)尾的文件中。由于該文件十分復(fù)雜，而我們想要提取想要的信息，需要依靠特殊的腳本解決，這里提供一個(gè)思路（python 處理）：

#!/usr/bin/env python

import re
import argparse

parser = argparse.ArgumentParser(description='meme file stat')
parser.add_argument('-i', '--input_file', type=str, required=True, metavar='<file name>',
                    help='Input txt file the meme outputs, i.e., meme.txt')
parser.add_argument('-o', '--output_file', type=str, required=True, metavar='<file name>', help='output stat file')
args = parser.parse_args()

content = []
motif_order = 1
pattern1 = "Motif ([A-Z]+) MEME-[0-9]+ sites sorted by position p-value"
pattern2 = "Aco[0-9]+\s+[0-9]+\s+[0-9]+\.[0-9]+e-[0-9]+\s+[A-Z]+\s+[A-Z]+\s+[A-Z]+"
Motif_info_dict = {}
Motif_sample_dict = {}
seq_motif_info_all = []

with open(args.input_file) as f:
    while True:
        content_list = f.readline()

        if not content_list:
            break
        else:
            content_list = content_list.replace("\t", "").replace("\n", "")
            if re.search("\+s", content_list):
                continue

        if re.search(pattern1, content_list):
            match = re.findall(pattern1, content_list)
            Motif_name = "Motif" + str(motif_order)
            motif_order = int(motif_order) + 1
            Motif_sample_dict["seq"] = match[0]
            Motif_sample_dict["length"] = len(match[0])
            Motif_info_dict[Motif_name] = Motif_sample_dict

        if re.search(pattern2, content_list):
            protein_motif_info_list = re.split("\s+", content_list)
            seq_name = protein_motif_info_list[0]
            seq_start = int(protein_motif_info_list[1])
            seq_end = int(Motif_sample_dict["length"]) + int(seq_start) - 1
            seq_motif = Motif_name
            seq_motif_seq = str(protein_motif_info_list[4])
            seq_motif_info = [seq_name, seq_motif, seq_start, seq_end, seq_motif_seq]
            seq_motif_info_all.append(seq_motif_info)

for i in seq_motif_info_all:
    output = i[0] + "\t" + i[1] + "\t" + str(i[2]) + "\t" + str(i[3]) + "\t" + i[4] + "\n"
    with open(args.output_file, "a") as f:
        f.write(output)

得到處理好的文件后，為了呈現(xiàn)文章中的效果，我們可以用ggplot2進(jìn)行可視化，結(jié)合我之前的文件在這里提供一個(gè)思路：

library(data.table)
library(tidyverse)

# 載入文本
pep_meme <- fread("./DATA/test.txt", header = F)
new_pep <- pep_meme %>%
  arrange(V1) %>%
  group_by(V1) 

pep_length <- fread("./DATA/length.stat.pep.txt", header = F)
pep_length <- mutate(pep_length, V3=0)

# 畫(huà)圖
ggplot() +
  geom_segment(data = new_pep, 
               aes(x = as.numeric(V3), 
                   y = V1,
                   xend = as.numeric(V4),
                   yend = V1,
                   color = V2),
               linewidth = 2.5, 
               position = position_nudge(y = 0.2)) +
  scale_color_brewer(palette = "Set3") +
  geom_segment(data = pep_length, 
               aes(x = as.numeric(V3),
                   y = V1,
                   xend = as.numeric(V2),
                   yend = V1),
               color = "grey",
               linewidth = 1) +
  scale_x_continuous(expand = c(0,0)) +
  labs(y = "Family",
       x = "Length",
       color = "Motif") +
  theme_classic()