男人天堂熟妇女日本优,好好的日无码视频大全,久久婷婷短片

背景

當(dāng)我們使用 CentriMo 進行motif 富集后，如采用下面的命令

$ centrimo -oc test test.500l.fa  $meme_motif

會在test目錄產(chǎn)生三個文件（centrimo.html，centrimo.tsv，site_counts.txt），用瀏覽器打開centrimo.html后，點擊motif id前面的復(fù)選框后，會在左邊位置的文本框顯現(xiàn)出motif匹配的序列。

image.png

因此，問題是我想要獲取每個motif匹配的序列。一開始，我想應(yīng)該centrimo.tsv，會有這吧，但是沒找到。然后又去centrimo的參數(shù)說明里看看有沒相關(guān)參數(shù)設(shè)置，找到了個，不把匹配序列放進html的參數(shù)。。。

image.png

既然如此那就只好自己弄了。

方法

既然能在html上顯示，那網(wǎng)頁源碼里肯定包含了這些信息。所以查看下網(wǎng)頁源碼：

image.png

分析下，通過var data定義個JavaScript對象（相當(dāng)于Python中的字典），sequences儲存著所有的序列，motifs下存儲著motif信息，每個motif以js對象表示，其中每個motif對象里，seqs又存儲著序列索引（sequences數(shù)組的索引，沒在圖片中給出，可自行查看自己的的網(wǎng)頁源碼）。

因此，我們只要提取出 data 對象中的sequences和motifs數(shù)據(jù), 根據(jù)motifs下的每個motif儲存的 seqs索引在sequences中取出對應(yīng)的序列就可以獲得了。

所以上代碼(Python)：

import re
import json
from bs4 import BeautifulSoup  ## 安裝 pip install beautifulsoup4


file = "centrimo.html"

## 讀取html文件
with open(file, "r") as f:
    html = f.read()


## 通過bs4 進行解析
soup = BeautifulSoup(html)

# 在script標(biāo)簽下，找到包含 var data的文本， \s*是用于匹配空格的
script = soup.find('script', text=re.compile('var\s*data\s*=\s*'))

## 通過正則匹配到 var data = {...}; 中data對象里的數(shù)據(jù) 
json_text = re.search('var\s*data\s*=\s*({.*?})\s*;$', script.string, flags=re.DOTALL | re.MULTILINE).group(1)

data = json.loads(json_text)  #加載成字典 
print(data.keys())

運行后，輸出是這樣，代表data數(shù)據(jù)提取成功

dict_keys(['version', 'revision', 'release', 'program', 'cmd', 'options', 'seqlen', 'tested', 'alphabet', 'background', 'sequence_db', 'motif_dbs', 'sequences', 'motifs'])

簡單的代碼就提取了。

sequences = data["sequences"]
motifs = data["motifs"]

extracted_seq = []  # 這里作為一個例子，把它們保存在列表里。

for motif in motifs:
    seqs_idx = motif["seqs"]
    seqs_id = [sequences[i] for i in seqs_idx]
    extracted_seq.append({"id": motif["id"], "seq": seqs_id})

print(extracted_seq[1]["seq"][:10])

# ['chr2:22587565-22588065', 'chrX:143482808-143483308', 'chr1:7397843-7398343', 'chr5:88764784-88765284', 'chr9:20651695-20652195', 'chr8:25016942-25017442', 'chr6:21949731-21950231', 'chr10:30842681-30843181', 'chr3:138442803-138443303', 'chr2:3283792-3284292']

之后做什么處理，就看各自的需求了。

在此，結(jié)束了。

參考

https://stackoverflow.com/questions/13323976/how-to-extract-a-json-object-that-was-defined-in-a-html-page-javascript-block-us
https://meme-suite.org/meme/doc/centrimo.html?man_type=web

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

meme CentriMo富集 motif匹配序列的獲取

meme CentriMo富集 motif匹配序列的獲取

背景

方法

參考

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

meme CentriMo富集 motif匹配序列的獲取

背景

方法

參考

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av