今天我想用Linux重新統(tǒng)計(jì)一下每對(duì)通用引物能擴(kuò)增出多少目科屬種鳥類
首先用excel將rtfd中所有鳥類名稱篩出來,列名為NAME
分別為12sspecies_list.xlsx |1cytb_list.xlsx | 2cytbspecies_list.xlsx | co1species_list.xlsx
其中用到excel中的分列及合并
合并的公式為 =A1 & “ ” & B1
import pandas as pd
from Bio import Entrez
from time import sleep
# 設(shè)置你的郵箱(NCBI要求提供郵箱地址)
Entrez.email = "maha2082@163.com"
# 從Excel讀取鳥類的拉丁文名
input_file = '1cytb_list.xlsx' # 替換Excel文件名
df = pd.read_excel(input_file, sheet_name=0)
# 創(chuàng)建一個(gè)新列來存儲(chǔ)結(jié)果
df['Order'] = ''
df['Family'] = ''
df['Genus'] = ''
df['Species'] = ''
# 定義一個(gè)函數(shù)來查詢NCBI數(shù)據(jù)庫(kù)
def fetch_taxonomy(name):
try:
handle = Entrez.esearch(db="taxonomy", term=name, retmode="xml")
record = Entrez.read(handle)
if record['IdList']:
taxon_id = record['IdList'][0]
tax_handle = Entrez.efetch(db="taxonomy", id=taxon_id, retmode="xml")
tax_record = Entrez.read(tax_handle)
lineage = tax_record[0]['LineageEx']
order = family = genus = species = ''
for item in lineage:
rank = item.get('Rank')
name = item.get('ScientificName')
if rank == 'order':
order = name
elif rank == 'family':
family = name
elif rank == 'genus':
genus = name
species = tax_record[0]['ScientificName']
return order, family, genus, species
except Exception as e:
print(f"Error fetching data for {name}: {e}")
return '', '', '', ''
# 遍歷每個(gè)物種進(jìn)行查詢
for i, row in df.iterrows():
species_name = row['NAME']
print(f"Fetching data for {species_name}...")
order, family, genus, species = fetch_taxonomy(species_name)
df.at[i, 'Order'] = order
df.at[i, 'Family'] = family
df.at[i, 'Genus'] = genus
df.at[i, 'Species'] = species
sleep(0.5) # 為了避免被NCBI服務(wù)器封禁,添加延遲
# 保存結(jié)果到新的Excel文件
output_file = '1cytbbird_species_classification.xlsx'#修改文件名
df.to_excel(output_file, index=False)
print(f"結(jié)果已保存到 {output_file}")
#在目錄中保存為classify.py,修改完文件名,直接運(yùn)行即可
python3 classify.py
生成的excel文件為:
12sbird_species_classification.xlsx
1cytbbird_species_classification.xlsx
...
將excel按目科屬種排序,補(bǔ)充空缺內(nèi)容,刪除重復(fù)物種
找到重復(fù)的步驟為:選中species列,格式,條件格式,新建規(guī)則,僅對(duì)唯一值或重復(fù)值設(shè)置格式,格式選擇顏色,點(diǎn)擊確定,再次點(diǎn)擊確定。