這次準(zhǔn)備介紹的是一個使用簡便的Python分析包:ipyrad。(在此特別感謝虞米兒~)
ipyrad 官方文檔: https://ipyrad.readthedocs.io/en/latest/API-analysis/cookbook-abba-baba.html
對我來說比較有用的是下面這幾個模塊:
vcf_to_hdf5
treemix
PCA
RAxML
STRUCTURE
abba-baba
先來介紹一下PCA模塊~
下載
作者推薦使用conda下載,最好下載比較新的版本(python v 3.7)
conda install ipyrad=0.9.63 -c bioconda
到這主體部分就下好了。
PCA分析
下載額外用到的模塊
conda install scikit-learn -c bioconda
conda install toyplot -c eaton-lab
將VCF文件轉(zhuǎn)換為haf5格式
vcf2hdf5.py:
### name : output_file_name data : input_vcf ld_block_size : window_size
import ipyrad.analysis as ipa
import pandas as pd
converter = ipa.vcf_to_hdf5(
name="test_LD10k",
data="all_snp_1.vcf.gz",
ld_block_size=10000,
)
converter.run()
運(yùn)行結(jié)果在 analysis-vcf2hdf5 文件夾下
運(yùn)行PCA分析
ipyrad_pca.py :
import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf
data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"
imap = {
"AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
"ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
}
minmap = {i: 0.5 for i in imap}
pca = ipa.pca(data = data,
imap = imap,
minmap = minmap,
mincov = 0.75,
impute_method = "sample",
)
pca.run(nreplicates=25,seed=12345)
df = pd.DataFrame(pca.pcaxes[0], index=pca.names)
df.to_csv("pca_analysis.csv")
canvas, axes = pca.draw(0,2)
toyplot.pdf.render(canvas, "PCA-out.pdf")
這里產(chǎn)生兩個結(jié)果
pca_analysis.csv PCA分析結(jié)果文件:

1.png
PCA-out.pdf 結(jié)果圖:

1.png
這里運(yùn)用的是pca.run(nreplicates=25,seed=12345) 方法,可以根據(jù)說明書自行修改。
tSNE分析
tSNE是降維分析的一種,具體原理還不是很懂,這個方法也提供了tSNE方法。
ipyrad_tsne.py :
import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf
data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"
imap = {
"AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
"ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
}
minmap = {i: 0.5 for i in imap}
pca = ipa.pca(data = data,
imap = imap,
minmap = minmap,
mincov = 0.75,
impute_method = "sample",
)
pca.run_tsne(subsample=True, perplexity=4.0, n_iter=100000, seed=123)
canvas, axes = pca.draw()
toyplot.pdf.render(canvas, "tsne-out.pdf")
tsne-out.pdf :

2.png
PCA這里就寫這么多,只是編寫了我的做法,更詳細(xì)的內(nèi)容還是去看官方文檔吧~
ipyrad后續(xù)內(nèi)容,且聽下回。