ipyrad——主成分分析(一)

這次準(zhǔn)備介紹的是一個使用簡便的Python分析包:ipyrad。(在此特別感謝虞米兒~)
ipyrad 官方文檔: https://ipyrad.readthedocs.io/en/latest/API-analysis/cookbook-abba-baba.html
對我來說比較有用的是下面這幾個模塊:

vcf_to_hdf5
treemix
PCA
RAxML
STRUCTURE
abba-baba

先來介紹一下PCA模塊~

下載

作者推薦使用conda下載,最好下載比較新的版本(python v 3.7)

conda install ipyrad=0.9.63 -c bioconda

到這主體部分就下好了。

PCA分析

下載額外用到的模塊

conda install scikit-learn -c bioconda
conda install toyplot -c eaton-lab

將VCF文件轉(zhuǎn)換為haf5格式

vcf2hdf5.py:

### name : output_file_name  data : input_vcf  ld_block_size : window_size
import ipyrad.analysis as ipa
import pandas as pd

converter = ipa.vcf_to_hdf5(
        name="test_LD10k",
        data="all_snp_1.vcf.gz",
        ld_block_size=10000,
)

converter.run()

運(yùn)行結(jié)果在 analysis-vcf2hdf5 文件夾下

運(yùn)行PCA分析

ipyrad_pca.py :

import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf

data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"

imap = {
        "AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
        "ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
        }

minmap = {i: 0.5 for i in imap}

pca = ipa.pca(data = data,
        imap = imap,
        minmap = minmap,
        mincov = 0.75,
        impute_method = "sample",
        )

pca.run(nreplicates=25,seed=12345)

df = pd.DataFrame(pca.pcaxes[0], index=pca.names)
df.to_csv("pca_analysis.csv")

canvas, axes = pca.draw(0,2)
toyplot.pdf.render(canvas, "PCA-out.pdf")

這里產(chǎn)生兩個結(jié)果
pca_analysis.csv PCA分析結(jié)果文件:

1.png

PCA-out.pdf 結(jié)果圖:
1.png

這里運(yùn)用的是pca.run(nreplicates=25,seed=12345) 方法,可以根據(jù)說明書自行修改。

tSNE分析

tSNE是降維分析的一種,具體原理還不是很懂,這個方法也提供了tSNE方法。
ipyrad_tsne.py :

import ipyrad.analysis as ipa
import pandas as pd
import toyplot
import toyplot.pdf

data = "analysis-vcf2hdf5/test_LD10k.snps.hdf5"

imap = {
        "AS": ["AS1","AS2","AS3","AS4","AS5","AS6","AS7","AS8","AS9","AS10","AS11","AS12","AS13","AS14","AS15","AS16","AS17","AS18","AS19"],
        "ES": ["ES1","ES2","ES3","ES4","ES5","ES6","ES7","ES8","ES9","ES10","ES11","ES12","ES13","ES14","ES15","ES16","ES17","ES18","ES19","ES20"],
        }

minmap = {i: 0.5 for i in imap}

pca = ipa.pca(data = data,
        imap = imap,
        minmap = minmap,
        mincov = 0.75,
        impute_method = "sample",
        )

pca.run_tsne(subsample=True, perplexity=4.0, n_iter=100000, seed=123)

canvas, axes = pca.draw()
toyplot.pdf.render(canvas, "tsne-out.pdf")

tsne-out.pdf :

2.png

PCA這里就寫這么多,只是編寫了我的做法,更詳細(xì)的內(nèi)容還是去看官方文檔吧~

ipyrad后續(xù)內(nèi)容,且聽下回。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容