背景:qiime artifact 是用于存儲(chǔ)qiime2的輸入輸出以及相關(guān)的元數(shù)據(jù),并提供結(jié)果是如何產(chǎn)生的信息,但是qiime2所產(chǎn)生的artifacts(如.qza,雖然其是一個(gè)壓縮文件)不能直接作為R的直接輸入文件,而是要經(jīng)過(guò)一系列的轉(zhuǎn)化成R可接受的文件,所以qiime2R這個(gè)包被用來(lái)簡(jiǎn)化從qiime2 artifacts到R中輸入文件的步驟,并且盡可能的保留artifacts中的信息,主要通過(guò)read_qza函數(shù)實(shí)現(xiàn)。
原理: The artifact is unpacked in to a temporary directory and the raw data and associated metadata are read into a named list (see below). Data are typically returned as either a data.frame, phylo object (trees), or DNAStringSets (nucleic acid sequences).
2.qiime2R包的下載
github中下載
if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
devtools::install_github("jbisanz/qiime2R")
3.讀取artifacts(.qza)
依靠read_qza函數(shù)實(shí)現(xiàn)read_qza(.qza), 例如
SVs<-read_qza("table.qza")
names(SVs)
[1] "uuid" "type" "format" "contents" "version"
[6] "data" "provenance"
SVs$data[1:5,1:5] #show first 5 samples and first 5 taxa
# L1S105 L1S140 L1S208 L1S257 L1S281
#4b5eeb300368260019c1fbc7a3c718fc 2183 0 0 0 0
#fe30ff0f71a38a39cf1717ec2be3a2fc 5 0 0 0 0
#d29fe3c70564fc0f69f2c03e0d1e5561 0 0 0 0 0
#868528ca947bc57b69ffdf83e6b73bae 0 2249 2117 1191 1737
#154709e160e8cada6bfb21115acc80f5 802 1174 694 406 242
data: the raw data ex OTU table as matrix or tree in phylo format
uuid: the unique identifer of the artifact
type :the semantic type of the object (ex FeatureData[Sequence])
format: the format of the qiime artifact
provenance: information tracking how the object was created
contents: a table of all the files contained within the artifact and their file size
version: the reported version for the artifact, a warning error may be thrown if a new version is seen
4. 讀取metadata
read_q2metadata()函數(shù)
metadata<-read_q2metadata("sample-metadata.tsv")
head(metadata) # show top lines of metadata
# SampleID barcode-sequence body-site year month day subject reported-antibiotic-usage days-since-experiment-start
#2 L1S8 AGCTGACTAGTC gut 2008 10 28 subject-1 Yes 0
#3 L1S57 ACACACTATGGC gut 2009 1 20 subject-1 No 84
#4 L1S76 ACTACGTGTGGT gut 2009 2 17 subject-1 No 112
#5 L1S105 AGTGCGATGCGT gut 2009 3 17 subject-1 No 140
#6 L2S155 ACGATGCGACCA left palm 2009 1 20 subject-1 No 84
#7 L2S175 AGCTATCCACGA left palm 2009 2 17 subject-1 No 112
5.讀取taxonomy
當(dāng)read_qza讀入taxonomy時(shí),返回的是feature id 和未拆分的物種注釋以及置信分?jǐn)?shù),而后續(xù)分析需要拆分物種注釋到具體的界門綱目科屬種,parse_taxonomy()可以實(shí)現(xiàn)上述要求。
taxonomy<-read_qza("taxonomy.qza")
head(taxonomy$data)
# Feature.ID Taxon Confidence
#1 4b5eeb300368260019c1fbc7a3c718fc k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s__ 0.9972511
#2 fe30ff0f71a38a39cf1717ec2be3a2fc k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria 0.9799427
#3 d29fe3c70564fc0f69f2c03e0d1e5561 k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus 1.0000000
#4 868528ca947bc57b69ffdf83e6b73bae k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s__ 0.9955859
#5 154709e160e8cada6bfb21115acc80f5 k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides 1.0000000
#6 1d2e5f3444ca750c85302ceee2473331 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae 0.9455365
taxonomy<-parse_taxonomy(taxonomy$data)
head(taxonomy)
# Kingdom Phylum Class Order Family Genus Species
#4b5eeb300368260019c1fbc7a3c718fc Bacteria Bacteroidetes Bacteroidia Bacteroidales Bacteroidaceae Bacteroides <NA>
#fe30ff0f71a38a39cf1717ec2be3a2fc Bacteria Proteobacteria Betaproteobacteria Neisseriales Neisseriaceae Neisseria <NA>
#d29fe3c70564fc0f69f2c03e0d1e5561 Bacteria Firmicutes Bacilli Lactobacillales Streptococcaceae Streptococcus <NA>
#868528ca947bc57b69ffdf83e6b73bae Bacteria Bacteroidetes Bacteroidia Bacteroidales Bacteroidaceae Bacteroides <NA>
#154709e160e8cada6bfb21115acc80f5 Bacteria Bacteroidetes Bacteroidia Bacteroidales Bacteroidaceae Bacteroides <NA>
#1d2e5f3444ca750c85302ceee2473331 Bacteria Proteobacteria Gammaproteobacteria Pasteurellales Pasteurellaceae Haemophilus parainfluenzae
6.創(chuàng)建phyloseq對(duì)象
qza_to_phyloseq()函數(shù)可以連接多個(gè)read_qza()創(chuàng)建一個(gè)phyloseq對(duì)象用于后續(xù)分析
physeq<-qza_to_phyloseq(
features="inst/artifacts/2020.2_moving-pictures/table.qza",
tree="inst/artifacts/2020.2_moving-pictures/rooted-tree.qza",
taxonomy="inst/artifacts/2020.2_moving-pictures/taxonomy.qza",
metadata = "inst/artifacts/2020.2_moving-pictures/sample-metadata.tsv"
)
physeq
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 759 taxa and 34 samples ]
## sample_data() Sample Data: [ 34 samples by 10 sample variables ]
## tax_table() Taxonomy Table: [ 759 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 759 tips and 757 internal nodes ]
7.其他函數(shù)
- read_qza() - Function for reading artifacts (.qza).
- qza_to_phyloseq() - Imports multiple artifacts to produce a phyloseq object.
- read_q2metadata() - Reads qiime2 metadata file (containing q2-types definition line,metadata文件中第二行必須要定義哪些列是字符、那些列是數(shù)值)
- write_q2manifest() - Writes a read manifest file to import data into qiime2
- theme_q2r() - A ggplot2 theme for for clean figures.
- print_provenance() - A function to display provenance information.展示數(shù)據(jù)產(chǎn)生的步驟
- is_q2metadata() - A function to check if a file is a qiime2 metadata file.
- parse_taxonomy() - A function to parse taxonomy strings and return a table where each column is a taxonomic class.
- parse_ordination() - A function to parse the internal ordination format.
- read_q2biom() - A function for reading QIIME2 biom files in format v2.1
- make_clr() - Transform feature table using centered log2 ratio.
- make_proportion() - Transform feature table to proportion (sum to 1).
- make_percent() - Transform feature to percent (sum to 100).
- interactive_table() - Create an interactive table in Rstudio viewer or rmarkdown html.
- summarize_taxa()- Create a list of tables with abundances sumed to each taxonomic level.
- taxa_barplot() - Create a stacked barplot using ggplot2.
- taxa_heatmap() - Create a heatmap of taxonomic abundances using gplot2.
- corner() - Show top corner of a large table-like obejct.
- min_nonzero() - Find the smallest non-zero, non-NA in a numeric vector.
- mean_sd() - Return mean and standard deviation for plotting.
- subsample_table() - Subsample a table with or without replacement.
- filter_features() - Remove low abundance features by number of counts and number of samples they appear in.