Taxonomic Composition Analysis
A. Taxa Summary Plots
http:\/\/qiime.org\/scripts\/summarize_taxa_through_plots.html
Description
Taxa summary plots are useful for viewing the contribution of each taxa on a sample or a group of samples. Generating a taxa abundance plot is also work flow script (similar to the alpha and beta diversity work flows) The command is called summarize_taxa_through_plots.py
which performs all the necessary steps for summarizing, processing and plotting the data. The script is actually composed of 3 different steps\/scripts. (a parameters file can be used to customize each one of the steps).
Note: The first steps (Collapse samples and Sort OTU table) are actualy optional commands depending upon if you want to show the data for each sample or if you want to merge group of samples together.
collapse_samples.py and/or sort_otu_table.py
(http:\/\/qiime.org\/scripts\/collapse_samples.html \/ http:\/\/qiime.org\/scripts\/sort_otu_table.py)summarize_taxa.py
(http:\/\/qiime.org\/scripts\/summarize_taxa.html)plot_taxa_summary.py.py
(http:\/\/biocore.github.io\/emperor\/build\/html\/scripts\/plot_taxa_summary.py.html)
Parameters
--otu_table_fp | -i
Input OTU table in .biom
format
--output_dir | -o
The name and location of the output folder. The folder will contain a (.html) file which is an interactive document for viewing your data
--mapping_fp | -m
The mapping file that corresponds to the input OTU table
--mapping_category | -c (optional)
The column variable name to use for showing samples as average groups.
--parameter_fp | -p (optional)
Set of parameters to use for taxa abundance plots commands.
--sort | -s (optional)
Should the data be sorted on the X-axis
Command
# Plot group average
summarize_taxa_through_plots.py \
-i otu_table.biom \
-o taxa_summary_plots \
-m mapping_file.txt \
-c SampleType
# Plot each sample
summarize_taxa_through_plots.py \
-i otu_table.biom \
-o taxa_summary_plots \
-m mapping_file.txt
Output
The resulting folder contains many different files. The output of interest is the .html file in the taxa_summary_plots
folder.
B. Abundance Significance
http:\/\/qiime.org\/scripts\/group_significance.html
Description
After generating the taxa abundance plots, you may want to know which OTU's are significantly different between 2 or more groups. Similar to the statistical scripts seen in the alpha and beta diversity commands, there is a particular script that will calculate significance for abundances.
This is one method for finding taxa of interest between 2 or more groups. Another popular method is LEfSe which uses similar methods to detect biomarker species. See Section 8.3 for more information about using LEfSe.
Also see http:\/\/qiime.org\/scripts\/differential_abundance.html for information on using DESeq2 more in depth for differential abundance analysis.
Parameters
--otu_table_fp | -i
Input OTU table in .biom
format
--output_dir | -o
The name and location of the new text file that will be generate which will store a table of your results.
--mapping_fp | -m
The mapping file that corresponds to the input OTU table
--category | -c
The column variable name to use for comparing 2 or more groups
--test | -s
The type of significance test to use. See http:\/\/qiime.org\/scripts\/group_significance.html for more information about test assumptions and choosing which test to use for your data.
Command
group_significance.py \
-i otu_table.biom \
-o kruskal_wallis_test.txt \
-m mapping_file.txt \
-c SampleType \
-s kruskal_wallis
C. Normalizing Sample Abundances
http:\/\/qiime.org\/scripts\/normalize_table.html
Description
Another way of normalizing the abundance data, besides relative abundances, ($$x/SampleSum(x)$$), is to use methods such as CSS or DESeq2. After normalizing these file can be used for better statistical methods such as metagenomeSeq’s fitZIG and DESeq2 differential abundance testing.
Parameters
--input_path | -i
The location and file name of the input OTU table.
--out_path | -o
The location and file name of the new normalize OTU table in .biom
format.
--algorithm | -a
Which algorithm to use for normalization
Command
normalize_table.py \
-i otu_table.biom \
-o otu_table_deseq2_normalized.biom \
-a DESeq2
D. Correlating Taxa Abundances and a Variable
http:\/\/qiime.org\/scripts\/observation_metadata_correlation.html
Description
One interesting question is to find which of your metadata may correlate with taxonomic abundances. To answer this question, you can use the observation_metadata_correlation.py
to correlate variables in your mapping file with OTU abundances and see which may negatively or positively correlate.
There are additional tools outside of QIIME for correlating and using multivariate analysis on taxonomic abundances and metadata variables. See Section 8.3 for more information about MaAslin.
Parameters
--otu_table_fp | -i
Input OTU table in .biom
format
--output_fp | -o
The location and filename of the output text file that includes that table of results
--mapping_fp | -m
The mapping file that corresponds to the input OTU table
--category | -c
The name of the column variable within the input mapping file which you would like to correlate with OTU abundances
--test | -s
The type of correlation test to use. This can be spearman, pearson, kendall, cscore. See http:\/\/qiime.org\/scripts\/observation_metadata_correlation.html for more information about the testing assumptions
Command
observation_metadata_correlation.py \
-i otu_table.biom \
-o correlation_results.txt \
-m mapping_file.txt \
-c DaysSinceExperimentStart \
-s spearman
E. Supervised learning
http:\/\/qiime.org\/scripts\/supervised_learning.html
Description
A more advanced way of using taxa abundance and sample features is to use supervised learning methods, which use machine learning techniques to build classifier models based on abundances.
It is recommended to use a sample depth cutoff (~1000) and a rarefied table before running the command. Additionally, you can use many different rarefied tables to avoid any error that may be introduced by taking only a single rarefaction. It is important to choose a good rarefaction depth that includes many of the samples, but it relatively high. Using the median sampling depth is one common recommendation.
Single rarefaction at one depth http:\/\/qiime.org\/scripts\/single_rarefaction.html
# Single Rarefaction
single_rarefaction.py \
-i otu_table.biom \
-o otu_table_even100.biom \
-d 1000
Multiple rarefactions at the same depth http:\/\/qiime.org\/scripts\/multiple_rarefactions_even_depth.html
# Multiple Rarefactions
multiple_rarefactions_even_depth.py \
-i otu_table.biom \
-o rarefied_otu_tables/ \
-d 100 \
-n 10
Parameters
--input_data | -i
Input rarefied OTU table in .biom
format or a folder of rarefied OTU tables
--output_dir | -o
The name of the folder to place output files of supervised learning
--mapping_file | -m
The mapping file that corresponds to the input rarefied OTU table\/tables
--category | -c The name of the metadata variable to group samples into when plotting the x-axis of the figure.
--errortype | -e
The type of error estimation to use when running the classifer. See http:\/\/qiime.org\/scripts\/supervised_learning.html for more information about which classifer to choose for your data.
--ntree | -n
The number of tree to generate when building a classifer. The more the better, but can at a cost for processing time.
Command
supervised_learning.py \
-i otu_table.biom \
-o supervised_output \
-m mapping_file.txt \
-c SampleType \
--errortype oob \
--ntree 1000
F. Abundance Heatmap
http:\/\/qiime.org\/scripts\/make_otu_heatmap.html
Description
One way of showing the abundance of taxa in each sample or group is to use a heatmap. There are many ways to customize the colors, order, phylogeny level, transformation method and. See the QIIME link above for more detailed parameters.
Parameters
--otu_table_fp | -i
Input OTU table in .biom
format
--output_fp | -o
The file name and location of the output PDF generated
--map_fname | -m
The mapping file that corresponds to the input OTU table
--category | -c (optional) The name of the metadata variable to group samples into when plotting the x-axis of the figure.
--otu_tree | -t
The phylogenetic tree (.tre
) that corresponds to the input OTU table
Command
make_otu_heatmap.py \
-i otu_table.biom \
-o heatmap_sorted.pdf \
-m mapping_file.txt \
-t rep_set.tre