Managing the OTU Table
Once you have a complete and generated an OTU table, the next step is to perform a variety of analyses on this data. But before you can move onto the alpha/beta diversity analyses, you first must need to know how to filter your data. Filtering the OTU table, will be important to your analyses. You must always know what your input is for every command.
Additionally, knowing how to remove low abundant taxa or samples with a low sequencing depth will allow you to remove the noise that is typically contained in a 16s rRNA dataset.
Some questions to consider
- What samples to keep in the analysis?
- Samples with low OTU-count depth.
- Negative samples.
- OTU-depth cut off (1000 OTU's/sample are suggested)
- What OTU's or taxa to keep?
- What is the threshold for OTU abundances across all samples? (threshold of 0.01%? 0.001%?)
- Should any particular taxa that should be removed from the analysis? (Cyanobacteria?)
- How to manage multiple experiments within a single OTU table?
Command to generate sample information.
biom summarize-table \ -i otu_table.biom \ -o otu_table_stats.txt
Note about filtering
Most scripts or work flows perform their analysis on all the samples in the input OTU table. If you have many time points included in your
biom file, you will not be able to view a figure with a subset of your input data. It will generate a PCoA that includes all the samples, which may not be the analysis you are particularly interested in. To solve this problem, you will need to subset the table based on mapping file categories.