Filtering Samples from OTU-Table

Introduction

The most common type of filtering is filtering the groups of samples from the table. This will be the most important filter as it allows you to remove one particular group or time point from the table or remove samples below a particular sequencing depth.

There are a few different ways to filter out data. The command works differently depending upon the type and amount of samples in one particular group. Either way, the command takes the same formatted argument. You must have the name of the column in your mapping file and a factor or level within that column, separated by a colon and surrounded by quotes.

column variable:name of the group ('Treatment:Control')


1a. Positive Filtering of Samples

http://qiime.org/scripts/filter_samples_from_otu_table.html

Description

The first way is positive filtering. You tell the script which groups you WANT to keep. For this example, we have a total of 3 different groups within the column variable 'Treatment:Group1,Group2,Group3'. If we want to remove 'Group3' we would run the script below.

Parameters

--input_fp | -i
Input OTU table in .biom format

--output_fp | -o
The name of the new output filtered biom file.

--mapping_fp | -m
The mapping file that corresponds to the input OTU table.

--output_mapping_fp
The location of the new mapping file which will match the newly created BIOM file.

--valid_states | -s
The names of the groups you want to remove. It MUST be surrounded by single or double quotes.

Command

filter_samples_from_otu_table.py \
-i otu_table.biom \
-o otu_table_filtered.biom \
-m mapping_file.txt \
--output_mapping_fp mapping_file_filtered.txt \
-s 'SampleType:gut,tongue'

1b. Negative Filtering of Samples

Description

The second way is negative filtering. You tell the script which groups you DO NOT WANT to keep. We are going to use the same example found above and remove 'Group3'. The negative filtering requires a special few characters. It needs a *,! before the name of the group.

Command

filter_samples_from_otu_table.py \
-i otu_table.biom \
-o otu_table_no_gut.biom \
-m mapping_file.txt \
--output_mapping_fp mapping_file_no_gut.txt \
-s 'Treatment:*,!gut'

There are many more features within filter_sample_from_otu_table.py, such as the ability to remove high coverage samples or to choose samples that match a particular list of SampleID's. See the QIIME website link above for more examples.


1c. Advanced usage of Sample Filtering

Description

If you want to get a bit more advanced, you can specify multiple variables at the same time. If you want to filter out multiple groups as well as a particular study, you can use a semicolon,; between statements.

Command

filter_samples_from_otu_table.py \
-i otu_table.biom \
-o otu_table_gut_d28.biom \
-m mapping_file.txt \
--output_mapping_fp mapping_file_gut_d28.txt \
-s 'Treatment:gut;Day:28'

2. Splitting The Table Based on Group Information

http://qiime.org/scripts/split_otu_table.html

Description

Another important script for managing the OTU table is the split function. This command is very siilar to the filter_samples_from_otu_table.py, but instead of outputing a single filtered OTU table, the command will generate a new OTU table for each factor/group of the chosen variable. If you have separate studies or many timepoint in one OTU table, but you do not want to filter out every group individually, you can split the table, which will create new biom files with each unique particular category. So for example, if you had 5 different timepoints and you wanted to create 5 separate biom files and mapping files for each time point, you can use this command. The output of this command is a new folder that contains a new biom file for each factor in the column variable chosen. Below is an example file tree listing of the output files.

per_timepoint_tables/ 
-> otu_table_timepoint1.biom
-> otu_table_timepoint2.biom  
-> otu_table_timepoint3.biom
-> otu_table_timepoint4.biom
-> otu_table_timepoint5.biom
-> otu_table_timepoint6.biom

Parameters

--biom_table_fp | -i
Input OTU table in .biom format

--output_dir | -o
The name and location of the folder to store all the output biom files

--mapping_fp | -m
The mapping file that corresponds to the input OTU table

--fields | -f
The name of the group to split the table.

Command

split_otu_table.py \
-i otu_table.biom \
-o split_by_month \
-m mapping_file.txt \
-f Month

3. Removing Samples with Low Sampling Depth

http://qiime.org/scripts/filter_samples_from_otu_table.html

Description

Instead of filtering based on mapping file data, you may want to perform quality filtering on the samples regardless of group.

One way to filter the samples based on quality is to remove any sample with an observation count (otu count) below a certain threshold. Typically you want to retain as many samples as possible to maximize your analysis, but most analyses cannot be performed on samples that contain only 5 or 10 OTU's. These samples are typically removed before proceeding with any further analysis as they will severely skew the data and results with their low counts.

To determine the correct threshold, it is highly recommended to run biom summarize-table on the OTU table to generate a report of the per sample observation count.

# Get a table of sampling depth of all samples in an OTU table
biom summarize-table \
-i otu_table.biom \
-o otu_table_stats.txt

Parameters

--input_fp | -i
Input OTU table in .biom format

--output_fp | -o
The name of the output filtered biom file.

--mapping_fp | -m
The mapping file that corresponds to the input OTU table.

--output_mapping_fp
The location of the new mapping file which will match the newly created biom file.

--min_count | -n
The minimum cutoff for the number of sequences per sample. Any samples with a total sequencing depth below this number will be removed from the table.

Command

filter_samples_from_otu_table.py \
-i otu_table.biom \
-o otu_table_m1000.biom \
--output_mapping_fp mapping_file_m1000.txt \
--min_count 1000

results matching ""

    No results matching ""