Before Getting Started
1. Understanding 16S rRNA data
Below are links to papers which outline QIIME and its detailed work flow for OTU picking, as well as papers which discuss various parameters and options for a proper analysis.
QIIME related papers
- Conducting a Microbiome Study
- QIIME allows analysis of high-throughput community sequencing data
- Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample
- Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing
- Advancing our understanding of the human microbiome using QIIME
- Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity
- Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible
- Robust methods for differential abundance analysis in marker gene surveys
Alpha Diversity
- Kumar-UAB Diversity Presentation
- An Introduction to Applied Bioinformatics
- Wikipedia Diversity Index Entry
- Measurements of Biodiversity
- Measuring the diversity of a single community
- Werner Lab QIIME Overview
- Coloss Honey Bee Research Association
- Wikipedia Rarefaction Entry
Other resources
- See here for a list of citations for the many QIIME's dependencies/tools
- See here for more details on QIIME's OTU picking strategies
- See here for the link to QIIME's index of tutorials
2. Using Terminal
QIIME is a set of python scripts that are called using Terminal, therefore the use of Terminal and the basics of the command-line are necessary for using QIIME. Below are a few links to give you an introduction to a the command-line interface and important commands.
- http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part1
- http://swcarpentry.github.io/shell-novice/01-filedir.html
- https://edamame-course.github.io/docs/the_shell.html
Before proceeding to the next article, be sure that you are familiar with the following:
- How to move around folders in command line (
cd
andcd ..
) - How to find the help
-h
menu of a command (command_name.py -h
) - How to list files (
ls
) - How to read and run a basic script. (
scriptname.py -i input_file.txt -o output_folder
) - How to make directories (
mkdir
) - How to create a new text file (
touch new_file.txt
) - Do not name folder with spaces as it can cause many errors when typing a command. Instead use underscores. (
output_folder
) - Using the
\
to make the commands more readable. wget must be installed
# Check if wget is installed wget --version # Install wget (if needed) sudo easy_install wget
3. Keeping Track of Commands
It is very important to keep track of what you have run. The command line will only save so much history of your commands, so it is best practice to write your commands into a text editor such as Textwrangler (http://www.barebones.com/products/textwrangler/download.html), comment each command describing what you will actually be doing and then copy the command into the command line. It is always very important to comment each of your commands, so that additional analyses can be more readily carried out of your data by other researchers. A typical command is shown below:
# Keep only the time point 'Day 28' in a new biom and mapping files.
filter_samples_from_otu_table.py \
-i otu_table.biom \
-o otu_table_noday0.biom \
-m mapping_file.txt \
-s 'Day:28'
4. Computational Performance and Time
To decrease the amount of time for processing raw sequences, it's recommended to use a high performance cluster. Depending upon how the system is setup on your cluster will determine how to properly call the installed QIIME package. Contact your systems administrators for more questions about QIIME installation and package loading.
5. Tutorial Example Data
The rest of this tutorial will use a small data set so that an example of each command can be shown. The tutorial files are de-multiplexed and can be used for OTU picking. Open the mapping file in Excel to get a better idea of the meta data variables.
The tutorial data currently includes:
- qiime_processing_workflow_local.sh - A file of the commands for processing the de-multiplexed file
- 16s_pickotu_param.txt - Parameters text file for OTU picking.
- split_libraries/ - A folder which includes the de-multiplexed fasta file. Can be used for the Processing Sequences step
- otu_table.biom - Processed OTU table for running most analyses.
- mapping_file.txt - A metadata file corresponding to the OTU table.
- rep_set.tre - A phylogenetic tree corresponding to the OTU table.
- rep_set.fna - A file of representative sequences corresponding to the OTU table.
Download Tutorial Data (Right Click + Save As)
# Once the folder has been downloaded, you can cd into the folder
# and start the Processing Sequences step.
cd Qiime_Introduction_Tutorial/
Moving Pictures of the Human Microbiome. Caporaso JG, Lauber CL, Costello EK, Berg-Lyons D, Gonzalez A, Stombaugh J, Knights D, Gajer P, Ravel J, Fierer N, Gordon JI, Knight R. Genome Biol. 2011 May 30;12(5):R50.