Qiita Database Submission


Qiita is a online repository of next generation sequencing data. Qiita accepts data of different types, such as 16s rRNA, whole genome metagenomics, RNA-sequencing and more. This tutorial will walk through a complete submission of 16s rRNA data and demonstrate how to set up the required metadata files.

Note: This tutorial assumes all of your 16s rRNA data is in 3 FASTQ files: Forward/Reverse/Barcode (index).

Before You Begin

Before starting any of the analysis, you must create an account on https://qiita.ucsd.edu/ with an email address and password. Additionally, you must have your raw FASTQ reads easily accessible as well as your associated mapping file. See https://qiita.ucsd.edu/static/doc/html/tutorials/account-creation.html

Step 1. Create a new study

Qiita uses 'Studies' to organize the many different types of next generation sequencing involved in most modern day experiments. Each type of sequencing analysis is typically setup in a similar way. There are the raw sequences, sample metadata, and preparation metadata, such as barcode and primer sequences that correspond to each sample. Qiita uses this philosophy to organize sample data and relate it to the sequencing data.

Follow the tutorial here for setting up a new study. https://qiita.ucsd.edu/static/doc/html/tutorials/getting-started.html

Step 2. Preparing Sample Information Table

The sample information table can be thought of as the sample metadata table. Similar to the one created during QIIME analysis, it contains unique SampleID's, columns about treatment, time point, sample type, etc... The only difference is that it doesn't contain the barcode and LinkerPrimerSequences for each sample. These are reserved for the second (Preparation file) that is needed next. Below is a sample template for use with mouse studies. The table contains the necessary fields for submission to EBI.

Download sample template here:

2A. Required Columns for Sample data

Required Header Name Example Description
sample_name MA1.43 An unique identifier for each sample sequenced
title A mouse experiement analyzing the A title for the study
taxon_id 410661 The NCBI taxon identifier for the metagenome (See NCBI or http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi)
scientific_name mouse gut metagenome The name of the corresponding taxon ID
description mouse.fecal.timeA Can be any identifier for describing the sample
host_subject_id Mouse123 An unique identifier for each subject
sample_type Fecal The type of sample
collection_timestamp 09/18/2012 Date of the sample's collection
physical_specimen_remaining TRUE TRUE/FALSE if there is still specimen remaining for extraction.
dna_extracted TRUE TRUE/FALSE if there is still DNA remaining for sequencing.
physical_specimen_location NYUMC Center/University name for the location of the samples
geo_loc_name USA:NY:New York Location of the University/Center
elevation 33 Height of land above sea level in meters at the sampling site
latitude 32.842 Latitude of the center/university
longitude -117.258 Longitude of the center/university
env_biome host-associated (air/built environment/host-associated/human-associated/human-skin/human-oral/human-gut/human-vaginal/microbial mat/microbial biofilm/misc environment/plant-associated/sediment/soil/wastewater/sludge/water)
env_feature urban biome Environmental Ontology (ENVO) identifier. Only change if samples are not from a host and are environmental.
env_matter feces Similar to sample_type.
(Any name) Treatment/Time/Exposure Any necessary additional metadata

Step 3. Preparing Prep Information Table

Download a prep template here:

Required Header Name Example Description
sample_name MA1.43 An unique identifier for each sample sequenced
barcode XXX Nucleotide sequence per sample. If using the 12bp Golay barcodes, use the first 2 bases as the linker and the last 10bp as the b
linker XXX 2 nucloetide sequences used as a link
primer XXX Linker primer sequence
center_name XXX XXX
center_project_name XXX XXX
experiment_design_description XXX XXX
instrument_model Illumina MiSeq The type of instrument used for sequencing. (Illumina Genome Analyzer/ Illumina Genome Analyzer II/ Illumina Genome Analyzer Ix/ Illumina HiSeq 2500/ Illumina HiSeq 2000/ Illumina HiSeq 1500/ Illumina HiSeq 1000/ Illumina MiSeq/ Illumina HiScanSQ/ HiSeq X Ten/ NextSeq 500/ unspecified)
library_construction_protocol XXX XXX
platform XXX XXX
library_construction_protocol XXX XXX
run_prefix XXX Only include this column if there are multiple sequencing runs for one study.

Step 4. Upload FASTQ and sample/prep tables

Step 5. Select sample metadata

Step 6. Associate FASTQ with prep table

Step 7. Process the 16s data

Step 8. Verify correct sampling depth and metadata

