Qiita Database Submission
Description
Qiita is a online repository of next generation sequencing data. Qiita accepts data of different types, such as 16s rRNA, whole genome metagenomics, RNA-sequencing and more. This tutorial will walk through a complete submission of 16s rRNA data and demonstrate how to set up the required metadata files.
Note: This tutorial assumes all of your 16s rRNA data is in 3 FASTQ files: Forward/Reverse/Barcode (index).
Before You Begin
Before starting any of the analysis, you must create an account on https://qiita.ucsd.edu/ with an email address and password. Additionally, you must have your raw FASTQ reads easily accessible as well as your associated mapping file. See https://qiita.ucsd.edu/static/doc/html/tutorials/account-creation.html
Step 1. Create a new study
Qiita uses 'Studies' to organize the many different types of next generation sequencing involved in most modern day experiments. Each type of sequencing analysis is typically setup in a similar way. There are the raw sequences, sample metadata, and preparation metadata, such as barcode and primer sequences that correspond to each sample. Qiita uses this philosophy to organize sample data and relate it to the sequencing data.
Follow the tutorial here for setting up a new study. https://qiita.ucsd.edu/static/doc/html/tutorials/getting-started.html
Step 2. Preparing Sample Information Table
The sample information table can be thought of as the sample metadata table. Similar to the one created during QIIME analysis, it contains unique SampleID's, columns about treatment, time point, sample type, etc... The only difference is that it doesn't contain the barcode and LinkerPrimerSequences for each sample. These are reserved for the second (Preparation file) that is needed next. Below is a sample template for use with mouse studies. The table contains the necessary fields for submission to EBI.
Download sample template here:
2A. Required Columns for Sample data
Required Header Name | Example | Description |
---|---|---|
sample_name | MA1.43 | An unique identifier for each sample sequenced |
title | A mouse experiement analyzing the | A title for the study |
taxon_id | 410661 | The NCBI taxon identifier for the metagenome (See NCBI or http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) |
scientific_name | mouse gut metagenome | The name of the corresponding taxon ID |
description | mouse.fecal.timeA | Can be any identifier for describing the sample |
host_subject_id | Mouse123 | An unique identifier for each subject |
sample_type | Fecal | The type of sample |
collection_timestamp | 09/18/2012 | Date of the sample's collection |
physical_specimen_remaining | TRUE | TRUE/FALSE if there is still specimen remaining for extraction. |
dna_extracted | TRUE | TRUE/FALSE if there is still DNA remaining for sequencing. |
physical_specimen_location | NYUMC | Center/University name for the location of the samples |
geo_loc_name | USA:NY:New York | Location of the University/Center |
elevation | 33 | Height of land above sea level in meters at the sampling site |
latitude | 32.842 | Latitude of the center/university |
longitude | -117.258 | Longitude of the center/university |
env_biome | host-associated | (air/built environment/host-associated/human-associated/human-skin/human-oral/human-gut/human-vaginal/microbial mat/microbial biofilm/misc environment/plant-associated/sediment/soil/wastewater/sludge/water) |
env_feature | urban biome | Environmental Ontology (ENVO) identifier. Only change if samples are not from a host and are environmental. |
env_matter | feces | Similar to sample_type. |
(Any name) | Treatment/Time/Exposure | Any necessary additional metadata |
Step 3. Preparing Prep Information Table
Download a prep template here:
Required Header Name | Example | Description |
---|---|---|
sample_name | MA1.43 | An unique identifier for each sample sequenced |
barcode | XXX | Nucleotide sequence per sample. If using the 12bp Golay barcodes, use the first 2 bases as the linker and the last 10bp as the b |
linker | XXX | 2 nucloetide sequences used as a link |
primer | XXX | Linker primer sequence |
center_name | XXX | XXX |
center_project_name | XXX | XXX |
experiment_design_description | XXX | XXX |
instrument_model | Illumina MiSeq | The type of instrument used for sequencing. (Illumina Genome Analyzer/ Illumina Genome Analyzer II/ Illumina Genome Analyzer Ix/ Illumina HiSeq 2500/ Illumina HiSeq 2000/ Illumina HiSeq 1500/ Illumina HiSeq 1000/ Illumina MiSeq/ Illumina HiScanSQ/ HiSeq X Ten/ NextSeq 500/ unspecified) |
library_construction_protocol | XXX | XXX |
platform | XXX | XXX |
library_construction_protocol | XXX | XXX |
run_prefix | XXX | Only include this column if there are multiple sequencing runs for one study. |