Qiita Database Submission

Description

Qiita is a online repository of next generation sequencing data. Qiita accepts data of different types, such as 16s rRNA, whole genome metagenomics, RNA-sequencing and more. This tutorial will walk through a complete submission of 16s rRNA data and demonstrate how to set up the required metadata files.

Note: This tutorial assumes all of your 16s rRNA data is in 3 FASTQ files: Forward/Reverse/Barcode (index).

Before You Begin

Before starting any of the analysis, you must create an account on https://qiita.ucsd.edu/ with an email address and password. Additionally, you must have your raw FASTQ reads easily accessible as well as your associated mapping file. See https://qiita.ucsd.edu/static/doc/html/tutorials/account-creation.html

Step 1. Create a new study

Qiita uses 'Studies' to organize the many different types of next generation sequencing involved in most modern day experiments. Each type of sequencing analysis is typically setup in a similar way. There are the raw sequences, sample metadata, and preparation metadata, such as barcode and primer sequences that correspond to each sample. Qiita uses this philosophy to organize sample data and relate it to the sequencing data.

Follow the tutorial here for setting up a new study. https://qiita.ucsd.edu/static/doc/html/tutorials/getting-started.html

Step 2. Preparing Sample Information Table

The sample information table can be thought of as the sample metadata table. Similar to the one created during QIIME analysis, it contains unique SampleID's, columns about treatment, time point, sample type, etc... The only difference is that it doesn't contain the barcode and LinkerPrimerSequences for each sample. These are reserved for the second (Preparation file) that is needed next. Below is a sample template for use with mouse studies. The table contains the necessary fields for submission to EBI.

Download sample template here:

2A. Required Columns for Sample data

Required Header Name	Example	Description
sample_name	MA1.43	An unique identifier for each sample sequenced
title	A mouse experiement analyzing the	A title for the study
taxon_id	410661	The NCBI taxon identifier for the metagenome (See NCBI or http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi)
scientific_name	mouse gut metagenome	The name of the corresponding taxon ID
description	mouse.fecal.timeA	Can be any identifier for describing the sample
host_subject_id	Mouse123	An unique identifier for each subject
sample_type	Fecal	The type of sample
collection_timestamp	09/18/2012	Date of the sample's collection
physical_specimen_remaining	TRUE	TRUE/FALSE if there is still specimen remaining for extraction.
dna_extracted	TRUE	TRUE/FALSE if there is still DNA remaining for sequencing.
physical_specimen_location	NYUMC	Center/University name for the location of the samples
geo_loc_name	USA:NY:New York	Location of the University/Center
elevation	33	Height of land above sea level in meters at the sampling site
latitude	32.842	Latitude of the center/university
longitude	-117.258	Longitude of the center/university
env_biome	host-associated	(air/built environment/host-associated/human-associated/human-skin/human-oral/human-gut/human-vaginal/microbial mat/microbial biofilm/misc environment/plant-associated/sediment/soil/wastewater/sludge/water)
env_feature	urban biome	Environmental Ontology (ENVO) identifier. Only change if samples are not from a host and are environmental.
env_matter	feces	Similar to sample_type.
(Any name)	Treatment/Time/Exposure	Any necessary additional metadata

Step 3. Preparing Prep Information Table

Download a prep template here:

Required Header Name	Example	Description
sample_name	MA1.43	An unique identifier for each sample sequenced
barcode	XXX	Nucleotide sequence per sample. If using the 12bp Golay barcodes, use the first 2 bases as the linker and the last 10bp as the b
linker	XXX	2 nucloetide sequences used as a link
primer	XXX	Linker primer sequence
center_name	XXX	XXX
center_project_name	XXX	XXX
experiment_design_description	XXX	XXX
instrument_model	Illumina MiSeq	The type of instrument used for sequencing. (Illumina Genome Analyzer/ Illumina Genome Analyzer II/ Illumina Genome Analyzer Ix/ Illumina HiSeq 2500/ Illumina HiSeq 2000/ Illumina HiSeq 1500/ Illumina HiSeq 1000/ Illumina MiSeq/ Illumina HiScanSQ/ HiSeq X Ten/ NextSeq 500/ unspecified)
library_construction_protocol	XXX	XXX
platform	XXX	XXX
library_construction_protocol	XXX	XXX
run_prefix	XXX	Only include this column if there are multiple sequencing runs for one study.

QIITA

Qiita Database Submission

Description

Before You Begin

Step 1. Create a new study

Step 2. Preparing Sample Information Table

Download sample template here:

2A. Required Columns for Sample data

Step 3. Preparing Prep Information Table

Download a prep template here:

Step 4. Upload FASTQ and sample/prep tables

Step 5. Select sample metadata

Step 6. Associate FASTQ with prep table

Step 7. Process the 16s data

Step 8. Verify correct sampling depth and metadata

results matching ""

No results matching ""