Metadata
Metadata.csv contains the metadata corresponding to the selected samples including 1) sample identifiers, 2) publication identifiers, 3) environmental descriptors, 4) host descriptors, and 5) technical descriptors. These metadata categories are designed to link the processed sequences to their accession numbers in public sequence repositories (1), link the processed sequences to the articles where they were originally made public (2), facilitate the selection of data by microbiome type (3 and 4), and allow for the inclusion of technical sources of variation a posteriori (5). See Ontology for a description of the hierarchical categorization of microbiomes.
- Sample identifiers: sample identifiers link the sequences to their NCBI accession numbers, allowing MiCoDa users to also download and reprocess the data on their own.
- Sample: NCBI accession number
- StudyID: Internal study ID
- Dataset: whether the data was originally part of the Earth Microbiome Project dataset, or was retreived specifically for MiCoDa.
- Publication identifiers: publication identifiers link the data to the literature where it was originally published, in order to allow MiCoDa users to collect additional metadata from the original publications directly, and to credit data creators.
- DOI: DOI of the publication from which the data was retrieved
- Year: year of publication
- Journal: publishing journal
- Title: article title
- Abstract: article abstract
- Environmental descriptors: Environmental descriptors allow MiCoDa users to select microbiome samples according to where they were collected. Geographical coordinates may also be used to study the spatial distribution of microbiomes.
- *Realm: highest environmental hierarchy with categories "Mineral", "Organism", "Water", and "Other"
- *Environment: second level of environmental descriptors (see Ontology )
- *Environment_sub: third level of environmental descriptors (see Ontology )
- Environment_sub_sub: fourth level of environmental descriptors (see Ontology )
- Country: Country the sample was obtained from. Samples from open sea and ocean are labeled as "Sea" or "Ocean", respectively.
- City.Region: City, region, or body of water the sample was obtained from.
- Latitude: sample location in decimal degrees
- Longitude: sample location in decimal degrees
- Host descriptors (where applicable): Like the Environmental descriptors, host descriptors allow users to select microbiome samples according to the organism from which which they were collected. Note that rhizosphere samples are categorized as host-associated.
- *Host.Kingdom: Host taxonomy
- *Host.Phylum: Host taxonomy
- *Host.Class: Host taxonomy
- *Host.Order: Host taxonomy
- *Host.Family: Host taxonomy
- *Host.genus: Host taxonomy
- Host.species: Host taxonomy
- Host.Common.name: Host's common English name (where available)
- Host.Latin.name: Host's Latin name
- *Animal.Gut: Whether the sample was obtained from the host's gastrointestinal tract ("yes" where true)
- Technical descriptors: Technical descriptors are included to allow MiCoDa users to adjust for technical variability across studies a posteriori (e.g., as random effects in linear models), or directly, to evaluate the role of technical choices on diversity estimates derived from microbiome sequencing.
- Sequencing.Platform: Sequencer used (where available). Categories include 454 pyrosequencing, Illumina GAIIx, Illumina HiSeq, Illumina MiSeq, Ion Torrent, and PacBio
- DNA.extraction.method: method (i.e., kit or protocol) used to extract DNA from the sample. As extraction kits' names have changed over time, extraction methods were categorized into 57 methods.
- DNA.extraction.material: a verbal description of the substance from which the sample was extracted.
- DNA.extraction.unit.volume.mass.swab: where possible, a curated description of the sample. 62 categories are included.
- DNA.extraction.amount: where available, the unit and type of measurement describing the samples (e.g., "0.3 g", or "
"50 ml, filtered")