Metadata.csv contains the metadata corresponding to the selected samples including 1) sample identifiers, 2) publication identifiers, 3) environmental descriptors, 4) host descriptors, and 5) technical descriptors. These metadata categories are designed to link the processed sequences to their accession numbers in public sequence repositories (1), link the processed sequences to the articles where they were originally made public (2), facilitate the selection of data by microbiome type (3 and 4), and allow for the inclusion of technical sources of variation a posteriori (5). See Ontology for a description of the hierarchical categorization of microbiomes.
- Sample identifiers: sample identifiers link the sequences to their NCBI accession numbers, allowing MiCoDa users to also download and reprocess the data on their own.
- Sample : NCBI accession number
- StudyID : Internal study ID
- Collection : large-scale initiative through which the data was originally collected. Data collected during automated collection and human curation as part of a direct MiCoDa initiative is labelled as MiCoDa VXX. Data collected through Datathon initiatives is also labelled accordingly. Data that was part of other large-scale data collection efforts is also included (e.g., the Earth Microbiome Project). See Collections page for more information on all the Collections included in MiCoDa
- Publication identifiers: publication identifiers link the data to the literature where it was originally published, in order to allow MiCoDa users to collect additional metadata from the original publications directly, and to credit data creators.
- DOI : DOI of the publication from which the data was retrieved
- Year : year of publication
- Journal : publishing journal
- Title : article title
- Abstract : article abstract
- Environmental descriptors: Environmental descriptors allow MiCoDa users to select microbiome samples according to where they were collected. Geographical coordinates may also be used to study the spatial distribution of microbiomes.
- Realm : highest environmental hierarchy with categories "Mineral", "Organism", "Water", and "Other"
- *Environment-Broad : second level of environmental descriptors (see Ontology )
- *Environment-Local : third level of environmental descriptors (see Ontology )
- *Environment-media : fourth level of environmental descriptors (see Ontology )
- *Country : Country the sample was obtained from. Samples from open sea and ocean are labeled as "Sea" or "Ocean", respectively.
- *City.Region : City, region, or body of water the sample was obtained from.
- *Latitude : sample location in decimal degrees
- *Longitude : sample location in decimal degrees
- Host descriptors (where applicable): Like the Environmental descriptors, host descriptors allow users to select microbiome samples according to the organism from which which they were collected. Note that rhizosphere samples are categorized as host-associated.
- Host.Kingdom : Host taxonomy
- Host.Phylum : Host taxonomy
- Host.Class : Host taxonomy
- Host.Order : Host taxonomy
- Host.Family : Host taxonomy
- Host.genus : Host taxonomy
- Host.species : Host taxonomy
- Host.Common.name : Host's common English name (where available)
- Host.Latin.name : Host's Latin name
- Animal.Gut : Whether the sample was obtained from the host's gastrointestinal tract ("yes" where true)
- Technical descriptors: Technical descriptors are included to allow MiCoDa users to adjust for technical variability across studies a posteriori (e.g., as random effects in linear models), or directly, to evaluate the role of technical choices on diversity estimates derived from microbiome sequencing.
- Sequencing.Platform : Sequencer used (where available). Categories include 454 pyrosequencing, Illumina GAIIx, Illumina HiSeq, Illumina MiSeq, Ion Torrent, and PacBio.
- DNA.extraction.method : method (i.e., kit or protocol) used to extract DNA from the sample. As extraction kits' names have changed over time, extraction methods were categorized into 57 methods.
- DNA.extraction.material : a verbal description of the substance from which the sample was extracted.
- DNA.extraction.unit.volume.mass.swab : where possible, a curated description of the sample. 62 categories are included.
- DNA.extraction.amount : where available, the unit and type of measurement describing the samples (e.g., "0.3 g", or " "50 ml, filtered")
- Experimental : Whether the sample belonged to an experimental treatment (“yes” where true). Experimental is FALSE for observational and control samples. All samples in MiCoDa V1 are FALSE.
*These metadata categories match MIMARKS: survey, soil; Version 5.0 categories.