Welcome to MiCoDa Web interface

Welcome to the Microbial Community Database (MiCoDa)!


What is MiCoDa?

MiCoDa is a searchable database that hosts over 35,000 samples of processed 16S rRNA gene amplicon sequences from aquatic, host-associated, and mineral environments, spanning the entire globe. To improve cross-study comparability, all samples in MiCoDa have been sequenced in the same region of the 16S rRNA gene (between base pairs 515 and 806). MiCoDa also hosts the Earth Microbiome Project samples, processed in the same manner. MiCoDa is currently the largest public microbiome database available. Its goal is to encourage the reuse of extant sequence data by the life sciences. MiCoDa grew out of the observation that reusing biodiversity is hard, and it is especially complicated for microbiome sequence data. In addition to extensive data and metadata collection, the reuse of microbiome data requires extensive knowledge of bioinformatics and sufficient computational capacity for sequence processing. On the other hand, microbiome data is regularly archived. We created MiCoDa to take advantage of available data and facilitate microbiome data reuse and synthesis by specialists and non-specialists alike. To this end, we have manually curated the data and metadata included, preprocessed the sequence data to maximize comparability, and created a searchable data portal.

Combined with the Earth Microbiome Project dataset, MiCoDa currently houses 29,924 samples from 352 studies, covering 51 countries.

Who are we?

MiCoDa is a team effort led by Dr. Stephanie Jurburg (microbial ecology) in conjunction with Prof. Anna Heintz-Buschart (bioinformatics), Desiree Langer (data curation), Dr. Anahita Kazem (data management), Neha Gupta (computer science), and Prof. Birgitta Koenig-Ries (data management). It is hosted and supported by the Biodiversity Informatics Unit of the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Helmholtz Centre for Environmental Research- Leipzig and Friedrich Schiller Universität- Jena

Dr. Stephanie Jurburg
Lead, microbial ecology
Prof.Dr.Anna Heintz Buschart
Bioinformatics
Dr. Anahita Kazem
Biodiversity data management
Neha Gupta
Computer Science
Prof. Dr. Birgitta Koenig-Ries
Biodiversity data management
Desiree Langer
Data curation

Who is MiCoDa for?

We hope MiCoDa will be used by microbiome specialists and non-specialists. The ease of use will encourage scientists to include bacteria in large-scale ecological syntheses, to place their novel microbiome findings within the context of extant data, to examine large scale patterns in microbiome data, etc. MiCoDa is also for data producers: the metadata files associated with MiCoDa samples include all the necessary bibliometric information to cite data producers, ensuring that they, too, receive credit.

Equitability in data reuse?

A global understanding of ecological phenomena requires a global perspective, and microbial ecology is particularly well suited to pioneer the inversion of knowledge flows in synthetic ecology. Once bioinformatics processing is completed, reusing sequence data requires no major research funding or infrastructure, and robust statistics can be performed with personal computers making it an accessible form of research to scientists globally. Despite inequalities in funding and access to sequence data, microbial ecologists are being trained worldwide, whose expertise and perspectives can and should be harnessed in the analysis of large sequence datasets.

Equitable participation from researchers globally through synthesis has numerous advantages. First, because published and archived data is available to all, the link between access to funding, the scope/breadth of research, and academic credit can be eliminated, as data providers can be cited. Second, by expanding the geographic range of data reusers, data stewardship will also likely improve in the long-term, as proximity between data generators and data users, recognition for data providers and openness in data sharing increases. This, in turn, can reduce biodiversity blind spots. Third, limiting the scope of synthetic research by performing literature searches in a single language biases synthetic research and remains another major barrier in equitably expanding the scope of ecology. Spreading the culture of data reuse across different countries may increase the deposition of data from grey literature (e.g. reports) and the geographic range of meta analyses included in microbial syntheses, reducing geographic and cultural biases while leveraging extant resources.

MiCoDa is designed to allow scientists to select samples based on relevant metadata (e.g., sample type or location) and download processed sequences in as .csv tables. Because the MiCoDa database does not require bioinformatics know-how or dedicated computing infrastructure, it is designed to increase worldwide scholarly access to microbiome datasets, reducing biases in who is able to perform large, comprehensive meta analyses, and facilitating equitability in synthesis research. Furthermore, because datasets are selected from mining published literature, original authors can be cited in later syntheses, increasing the visibility of extant research and giving credit to data providers. In its first version, MiCoDa nearly doubled the amount of microbiome data previously available with the EMP and greatly increased the geographic range and variety of available data.

MiCoDa V1 hosts 2.3 times the amount of data in the EMP and 2.3 times the geographic coverage, and was created from available, published data.

Citing MiCoDa

TBD