GTDB - About

About

2.56.0

The Genome Taxonomy Database (GTDB) is an initiative to establish a standardised microbial taxonomy based on genome phylogeny, primarily funded by the Australian Research Council via a Laureate Fellowship (FL150100038), a Future Fellowship (FT170100213), and Discovery Project (DP220100900), with the welcome assistance of strategic funding from The University of Queensland.

The genomes used to construct the phylogeny are obtained from RefSeq and GenBank, and GTDB releases are indexed to RefSeq releases, starting with release 76. Importantly and increasingly, this dataset includes draft genomes of uncultured microorganisms obtained from metagenomes and single cells, ensuring improved genomic representation of the microbial world. All genomes are independently quality controlled using CheckM before inclusion in GTDB, see statistics here .

The GTDB taxonomy is based on genome trees inferred using FastTree from an aligned concatenated set of 120 single copy marker proteins for Bacteria, and with IQ-TREE from a concatenated set of 53 (starting with R07-RS207) and 122 (prior to R07-RS207) marker proteins for Archaea (download page here ). Additional marker sets are also used to cross-validate tree topologies including concatenated ribosomal proteins and ribosomal RNA genes.

NCBI taxonomy was initially used to decorate the genome tree via tax2tree and subsequently used as a reference source of new taxonomic opinions including new names. The 16S rRNA-based Greengenes and SILVA taxonomies were initially used to supplement the taxonomy particularly in regions of the tree with no cultured representatives, however genome assembly identifiers are now used to create placeholder names for uncultured taxa.

LPSN is used as the primary nomenclatural reference for establishing naming priorities and nomenclature types. All taxonomic ranks except species are normalized using PhyloRank and the taxonomy manually curated to remove polyphyletic groups. Polyphyly and rank evenness can be visualised in PhyloRank plots . Species were originally delineated based on phylogeny and rank normalization but this was replaced with an ANI-based method (starting with R04-RS89) to enable scalable and automated assignment of genomes to species clusters.

The GTDB taxonomy can be queried and downloaded through a number of tools on this website. Classification of new genomes based on the GTDB framework can be done via GTDB-Tk.

The Team

The GTDB is a global enterprise with team members located in Australia, Denmark, Canada and Austria, and comprises a curation and development group.

Curation team

Development team

Scientific Advisory Board

Robert Lanfear
Ph.D., Professor, Research School of Biology, The Australian National University, Australia

Per Halkjær Nielsen
Ph.D., Professor, Head of Centre for Microbial Communities, Aalborg University, Denmark

Jeff Christiansen
Ph.D., Deputy Director, Associate Director (Engagements and Operations), Australian BioCommons, Australia

Lorna Richardson
M.Sc., Microbiome Resources Coordinator, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), United Kingdom

Conrad Schoch
Ph.D., NCBI Taxonomy Team lead, National Library of Medicine (NLM), National Institute for Biotechnology Information (NCBI), United States of America

Jörg Overmann
Ph.D., Professor, Scientific Director, Director General, The Bavarian State Collections of Natural History, Germany

Mauricio Chalita
Ph.D., EzBioCloud database team lead, Seoul National University, Korea

Tobias Guldberg Frøslev
Ph.D., Programme Officer for Science Support in the Global Biodiversity Information Facility (GBIF), Denmark

Yasukazu Nakamura
Ph.D., Professor, DNA Data Bank of Japan (DDBJ) Division Head (International Affairs), Japan

Cite GTDB

If you find the GTDB useful in your work please cite:

Parks, D.H., et al. (2025). "GTDB release 10: a complete and systematic taxonomy for 715 230 bacterial and 17 245 archaeal genomes", Nucleic Acids Research, 2025;, gkaf1040, https://doi.org/10.1093/nar/gkaf1040
Chuvochina, M., et al. (2023). "Proposal of names for 329 higher rank taxa defined in the Genome Taxonomy Database under two prokaryotic codes", FEMS Microbiology Letters, Volume 370, 2023, fnad071, https://doi.org/10.1093/femsle/fnad071
Parks, D.H., et al. (2021). "GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy." Nucleic Acids Research, https://doi.org/10.1093/nar/gkab776.
Rinke, C, et al. (2021). "A standardized archaeal taxonomy for the Genome Taxonomy Database Nature Microbiology 6, 946–959 (2021). https://doi.org/10.1038/s41564-021-00918-8.
Parks, D.H., et al. (2020). "A complete domain-to-species taxonomy for Bacteria and Archaea." Nature Biotechnology, https://doi.org/10.1038/s41587-020-0501-8.
Parks, D.H., et al. (2018). "A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life." Nature Biotechnology, 36: 996-1004.

Reclassification of Shigella species as synonyms of E. coli is discussed in:

Parks, D.H., et al. (2021). "Reclassification of Shigella species as later heterotypic synonyms of Escherichia coli in the Genome Taxonomy Database" bioRxiv, https://doi.org/10.1101/2021.09.22.461432.

If you use the GTDB-Tk please cite:

Chaumeil, P.-A, et al. (2022). "GTDB-Tk v2: memory friendly classification with the Genome Taxonomy Database" Bioinformatics, btac672: https://doi.org/10.1093/bioinformatics/btac672.
Chaumeil, P.-A, et al. (2019). "GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database." Bioinformatics, btz848: https://doi.org/10.1093/bioinformatics/btz848.

A full list of GTDB manuscripts can be found on Google Scholar