Release 95 statistics
GTDB release date: July 17, 2020
Taxon overview
GTDB r95 spans 194,600 genomes organized into 31,910 species clusters.
BacteriaArchaeaTotal
Phylum11118129
Class32742369
Order9171031,020
Family2,2822762,558
Genus8,7786509,428
Species30,2381,67231,910
Species overview
GTDB r95 is comprised of 191,527 bacterial and 3,073 archaeal genomes organized into 30,238 bacterial and 1,672 archaeal species clusters. This is an increase of 48,862 (33.4%) genomes and 7,219 (29.2%) species clusters compared to GTDBr89. Of the 145,896 genomes in r89, 145,566 (99.77%) were assigned to the same species cluster in r95, 172 (0.12%) were assigned to a different species cluster in r95, and 158 (0.11%) are no longer present in the GTDB. Of the 24,706 species representatives in r89, 23,955 (96.96%) are unchanged in r95, 729 (2.95%) have been replaced with a preferred representative in r95 (e.g. type strain of species), 7(0.03%) are no longer present in the GTDB, and 15 (0.06%) were merged with another r95 species cluster.
R04-RS89R05-RS95Growth(%)
Bacterial genomes143,512191,52733.5
Archaeal genomes2,3923,07328.4
Bacterial species clusters23,45830,23828.9
Archaeal species clusters1,2481,67234.0
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
PhylumClassOrderFamilyGenus
Proteobacteria 9,474Gammaproteobacteria 5,779Burkholderiales 1,620Burkholderiaceae 1,096Streptomyces 571
Actinobacteriota 4,261Alphaproteobacteria 3,659Pseudomonadales 1,503Flavobacteriaceae 871Pseudomonas_E 476
Bacteroidota 3,781Bacteroidia 3,524Enterobacterales 1,448Lachnospiraceae 805Mycobacterium 287
Firmicutes 2,737Actinomycetia 3,491Bacteroidales 1,367Rhodobacteraceae 754Prevotella 280
Firmicutes_A 2,636Bacilli 2,702Flavobacteriales 1,304Mycobacteriaceae 671Streptococcus 270
Patescibacteria 1,131Clostridia 2,361Actinomycetales 1,246Streptomycetaceae 635Flavobacterium 239
Cyanobacteria 727Cyanobacteriia 648Rhizobiales 1,181Enterobacteriaceae 613Collinsella 223
Desulfobacterota 560Paceibacteria 432Mycobacteriales 1,128Rhizobiaceae 607Prochlorococcus_A 178
Halobacteriota 545Coriobacteriia 409Lachnospirales 866Pseudomonadaceae 597Vibrio 157
Chloroflexota 520Verrucomicrobiae 367Lactobacillales 838Sphingomonadaceae 511Microbacterium 155
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
PhylumClassOrderFamilyGenusSpecies
Proteobacteria 92,617Gammaproteobacteria 83,163Enterobacterales 52,385Enterobacteriaceae 45,359Escherichia 19,596Escherichia flexneri 12,267
Firmicutes 44,883Bacilli 44,844Lactobacillales 24,694Streptococcaceae 14,735Streptococcus 14,475Salmonella enterica 10,510
Actinobacteriota 19,045Actinomycetia 17,443Pseudomonadales 15,826Staphylococcaceae 12,904Staphylococcus 12,748Staphylococcus aureus 10,497
Firmicutes_A 8,008Alphaproteobacteria 9,386Staphylococcales 12,955Mycobacteriaceae 10,757Salmonella 10,729Streptococcus pneumoniae 8,340
Bacteroidota 7,836Clostridia 7,534Mycobacteriales 11,486Pseudomonadaceae 9,264Mycobacterium 9,228Klebsiella pneumoniae 7,580
Campylobacterota 4,994Bacteroidia 7,338Burkholderiales 9,893Burkholderiaceae 6,197Klebsiella 8,343Mycobacterium tuberculosis 6,514
Patescibacteria 2,580Campylobacteria 4,985Campylobacterales 4,972Moraxellaceae 5,293Acinetobacter 4,996Pseudomonas aeruginosa 4,561
Cyanobacteria 1,384Cyanobacteriia 1,235Rhizobiales 4,939Rhizobiaceae 3,941Pseudomonas 4,641Acinetobacter baumannii 3,975
Spirochaetota 1,328Paceibacteria 993Bacillales 4,898Vibrionaceae 3,912Pseudomonas_E 4,067Escherichia coli 3,724
Halobacteriota 992Coriobacteriia 809Bacteroidales 3,842Enterococcaceae 3,596Vibrio 3,576Clostridioides difficile 2,007
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).

Bacteria

Archaea

Comparison of GTDB and NCBI taxa
Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.
Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.