Release 207 statistics
GTDB release date: April 8, 2022
Taxon overview
GTDB R207 spans 317,542 genomes organized into 65,703 species clusters.
BacteriaArchaeaTotal
Phylum14818166
Class42552477
Order1,4391321,571
Family3,6144564,070
Genus15,3421,34416,686
Species62,2913,41265,703
Species overview
GTDB R207 is comprised of 311,480 bacterial and 6,062 archaeal genomes organized into 62,291 bacterial and 3,412 archaeal species clusters.
R04-RS89R05-RS95R06-RS202R07-RS207Growth from R06-RS202 (%)
Bacterial genomes143,512191,527254,090311,48022.59
Archaeal genomes2,3923,0734,3166,06240.45
Bacterial species clusters23,45830,23845,55562,29136.73
Archaeal species clusters1,2481,6722,3393,41245.87
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
PhylumClassOrderFamilyGenus
Proteobacteria 17,350Gammaproteobacteria 9,582Bacteroidales 3,805Lachnospiraceae 2,184Streptomyces 855
Bacteroidota 8,588Clostridia 8,185Burkholderiales 3,181Burkholderiaceae 2,063Pelagibacter 855
Firmicutes_A 8,243Bacteroidia 8,033Oscillospirales 3,168Flavobacteriaceae 1,452Pseudomonas_E 626
Actinobacteriota 7,328Alphaproteobacteria 7,684Lachnospirales 2,360Rhodobacteraceae 1,187Prevotella 554
Firmicutes 4,216Actinomycetia 5,591Flavobacteriales 2,233Bacteroidaceae 1,135Collinsella 403
Patescibacteria 2,485Bacilli 4,215Pseudomonadales 2,152Pelagibacteraceae 1,134Flavobacterium 395
Chloroflexota 1,387Cyanobacteriia 1,113Actinomycetales 1,980Streptomycetaceae 924Mycobacterium 370
Cyanobacteria 1,372Paceibacteria 961Enterobacterales 1,767Acutalibacteraceae 919Streptococcus 365
Verrucomicrobiota 1,325Verrucomicrobiae 912Rhizobiales 1,709Mycobacteriaceae 891Cryptobacteroides 361
Planctomycetota 1,071Coriobacteriia 779Mycobacteriales 1,620Sphingomonadaceae 883Prochlorococcus_A 272
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
PhylumClassOrderFamilyGenusSpecies
Proteobacteria 141,114Gammaproteobacteria 121,804Enterobacterales 74,108Enterobacteriaceae 63,971Escherichia 27,205Escherichia coli 26,859
Firmicutes 61,795Bacilli 61,794Lactobacillales 32,177Staphylococcaceae 17,191Staphylococcus 16,835Staphylococcus aureus 13,059
Actinobacteriota 28,532Actinomycetia 24,379Pseudomonadales 22,407Streptococcaceae 17,088Streptococcus 16,657Salmonella enterica 12,285
Firmicutes_A 21,744Clostridia 21,601Staphylococcales 17,289Mycobacteriaceae 12,366Klebsiella 13,720Klebsiella pneumoniae 11,294
Bacteroidota 20,893Bacteroidia 19,504Burkholderiales 15,809Pseudomonadaceae 12,187Salmonella 12,544Streptococcus pneumoniae 8,452
Campylobacterota 6,845Alphaproteobacteria 19,166Mycobacteriales 13,541Burkholderiaceae 10,295Mycobacterium 9,983Mycobacterium tuberculosis 6,836
Patescibacteria 4,645Campylobacteria 6,825Bacteroidales 11,844Moraxellaceae 7,998Acinetobacter 7,409Pseudomonas aeruginosa 5,623
Verrucomicrobiota 2,848Cyanobacteriia 2,387Campylobacterales 6,808Lachnospiraceae 6,306Pseudomonas 5,738Acinetobacter baumannii 5,417
Cyanobacteria 2,818Verrucomicrobiae 2,221Oscillospirales 6,803Vibrionaceae 5,492Pseudomonas_E 5,692Clostridioides difficile 2,225
Spirochaetota 2,331Coriobacteriia 1,961Lachnospirales 6,595Enterococcaceae 5,353Vibrio 5,056Enterococcus_B faecium 2,177
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).

Bacteria


Archaea

Comparison of GTDB and NCBI taxa
Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.

Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.

Genomes


Species