Release 220 statistics
GTDB release date: 24th April, 2024
Taxon overview
GTDB R220 spans 596,859 genomes organized into 113,104 species clusters.
BacteriaArchaeaTotal
Phylum17519194
Class53864602
Order1,8401662,006
Family4,8705645,434
Genus23,1121,84724,959
Species107,2355,869113,104
Species overview
GTDB R220 is comprised of 584,382 bacterial and 12,477 archaeal genomes organized into 107,235 bacterial and 5,869 archaeal species clusters.
R04-RS89R05-RS95R06-RS202R07-RS207R08-RS214R09-RS220Growth from R08-RS214 (%)
Bacterial genomes143,512191,527254,090311,480394,932584,38247.97
Archaeal genomes2,3923,0734,3166,0627,77712,47760.43
Bacterial species clusters23,45830,23845,55562,29180,789107,23532.73
Archaeal species clusters1,2481,6722,3393,4124,4165,86932.90
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50, unless a large portion of contamination could be attributed to strain heterogeneity. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates, where contamination exceeds 10%.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
PhylumClassOrderFamilyGenus
Pseudomonadota 27,965Gammaproteobacteria 15,356Bacteroidales 6,195Lachnospiraceae 3,746Streptomyces 1,070
Bacillota_A 14,790Clostridia 14,724Burkholderiales 5,345Burkholderiaceae 3,332Collinsella 947
Bacteroidota 14,787Bacteroidia 13,828Oscillospirales 5,282Flavobacteriaceae 2,433Pelagibacter 912
Actinomycetota 11,737Alphaproteobacteria 12,479Lachnospirales 4,084Rhodobacteraceae 2,000Prevotella 799
Patescibacteria 4,581Actinomycetes 7,939Flavobacteriales 3,746Bacteroidaceae 1,694Pseudomonas_E 795
Bacillota 3,868Bacilli 3,868Pseudomonadales 3,153Oscillospiraceae 1,540Flavobacterium 643
Bacillota_I 2,903Bacilli_A 2,903Rhizobiales 2,980Sphingomonadaceae 1,470Streptococcus 567
Chloroflexota 2,749Verrucomicrobiae 1,804Actinomycetales 2,943Acutalibacteraceae 1,440Cryptobacteroides 533
Verrucomicrobiota 2,454Paceibacteria 1,779Enterobacterales 2,454UBA660 1,353Mycobacterium 475
Planctomycetota 2,339Coriobacteriia 1,599Mycobacteriales 2,194Ruminococcaceae 1,247Microbacterium 355
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
PhylumClassOrderFamilyGenusSpecies
Pseudomonadota 214,930Gammaproteobacteria 183,508Enterobacterales 110,273Enterobacteriaceae 95,186Escherichia 39,673Escherichia coli 38,926
Bacillota 82,709Bacilli 82,709Bacteroidales 58,934Lachnospiraceae 35,105Klebsiella 22,500Klebsiella pneumoniae 18,499
Bacillota_A 80,317Clostridia 80,141Lactobacillales 47,490Muribaculaceae 24,600Staphylococcus 21,291Staphylococcus aureus 16,021
Bacteroidota 76,591Bacteroidia 74,294Lachnospirales 36,149Staphylococcaceae 21,933Streptococcus 20,045Salmonella enterica 15,089
Actinomycetota 44,996Actinomycetes 34,611Pseudomonadales 33,473Streptococcaceae 21,213Salmonella 15,373Streptococcus pneumoniae 9,133
Campylobacterota 11,105Alphaproteobacteria 31,226Burkholderiales 24,448Pseudomonadaceae 17,078Acinetobacter 11,458Acinetobacter baumannii 8,536
Bacillota_I 10,514Campylobacteria 11,105Staphylococcales 22,087Burkholderiaceae 15,875Mycobacterium 11,275Pseudomonas aeruginosa 8,390
Patescibacteria 8,106Bacilli_A 10,514Oscillospirales 21,952Bacteroidaceae 15,545Pseudomonas 8,578Mycobacterium tuberculosis 7,337
Verrucomicrobiota 6,636Verrucomicrobiae 5,429Mycobacteriales 16,889Mycobacteriaceae 15,323Pseudomonas_E 7,438Enterococcus_B faecium 3,202
Cyanobacteriota 5,634Coriobacteriia 5,065Campylobacterales 11,064Moraxellaceae 12,327Vibrio 6,956Enterococcus faecalis 3,044
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).

Bacteria


Archaea

Comparison of GTDB and NCBI taxa

Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.

Phylum names have been updated to follow the valid publication of 42 names in IJSEM. This has resulted in a large number of active phylum name changes relative to NCBI classifications at the time of this release. NCBI is also adopting these new phyla names.


Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.

Genomes


Species

Nomenclatural types per rank

This plot shows the breakdown of placeholder versus latinized names for each taxonomic rank.

Bacteria: LatinBacteria: PlaceholderArchaea: LatinArchaea: PlaceholderLatinPlaceholder
Phylum74 (42.29%)101 (57.71%)15 (78.95%)4 (21.05%)89 (45.88%)105 (54.12%)
Class160 (29.74%)378 (70.26%)39 (60.94%)25 (39.06%)199 (33.06%)403 (66.94%)
Order378 (20.54%)1,462 (79.46%)72 (43.37%)94 (56.63%)450 (22.43%)1,556 (77.57%)
Family805 (16.53%)4,065 (83.47%)108 (19.15%)456 (80.85%)913 (16.80%)4,521 (83.20%)
Genus3,957 (17.12%)19,155 (82.88%)260 (14.08%)1,587 (85.92%)4,217 (16.90%)20,742 (83.10%)
Species16,168 (15.08%)91,067 (84.92%)678 (11.55%)5,191 (88.45%)16,846 (14.89%)96,258 (85.11%)