Release 214 statistics
GTDB release date: 28th April, 2023
Taxon overview
GTDB R214 spans 402,709 genomes organized into 85,205 species clusters.
BacteriaArchaeaTotal
Phylum16120181
Class48860548
Order1,6241481,772
Family4,2625084,772
Genus19,1531,58620,739
Species80,7894,41685,205
Species overview
GTDB R207 is comprised of 394,932 bacterial and 7,777 archaeal genomes organized into 80,789 bacterial and 4,416 archaeal species clusters.
R04-RS89R05-RS95R06-RS202R07-RS207R08-RS214Growth from R07-RS207 (%)
Bacterial genomes143,512191,527254,090311,480394,93226.79
Archaeal genomes2,3923,0734,3166,0627,77728.29
Bacterial species clusters23,45830,23845,55562,29180,78929.70
Archaeal species clusters1,2481,6722,3393,4124,41629.43
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
PhylumClassOrderFamilyGenus
Pseudomonadota 21,693Gammaproteobacteria 11,935Bacteroidales 4,720Lachnospiraceae 2,902Streptomyces 948
Bacillota_A 11,264Clostridia 11,207Oscillospirales 4,184Flavobacteriaceae 1,785Pelagibacter 855
Bacteroidota 11,000Bacteroidia 10,262Burkholderiales 4,020Rhodobacteraceae 1,479Pseudomonas_E 693
Actinomycetota 8,813Alphaproteobacteria 9,662Lachnospirales 3,140Bacteroidaceae 1,384Prevotella 659
Bacillota 5,292Actinomycetia 6,393Flavobacteriales 2,789Oscillospiraceae 1,183Collinsella 502
Patescibacteria 3,374Bacilli 5,292Pseudomonadales 2,563Burkholderiaceae_B 1,145Flavobacterium 485
Chloroflexota 1,910Paceibacteria 1,369Rhizobiales 2,368Acutalibacteraceae 1,144Cryptobacteroides 463
Cyanobacteriota 1,830Cyanobacteriia 1,276Actinomycetales 2,302Pelagibacteraceae 1,138Streptococcus 420
Verrucomicrobiota 1,722Verrucomicrobiae 1,211Mycobacteriales 1,827Sphingomonadaceae 1,113Mycobacterium 417
Planctomycetota 1,664Coriobacteriia 963Christensenellales 1,567Streptomycetaceae 1,026Microbacterium 282
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
PhylumClassOrderFamilyGenusSpecies
Pseudomonadota 174,676Gammaproteobacteria 150,865Enterobacterales 80,364Enterobacteriaceae 80,364Escherichia 34,358Escherichia coli 33,849
Bacillota 74,157Bacilli 74,157Lactobacillales 38,051Staphylococcaceae 20,155Staphylococcus 19,627Klebsiella pneumoniae 14,975
Actinomycetota 33,994Clostridia 33,352Pseudomonadales 27,546Streptococcaceae 19,088Streptococcus 18,492Staphylococcus aureus 14,959
Bacillota_A 33,495Actinomycetia 28,469Staphylococcales 20,270Pseudomonadaceae 14,789Klebsiella 18,145Salmonella enterica 13,832
Bacteroidota 29,028Bacteroidia 27,250Burkholderiales 18,787Mycobacteriaceae 13,735Salmonella 14,109Streptococcus pneumoniae 8,895
Campylobacterota 8,496Alphaproteobacteria 23,655Bacteroidales 17,142Moraxellaceae 9,879Mycobacterium 10,657Mycobacterium tuberculosis 7,132
Patescibacteria 5,830Campylobacteria 8,470Mycobacteriales 14,998Lachnospiraceae 9,794Acinetobacter 9,221Pseudomonas aeruginosa 7,037
Cyanobacteriota 3,846Cyanobacteriia 2,997Enterobacterales_A 12,508Bacteroidaceae 8,212Pseudomonas 7,187Acinetobacter baumannii 6,912
Verrucomicrobiota 3,678Verrucomicrobiae 2,834Oscillospirales 10,969Burkholderiaceae 7,042Pseudomonas_E 6,686Clostridioides difficile 2,701
Chloroflexota 3,064Coriobacteriia 2,570Lachnospirales 10,229Lactobacillaceae 6,794Vibrio 6,120Enterococcus_B faecium 2,657
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).

Bacteria


Archaea

Comparison of GTDB and NCBI taxa

Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.

Phylum names have been updated to follow the valid publication of 42 names in IJSEM. This has resulted in a large number of active phylum name changes relative to NCBI classifications at the time of this release. NCBI is also adopting these new phyla names.


Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.

Genomes


Species

Nomenclatural types per rank

This plot shows the breakdown of placeholder versus latinized names for each taxonomic rank.

Bacteria: LatinBacteria: PlaceholderArchaea: LatinArchaea: PlaceholderLatinPlaceholder
Phylum71 (44.10%)90 (55.90%)15 (75.00%)5 (25.00%)86 (47.51%)95 (52.49%)
Class153 (31.35%)335 (68.65%)39 (65.00%)21 (35.00%)192 (35.04%)356 (64.96%)
Order356 (21.92%)1,268 (78.08%)59 (39.86%)89 (60.14%)415 (23.42%)1,357 (76.58%)
Family758 (17.78%)3,506 (82.22%)91 (17.91%)417 (82.09%)849 (17.79%)3,923 (82.21%)
Genus3,723 (19.44%)15,430 (80.56%)226 (14.25%)1,360 (85.75%)3,949 (19.04%)16,790 (80.96%)
Species14,489 (17.93%)66,300 (82.07%)591 (13.38%)3,825 (86.62%)15,080 (17.70%)70,125 (82.30%)