Release 226 statistics
GTDB release date: 16th April, 2025
Taxon overview
GTDB R226 spans 732,475 genomes organized into 143,614 species clusters.
BacteriaArchaeaTotal
Phylum16920189
Class57163634
Order1,9761712,147
Family5,3116035,914
Genus27,3262,07929,405
Species136,6466,968143,614
Species overview
GTDB R226 is comprised of 715,230 bacterial and 17,245 archaeal genomes organized into 136,646 bacterial and 6,968 archaeal species clusters, respectively.
ReleaseBacterial GenomesArchaeal GenomesBacterial Species ClustersArchaeal Species Clusters
R04-RS89143,5122,39223,4581,248
R05-RS95191,5273,07330,2381,672
R06-RS202254,0904,31645,5552,339
R07-RS207311,4806,06262,2913,412
R08-RS214394,9327,77780,7894,416
R9-RS220584,38212,477107,2355,869
R10-RS226715,23017,245136,6466,968
Growth from R09-RS22022.39%38.21%27.43%18.73%
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50, unless a large portion of contamination could be attributed to strain heterogeneity. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates, where contamination exceeds 10%.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
PhylumClassOrderFamilyGenus
Pseudomonadota 35,839Gammaproteobacteria 19,731Burkholderiales 7,245Burkholderiaceae 4,466Streptomyces 1,739
Bacillota 24,866Bacteroidia 16,912Bacteroidales 6,951Lachnospiraceae 3,903Collinsella 1,058
Actinomycetota 16,922Alphaproteobacteria 15,970Oscillospirales 5,467Flavobacteriaceae 2,952Pelagibacter 891
Bacteroidota 16,912Clostridia 15,463Flavobacteriales 4,528Rhodobacteraceae 2,553Prevotella 888
Patescibacteriota 6,435Actinomycetes 10,924Lachnospirales 4,263Sphingomonadaceae 2,180Pseudomonas_E 878
Chloroflexota 4,182Bacilli 7,498Rhizobiales 3,949Streptomycetaceae 1,950Flavobacterium 812
Acidobacteriota 3,758Verrucomicrobiia 2,595Pseudomonadales 3,843Bacteroidaceae 1,845Streptococcus 693
Verrucomicrobiota 3,351Minisyncoccia 2,547Actinomycetales 3,638Chitinophagaceae 1,598Mycobacterium 597
Planctomycetota 3,168Acidimicrobiia 2,102Mycobacteriales 2,983Oscillospiraceae 1,587Cryptobacteroides 555
Cyanobacteriota 2,718Cyanobacteriia 1,943Rhodobacterales 2,553Microbacteriaceae 1,524Microbacterium 451
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
PhylumClassOrderFamilyGenusSpecies
Pseudomonadota 267,130Gammaproteobacteria 226,629Enterobacterales 133,925Enterobacteriaceae 115,303Escherichia 45,533Escherichia coli 44,640
Bacillota 205,871Bacilli 109,361Bacteroidales 67,821Lachnospiraceae 37,618Klebsiella 28,408Klebsiella pneumoniae 23,011
Bacteroidota 89,203Clostridia 89,775Lactobacillales 54,361Staphylococcaceae 25,195Staphylococcus 24,405Staphylococcus aureus 17,542
Actinomycetota 59,430Bacteroidia 89,203Pseudomonadales 42,287Muribaculaceae 24,600Streptococcus 21,852Salmonella enterica 17,159
Campylobacterota 12,704Actinomycetes 45,222Lachnospirales 38,784Streptococcaceae 23,415Salmonella 17,457Pseudomonas aeruginosa 10,492
Patescibacteriota 10,847Alphaproteobacteria 40,295Burkholderiales 30,358Pseudomonadaceae 21,139Acinetobacter 14,084Acinetobacter baumannii 10,048
Verrucomicrobiota 8,325Campylobacteria 12,658Staphylococcales 25,423Bacteroidaceae 20,813Mycobacterium 11,924Streptococcus pneumoniae 9,446
Cyanobacteriota 7,057Verrucomicrobiia 6,806Oscillospirales 25,303Burkholderiaceae 19,661Pseudomonas 10,723Mycobacterium tuberculosis 7,631
Chloroflexota 6,904Coriobacteriia 6,002Mycobacteriales 19,389Mycobacteriaceae 16,757Pseudomonas_E 8,551Enterococcus faecalis 4,055
Acidobacteriota 5,462Acidimicrobiia 5,487Actinomycetales 13,693Moraxellaceae 15,244Vibrio 8,461Enterococcus_B faecium 3,907
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).

Bacteria


Archaea

Comparison of GTDB and NCBI taxa

Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.

Phylum names have been updated to follow the valid publication of 42 names in IJSEM. This has resulted in a large number of active phylum name changes relative to NCBI classifications at the time of this release. NCBI is also adopting these new phyla names.


Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.

Genomes


Species

Nomenclatural types per rank

This plot shows the breakdown of placeholder versus latinized names for each taxonomic rank.

Bacteria: LatinBacteria: PlaceholderArchaea: LatinArchaea: PlaceholderLatinPlaceholder
Phylum76 (44.97%)93 (55.03%)16 (80.00%)4 (20.00%)92 (48.68%)97 (51.32%)
Class170 (29.77%)401 (70.23%)41 (65.08%)22 (34.92%)211 (33.28%)423 (66.72%)
Order401 (20.29%)1,575 (79.71%)73 (42.69%)98 (57.31%)474 (22.08%)1,673 (77.92%)
Family856 (16.12%)4,455 (83.88%)111 (18.41%)492 (81.59%)967 (16.35%)4,947 (83.65%)
Genus4,230 (15.48%)23,096 (84.52%)294 (14.14%)1,785 (85.86%)4,524 (15.39%)24,881 (84.61%)
Species17,977 (13.16%)118,669 (86.84%)811 (11.64%)6,157 (88.36%)18,788 (13.08%)124,826 (86.92%)