Release 202 statistics
GTDB release date: April 27, 2021
Taxon overview
GTDB R202 spans 258,406 genomes organized into 47,894 species clusters.
Bacteria Archaea Total
Phylum 127 19 146
Class 360 47 407
Order 1,163 116 1,279
Family 2,886 336 3,222
Genus 12,037 851 12,888
Species 45,555 2,339 47,894
Species overview
GTDB R202 is comprised of 254,090 bacterial and 4,316 archaeal genomes organized into 45,555 bacterial and 2,339 archaeal species clusters. This is an increase of 63,806 (32.79%) genomes and 15,984 (50.09%) species clusters compared to GTDB R95.
R04-RS89 R05-RS95 R06-RS202 Growth from R05-RS95 (%)
Bacterial genomes 143,512 191,527 254,090 32.67
Archaeal genomes 2,392 3,073 4,316 40.45
Bacterial species clusters 23,458 30,238 45,555 50.65
Archaeal species clusters 1,248 1,672 2,339 39.89
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
Phylum Class Order Family Genus
Proteobacteria 14,419 Gammaproteobacteria 7,998 Burkholderiales 2,491 Burkholderiaceae 1,668 Pelagibacter 841
Actinobacteriota 6,192 Alphaproteobacteria 6,379 Bacteroidales 2,083 Flavobacteriaceae 1,247 Streptomyces 762
Bacteroidota 5,818 Bacteroidia 5,402 Pseudomonadales 1,886 Lachnospiraceae 1,155 Pseudomonas_E 574
Firmicutes_A 4,062 Actinomycetia 4,864 Flavobacteriales 1,847 Pelagibacteraceae 1,114 Prevotella 383
Firmicutes 3,390 Clostridia 3,566 Enterobacterales 1,701 Rhodobacteraceae 1,015 Mycobacterium 339
Patescibacteria 1,601 Bacilli 3,352 Actinomycetales 1,678 Streptomycetaceae 846 Flavobacterium 319
Cyanobacteria 1,067 Cyanobacteriia 920 Mycobacteriales 1,426 Mycobacteriaceae 811 Collinsella 316
Chloroflexota 982 Verrucomicrobiae 699 Rhizobiales 1,395 Sphingomonadaceae 762 Streptococcus 301
Verrucomicrobiota 917 Paceibacteria 684 Lachnospirales 1,285 Pseudomonadaceae 731 Prochlorococcus_A 265
Planctomycetota 748 Coriobacteriia 594 Oscillospirales 1,249 Enterobacteriaceae 724 Microbacterium 206
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
Phylum Class Order Family Genus Species
Proteobacteria 120,757 Gammaproteobacteria 104,665 Enterobacterales 64,157 Enterobacteriaceae 55,347 Escherichia 23,687 Escherichia flexneri 14,743
Firmicutes 53,879 Bacilli 53,835 Lactobacillales 28,917 Streptococcaceae 16,112 Streptococcus 15,765 Staphylococcus aureus 12,022
Actinobacteriota 24,602 Actinomycetia 21,459 Pseudomonadales 19,449 Staphylococcaceae 15,139 Staphylococcus 14,962 Salmonella enterica 11,523
Bacteroidota 13,917 Alphaproteobacteria 16,016 Staphylococcales 15,215 Mycobacteriaceae 11,561 Salmonella 11,772 Klebsiella pneumoniae 9,621
Firmicutes_A 12,649 Bacteroidia 12,772 Burkholderiales 13,431 Pseudomonadaceae 10,859 Klebsiella 10,886 Streptococcus pneumoniae 8,405
Campylobacterota 5,968 Clostridia 11,762 Mycobacteriales 12,534 Burkholderiaceae 8,796 Mycobacterium 9,568 Mycobacterium tuberculosis 6,684
Patescibacteria 3,496 Campylobacteria 5,957 Bacteroidales 7,082 Moraxellaceae 6,881 Acinetobacter 6,462 Pseudomonas aeruginosa 5,211
Cyanobacteria 2,210 Cyanobacteriia 1,949 Campylobacterales 5,940 Vibrionaceae 4,865 Pseudomonas 5,311 Acinetobacter baumannii 4,766
Verrucomicrobiota 2,095 Verrucomicrobiae 1,742 Bacillales 4,944 Enterococcaceae 4,454 Pseudomonas_E 4,871 Escherichia coli 4,370
Spirochaetota 1,689 Paceibacteria 1,561 Rhizobiales 4,913 Lactobacillaceae 4,070 Vibrio 4,447 Streptococcus pyogenes 2,136
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).

Bacteria

Archaea

Comparison of GTDB and NCBI taxa
Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.
Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.