Release 202 statistics
GTDB release date: April 27, 2021
Taxon overview
GTDB R202 spans 258,406 genomes organized into 47,894 species clusters.
Bacteria | Archaea | Total | |
---|---|---|---|
Phylum | 127 | 19 | 146 |
Class | 360 | 47 | 407 |
Order | 1,163 | 116 | 1,279 |
Family | 2,886 | 336 | 3,222 |
Genus | 12,037 | 851 | 12,888 |
Species | 45,555 | 2,339 | 47,894 |
Species overview
GTDB R202 is comprised of 254,090 bacterial and 4,316 archaeal genomes organized into 45,555 bacterial and 2,339 archaeal species clusters. This is an increase of 63,806 (32.79%) genomes and 15,984 (50.09%) species clusters compared to GTDB R95.
R04-RS89 | R05-RS95 | R06-RS202 | Growth from R05-RS95 (%) | |
---|---|---|---|---|
Bacterial genomes | 143,512 | 191,527 | 254,090 | 32.67 |
Archaeal genomes | 2,392 | 3,073 | 4,316 | 40.45 |
Bacterial species clusters | 23,458 | 30,238 | 45,555 | 50.65 |
Archaeal species clusters | 1,248 | 1,672 | 2,339 | 39.89 |
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
Phylum | Class | Order | Family | Genus |
---|---|---|---|---|
Proteobacteria 14,419 | Gammaproteobacteria 7,998 | Burkholderiales 2,491 | Burkholderiaceae 1,668 | Pelagibacter 841 |
Actinobacteriota 6,192 | Alphaproteobacteria 6,379 | Bacteroidales 2,083 | Flavobacteriaceae 1,247 | Streptomyces 762 |
Bacteroidota 5,818 | Bacteroidia 5,402 | Pseudomonadales 1,886 | Lachnospiraceae 1,155 | Pseudomonas_E 574 |
Firmicutes_A 4,062 | Actinomycetia 4,864 | Flavobacteriales 1,847 | Pelagibacteraceae 1,114 | Prevotella 383 |
Firmicutes 3,390 | Clostridia 3,566 | Enterobacterales 1,701 | Rhodobacteraceae 1,015 | Mycobacterium 339 |
Patescibacteria 1,601 | Bacilli 3,352 | Actinomycetales 1,678 | Streptomycetaceae 846 | Flavobacterium 319 |
Cyanobacteria 1,067 | Cyanobacteriia 920 | Mycobacteriales 1,426 | Mycobacteriaceae 811 | Collinsella 316 |
Chloroflexota 982 | Verrucomicrobiae 699 | Rhizobiales 1,395 | Sphingomonadaceae 762 | Streptococcus 301 |
Verrucomicrobiota 917 | Paceibacteria 684 | Lachnospirales 1,285 | Pseudomonadaceae 731 | Prochlorococcus_A 265 |
Planctomycetota 748 | Coriobacteriia 594 | Oscillospirales 1,249 | Enterobacteriaceae 724 | Microbacterium 206 |
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
Phylum | Class | Order | Family | Genus | Species |
---|---|---|---|---|---|
Proteobacteria 120,757 | Gammaproteobacteria 104,665 | Enterobacterales 64,157 | Enterobacteriaceae 55,347 | Escherichia 23,687 | Escherichia flexneri 14,743 |
Firmicutes 53,879 | Bacilli 53,835 | Lactobacillales 28,917 | Streptococcaceae 16,112 | Streptococcus 15,765 | Staphylococcus aureus 12,022 |
Actinobacteriota 24,602 | Actinomycetia 21,459 | Pseudomonadales 19,449 | Staphylococcaceae 15,139 | Staphylococcus 14,962 | Salmonella enterica 11,523 |
Bacteroidota 13,917 | Alphaproteobacteria 16,016 | Staphylococcales 15,215 | Mycobacteriaceae 11,561 | Salmonella 11,772 | Klebsiella pneumoniae 9,621 |
Firmicutes_A 12,649 | Bacteroidia 12,772 | Burkholderiales 13,431 | Pseudomonadaceae 10,859 | Klebsiella 10,886 | Streptococcus pneumoniae 8,405 |
Campylobacterota 5,968 | Clostridia 11,762 | Mycobacteriales 12,534 | Burkholderiaceae 8,796 | Mycobacterium 9,568 | Mycobacterium tuberculosis 6,684 |
Patescibacteria 3,496 | Campylobacteria 5,957 | Bacteroidales 7,082 | Moraxellaceae 6,881 | Acinetobacter 6,462 | Pseudomonas aeruginosa 5,211 |
Cyanobacteria 2,210 | Cyanobacteriia 1,949 | Campylobacterales 5,940 | Vibrionaceae 4,865 | Pseudomonas 5,311 | Acinetobacter baumannii 4,766 |
Verrucomicrobiota 2,095 | Verrucomicrobiae 1,742 | Bacillales 4,944 | Enterococcaceae 4,454 | Pseudomonas_E 4,871 | Escherichia coli 4,370 |
Spirochaetota 1,689 | Paceibacteria 1,561 | Rhizobiales 4,913 | Lactobacillaceae 4,070 | Vibrio 4,447 | Streptococcus pyogenes 2,136 |
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).
Bacteria
Archaea
Comparison of GTDB and NCBI taxa
Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.
Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.