Release 207 statistics
GTDB release date: April 8, 2022
Taxon overview
GTDB R207 spans 317,542 genomes organized into 65,703 species clusters.
Bacteria | Archaea | Total | |
---|---|---|---|
Phylum | 148 | 18 | 166 |
Class | 425 | 52 | 477 |
Order | 1,439 | 132 | 1,571 |
Family | 3,614 | 456 | 4,070 |
Genus | 15,342 | 1,344 | 16,686 |
Species | 62,291 | 3,412 | 65,703 |
Species overview
GTDB R207 is comprised of 311,480 bacterial and 6,062 archaeal genomes organized into 62,291 bacterial and 3,412 archaeal species clusters.
R04-RS89 | R05-RS95 | R06-RS202 | R07-RS207 | Growth from R06-RS202 (%) | |
---|---|---|---|---|---|
Bacterial genomes | 143,512 | 191,527 | 254,090 | 311,480 | 22.59 |
Archaeal genomes | 2,392 | 3,073 | 4,316 | 6,062 | 40.45 |
Bacterial species clusters | 23,458 | 30,238 | 45,555 | 62,291 | 36.73 |
Archaeal species clusters | 1,248 | 1,672 | 2,339 | 3,412 | 45.87 |
Genome categories
GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.
GTDB species representatives
Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.
Quality of GTDB representative genomes
The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates.
Taxa with the largest number of species
Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.
Phylum | Class | Order | Family | Genus |
---|---|---|---|---|
Proteobacteria 17,350 | Gammaproteobacteria 9,582 | Bacteroidales 3,805 | Lachnospiraceae 2,184 | Streptomyces 855 |
Bacteroidota 8,588 | Clostridia 8,185 | Burkholderiales 3,181 | Burkholderiaceae 2,063 | Pelagibacter 855 |
Firmicutes_A 8,243 | Bacteroidia 8,033 | Oscillospirales 3,168 | Flavobacteriaceae 1,452 | Pseudomonas_E 626 |
Actinobacteriota 7,328 | Alphaproteobacteria 7,684 | Lachnospirales 2,360 | Rhodobacteraceae 1,187 | Prevotella 554 |
Firmicutes 4,216 | Actinomycetia 5,591 | Flavobacteriales 2,233 | Bacteroidaceae 1,135 | Collinsella 403 |
Patescibacteria 2,485 | Bacilli 4,215 | Pseudomonadales 2,152 | Pelagibacteraceae 1,134 | Flavobacterium 395 |
Chloroflexota 1,387 | Cyanobacteriia 1,113 | Actinomycetales 1,980 | Streptomycetaceae 924 | Mycobacterium 370 |
Cyanobacteria 1,372 | Paceibacteria 961 | Enterobacterales 1,767 | Acutalibacteraceae 919 | Streptococcus 365 |
Verrucomicrobiota 1,325 | Verrucomicrobiae 912 | Rhizobiales 1,709 | Mycobacteriaceae 891 | Cryptobacteroides 361 |
Planctomycetota 1,071 | Coriobacteriia 779 | Mycobacteriales 1,620 | Sphingomonadaceae 883 | Prochlorococcus_A 272 |
Taxa with the largest number of sequenced genomes
Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.
Phylum | Class | Order | Family | Genus | Species |
---|---|---|---|---|---|
Proteobacteria 141,114 | Gammaproteobacteria 121,804 | Enterobacterales 74,108 | Enterobacteriaceae 63,971 | Escherichia 27,205 | Escherichia coli 26,859 |
Firmicutes 61,795 | Bacilli 61,794 | Lactobacillales 32,177 | Staphylococcaceae 17,191 | Staphylococcus 16,835 | Staphylococcus aureus 13,059 |
Actinobacteriota 28,532 | Actinomycetia 24,379 | Pseudomonadales 22,407 | Streptococcaceae 17,088 | Streptococcus 16,657 | Salmonella enterica 12,285 |
Firmicutes_A 21,744 | Clostridia 21,601 | Staphylococcales 17,289 | Mycobacteriaceae 12,366 | Klebsiella 13,720 | Klebsiella pneumoniae 11,294 |
Bacteroidota 20,893 | Bacteroidia 19,504 | Burkholderiales 15,809 | Pseudomonadaceae 12,187 | Salmonella 12,544 | Streptococcus pneumoniae 8,452 |
Campylobacterota 6,845 | Alphaproteobacteria 19,166 | Mycobacteriales 13,541 | Burkholderiaceae 10,295 | Mycobacterium 9,983 | Mycobacterium tuberculosis 6,836 |
Patescibacteria 4,645 | Campylobacteria 6,825 | Bacteroidales 11,844 | Moraxellaceae 7,998 | Acinetobacter 7,409 | Pseudomonas aeruginosa 5,623 |
Verrucomicrobiota 2,848 | Cyanobacteriia 2,387 | Campylobacterales 6,808 | Lachnospiraceae 6,306 | Pseudomonas 5,738 | Acinetobacter baumannii 5,417 |
Cyanobacteria 2,818 | Verrucomicrobiae 2,221 | Oscillospirales 6,803 | Vibrionaceae 5,492 | Pseudomonas_E 5,692 | Clostridioides difficile 2,225 |
Spirochaetota 2,331 | Coriobacteriia 1,961 | Lachnospirales 6,595 | Enterococcaceae 5,353 | Vibrio 5,056 | Enterococcus_B faecium 2,177 |
Relative evolutionary divergence
The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).
Bacteria
Archaea
Comparison of GTDB and NCBI taxa
Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.
Genomic statistics
Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.