Release 95 Statistics

Taxon Overview

GTDB r95 spans 194,600 genomes organized into 31,910 species clusters.

Bacteria Archaea Total
Phylum 111 18 129
Class 327 42 369
Order 917 103 1,020
Family 2,282 276 2,558
Genus 8,778 650 9,428
Species 30,238 1,672 31,910
Species Overview

GTDB r95 is comprised of 191,527 bacterial and 3,073 archaeal genomes organized into 30,238 bacterial and 1,672 archaeal species clusters. This is an increase of 48,862 (33.4%) genomes and 7,219 (29.2%) species clusters compared to GTDBr89. Of the 145,896 genomes in r89, 145,566 (99.77%) were assigned to the same species cluster in r95, 172 (0.12%) were assigned to a different species cluster in r95, and 158 (0.11%) are no longer present in the GTDB. Of the 24,706 species representatives in r89, 23,955 (96.96%) are unchanged in r95, 729 (2.95%) have been replaced with a preferred representative in r95 (e.g. type strain of species), 7(0.03%) are no longer present in the GTDB, and 15 (0.06%) were merged with another r95 species cluster.

R04-RS89 R05-RS95 Growth(%)
Bacterial genomes 143,512 191,527 33.5
Archaeal genomes 2,392 3,073 28.4
Bacterial species clusters 23,458 30,238 28.9
Archaeal species clusters 1,248 1,672 34.0
Genome Categories

GTDB taxa are comprised of isolate genomes, metagenome-assembled genomes (MAGs), and single-amplified genomes (SAGs). The following plot indicates the proportion of taxa at each taxonomic rank comprised exclusively of isolate genomes, exclusively of environmental genomes (i.e. MAGs/SAGs), or both isolate and environmental genomes.

genome_category_per_rank
GTDB Species Representatives

Each GTDB species cluster is represented by a single genome. Genomes assembled from the type strain of the species were selected where possible, though the majority of species clusters are currently assigned only placeholder names. The proportion of representatives which are isolates, MAGs, or SAGs is given for each category.

sp_rep_type
Quality of GTDB Representative Genomes

The quality of the genomes selected as GTDB species representatives is given below. Genome completeness and contamination were estimated using CheckM and are colored based on the MIMAG genome standards. In general, representative genomes were restricted to having a quality satisfying completeness - 5*contamination >50. A few exceptions exist in order to retain well-known species with abnormal CheckM quality estimates.

genome_quality_species
Taxa with the largest number of species

Taxa encompassing the largest number of GTDB species clusters is given for each taxonomic rank.

Taxa with largest number of sequenced genomes

Taxa encompassing the largest number of genomes in the GTDB is given for each taxonomic ranks.

Relative Evolutionary Divergence

The following graphs show the relative evolutionary divergence (RED) of taxa at each taxonomic rank from phylum to genus. RED values provide an operational approximation of relative time with extant taxa existing in the present (RED=1), the last common ancestor occurring at a fixed time in the past (RED=0), and internal nodes being linearly interpolated between these values according to lineage-specific rates of evolution. RED intervals for normalizing taxa at taxonomic ranks was operationally defined as the median RED value (indicated by a blue bar) at each rank ±0.1 (indicated by grey bars).

Note that RED values are analysis-specific and should not be used as absolute values for comparison between studies.

Bacteria

Archaea

Comparison of GTDB and NCBI taxa

Comparison of GTDB and NCBI taxonomic assignments across GTDB species representative genomes and all GTDB genomes which have an assigned NCBI taxonomy. For each taxonomic rank, a taxon was classified as being unchanged if its name was identical in both taxonomies, passively changed if the GTDB taxonomy provided name information absent in the NCBI taxonomy, or actively changed if the name was different between the two taxonomies.

ncbi_compare_species ncbi_compare_genomes
Genomic Statistics

Key genomic statistics for the GTDB species representative genomes and all genomes in the GTDB.

For clarity, extreme outliers are not shown which results in the number of species and genomes being slightly different for each plot.

genomic_stats_species genomic_stats_genomes