A library for working with phylogenetic and population genetic data.
v0.27.0
genesis::taxonomy Namespace Reference

Classes

class  BaseTaxonData
 Base class for storing data on Taxa of a Taxonomy. More...
 
struct  BreadthFirstSearch
 Tag used for find_taxon(). More...
 
struct  DepthFirstSearch
 Tag used for find_taxon(). More...
 
class  EntropyTaxonData
 
class  IteratorPreorder
 
struct  NcbiName
 
struct  NcbiNode
 
class  PrinterNested
 Simple printer class for Taxonomy. More...
 
struct  PruneByEntropySettings
 Store settings for the Taxonomy pruning algorithm prune_by_entropy(). More...
 
class  Taxon
 Store a Taxon, i.e., an element in a Taxonomy, with its name, rank, ID and sub-taxa. More...
 
class  Taxonomy
 Store a Taxonomy, i.e., a nested hierarchy of Taxa. More...
 
class  TaxonomyReader
 Read Taxonomy file formats. More...
 
class  TaxonomyWriter
 Write a Taxonomy as a list of Taxopaths. More...
 
class  Taxopath
 Helper class to store a taxonomic path. More...
 
class  TaxopathGenerator
 Helper class to generate a taxonomic path string from a Taxopath object or a Taxon. More...
 
class  TaxopathParser
 Helper class to parse a string containing a taxonomic path string into a Taxopath object. More...
 

Functions

Taxonadd_from_taxopath (Taxonomy &taxonomy, Taxopath const &taxopath, bool expect_parents)
 Add a Taxon to a Taxonomy, using the taxonomic elements of a Taxopath. More...
 
static void add_subtaxonomy_ (Taxonomy const &taxonomy, bool keep_singleton_inner_nodes, bool keep_inner_node_names, int max_level, int parent_level, tree::NewickBroker &broker)
 Recursive local helper function to add taxa to the tree broker. More...
 
NcbiNameLookup convert_ncbi_name_table (utils::CsvReader::Table const &name_table, size_t tax_id_pos, size_t name_pos, size_t name_class_pos, std::string const &name_class_filter)
 
NcbiNodeLookup convert_ncbi_node_table (utils::CsvReader::Table const &node_table, size_t tax_id_pos, size_t parent_tax_id_pos, size_t rank_pos)
 
Taxonomy convert_ncbi_tables (NcbiNodeLookup const &nodes, NcbiNameLookup const &names)
 
size_t count_taxa_with_prune_status (Taxonomy const &taxonomy, EntropyTaxonData::PruneStatus status)
 Return the number of Taxa that have a certain prune status. More...
 
void expand_small_subtaxonomies (Taxonomy &taxonomy, size_t min_subtaxonomy_size)
 Expand the leaves of a pruned Taxonomy if their sub-taxonomies are smaller than the given threshold. More...
 
template<class UnaryPredicate >
Taxonfind_taxon (Taxonomy &tax, UnaryPredicate p)
 Alias for find_taxon(..., DepthFirstSearch{}) More...
 
template<class SearchStrategy , class UnaryPredicate >
Taxonfind_taxon (Taxonomy &tax, UnaryPredicate p, SearchStrategy strat)
 Find a Taxon based on a given predicate by recursively searching the Taxonomy according to a search strategy. More...
 
template<class UnaryPredicate >
Taxon const * find_taxon (Taxonomy const &tax, UnaryPredicate p)
 Alias for find_taxon(..., DepthFirstSearch{}) More...
 
template<class UnaryPredicate >
Taxon const * find_taxon (Taxonomy const &tax, UnaryPredicate p, BreadthFirstSearch)
 Find a Taxon based on a given predicate by recursively searching the Taxonomy in a breadth first manner. More...
 
template<class UnaryPredicate >
Taxon const * find_taxon (Taxonomy const &tax, UnaryPredicate p, DepthFirstSearch)
 Find a Taxon based on a given predicate by recursively searching the Taxonomy in a depth first manner. More...
 
Taxonfind_taxon_by_id (Taxonomy &tax, std::string const &id)
 Alias for find_taxon_by_id(..., DepthFirstSearch{}). More...
 
template<class SearchStrategy >
Taxonfind_taxon_by_id (Taxonomy &tax, std::string const &id, SearchStrategy strat)
 Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy. More...
 
Taxon const * find_taxon_by_id (Taxonomy const &tax, std::string const &id)
 Alias for find_taxon_by_id(..., DepthFirstSearch{}). More...
 
template<class SearchStrategy >
Taxon const * find_taxon_by_id (Taxonomy const &tax, std::string const &id, SearchStrategy strat)
 Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy. More...
 
Taxonfind_taxon_by_name (Taxonomy &tax, std::string const &name)
 Alias for find_taxon_by_name(..., DepthFirstSearch{}). More...
 
template<class SearchStrategy >
Taxonfind_taxon_by_name (Taxonomy &tax, std::string const &name, SearchStrategy strat)
 Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy. More...
 
Taxon const * find_taxon_by_name (Taxonomy const &tax, std::string const &name)
 Alias for find_taxon_by_name(..., DepthFirstSearch{}). More...
 
template<class SearchStrategy >
Taxon const * find_taxon_by_name (Taxonomy const &tax, std::string const &name, SearchStrategy strat)
 Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy. More...
 
Taxonfind_taxon_by_taxopath (Taxonomy &tax, Taxopath const &taxopath)
 Find a Taxon in a Taxonomy, given its Taxopath. More...
 
Taxon const * find_taxon_by_taxopath (Taxonomy const &tax, Taxopath const &taxopath)
 Find a Taxon in a Taxonomy, given its Taxopath. More...
 
bool has_unique_ids (Taxonomy const &tax)
 Return true iff all IDs of the Taxa in the Taxonomy are unique. More...
 
void levelorder_for_each (Taxonomy &tax, std::function< void(Taxon &)> fn, bool include_inner_taxa=true)
 Apply a function to all taxa of the Taxonomy, traversing it in levelorder. More...
 
void levelorder_for_each (Taxonomy const &tax, std::function< void(Taxon const &)> fn, bool include_inner_taxa=true)
 Apply a function to all taxa of the Taxonomy, traversing it in levelorder. More...
 
std::ostream & operator<< (std::ostream &out, Taxonomy const &tax)
 Print the contents of a Taxonomy, i.e., all nested taxa, up to a limit of 10. More...
 
void postorder_for_each (Taxonomy &tax, std::function< void(Taxon &)> fn, bool include_inner_taxa=true)
 Apply a function to all taxa of the Taxonomy, traversing it in postorder. More...
 
void postorder_for_each (Taxonomy const &tax, std::function< void(Taxon const &)> fn, bool include_inner_taxa=true)
 Apply a function to all taxa of the Taxonomy, traversing it in postorder. More...
 
template<typename TaxonomyType >
utils::Range< IteratorPreorder< Taxonomy, Taxon > > preorder (TaxonomyType &taxonomy)
 
template<typename TaxonomyType >
utils::Range< IteratorPreorder< Taxonomy const, Taxon const > > preorder (TaxonomyType const &taxonomy)
 
void preorder_for_each (Taxonomy &tax, std::function< void(Taxon &)> fn, bool include_inner_taxa=true)
 Apply a function to all taxa of the Taxonomy, traversing it in preorder. More...
 
void preorder_for_each (Taxonomy const &tax, std::function< void(Taxon const &)> fn, bool include_inner_taxa=true)
 Apply a function to all taxa of the Taxonomy, traversing it in preorder. More...
 
std::string print_pruned_taxonomy (Taxonomy const &taxonomy)
 Print a Taxonomy, highlighting those Taxa that are the pruning border, i.e., where we cut off the sub-taxa, and print their entropies next to them. More...
 
void prune_by_entropy (Taxonomy &taxonomy, size_t target_taxonomy_size, PruneByEntropySettings settings={})
 Prune a Taxonomy so that the result (approximately) contains a desired number of "leaf" Taxa, using the entropy of the Taxa as indicator where to prune. More...
 
std::string rank_from_abbreviation (char r)
 Get the taxonomic rank name given its abbreviation. More...
 
std::string rank_to_abbreviation (std::string const &rank)
 Get the abbreviation of a taxonomic rank name. More...
 
Taxonomy read_ncbi_taxonomy (std::string const &node_file, std::string const &name_file)
 
void remove_pruned_taxonomy_children (Taxonomy &taxonomy)
 Remove the children of all Taxa that are pruned, i.e, that have prune status == kOutside. More...
 
void remove_taxa_at_level (Taxonomy &tax, size_t level)
 Remove all Taxa at a given level of depth in the Taxonomy hierarchy, and all their children. More...
 
template<class TaxonDataType >
void reset_taxonomy_data (Taxonomy &taxonomy, bool allow_overwrite=true)
 (Re-)set all Taxon data of a Taxonomy to a specified data type. More...
 
std::pair< std::string, std::string > resolve_rank_abbreviation (std::string const &entry)
 Resolve a combined rank and name entry of the form "k_Bacteria" into the full rank and the name, i.e. "Kingdom" and "Bacteria". More...
 
void sort_by_name (Taxonomy &tax, bool recursive=true, bool case_sensitive=false)
 Sort the Taxa of a Taxonomy by their name. More...
 
void swap (Taxon &lhs, Taxon &rhs)
 
void swap (Taxonomy &lhs, Taxonomy &rhs)
 Swapperator for Taxonomy. More...
 
size_t taxa_count_at_level (Taxonomy const &tax, size_t level)
 Count the number of Taxa at a certain level of depth in the Taxonomy. More...
 
std::vector< size_t > taxa_count_levels (Taxonomy const &tax)
 Count the number of Taxa at each level of depth in the Taxonomy. More...
 
size_t taxa_count_lowest_levels (Taxonomy const &tax)
 Return the number of lowest level Taxa (i.e., taxa without sub-taxa) in the Taxonomy. More...
 
std::unordered_map< std::string, size_t > taxa_count_ranks (Taxonomy const &tax, bool case_sensitive=false)
 Count the number of Taxa in a Taxonomy per rank. More...
 
size_t taxa_count_with_rank (Taxonomy const &tax, std::string const &rank, bool case_sensitive=false)
 Count the number of Taxa in a Taxonomy that have a certain rank assigned to them. More...
 
size_t taxon_level (Taxon const &taxon)
 Return the level of depth of a given Taxon. More...
 
template<class TaxonDataType >
bool taxonomy_data_is (Taxonomy const &taxonomy)
 Check whether the data of a Taxonomy are exactly of the specified data type. More...
 
template<class TaxonDataType >
bool taxonomy_data_is_derived_from (Taxonomy const &taxonomy)
 Check whether the data of a Taxonomy are derived from the specified data type. More...
 
tree::Tree taxonomy_to_tree (std::unordered_map< std::string, Taxopath > const &taxon_map, bool keep_singleton_inner_nodes=false, bool keep_inner_node_names=false, int max_level=-1)
 Turn a list of Taxa into a (possibly multifurcating) Tree. More...
 
tree::Tree taxonomy_to_tree (Taxonomy const &taxonomy, bool keep_singleton_inner_nodes=false, bool keep_inner_node_names=false, int max_level=-1)
 Turn a Taxonomy into a (possibly multifurcating) Tree. More...
 
tree::Tree taxonomy_to_tree (Taxonomy const &taxonomy, std::unordered_map< std::string, Taxopath > const &extra_taxa, bool keep_singleton_inner_nodes=false, bool keep_inner_node_names=false, int max_level=-1, bool add_extra_taxa_parents=true)
 Turn a Taxonomy into a (possibly multifurcating) Tree, and allow to add extra tips to it. More...
 
size_t total_taxa_count (Taxonomy const &tax)
 Return the total number of taxa contained in the Taxomony, i.e., the number of (non-unique) names of all children (recursively). More...
 
bool validate (Taxonomy const &taxonomy, bool stop_at_first_error=false)
 Validate the internal data structures of a Taxonomy and its child Taxa Taxa. More...
 
bool validate_pruned_taxonomy (Taxonomy const &taxonomy)
 Validate that the pruning status of a Taxonomy is valid. More...
 

Typedefs

using BFS = BreadthFirstSearch
 Alias for BreadthFirstSearch. More...
 
using DFS = DepthFirstSearch
 Alias for DepthFirstSearch. More...
 
using NcbiNameLookup = std::unordered_map< std::string, NcbiName >
 
using NcbiNodeLookup = std::unordered_map< std::string, NcbiNode >
 

Variables

static const std::unordered_map< char, std::string > rank_abbreviations
 Local helper data that stores the abbreviations and names of common taxonomic ranks. More...
 

Function Documentation

◆ add_from_taxopath()

Taxon & add_from_taxopath ( Taxonomy taxonomy,
Taxopath const &  taxopath,
bool  expect_parents 
)

Add a Taxon to a Taxonomy, using the taxonomic elements of a Taxopath.

For example, given a Taxopath like

[ "Animalia", "Vertebrata", "Mammalia", "Carnivora" ]

this functions adds the following hierarchy to the Taxonomy:

Animalia
    Vertebrata
        Mammalia
            Carnivora

For any existing Taxa, nothing happens. If any (parent) Taxon in the hierarchy does not exist, it is created by default.

Parameters
taxonomyTaxonomy to add the Taxon to.
taxopathA Taxopath object from which the Taxon and its parents are taken.
expect_parentsOptional, defaults to false. If set to true, the function expects all super-taxa of the added Taxon to exists, that is, all taxa except for the last one in the hierachry. If this expectation is not met, that is, if not all super-taxa exist, an std::runtime_error exception is thrown. If left at the default (false), all necessary super-taxa are created if they do not exists yet.
Returns
The function returns a reference to the newly created Taxon. This is the deepest Taxon of the Taxopath; in other words, its last element.

Definition at line 76 of file taxopath.cpp.

◆ add_subtaxonomy_()

static void genesis::taxonomy::add_subtaxonomy_ ( Taxonomy const &  taxonomy,
bool  keep_singleton_inner_nodes,
bool  keep_inner_node_names,
int  max_level,
int  parent_level,
tree::NewickBroker broker 
)
static

Recursive local helper function to add taxa to the tree broker.

Definition at line 60 of file taxonomy/functions/tree.cpp.

◆ convert_ncbi_name_table()

NcbiNameLookup convert_ncbi_name_table ( utils::CsvReader::Table const &  name_table,
size_t  tax_id_pos,
size_t  name_pos,
size_t  name_class_pos,
std::string const &  name_class_filter 
)

Definition at line 87 of file ncbi.cpp.

◆ convert_ncbi_node_table()

NcbiNodeLookup convert_ncbi_node_table ( utils::CsvReader::Table const &  node_table,
size_t  tax_id_pos,
size_t  parent_tax_id_pos,
size_t  rank_pos 
)

Definition at line 48 of file ncbi.cpp.

◆ convert_ncbi_tables()

Taxonomy convert_ncbi_tables ( NcbiNodeLookup const &  nodes,
NcbiNameLookup const &  names 
)

Definition at line 132 of file ncbi.cpp.

◆ count_taxa_with_prune_status()

size_t count_taxa_with_prune_status ( Taxonomy const &  taxonomy,
EntropyTaxonData::PruneStatus  status 
)

Return the number of Taxa that have a certain prune status.

Definition at line 449 of file taxonomy/functions/entropy.cpp.

◆ expand_small_subtaxonomies()

void expand_small_subtaxonomies ( Taxonomy taxonomy,
size_t  min_subtaxonomy_size 
)

Expand the leaves of a pruned Taxonomy if their sub-taxonomies are smaller than the given threshold.

This function takes a Taxonomy with EntropyTaxonData on its Taxa and looks for taxa with status kBorder which have fewer than the threshold many leaves. If so, this sub-taxonomy is expaneded. This is, it is turned into taxa with status kInside for inner taxa and kBorder for leaf taxa.

Definition at line 412 of file taxonomy/functions/entropy.cpp.

◆ find_taxon() [1/5]

Taxon* genesis::taxonomy::find_taxon ( Taxonomy tax,
UnaryPredicate  p 
)

Alias for find_taxon(..., DepthFirstSearch{})

Definition at line 88 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon() [2/5]

Taxon* genesis::taxonomy::find_taxon ( Taxonomy tax,
UnaryPredicate  p,
SearchStrategy  strat 
)

Find a Taxon based on a given predicate by recursively searching the Taxonomy according to a search strategy.

Definition at line 141 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon() [3/5]

Taxon const* genesis::taxonomy::find_taxon ( Taxonomy const &  tax,
UnaryPredicate  p 
)

Alias for find_taxon(..., DepthFirstSearch{})

Definition at line 79 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon() [4/5]

Taxon const* genesis::taxonomy::find_taxon ( Taxonomy const &  tax,
UnaryPredicate  p,
BreadthFirstSearch   
)

Find a Taxon based on a given predicate by recursively searching the Taxonomy in a breadth first manner.

Definition at line 115 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon() [5/5]

Taxon const* genesis::taxonomy::find_taxon ( Taxonomy const &  tax,
UnaryPredicate  p,
DepthFirstSearch   
)

Find a Taxon based on a given predicate by recursively searching the Taxonomy in a depth first manner.

Definition at line 97 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon_by_id() [1/4]

Taxon * find_taxon_by_id ( Taxonomy tax,
std::string const &  id 
)

Alias for find_taxon_by_id(..., DepthFirstSearch{}).

Definition at line 69 of file functions/taxonomy.cpp.

◆ find_taxon_by_id() [2/4]

Taxon* genesis::taxonomy::find_taxon_by_id ( Taxonomy tax,
std::string const &  id,
SearchStrategy  strat 
)

Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy.

Definition at line 205 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon_by_id() [3/4]

Taxon const * find_taxon_by_id ( Taxonomy const &  tax,
std::string const &  id 
)

Alias for find_taxon_by_id(..., DepthFirstSearch{}).

Definition at line 64 of file functions/taxonomy.cpp.

◆ find_taxon_by_id() [4/4]

Taxon const* genesis::taxonomy::find_taxon_by_id ( Taxonomy const &  tax,
std::string const &  id,
SearchStrategy  strat 
)

Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy.

Definition at line 194 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon_by_name() [1/4]

Taxon * find_taxon_by_name ( Taxonomy tax,
std::string const &  name 
)

Alias for find_taxon_by_name(..., DepthFirstSearch{}).

Definition at line 59 of file functions/taxonomy.cpp.

◆ find_taxon_by_name() [2/4]

Taxon* genesis::taxonomy::find_taxon_by_name ( Taxonomy tax,
std::string const &  name,
SearchStrategy  strat 
)

Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy.

Definition at line 183 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon_by_name() [3/4]

Taxon const * find_taxon_by_name ( Taxonomy const &  tax,
std::string const &  name 
)

Alias for find_taxon_by_name(..., DepthFirstSearch{}).

Definition at line 54 of file functions/taxonomy.cpp.

◆ find_taxon_by_name() [4/4]

Taxon const* genesis::taxonomy::find_taxon_by_name ( Taxonomy const &  tax,
std::string const &  name,
SearchStrategy  strat 
)

Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy.

Definition at line 172 of file taxonomy/functions/taxonomy.hpp.

◆ find_taxon_by_taxopath() [1/2]

Taxon * find_taxon_by_taxopath ( Taxonomy tax,
Taxopath const &  taxopath 
)

Find a Taxon in a Taxonomy, given its Taxopath.

Definition at line 147 of file taxopath.cpp.

◆ find_taxon_by_taxopath() [2/2]

Taxon const * find_taxon_by_taxopath ( Taxonomy const &  tax,
Taxopath const &  taxopath 
)

Find a Taxon in a Taxonomy, given its Taxopath.

Definition at line 120 of file taxopath.cpp.

◆ has_unique_ids()

bool has_unique_ids ( Taxonomy const &  tax)

Return true iff all IDs of the Taxa in the Taxonomy are unique.

Definition at line 196 of file functions/taxonomy.cpp.

◆ levelorder_for_each() [1/2]

void genesis::taxonomy::levelorder_for_each ( Taxonomy tax,
std::function< void(Taxon &)>  fn,
bool  include_inner_taxa = true 
)
inline

Apply a function to all taxa of the Taxonomy, traversing it in levelorder.

The given Taxonomy is traversed in levelorder (i.e., breadth-first search). If include_inner_taxa is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.

This is the non-const version of the function.

Definition at line 58 of file taxonomy/iterator/levelorder.hpp.

◆ levelorder_for_each() [2/2]

void genesis::taxonomy::levelorder_for_each ( Taxonomy const &  tax,
std::function< void(Taxon const &)>  fn,
bool  include_inner_taxa = true 
)
inline

Apply a function to all taxa of the Taxonomy, traversing it in levelorder.

The given Taxonomy is traversed in levelorder (i.e., breadth-first search). If include_inner_taxa is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.

This is the const version of the function.

Definition at line 92 of file taxonomy/iterator/levelorder.hpp.

◆ operator<<()

std::ostream & operator<< ( std::ostream &  out,
Taxonomy const &  tax 
)

Print the contents of a Taxonomy, i.e., all nested taxa, up to a limit of 10.

This simple output function prints the first 10 nested Taxa of a Taxonomy. If you need all Taxa and more control over what you want to print, see PrinterNested class.

Definition at line 260 of file functions/taxonomy.cpp.

◆ postorder_for_each() [1/2]

void genesis::taxonomy::postorder_for_each ( Taxonomy tax,
std::function< void(Taxon &)>  fn,
bool  include_inner_taxa = true 
)
inline

Apply a function to all taxa of the Taxonomy, traversing it in postorder.

The given Taxonomy is traversed in postorder (i.e., a variant of depth-first search). If include_inner_taxa is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.

This is the non-const version of the function.

Definition at line 57 of file taxonomy/iterator/postorder.hpp.

◆ postorder_for_each() [2/2]

void genesis::taxonomy::postorder_for_each ( Taxonomy const &  tax,
std::function< void(Taxon const &)>  fn,
bool  include_inner_taxa = true 
)
inline

Apply a function to all taxa of the Taxonomy, traversing it in postorder.

The given Taxonomy is traversed in postorder (i.e., a variant of depth-first search). If include_inner_taxa is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.

This is the const version of the function.

Definition at line 81 of file taxonomy/iterator/postorder.hpp.

◆ preorder() [1/2]

utils::Range< IteratorPreorder< Taxonomy, Taxon > > genesis::taxonomy::preorder ( TaxonomyType &  taxonomy)
inline

Definition at line 216 of file taxonomy/iterator/preorder.hpp.

◆ preorder() [2/2]

utils::Range< IteratorPreorder< Taxonomy const, Taxon const > > genesis::taxonomy::preorder ( TaxonomyType const &  taxonomy)
inline

Definition at line 206 of file taxonomy/iterator/preorder.hpp.

◆ preorder_for_each() [1/2]

void genesis::taxonomy::preorder_for_each ( Taxonomy tax,
std::function< void(Taxon &)>  fn,
bool  include_inner_taxa = true 
)
inline

Apply a function to all taxa of the Taxonomy, traversing it in preorder.

The given Taxonomy is traversed in preorder (i.e., a variant of depth-first search). If include_inner_taxa is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.

This is the non-const version of the function.

Definition at line 60 of file taxonomy/iterator/preorder.hpp.

◆ preorder_for_each() [2/2]

void genesis::taxonomy::preorder_for_each ( Taxonomy const &  tax,
std::function< void(Taxon const &)>  fn,
bool  include_inner_taxa = true 
)
inline

Apply a function to all taxa of the Taxonomy, traversing it in preorder.

The given Taxonomy is traversed in preorder (i.e., a variant of depth-first search). If include_inner_taxa is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.

This is the const version of the function.

Definition at line 84 of file taxonomy/iterator/preorder.hpp.

◆ print_pruned_taxonomy()

std::string print_pruned_taxonomy ( Taxonomy const &  taxonomy)

Print a Taxonomy, highlighting those Taxa that are the pruning border, i.e., where we cut off the sub-taxa, and print their entropies next to them.

Definition at line 472 of file taxonomy/functions/entropy.cpp.

◆ prune_by_entropy()

void prune_by_entropy ( Taxonomy taxonomy,
size_t  target_taxonomy_size,
PruneByEntropySettings  settings = {} 
)

Prune a Taxonomy so that the result (approximately) contains a desired number of "leaf" Taxa, using the entropy of the Taxa as indicator where to prune.

The function takes a Taxonomy with data type EntropyTaxonData and a target size which indicates the desired number of "leaf" Taxa after pruning the Taxonomy. In the pruned Taxonomy, some Taxa are considered as belonging to the Taxonomy (have status EntropyTaxonData::PruneStatus::kInside or EntropyTaxonData::PruneStatus::kBorder), while others (deeper in the Taxonomy) are excluded (have status EntropyTaxonData::PruneStatus::kOutside). The number of border taxa (or "leaves") of the included Taxa then is aimed to be as close as possible to the target size.

That means, this function sets the status of the Taxa, but does not remove any Taxa. All Taxa with status EntropyTaxonData::PruneStatus::kOutside are then considered to be pruned from the taxonomy.

Example: The Taxonomy

Tax_1
    Tax_2
        Tax_3
        Tax_4
    Tax_5
        Tax_6
Tax_7
    Tax_8
    Tax_9

contains 5 "leaf" taxa, i.e., Tax_3, Tax_4, Tax_6, Tax_8 and Tax_9. If we want to prune it with a target size of 3, we might end up with either

Tax_1
    Tax_2
    Tax_5
Tax_7

or

Tax_1
Tax_7
    Tax_8
    Tax_9

as both contain 3 "leaves": Tax_2, Tax_5 and Tax_7 in the former case and Tax_1, Tax_8 and Tax_9 in the latter. Which of those two is used depends on the entropies of the Taxa.

In the former case, Tax_1 is considered inside, Tax_2, Tax_5 and Tax_7 are border, and all other taxa are outside of the pruned Taxonomy. In the latter case, Tax_7 is inside, Tax_1, Tax_8 and Tax_9 are border, and again all others are outside.

It is not always possible to prune a Taxonomy in a way the we exaclty hit the target size. The function then ends at a number of border Taxa that is closest (either below or above the target size).

In order to decide which Taxa to set to inside (i.e., not include as leaves, but further resolve into their children), we use the entropies of the Taxa: We choose to split up at a current border Taxon with the highest entropy value, as long as this brings us closer to the target size.

This means that the above case where we had two possible ways of splitting should be rare, as the entropies will rarely be identical with real world data sets. If this happens nonetheless, it is random which of the Taxa with equal entropy will be used.

In order to control further settings, see PruneByEntropySettings.

Definition at line 59 of file taxonomy/functions/entropy.cpp.

◆ rank_from_abbreviation()

std::string rank_from_abbreviation ( char  r)

Get the taxonomic rank name given its abbreviation.

The common taxonomic ranks are used:

D Domain
K Kingdom
P Phylum
C Class
O Order
F Family
G Genus
S Species

If any of those abbreviations (case-independend) is given, the full rank name is returned. For all other input, an empty string is returned.

Definition at line 75 of file ranks.cpp.

◆ rank_to_abbreviation()

std::string rank_to_abbreviation ( std::string const &  rank)

Get the abbreviation of a taxonomic rank name.

This function returns the abbreviation for a given common taxonomic rank name, case-independently. See rank_from_abbreviation() for a list of valid rank names. If the given rank name is invalid, an empty string is returned.

Definition at line 92 of file ranks.cpp.

◆ read_ncbi_taxonomy()

Taxonomy read_ncbi_taxonomy ( std::string const &  node_file,
std::string const &  name_file 
)

Definition at line 212 of file ncbi.cpp.

◆ remove_pruned_taxonomy_children()

void remove_pruned_taxonomy_children ( Taxonomy taxonomy)

Remove the children of all Taxa that are pruned, i.e, that have prune status == kOutside.

The function does not validate the status before. Use validate_pruned_taxonomy() if you are unsure whether the status is correct for all Taxa.

Definition at line 463 of file taxonomy/functions/entropy.cpp.

◆ remove_taxa_at_level()

void remove_taxa_at_level ( Taxonomy tax,
size_t  level 
)

Remove all Taxa at a given level of depth in the Taxonomy hierarchy, and all their children.

That is, providing level = 0 has the same effect as calling clear_children() on the given Taxonomy; level = 1 has this effect for the children of the given Taxonomy; and so on.

See taxon_level() for more information on the level.

Definition at line 244 of file functions/taxonomy.cpp.

◆ reset_taxonomy_data()

void genesis::taxonomy::reset_taxonomy_data ( Taxonomy taxonomy,
bool  allow_overwrite = true 
)

(Re-)set all Taxon data of a Taxonomy to a specified data type.

The data is created empty, using BaseTaxonData::create(). If the optional parameter allow_overwrite is set to false (instead of the default true), the function throws an exception if a Taxon already has data assigned to it.

Definition at line 99 of file taxonomy/functions/operators.hpp.

◆ resolve_rank_abbreviation()

std::pair< std::string, std::string > resolve_rank_abbreviation ( std::string const &  entry)

Resolve a combined rank and name entry of the form "k_Bacteria" into the full rank and the name, i.e. "Kingdom" and "Bacteria".

The function returns a pair of { "rank", "name" }.

The expected format of the input string is "x_abc", where "x" is a rank name abbreviation and "abc" is a taxon name. If the string is in this format, it is split and the rank name abbreviation is resolved. If this abbreviation is valid, the rank (first) and the name (second) are returned. See rank_from_abbreviation() for the list of valid rank name abbreviations. The number of underscores is irrelevant, that is, C___Mammalia also works and will return { "Class", "Mammalia" }.

If any of the conditions is not met (either, the string does not start with "x_", or the rank name abbreviation is invalid), the rank is left empty, and the whole given string is used as name. Thus, this function also works on normal taxon names.

Definition at line 121 of file ranks.cpp.

◆ sort_by_name()

void sort_by_name ( Taxonomy tax,
bool  recursive = true,
bool  case_sensitive = false 
)

Sort the Taxa of a Taxonomy by their name.

After calling this function, the Taxa are stored in the order given by their names. This is useful for e.g., output.

Parameters
taxTaxonomy to be sorted.
recursiveOptional, default is true. If set to true, the sub-taxa are also sorted. If set to false, only the immediate children of the given Taxonomy are sorted.
case_sensitiveOptional, default is false. Determines whether the name string comparison is done in a case sensitive manner or not.

Definition at line 217 of file functions/taxonomy.cpp.

◆ swap() [1/2]

void swap ( Taxon lhs,
Taxon rhs 
)

Definition at line 111 of file taxon.cpp.

◆ swap() [2/2]

void swap ( Taxonomy lhs,
Taxonomy rhs 
)

Swapperator for Taxonomy.

Definition at line 74 of file taxonomy.cpp.

◆ taxa_count_at_level()

size_t taxa_count_at_level ( Taxonomy const &  tax,
size_t  level 
)

Count the number of Taxa at a certain level of depth in the Taxonomy.

The function returns how many Taxa there are in the Taxonomy that are at a certain level - that is excluding the number of their respective sub-taxa. The first/top level has depth 0.

See here for a version of this function that returns those values for all levels of depth.

Definition at line 111 of file functions/taxonomy.cpp.

◆ taxa_count_levels()

std::vector< size_t > taxa_count_levels ( Taxonomy const &  tax)

Count the number of Taxa at each level of depth in the Taxonomy.

The function returns how many Taxa there are in the Taxonomy that are at each level - that is excluding the number of their respective sub-taxa. The first/top level has depth 0; it's count is the first element in the returned vector, and so on.

This function returns the values of taxa_count_at_level( Taxonomy const& tax, size_t level ) for all levels of depth.

Definition at line 125 of file functions/taxonomy.cpp.

◆ taxa_count_lowest_levels()

size_t taxa_count_lowest_levels ( Taxonomy const &  tax)

Return the number of lowest level Taxa (i.e., taxa without sub-taxa) in the Taxonomy.

The function counts the number of taxa without any sub-taxa, that is, the "leaves" of the Taxonomy.

Example: The Taxonomy

Tax_1
    Tax_2
        Tax_3
    Tax_4
        Tax_5
Tax_6
    Tax_7

contains 3 such taxa, i.e., Tax_3, Tax_5 and Tax_7.

Definition at line 98 of file functions/taxonomy.cpp.

◆ taxa_count_ranks()

std::unordered_map< std::string, size_t > taxa_count_ranks ( Taxonomy const &  tax,
bool  case_sensitive = false 
)

Count the number of Taxa in a Taxonomy per rank.

The function gives a list of all ranks found in the Taxonomy, with a count of how many Taxa there are that have this rank.

It is similar to this function, but gives the result for all ranks.

If the optional parameter case_sensitive is set to true, all ranks are treated case sensitive, that is, ranks with different case produce different entries. If left at the default false, they are converted to lower case first, so that they are all treated case insensitivly.

Definition at line 173 of file functions/taxonomy.cpp.

◆ taxa_count_with_rank()

size_t taxa_count_with_rank ( Taxonomy const &  tax,
std::string const &  rank,
bool  case_sensitive = false 
)

Count the number of Taxa in a Taxonomy that have a certain rank assigned to them.

The function recursively iterates all sub-taxa of the Taxonomy and counts how many of the Taxa have the given rank assigned (case sensitive or not).

See here for a version of this function that returns this number for all ranks in the Taxonomy.

Definition at line 148 of file functions/taxonomy.cpp.

◆ taxon_level()

size_t taxon_level ( Taxon const &  taxon)

Return the level of depth of a given Taxon.

This level is the number of parents the Taxon has, excluding the Taxonomy which contains them. That means, the immediate children of a Taxonomy all have level 0, their children level 1, and so on.

Definition at line 78 of file functions/taxonomy.cpp.

◆ taxonomy_data_is()

bool genesis::taxonomy::taxonomy_data_is ( Taxonomy const &  taxonomy)

Check whether the data of a Taxonomy are exactly of the specified data type.

This function returns true iff all Taxa have data of the given type, using typeid() for this matching.

Definition at line 54 of file taxonomy/functions/operators.hpp.

◆ taxonomy_data_is_derived_from()

bool genesis::taxonomy::taxonomy_data_is_derived_from ( Taxonomy const &  taxonomy)

Check whether the data of a Taxonomy are derived from the specified data type.

This function returns true iff all Taxa have data whose types are derived from the specified type. It uses dynamic_cast() for this.

Definition at line 78 of file taxonomy/functions/operators.hpp.

◆ taxonomy_to_tree() [1/3]

tree::Tree taxonomy_to_tree ( std::unordered_map< std::string, Taxopath > const &  taxon_map,
bool  keep_singleton_inner_nodes = false,
bool  keep_inner_node_names = false,
int  max_level = -1 
)

Turn a list of Taxa into a (possibly multifurcating) Tree.

The function is a simplified version of taxonomy_to_tree(), that does not take a given Taxonomy, but instead just a list if (tip) Taxa and their Taxopaths. All keys in taxon_map are added as new tips to a Taxonomy that is created from the the Taxopaths (mapped values) of taxon_map.

This is for example again useful if one has a set of sequences with taxonomy assignment, and wants to build a taxonomic constraint for inferring a tree from these sequences: Given the sequence names as keys, and their taxonomic paths as mapped values, the function creates a (possibly multifurcating) tree that can be used as such constraint.

Definition at line 180 of file taxonomy/functions/tree.cpp.

◆ taxonomy_to_tree() [2/3]

tree::Tree taxonomy_to_tree ( Taxonomy const &  taxonomy,
bool  keep_singleton_inner_nodes = false,
bool  keep_inner_node_names = false,
int  max_level = -1 
)

Turn a Taxonomy into a (possibly multifurcating) Tree.

A Taxonomy is a hierarchy that can be interpreted as a rooted tree. Using this function, such a tree is created and returned. It can be used to construct a taxonomic constraint tree for tree inference.

It might happen that a taxonomic path goes down several levels with just one taxon at each level. This would create inner nodes in the tree that just connect two other nodes, that is, nodes that do not furcate at all. Many downstream programs might have problems with such trees. By default, such nodes are collapsed. keep_singleton_inner_nodes can be used to include these inner nodes in the tree, instead of immediately adding their children.

Furthermore, a Taxonomy contains names at every level, while a Tree usually does not contain inner node names. Thus, inner node are not named by default. Use keep_inner_node_names to still set the inner taxonomic labels in the tree.

Lastly, max_level can be used to only turn the first few levels (starting at 0) of the Taxonomy to the tree, and stopping after that. By default, the whole Taxonomy (all levels) is turned into a Tree.

Definition at line 105 of file taxonomy/functions/tree.cpp.

◆ taxonomy_to_tree() [3/3]

tree::Tree taxonomy_to_tree ( Taxonomy const &  taxonomy,
std::unordered_map< std::string, Taxopath > const &  extra_taxa,
bool  keep_singleton_inner_nodes = false,
bool  keep_inner_node_names = false,
int  max_level = -1,
bool  add_extra_taxa_parents = true 
)

Turn a Taxonomy into a (possibly multifurcating) Tree, and allow to add extra tips to it.

This is similar to the general version of this function, see taxonomy_to_tree(). It however allows a special feature: A mapping from extra taxon names to Taxa in the given Taxonomy.

This is useful if the Taxonomy is used for a set of sequences that have taxonomic assignments: One might wish to build a tree where tips correspond to sequences, and the tree topology reflects the taxonomy of these sequences. For such a use case, this function can use the Taxonomy of the sequences, as well as a mapping of sequences names to Taxopaths. The output tree will then contain "extra taxa" that are made up of the sequences names, added as children to the Taxonomy (and hence, added as tips to the tree).

The parameter add_extra_taxa_parents defaults to true, meaning that the parent taxa of the extra_taxa are added to the Taxonomy if not already present in the taxonomy. If set to false, the taxonomy is expected to already contain all paths that are found in the extra_taxa, and will throw if this is not the case.

See also taxonomy_to_tree() for a simplified version of this function that also explains some more details of the workings.

Definition at line 145 of file taxonomy/functions/tree.cpp.

◆ total_taxa_count()

size_t total_taxa_count ( Taxonomy const &  tax)

Return the total number of taxa contained in the Taxomony, i.e., the number of (non-unique) names of all children (recursively).

Example: The Taxonomy

Tax_1
    Tax_2
        Tax_3
    Tax_4
        Tax_3
Tax_5

contains a total of 6 taxa. The name Tax_3 appears twice and is counted twice.

Definition at line 89 of file functions/taxonomy.cpp.

◆ validate()

bool validate ( Taxonomy const &  taxonomy,
bool  stop_at_first_error = false 
)

Validate the internal data structures of a Taxonomy and its child Taxa Taxa.

The function validates the correctness of internal pointers, particularly, the parent pointers of Taxon. If the structure is broken, a log message is logged to LOG_INFO and the functions returns false.

Parameters
taxonomyThe Taxonomy object to validate.
stop_at_first_errorOptional, defaults to false. By default, all errors are reported. If set to true, only the first one is logged and the function immediately returns false (or runs through and returns true if no errors are found).

Definition at line 269 of file functions/taxonomy.cpp.

◆ validate_pruned_taxonomy()

bool validate_pruned_taxonomy ( Taxonomy const &  taxonomy)

Validate that the pruning status of a Taxonomy is valid.

This function expects the Taxa of the Taxonomy to have data type EntropyTaxonData. It then checks whether the pruning states are all correctly set.

That means:

  • Taxa with status kInside can only have children of the same status or of kBorder.
  • Taxa with status kBorder can only have children of status kOutside.
  • Taxa with status kOutside can only have children of the same status.

If any of those conditions is not met, an information about the faulty Taxon is written to LOG_INFO, and the function returns false.

Definition at line 491 of file taxonomy/functions/entropy.cpp.

Typedef Documentation

◆ BFS

Alias for BreadthFirstSearch.

Definition at line 69 of file taxonomy/functions/taxonomy.hpp.

◆ DFS

Alias for DepthFirstSearch.

Definition at line 64 of file taxonomy/functions/taxonomy.hpp.

◆ NcbiNameLookup

using NcbiNameLookup = std::unordered_map<std::string, NcbiName>

Definition at line 65 of file ncbi.hpp.

◆ NcbiNodeLookup

using NcbiNodeLookup = std::unordered_map<std::string, NcbiNode>

Definition at line 64 of file ncbi.hpp.

Variable Documentation

◆ rank_abbreviations

const std::unordered_map<char, std::string> rank_abbreviations
static
Initial value:
= {
{ 'd', "Domain" },
{ 'k', "Kingdom" },
{ 'p', "Phylum" },
{ 'c', "Class" },
{ 'o', "Order" },
{ 'f', "Family" },
{ 'g', "Genus" },
{ 's', "Species" }
}

Local helper data that stores the abbreviations and names of common taxonomic ranks.

Definition at line 47 of file ranks.cpp.