Classes | |
class | BaseTaxonData |
Base class for storing data on Taxa of a Taxonomy. More... | |
struct | BreadthFirstSearch |
Tag used for find_taxon(). More... | |
struct | DepthFirstSearch |
Tag used for find_taxon(). More... | |
class | EntropyTaxonData |
class | IteratorPreorder |
struct | NcbiName |
struct | NcbiNode |
class | PrinterNested |
Simple printer class for Taxonomy. More... | |
struct | PruneByEntropySettings |
Store settings for the Taxonomy pruning algorithm prune_by_entropy(). More... | |
class | Taxon |
Store a Taxon, i.e., an element in a Taxonomy, with its name, rank, ID and sub-taxa. More... | |
class | Taxonomy |
Store a Taxonomy, i.e., a nested hierarchy of Taxa. More... | |
class | TaxonomyReader |
Read Taxonomy file formats. More... | |
class | TaxonomyWriter |
Write a Taxonomy as a list of Taxopaths. More... | |
class | Taxopath |
Helper class to store a taxonomic path. More... | |
class | TaxopathGenerator |
Helper class to generate a taxonomic path string from a Taxopath object or a Taxon. More... | |
class | TaxopathParser |
Helper class to parse a string containing a taxonomic path string into a Taxopath object. More... | |
Functions | |
Taxon & | add_from_taxopath (Taxonomy &taxonomy, Taxopath const &taxopath, bool expect_parents) |
Add a Taxon to a Taxonomy, using the taxonomic elements of a Taxopath. More... | |
static void | add_subtaxonomy_ (Taxonomy const &taxonomy, bool keep_singleton_inner_nodes, bool keep_inner_node_names, int max_level, int parent_level, tree::NewickBroker &broker) |
Recursive local helper function to add taxa to the tree broker. More... | |
NcbiNameLookup | convert_ncbi_name_table (utils::CsvReader::Table const &name_table, size_t tax_id_pos, size_t name_pos, size_t name_class_pos, std::string const &name_class_filter) |
NcbiNodeLookup | convert_ncbi_node_table (utils::CsvReader::Table const &node_table, size_t tax_id_pos, size_t parent_tax_id_pos, size_t rank_pos) |
Taxonomy | convert_ncbi_tables (NcbiNodeLookup const &nodes, NcbiNameLookup const &names) |
size_t | count_taxa_with_prune_status (Taxonomy const &taxonomy, EntropyTaxonData::PruneStatus status) |
Return the number of Taxa that have a certain prune status. More... | |
void | expand_small_subtaxonomies (Taxonomy &taxonomy, size_t min_subtaxonomy_size) |
Expand the leaves of a pruned Taxonomy if their sub-taxonomies are smaller than the given threshold. More... | |
template<class UnaryPredicate > | |
Taxon * | find_taxon (Taxonomy &tax, UnaryPredicate p) |
Alias for find_taxon(..., DepthFirstSearch{}) More... | |
template<class SearchStrategy , class UnaryPredicate > | |
Taxon * | find_taxon (Taxonomy &tax, UnaryPredicate p, SearchStrategy strat) |
Find a Taxon based on a given predicate by recursively searching the Taxonomy according to a search strategy. More... | |
template<class UnaryPredicate > | |
Taxon const * | find_taxon (Taxonomy const &tax, UnaryPredicate p) |
Alias for find_taxon(..., DepthFirstSearch{}) More... | |
template<class UnaryPredicate > | |
Taxon const * | find_taxon (Taxonomy const &tax, UnaryPredicate p, BreadthFirstSearch) |
Find a Taxon based on a given predicate by recursively searching the Taxonomy in a breadth first manner. More... | |
template<class UnaryPredicate > | |
Taxon const * | find_taxon (Taxonomy const &tax, UnaryPredicate p, DepthFirstSearch) |
Find a Taxon based on a given predicate by recursively searching the Taxonomy in a depth first manner. More... | |
Taxon * | find_taxon_by_id (Taxonomy &tax, std::string const &id) |
Alias for find_taxon_by_id(..., DepthFirstSearch{}). More... | |
template<class SearchStrategy > | |
Taxon * | find_taxon_by_id (Taxonomy &tax, std::string const &id, SearchStrategy strat) |
Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy. More... | |
Taxon const * | find_taxon_by_id (Taxonomy const &tax, std::string const &id) |
Alias for find_taxon_by_id(..., DepthFirstSearch{}). More... | |
template<class SearchStrategy > | |
Taxon const * | find_taxon_by_id (Taxonomy const &tax, std::string const &id, SearchStrategy strat) |
Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy. More... | |
Taxon * | find_taxon_by_name (Taxonomy &tax, std::string const &name) |
Alias for find_taxon_by_name(..., DepthFirstSearch{}). More... | |
template<class SearchStrategy > | |
Taxon * | find_taxon_by_name (Taxonomy &tax, std::string const &name, SearchStrategy strat) |
Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy. More... | |
Taxon const * | find_taxon_by_name (Taxonomy const &tax, std::string const &name) |
Alias for find_taxon_by_name(..., DepthFirstSearch{}). More... | |
template<class SearchStrategy > | |
Taxon const * | find_taxon_by_name (Taxonomy const &tax, std::string const &name, SearchStrategy strat) |
Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy. More... | |
Taxon * | find_taxon_by_taxopath (Taxonomy &tax, Taxopath const &taxopath) |
Find a Taxon in a Taxonomy, given its Taxopath. More... | |
Taxon const * | find_taxon_by_taxopath (Taxonomy const &tax, Taxopath const &taxopath) |
Find a Taxon in a Taxonomy, given its Taxopath. More... | |
bool | has_unique_ids (Taxonomy const &tax) |
Return true iff all IDs of the Taxa in the Taxonomy are unique. More... | |
void | levelorder_for_each (Taxonomy &tax, std::function< void(Taxon &)> fn, bool include_inner_taxa=true) |
Apply a function to all taxa of the Taxonomy, traversing it in levelorder. More... | |
void | levelorder_for_each (Taxonomy const &tax, std::function< void(Taxon const &)> fn, bool include_inner_taxa=true) |
Apply a function to all taxa of the Taxonomy, traversing it in levelorder. More... | |
std::ostream & | operator<< (std::ostream &out, Taxonomy const &tax) |
Print the contents of a Taxonomy, i.e., all nested taxa, up to a limit of 10. More... | |
void | postorder_for_each (Taxonomy &tax, std::function< void(Taxon &)> fn, bool include_inner_taxa=true) |
Apply a function to all taxa of the Taxonomy, traversing it in postorder. More... | |
void | postorder_for_each (Taxonomy const &tax, std::function< void(Taxon const &)> fn, bool include_inner_taxa=true) |
Apply a function to all taxa of the Taxonomy, traversing it in postorder. More... | |
template<typename TaxonomyType > | |
utils::Range< IteratorPreorder< Taxonomy, Taxon > > | preorder (TaxonomyType &taxonomy) |
template<typename TaxonomyType > | |
utils::Range< IteratorPreorder< Taxonomy const, Taxon const > > | preorder (TaxonomyType const &taxonomy) |
void | preorder_for_each (Taxonomy &tax, std::function< void(Taxon &)> fn, bool include_inner_taxa=true) |
Apply a function to all taxa of the Taxonomy, traversing it in preorder. More... | |
void | preorder_for_each (Taxonomy const &tax, std::function< void(Taxon const &)> fn, bool include_inner_taxa=true) |
Apply a function to all taxa of the Taxonomy, traversing it in preorder. More... | |
std::string | print_pruned_taxonomy (Taxonomy const &taxonomy) |
Print a Taxonomy, highlighting those Taxa that are the pruning border, i.e., where we cut off the sub-taxa, and print their entropies next to them. More... | |
void | prune_by_entropy (Taxonomy &taxonomy, size_t target_taxonomy_size, PruneByEntropySettings settings={}) |
Prune a Taxonomy so that the result (approximately) contains a desired number of "leaf" Taxa, using the entropy of the Taxa as indicator where to prune. More... | |
std::string | rank_from_abbreviation (char r) |
Get the taxonomic rank name given its abbreviation. More... | |
std::string | rank_to_abbreviation (std::string const &rank) |
Get the abbreviation of a taxonomic rank name. More... | |
Taxonomy | read_ncbi_taxonomy (std::string const &node_file, std::string const &name_file) |
void | remove_pruned_taxonomy_children (Taxonomy &taxonomy) |
Remove the children of all Taxa that are pruned, i.e, that have prune status == kOutside. More... | |
void | remove_taxa_at_level (Taxonomy &tax, size_t level) |
Remove all Taxa at a given level of depth in the Taxonomy hierarchy, and all their children. More... | |
template<class TaxonDataType > | |
void | reset_taxonomy_data (Taxonomy &taxonomy, bool allow_overwrite=true) |
(Re-)set all Taxon data of a Taxonomy to a specified data type. More... | |
std::pair< std::string, std::string > | resolve_rank_abbreviation (std::string const &entry) |
Resolve a combined rank and name entry of the form "k_Bacteria" into the full rank and the name, i.e. "Kingdom" and "Bacteria". More... | |
void | sort_by_name (Taxonomy &tax, bool recursive=true, bool case_sensitive=false) |
Sort the Taxa of a Taxonomy by their name. More... | |
void | swap (Taxon &lhs, Taxon &rhs) |
void | swap (Taxonomy &lhs, Taxonomy &rhs) |
Swapperator for Taxonomy. More... | |
size_t | taxa_count_at_level (Taxonomy const &tax, size_t level) |
Count the number of Taxa at a certain level of depth in the Taxonomy. More... | |
std::vector< size_t > | taxa_count_levels (Taxonomy const &tax) |
Count the number of Taxa at each level of depth in the Taxonomy. More... | |
size_t | taxa_count_lowest_levels (Taxonomy const &tax) |
Return the number of lowest level Taxa (i.e., taxa without sub-taxa) in the Taxonomy. More... | |
std::unordered_map< std::string, size_t > | taxa_count_ranks (Taxonomy const &tax, bool case_sensitive=false) |
Count the number of Taxa in a Taxonomy per rank. More... | |
size_t | taxa_count_with_rank (Taxonomy const &tax, std::string const &rank, bool case_sensitive=false) |
Count the number of Taxa in a Taxonomy that have a certain rank assigned to them. More... | |
size_t | taxon_level (Taxon const &taxon) |
Return the level of depth of a given Taxon. More... | |
template<class TaxonDataType > | |
bool | taxonomy_data_is (Taxonomy const &taxonomy) |
Check whether the data of a Taxonomy are exactly of the specified data type. More... | |
template<class TaxonDataType > | |
bool | taxonomy_data_is_derived_from (Taxonomy const &taxonomy) |
Check whether the data of a Taxonomy are derived from the specified data type. More... | |
tree::Tree | taxonomy_to_tree (std::unordered_map< std::string, Taxopath > const &taxon_map, bool keep_singleton_inner_nodes=false, bool keep_inner_node_names=false, int max_level=-1) |
Turn a list of Taxa into a (possibly multifurcating) Tree. More... | |
tree::Tree | taxonomy_to_tree (Taxonomy const &taxonomy, bool keep_singleton_inner_nodes=false, bool keep_inner_node_names=false, int max_level=-1) |
Turn a Taxonomy into a (possibly multifurcating) Tree. More... | |
tree::Tree | taxonomy_to_tree (Taxonomy const &taxonomy, std::unordered_map< std::string, Taxopath > const &extra_taxa, bool keep_singleton_inner_nodes=false, bool keep_inner_node_names=false, int max_level=-1, bool add_extra_taxa_parents=true) |
Turn a Taxonomy into a (possibly multifurcating) Tree, and allow to add extra tips to it. More... | |
size_t | total_taxa_count (Taxonomy const &tax) |
Return the total number of taxa contained in the Taxomony, i.e., the number of (non-unique) names of all children (recursively). More... | |
bool | validate (Taxonomy const &taxonomy, bool stop_at_first_error=false) |
Validate the internal data structures of a Taxonomy and its child Taxa Taxa. More... | |
bool | validate_pruned_taxonomy (Taxonomy const &taxonomy) |
Validate that the pruning status of a Taxonomy is valid. More... | |
Typedefs | |
using | BFS = BreadthFirstSearch |
Alias for BreadthFirstSearch. More... | |
using | DFS = DepthFirstSearch |
Alias for DepthFirstSearch. More... | |
using | NcbiNameLookup = std::unordered_map< std::string, NcbiName > |
using | NcbiNodeLookup = std::unordered_map< std::string, NcbiNode > |
Variables | |
static const std::unordered_map< char, std::string > | rank_abbreviations |
Local helper data that stores the abbreviations and names of common taxonomic ranks. More... | |
Add a Taxon to a Taxonomy, using the taxonomic elements of a Taxopath.
For example, given a Taxopath like
[ "Animalia", "Vertebrata", "Mammalia", "Carnivora" ]
this functions adds the following hierarchy to the Taxonomy:
Animalia Vertebrata Mammalia Carnivora
For any existing Taxa, nothing happens. If any (parent) Taxon in the hierarchy does not exist, it is created by default.
taxonomy | Taxonomy to add the Taxon to. |
taxopath | A Taxopath object from which the Taxon and its parents are taken. |
expect_parents | Optional, defaults to false . If set to true, the function expects all super-taxa of the added Taxon to exists, that is, all taxa except for the last one in the hierachry. If this expectation is not met, that is, if not all super-taxa exist, an std::runtime_error exception is thrown. If left at the default (false ), all necessary super-taxa are created if they do not exists yet. |
Definition at line 76 of file taxopath.cpp.
|
static |
Recursive local helper function to add taxa to the tree broker.
Definition at line 60 of file taxonomy/functions/tree.cpp.
NcbiNameLookup convert_ncbi_name_table | ( | utils::CsvReader::Table const & | name_table, |
size_t | tax_id_pos, | ||
size_t | name_pos, | ||
size_t | name_class_pos, | ||
std::string const & | name_class_filter | ||
) |
NcbiNodeLookup convert_ncbi_node_table | ( | utils::CsvReader::Table const & | node_table, |
size_t | tax_id_pos, | ||
size_t | parent_tax_id_pos, | ||
size_t | rank_pos | ||
) |
Taxonomy convert_ncbi_tables | ( | NcbiNodeLookup const & | nodes, |
NcbiNameLookup const & | names | ||
) |
size_t count_taxa_with_prune_status | ( | Taxonomy const & | taxonomy, |
EntropyTaxonData::PruneStatus | status | ||
) |
Return the number of Taxa that have a certain prune status.
Definition at line 449 of file taxonomy/functions/entropy.cpp.
void expand_small_subtaxonomies | ( | Taxonomy & | taxonomy, |
size_t | min_subtaxonomy_size | ||
) |
Expand the leaves of a pruned Taxonomy if their sub-taxonomies are smaller than the given threshold.
This function takes a Taxonomy with EntropyTaxonData on its Taxa and looks for taxa with status kBorder which have fewer than the threshold many leaves. If so, this sub-taxonomy is expaneded. This is, it is turned into taxa with status kInside for inner taxa and kBorder for leaf taxa.
Definition at line 412 of file taxonomy/functions/entropy.cpp.
Alias for find_taxon(..., DepthFirstSearch{})
Definition at line 88 of file taxonomy/functions/taxonomy.hpp.
Find a Taxon based on a given predicate by recursively searching the Taxonomy according to a search strategy.
Definition at line 141 of file taxonomy/functions/taxonomy.hpp.
Alias for find_taxon(..., DepthFirstSearch{})
Definition at line 79 of file taxonomy/functions/taxonomy.hpp.
Taxon const* genesis::taxonomy::find_taxon | ( | Taxonomy const & | tax, |
UnaryPredicate | p, | ||
BreadthFirstSearch | |||
) |
Find a Taxon based on a given predicate by recursively searching the Taxonomy in a breadth first manner.
Definition at line 115 of file taxonomy/functions/taxonomy.hpp.
Taxon const* genesis::taxonomy::find_taxon | ( | Taxonomy const & | tax, |
UnaryPredicate | p, | ||
DepthFirstSearch | |||
) |
Find a Taxon based on a given predicate by recursively searching the Taxonomy in a depth first manner.
Definition at line 97 of file taxonomy/functions/taxonomy.hpp.
Alias for find_taxon_by_id(..., DepthFirstSearch{}).
Definition at line 69 of file functions/taxonomy.cpp.
Taxon* genesis::taxonomy::find_taxon_by_id | ( | Taxonomy & | tax, |
std::string const & | id, | ||
SearchStrategy | strat | ||
) |
Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy.
Definition at line 205 of file taxonomy/functions/taxonomy.hpp.
Alias for find_taxon_by_id(..., DepthFirstSearch{}).
Definition at line 64 of file functions/taxonomy.cpp.
Taxon const* genesis::taxonomy::find_taxon_by_id | ( | Taxonomy const & | tax, |
std::string const & | id, | ||
SearchStrategy | strat | ||
) |
Find a Taxon with a given ID by recursively searching the Taxonomy according to a search strategy.
Definition at line 194 of file taxonomy/functions/taxonomy.hpp.
Alias for find_taxon_by_name(..., DepthFirstSearch{}).
Definition at line 59 of file functions/taxonomy.cpp.
Taxon* genesis::taxonomy::find_taxon_by_name | ( | Taxonomy & | tax, |
std::string const & | name, | ||
SearchStrategy | strat | ||
) |
Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy.
Definition at line 183 of file taxonomy/functions/taxonomy.hpp.
Alias for find_taxon_by_name(..., DepthFirstSearch{}).
Definition at line 54 of file functions/taxonomy.cpp.
Taxon const* genesis::taxonomy::find_taxon_by_name | ( | Taxonomy const & | tax, |
std::string const & | name, | ||
SearchStrategy | strat | ||
) |
Find a Taxon with a given name by recursively searching the Taxonomy according to a search strategy.
Definition at line 172 of file taxonomy/functions/taxonomy.hpp.
Find a Taxon in a Taxonomy, given its Taxopath.
Definition at line 147 of file taxopath.cpp.
Find a Taxon in a Taxonomy, given its Taxopath.
Definition at line 120 of file taxopath.cpp.
bool has_unique_ids | ( | Taxonomy const & | tax | ) |
Return true
iff all IDs of the Taxa in the Taxonomy are unique.
Definition at line 196 of file functions/taxonomy.cpp.
|
inline |
Apply a function to all taxa of the Taxonomy, traversing it in levelorder.
The given Taxonomy is traversed in levelorder (i.e., breadth-first search). If include_inner_taxa
is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.
This is the non-const version of the function.
Definition at line 58 of file taxonomy/iterator/levelorder.hpp.
|
inline |
Apply a function to all taxa of the Taxonomy, traversing it in levelorder.
The given Taxonomy is traversed in levelorder (i.e., breadth-first search). If include_inner_taxa
is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.
This is the const version of the function.
Definition at line 92 of file taxonomy/iterator/levelorder.hpp.
std::ostream & operator<< | ( | std::ostream & | out, |
Taxonomy const & | tax | ||
) |
Print the contents of a Taxonomy, i.e., all nested taxa, up to a limit of 10.
This simple output function prints the first 10 nested Taxa of a Taxonomy. If you need all Taxa and more control over what you want to print, see PrinterNested class.
Definition at line 260 of file functions/taxonomy.cpp.
|
inline |
Apply a function to all taxa of the Taxonomy, traversing it in postorder.
The given Taxonomy is traversed in postorder (i.e., a variant of depth-first search). If include_inner_taxa
is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.
This is the non-const version of the function.
Definition at line 57 of file taxonomy/iterator/postorder.hpp.
|
inline |
Apply a function to all taxa of the Taxonomy, traversing it in postorder.
The given Taxonomy is traversed in postorder (i.e., a variant of depth-first search). If include_inner_taxa
is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.
This is the const version of the function.
Definition at line 81 of file taxonomy/iterator/postorder.hpp.
|
inline |
Definition at line 216 of file taxonomy/iterator/preorder.hpp.
|
inline |
Definition at line 206 of file taxonomy/iterator/preorder.hpp.
|
inline |
Apply a function to all taxa of the Taxonomy, traversing it in preorder.
The given Taxonomy is traversed in preorder (i.e., a variant of depth-first search). If include_inner_taxa
is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.
This is the non-const version of the function.
Definition at line 60 of file taxonomy/iterator/preorder.hpp.
|
inline |
Apply a function to all taxa of the Taxonomy, traversing it in preorder.
The given Taxonomy is traversed in preorder (i.e., a variant of depth-first search). If include_inner_taxa
is set to true (default), the provided functional is called for all Taxa . Otherwise, the functional is only called for the taxa of lowest rank, that is, for each Taxon that does not have sub-taxa.
This is the const version of the function.
Definition at line 84 of file taxonomy/iterator/preorder.hpp.
std::string print_pruned_taxonomy | ( | Taxonomy const & | taxonomy | ) |
Print a Taxonomy, highlighting those Taxa that are the pruning border, i.e., where we cut off the sub-taxa, and print their entropies next to them.
Definition at line 472 of file taxonomy/functions/entropy.cpp.
void prune_by_entropy | ( | Taxonomy & | taxonomy, |
size_t | target_taxonomy_size, | ||
PruneByEntropySettings | settings = {} |
||
) |
Prune a Taxonomy so that the result (approximately) contains a desired number of "leaf" Taxa, using the entropy of the Taxa as indicator where to prune.
The function takes a Taxonomy with data type EntropyTaxonData and a target size which indicates the desired number of "leaf" Taxa after pruning the Taxonomy. In the pruned Taxonomy, some Taxa are considered as belonging to the Taxonomy (have status EntropyTaxonData::PruneStatus::kInside or EntropyTaxonData::PruneStatus::kBorder), while others (deeper in the Taxonomy) are excluded (have status EntropyTaxonData::PruneStatus::kOutside). The number of border taxa (or "leaves") of the included Taxa then is aimed to be as close as possible to the target size.
That means, this function sets the status of the Taxa, but does not remove any Taxa. All Taxa with status EntropyTaxonData::PruneStatus::kOutside are then considered to be pruned from the taxonomy.
Example: The Taxonomy
Tax_1 Tax_2 Tax_3 Tax_4 Tax_5 Tax_6 Tax_7 Tax_8 Tax_9
contains 5 "leaf" taxa, i.e., Tax_3
, Tax_4
, Tax_6
, Tax_8
and Tax_9
. If we want to prune it with a target size of 3, we might end up with either
Tax_1 Tax_2 Tax_5 Tax_7
or
Tax_1 Tax_7 Tax_8 Tax_9
as both contain 3 "leaves": Tax_2
, Tax_5
and Tax_7
in the former case and Tax_1
, Tax_8
and Tax_9
in the latter. Which of those two is used depends on the entropies of the Taxa.
In the former case, Tax_1
is considered inside, Tax_2
, Tax_5
and Tax_7
are border, and all other taxa are outside of the pruned Taxonomy. In the latter case, Tax_7
is inside, Tax_1
, Tax_8
and Tax_9
are border, and again all others are outside.
It is not always possible to prune a Taxonomy in a way the we exaclty hit the target size. The function then ends at a number of border Taxa that is closest (either below or above the target size).
In order to decide which Taxa to set to inside (i.e., not include as leaves, but further resolve into their children), we use the entropies of the Taxa: We choose to split up at a current border Taxon with the highest entropy value, as long as this brings us closer to the target size.
This means that the above case where we had two possible ways of splitting should be rare, as the entropies will rarely be identical with real world data sets. If this happens nonetheless, it is random which of the Taxa with equal entropy will be used.
In order to control further settings, see PruneByEntropySettings.
Definition at line 59 of file taxonomy/functions/entropy.cpp.
std::string rank_from_abbreviation | ( | char | r | ) |
Get the taxonomic rank name given its abbreviation.
The common taxonomic ranks are used:
D Domain K Kingdom P Phylum C Class O Order F Family G Genus S Species
If any of those abbreviations (case-independend) is given, the full rank name is returned. For all other input, an empty string is returned.
std::string rank_to_abbreviation | ( | std::string const & | rank | ) |
Get the abbreviation of a taxonomic rank name.
This function returns the abbreviation for a given common taxonomic rank name, case-independently. See rank_from_abbreviation() for a list of valid rank names. If the given rank name is invalid, an empty string is returned.
Taxonomy read_ncbi_taxonomy | ( | std::string const & | node_file, |
std::string const & | name_file | ||
) |
void remove_pruned_taxonomy_children | ( | Taxonomy & | taxonomy | ) |
Remove the children of all Taxa that are pruned, i.e, that have prune status ==
kOutside.
The function does not validate the status before. Use validate_pruned_taxonomy() if you are unsure whether the status is correct for all Taxa.
Definition at line 463 of file taxonomy/functions/entropy.cpp.
void remove_taxa_at_level | ( | Taxonomy & | tax, |
size_t | level | ||
) |
Remove all Taxa at a given level of depth in the Taxonomy hierarchy, and all their children.
That is, providing level = 0
has the same effect as calling clear_children() on the given Taxonomy; level = 1
has this effect for the children of the given Taxonomy; and so on.
See taxon_level() for more information on the level.
Definition at line 244 of file functions/taxonomy.cpp.
void genesis::taxonomy::reset_taxonomy_data | ( | Taxonomy & | taxonomy, |
bool | allow_overwrite = true |
||
) |
(Re-)set all Taxon data of a Taxonomy to a specified data type.
The data is created empty, using BaseTaxonData::create(). If the optional parameter allow_overwrite
is set to false
(instead of the default true
), the function throws an exception if a Taxon already has data assigned to it.
Definition at line 99 of file taxonomy/functions/operators.hpp.
std::pair< std::string, std::string > resolve_rank_abbreviation | ( | std::string const & | entry | ) |
Resolve a combined rank and name entry of the form "k_Bacteria" into the full rank and the name, i.e. "Kingdom" and "Bacteria".
The function returns a pair of { "rank", "name" }
.
The expected format of the input string is "x_abc", where "x" is a rank name abbreviation and "abc" is a taxon name. If the string is in this format, it is split and the rank name abbreviation is resolved. If this abbreviation is valid, the rank (first) and the name (second) are returned. See rank_from_abbreviation() for the list of valid rank name abbreviations. The number of underscores is irrelevant, that is, C___Mammalia
also works and will return { "Class", "Mammalia" }
.
If any of the conditions is not met (either, the string does not start with "x_", or the rank name abbreviation is invalid), the rank is left empty, and the whole given string is used as name. Thus, this function also works on normal taxon names.
void sort_by_name | ( | Taxonomy & | tax, |
bool | recursive = true , |
||
bool | case_sensitive = false |
||
) |
Sort the Taxa of a Taxonomy by their name.
After calling this function, the Taxa are stored in the order given by their names. This is useful for e.g., output.
tax | Taxonomy to be sorted. |
recursive | Optional, default is true . If set to true , the sub-taxa are also sorted. If set to false , only the immediate children of the given Taxonomy are sorted. |
case_sensitive | Optional, default is false . Determines whether the name string comparison is done in a case sensitive manner or not. |
Definition at line 217 of file functions/taxonomy.cpp.
Swapperator for Taxonomy.
Definition at line 74 of file taxonomy.cpp.
size_t taxa_count_at_level | ( | Taxonomy const & | tax, |
size_t | level | ||
) |
Count the number of Taxa at a certain level of depth in the Taxonomy.
The function returns how many Taxa there are in the Taxonomy that are at a certain level - that is excluding the number of their respective sub-taxa. The first/top level has depth 0.
See here for a version of this function that returns those values for all levels of depth.
Definition at line 111 of file functions/taxonomy.cpp.
std::vector< size_t > taxa_count_levels | ( | Taxonomy const & | tax | ) |
Count the number of Taxa at each level of depth in the Taxonomy.
The function returns how many Taxa there are in the Taxonomy that are at each level - that is excluding the number of their respective sub-taxa. The first/top level has depth 0; it's count is the first element in the returned vector, and so on.
This function returns the values of taxa_count_at_level( Taxonomy const& tax, size_t level ) for all levels of depth.
Definition at line 125 of file functions/taxonomy.cpp.
size_t taxa_count_lowest_levels | ( | Taxonomy const & | tax | ) |
Return the number of lowest level Taxa (i.e., taxa without sub-taxa) in the Taxonomy.
The function counts the number of taxa without any sub-taxa, that is, the "leaves" of the Taxonomy.
Example: The Taxonomy
Tax_1 Tax_2 Tax_3 Tax_4 Tax_5 Tax_6 Tax_7
contains 3 such taxa, i.e., Tax_3
, Tax_5
and Tax_7
.
Definition at line 98 of file functions/taxonomy.cpp.
std::unordered_map< std::string, size_t > taxa_count_ranks | ( | Taxonomy const & | tax, |
bool | case_sensitive = false |
||
) |
Count the number of Taxa in a Taxonomy per rank.
The function gives a list of all ranks found in the Taxonomy, with a count of how many Taxa there are that have this rank.
It is similar to this function, but gives the result for all ranks.
If the optional parameter case_sensitive
is set to true
, all ranks are treated case sensitive, that is, ranks with different case produce different entries. If left at the default false
, they are converted to lower case first, so that they are all treated case insensitivly.
Definition at line 173 of file functions/taxonomy.cpp.
size_t taxa_count_with_rank | ( | Taxonomy const & | tax, |
std::string const & | rank, | ||
bool | case_sensitive = false |
||
) |
Count the number of Taxa in a Taxonomy that have a certain rank assigned to them.
The function recursively iterates all sub-taxa of the Taxonomy and counts how many of the Taxa have the given rank assigned (case sensitive or not).
See here for a version of this function that returns this number for all ranks in the Taxonomy.
Definition at line 148 of file functions/taxonomy.cpp.
size_t taxon_level | ( | Taxon const & | taxon | ) |
bool genesis::taxonomy::taxonomy_data_is | ( | Taxonomy const & | taxonomy | ) |
Check whether the data of a Taxonomy are exactly of the specified data type.
This function returns true iff all Taxa have data of the given type, using typeid() for this matching.
Definition at line 54 of file taxonomy/functions/operators.hpp.
bool genesis::taxonomy::taxonomy_data_is_derived_from | ( | Taxonomy const & | taxonomy | ) |
Check whether the data of a Taxonomy are derived from the specified data type.
This function returns true iff all Taxa have data whose types are derived from the specified type. It uses dynamic_cast() for this.
Definition at line 78 of file taxonomy/functions/operators.hpp.
tree::Tree taxonomy_to_tree | ( | std::unordered_map< std::string, Taxopath > const & | taxon_map, |
bool | keep_singleton_inner_nodes = false , |
||
bool | keep_inner_node_names = false , |
||
int | max_level = -1 |
||
) |
Turn a list of Taxa into a (possibly multifurcating) Tree.
The function is a simplified version of taxonomy_to_tree(), that does not take a given Taxonomy, but instead just a list if (tip) Taxa and their Taxopaths. All keys in taxon_map
are added as new tips to a Taxonomy that is created from the the Taxopaths (mapped values) of taxon_map
.
This is for example again useful if one has a set of sequences with taxonomy assignment, and wants to build a taxonomic constraint for inferring a tree from these sequences: Given the sequence names as keys, and their taxonomic paths as mapped values, the function creates a (possibly multifurcating) tree that can be used as such constraint.
Definition at line 180 of file taxonomy/functions/tree.cpp.
tree::Tree taxonomy_to_tree | ( | Taxonomy const & | taxonomy, |
bool | keep_singleton_inner_nodes = false , |
||
bool | keep_inner_node_names = false , |
||
int | max_level = -1 |
||
) |
Turn a Taxonomy into a (possibly multifurcating) Tree.
A Taxonomy is a hierarchy that can be interpreted as a rooted tree. Using this function, such a tree is created and returned. It can be used to construct a taxonomic constraint tree for tree inference.
It might happen that a taxonomic path goes down several levels with just one taxon at each level. This would create inner nodes in the tree that just connect two other nodes, that is, nodes that do not furcate at all. Many downstream programs might have problems with such trees. By default, such nodes are collapsed. keep_singleton_inner_nodes
can be used to include these inner nodes in the tree, instead of immediately adding their children.
Furthermore, a Taxonomy contains names at every level, while a Tree usually does not contain inner node names. Thus, inner node are not named by default. Use keep_inner_node_names
to still set the inner taxonomic labels in the tree.
Lastly, max_level
can be used to only turn the first few levels (starting at 0) of the Taxonomy to the tree, and stopping after that. By default, the whole Taxonomy (all levels) is turned into a Tree.
Definition at line 105 of file taxonomy/functions/tree.cpp.
tree::Tree taxonomy_to_tree | ( | Taxonomy const & | taxonomy, |
std::unordered_map< std::string, Taxopath > const & | extra_taxa, | ||
bool | keep_singleton_inner_nodes = false , |
||
bool | keep_inner_node_names = false , |
||
int | max_level = -1 , |
||
bool | add_extra_taxa_parents = true |
||
) |
Turn a Taxonomy into a (possibly multifurcating) Tree, and allow to add extra tips to it.
This is similar to the general version of this function, see taxonomy_to_tree(). It however allows a special feature: A mapping from extra taxon names to Taxa in the given Taxonomy.
This is useful if the Taxonomy is used for a set of sequences that have taxonomic assignments: One might wish to build a tree where tips correspond to sequences, and the tree topology reflects the taxonomy of these sequences. For such a use case, this function can use the Taxonomy of the sequences, as well as a mapping of sequences names to Taxopaths. The output tree will then contain "extra taxa" that are made up of the sequences names, added as children to the Taxonomy (and hence, added as tips to the tree).
The parameter add_extra_taxa_parents
defaults to true
, meaning that the parent taxa of the extra_taxa
are added to the Taxonomy if not already present in the taxonomy
. If set to false
, the taxonomy
is expected to already contain all paths that are found in the extra_taxa
, and will throw if this is not the case.
See also taxonomy_to_tree() for a simplified version of this function that also explains some more details of the workings.
Definition at line 145 of file taxonomy/functions/tree.cpp.
size_t total_taxa_count | ( | Taxonomy const & | tax | ) |
Return the total number of taxa contained in the Taxomony, i.e., the number of (non-unique) names of all children (recursively).
Example: The Taxonomy
Tax_1 Tax_2 Tax_3 Tax_4 Tax_3 Tax_5
contains a total of 6 taxa. The name Tax_3
appears twice and is counted twice.
Definition at line 89 of file functions/taxonomy.cpp.
bool validate | ( | Taxonomy const & | taxonomy, |
bool | stop_at_first_error = false |
||
) |
Validate the internal data structures of a Taxonomy and its child Taxa Taxa.
The function validates the correctness of internal pointers, particularly, the parent pointers of Taxon. If the structure is broken, a log message is logged to LOG_INFO
and the functions returns false
.
taxonomy | The Taxonomy object to validate. |
stop_at_first_error | Optional, defaults to false . By default, all errors are reported. If set to true , only the first one is logged and the function immediately returns false (or runs through and returns true if no errors are found). |
Definition at line 269 of file functions/taxonomy.cpp.
bool validate_pruned_taxonomy | ( | Taxonomy const & | taxonomy | ) |
Validate that the pruning status of a Taxonomy is valid.
This function expects the Taxa of the Taxonomy to have data type EntropyTaxonData. It then checks whether the pruning states are all correctly set.
That means:
If any of those conditions is not met, an information about the faulty Taxon is written to LOG_INFO, and the function returns false
.
Definition at line 491 of file taxonomy/functions/entropy.cpp.
using BFS = BreadthFirstSearch |
Alias for BreadthFirstSearch.
Definition at line 69 of file taxonomy/functions/taxonomy.hpp.
using DFS = DepthFirstSearch |
Alias for DepthFirstSearch.
Definition at line 64 of file taxonomy/functions/taxonomy.hpp.
using NcbiNameLookup = std::unordered_map<std::string, NcbiName> |
using NcbiNodeLookup = std::unordered_map<std::string, NcbiNode> |
|
static |