A toolkit for working with phylogenetic data.
v0.18.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
TaxonomyReader Class Reference

#include <genesis/taxonomy/formats/taxonomy_reader.hpp>

Detailed Description

Read Taxonomy file formats.

This reader populates a Taxonomy. It supports to read

Exemplary usage:

std::string infile = "path/to/taxonomy.txt";
Taxonomy tax;

TaxonomyReader()
    .rank_field_position( 2 )
    .expect_strict_order( true )
    .from_file( infile, tax );

It expects one taxon per input line. This line can also contain other information, for example

Archaea;Crenarchaeota;Thermoprotei;Desulfurococcales;   14  order   119

In order to separate the fields of the input, a CsvReader is used. By default, all its properties except for the separator chars are left at their default values. The separator char is set to a tab \t instead of a comma, as this is more common for taxonomy files.

Use the getter csv_reader() to access the CsvReader and change its behaviour, for example, to change the field separator char. Also, all other properties of the CsvReader can be adjusted in order to suit any char-separated input format.

Once the fields of a line are split, this reader uses its properties name_field_position() and rank_field_position() to determine which of the fields represent the taxon name and its rank, respectively. For example, given the line from above, those would have to be set to 0 and 2. All other fields of the line are ignored, which in the example are "14" and "119".

The taxon name is expected to be a taxonomic path string. This is what we call a string consisting of the different parts of the taxonomic hierarchy, usually separated by semicola. See Taxopath for a description of the expected format.

This string is split into its Taxa using a TaxopathParser. In order to change the behaviour of this splitting, access the parser via taxopath_parser().

In summary, by default, this reader reads tab-separated lines and expects the taxonomy entry to be the first (or only) field in the line and to be a taxonomic path in the format described at Taxopath.

Definition at line 109 of file taxonomy_reader.hpp.

Public Member Functions

 TaxonomyReader ()
 Default constructor. More...
 
 TaxonomyReader (TaxonomyReader const &)=default
 
 TaxonomyReader (TaxonomyReader &&)=default
 
 ~TaxonomyReader ()=default
 
utils::CsvReadercsv_reader ()
 Get the CsvReader used for reading a taxonomy file. More...
 
TaxonomyReaderexpect_strict_order (bool value)
 Set whether the reader expects a strict order of taxa. More...
 
bool expect_strict_order () const
 Return whether currently the reader expects a strict order of taxa. More...
 
void from_file (std::string const &fn, Taxonomy &tax) const
 Read a taxonomy file and add its contents to a Taxonomy. More...
 
void from_stream (std::istream &is, Taxonomy &tax) const
 Read taxonomy data until the end of the stream is reached, and add the contents to a Taxonomy. More...
 
void from_string (std::string const &is, Taxonomy &tax) const
 Read a string with taxonomy data and add its contents to a Taxonomy. More...
 
TaxonomyReadername_field_position (int value)
 Set the position of the field in each line where the taxon name (Taxopath) is located. More...
 
int name_field_position () const
 Get the currently set position of the field in each line where the taxon name is located. More...
 
TaxonomyReaderoperator= (TaxonomyReader const &)=default
 
TaxonomyReaderoperator= (TaxonomyReader &&)=default
 
void parse_document (utils::InputStream &it, Taxonomy &tax) const
 Parse all data from an InputStream into a Taxonomy object. More...
 
Line parse_line (utils::InputStream &it) const
 Read a single line of a taxonomy file and return the contained name and rank. More...
 
TaxonomyReaderrank_field_position (int value)
 Set the position of the field in each line where the rank name is located. More...
 
int rank_field_position () const
 
TaxopathParsertaxopath_parser ()
 Get the TaxopathParser used for parsing taxonomic path strings. More...
 

Classes

struct  Line
 Internal helper structure that stores the relevant data of one line while reading. More...
 

Constructor & Destructor Documentation

Default constructor.

Initializes the CsvReader so that tabs are used as field separators instead of commata.

Definition at line 61 of file taxonomy_reader.cpp.

~TaxonomyReader ( )
default
TaxonomyReader ( TaxonomyReader const &  )
default
TaxonomyReader ( TaxonomyReader &&  )
default

Member Function Documentation

utils::CsvReader & csv_reader ( )

Get the CsvReader used for reading a taxonomy file.

This can be used to modify the reading behaviour, particularly values like the separator chars within the lines of the file. By default, the TaxonomyReader uses a tab \t char to separate fields, which is different from the comma ',' that is used as default by the CsvReader.

It is also possible to change other properties of the CsvReader, for example escaping behaviour, if the input data needs special treatment in those regards.

See CsvReader for details about those properties.

Definition at line 191 of file taxonomy_reader.cpp.

TaxonomyReader & expect_strict_order ( bool  value)

Set whether the reader expects a strict order of taxa.

In a strictly ordered taxonomy file, the super-groups have to be listed before any sub-groups.

For example, the list

Archaea;
Archaea;Aenigmarchaeota;
Archaea;Crenarchaeota;
Archaea;Crenarchaeota;Thermoprotei;

is in strict order.

If this property is set to true, the reader expects this ordering and throws an exception if there is a violation, that is, if there is a sub-group in the list without a previous entry of its super-group (recursively). This is useful to check a file for consistency, e.g., it might happen that some super-group is misspelled by accident.

If set to false (default), the order is ignored and all super-groups are created if necessary.

Definition at line 300 of file taxonomy_reader.cpp.

bool expect_strict_order ( ) const

Return whether currently the reader expects a strict order of taxa.

See the setter for more information.

Definition at line 311 of file taxonomy_reader.cpp.

void from_file ( std::string const &  fn,
Taxonomy tax 
) const

Read a taxonomy file and add its contents to a Taxonomy.

Definition at line 83 of file taxonomy_reader.cpp.

void from_stream ( std::istream &  is,
Taxonomy tax 
) const

Read taxonomy data until the end of the stream is reached, and add the contents to a Taxonomy.

Definition at line 74 of file taxonomy_reader.cpp.

void from_string ( std::string const &  is,
Taxonomy tax 
) const

Read a string with taxonomy data and add its contents to a Taxonomy.

Definition at line 92 of file taxonomy_reader.cpp.

TaxonomyReader & name_field_position ( int  value)

Set the position of the field in each line where the taxon name (Taxopath) is located.

This value determines at with position (zero based) the field for the taxon name is located.

For example, in a taxonomy file with entries like

Archaea;Crenarchaeota;Thermoprotei; 7   class   119

this value would have to be set to 0, as this is where the taxon name is found. This reader expects the taxon name to be a Taxopath. This is what we call a string of taxonomic hierarchy elements, usually separated by semicola. See Taxopath for details.

By default, this value is set to 0, that is, the first field. As it does not make sense to skip this value, it cannot be set to values below zero - which is different from rank_field_position. An exception is thrown should this be attempted.

Definition at line 226 of file taxonomy_reader.cpp.

int name_field_position ( ) const

Get the currently set position of the field in each line where the taxon name is located.

See the setter of this function for details.

Definition at line 244 of file taxonomy_reader.cpp.

TaxonomyReader& operator= ( TaxonomyReader const &  )
default
TaxonomyReader& operator= ( TaxonomyReader &&  )
default
void parse_document ( utils::InputStream it,
Taxonomy tax 
) const

Parse all data from an InputStream into a Taxonomy object.

Definition at line 105 of file taxonomy_reader.cpp.

TaxonomyReader::Line parse_line ( utils::InputStream it) const

Read a single line of a taxonomy file and return the contained name and rank.

The name is expected to be a taxonomic path string. See Taxopath for details on that format.

Definition at line 135 of file taxonomy_reader.cpp.

TaxonomyReader & rank_field_position ( int  value)

Set the position of the field in each line where the rank name is located.

This value determines at with position (zero based) the field for the rank name is located.

For example, in a taxonomy file with entries like

Archaea;Crenarchaeota;Thermoprotei; 7   class   119

this value would have to be set to 2, as this is where the rank name "class" is found.

If the file does not contain any rank names, or if this field should be skipped, set it to a value of -1. This is also the default.

Definition at line 263 of file taxonomy_reader.cpp.

int rank_field_position ( ) const

@ briefGet the currently set position of the field in each line where the rank name is located.

See the setter of this function for details.

Definition at line 274 of file taxonomy_reader.cpp.

TaxopathParser & taxopath_parser ( )

Get the TaxopathParser used for parsing taxonomic path strings.

The name field is expected to be a taxonomic path string. It is turned into a Taxon using the settings of the TaxopathParser. See there for details. See Taxopath for a path of the expected string format.

Definition at line 203 of file taxonomy_reader.cpp.


The documentation for this class was generated from the following files: