A toolkit for working with phylogenetic data.
v0.18.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
CsvReader Class Reference

#include <genesis/utils/formats/csv/reader.hpp>

Detailed Description

Read Comma Separated Values (CSV) data and other delimiter-separated formats.

This class provides simple facilities for reading data in a format that uses delimiter chars to separate tabulated data into fields, where one line represents one row of the table.

It supports to read

Those functions return the table as a vector, with one entry per line (i.e., row). Each such entry is itself a vector of strings, representing the fields (values of the columns) of that row.

There are several properties that can be changed in order to customize the behaviour. By default, the reader uses the comma char to separate fields and uses double quotation marks. See the property functions for more information.

If the data is too big to be read at once into memory, or if you want to parse the data line by line, you can also use the parser functions parse_line() and parse_field() directly.

Definition at line 73 of file utils/formats/csv/reader.hpp.

Public Member Functions

 CsvReader ()=default
 
 CsvReader (CsvReader const &)=default
 
 CsvReader (CsvReader &&)=default
 
 ~CsvReader ()=default
 
CsvReadercomment_chars (std::string const &chars)
 Set chars that are used to mark comment lines. More...
 
std::string const & comment_chars () const
 Return the currently set chars that are used to mark comment lines. More...
 
table from_file (std::string const &fn) const
 Read a CSV file and return its contents. More...
 
table from_stream (std::istream &is) const
 Read CSV data until the end of the stream is reached, and return it. More...
 
table from_string (std::string const &fs) const
 Read a string in CSV format and return its contents. More...
 
CsvReadermerge_separators (bool value)
 Set whether consecutive separater chars are merged or whether each of them creates a new field. More...
 
bool merge_separators () const
 Return the current setting whether consecutive separators are merged or not. More...
 
CsvReaderoperator= (CsvReader const &)=default
 
CsvReaderoperator= (CsvReader &&)=default
 
table parse_document (utils::InputStream &input_stream) const
 Parse a whole CSV document and return its contents. More...
 
std::string parse_field (utils::InputStream &input_stream) const
 Parse one field (i.e., one cell) of the CSV data and return it. More...
 
std::vector< std::string > parse_line (utils::InputStream &input_stream) const
 Parse one line of the CSV data and return it. More...
 
CsvReaderquotation_chars (std::string const &chars)
 Set the chars that are used for quoting strings in fields. More...
 
std::string const & quotation_chars () const
 Return the currently set chars for quoting strings in fields. More...
 
CsvReaderseparator_chars (std::string const &chars)
 Set the chars used to separate fields of the CSV data. More...
 
std::string const & separator_chars () const
 Return the currently set chars used to separate fields of the CSV data. More...
 
CsvReaderskip_empty_lines (bool value)
 Set whether to skip empty lines. More...
 
bool skip_empty_lines () const
 Return whether currently empty lines are skipped. More...
 
CsvReadertrim_chars (std::string const &chars)
 Set chars that are trimmed from the start and end of each field. More...
 
std::string const & trim_chars () const
 Return the currently set chars that are trimmed from the start and end of each field. More...
 
CsvReaderuse_escapes (bool value)
 Set whether to use backslash escape sequences. More...
 
bool use_escapes () const
 Return whether backslash escape sequences are used. More...
 
CsvReaderuse_twin_quotes (bool value)
 Set whether to interpret two consequtive quotation marks as a single ("escaped") one. More...
 
bool use_twin_quotes () const
 Return whether to interpret two consequtive quotation marks as a single ("escaped") one. More...
 

Public Types

typedef std::string field
 
typedef std::vector< fieldrow
 
typedef std::vector< rowtable
 

Constructor & Destructor Documentation

CsvReader ( )
default
~CsvReader ( )
default
CsvReader ( CsvReader const &  )
default
CsvReader ( CsvReader &&  )
default

Member Function Documentation

CsvReader & comment_chars ( std::string const &  chars)

Set chars that are used to mark comment lines.

By default, no chars are used, that is, no line is interpreted as comment. Use this function to change that behaviour, e.g., use # as marker for comment lines. All lines starting with any of the set chars are then skipped while reading. The char has to be the first on the line, that is, no leading blanks are allowed.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 298 of file utils/formats/csv/reader.cpp.

std::string const & comment_chars ( ) const

Return the currently set chars that are used to mark comment lines.

See the setter of this function for more details.

Definition at line 310 of file utils/formats/csv/reader.cpp.

CsvReader::table from_file ( std::string const &  fn) const

Read a CSV file and return its contents.

Definition at line 66 of file utils/formats/csv/reader.cpp.

CsvReader::table from_stream ( std::istream &  is) const

Read CSV data until the end of the stream is reached, and return it.

Definition at line 57 of file utils/formats/csv/reader.cpp.

CsvReader::table from_string ( std::string const &  fs) const

Read a string in CSV format and return its contents.

Definition at line 75 of file utils/formats/csv/reader.cpp.

CsvReader & merge_separators ( bool  value)

Set whether consecutive separater chars are merged or whether each of them creates a new field.

Default is false. Usually, CSV data has the same number of columns for the whole dataset. Thus, empty fields will result in consecutive separator chars. When this value is set to false, those fields are correctly parsed into empty fields.

It might however be useful to not create separate empty fields when consecutive separator chars are encountered. This is particularly the case if spaces or tabs are used as separators. In this case it makes sense to have more than one of them consecutively in order to align the columns of the data. For such datasets, this value can be set to true.

To put it in other words, this value determines whether empty strings resulting from adjacent separator chars are excluded from the output.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 467 of file utils/formats/csv/reader.cpp.

bool merge_separators ( ) const

Return the current setting whether consecutive separators are merged or not.

See the setter of this function for details.

Definition at line 478 of file utils/formats/csv/reader.cpp.

CsvReader& operator= ( CsvReader const &  )
default
CsvReader& operator= ( CsvReader &&  )
default
CsvReader::table parse_document ( utils::InputStream input_stream) const

Parse a whole CSV document and return its contents.

Definition at line 88 of file utils/formats/csv/reader.cpp.

std::string parse_field ( utils::InputStream input_stream) const

Parse one field (i.e., one cell) of the CSV data and return it.

This function reads from a given input stream until the column separator or the end of the line or the end of the stream is found. It furthermore trims the necessary chars from the beginning and end of the field, and handles quoted strings according to the settings of the CsvReader.

The stream is left at either the separator char, the new line char, or the end of the file, depending on which occurs first.

See trim_chars(), quotation_chars(), separator_chars(), use_escapes() and use_twin_quotes() to change the behaviour of this function.

Definition at line 124 of file utils/formats/csv/reader.cpp.

std::vector< std::string > parse_line ( utils::InputStream input_stream) const

Parse one line of the CSV data and return it.

This function parses a whole line using parse_field() until the new line char (or the end of the stream) is found. The fields are returned in a vector. The stream is left at either the next char after the new line char or the end of the file, if there is no new line.

See merge_separators() to change the behaviour of this function.

Definition at line 204 of file utils/formats/csv/reader.cpp.

CsvReader & quotation_chars ( std::string const &  chars)

Set the chars that are used for quoting strings in fields.

By default, the double quotation mark char " is used as quotation mark. Any other set of chars can be used instead, for example a combination of single and double quotation marks by providing `'"` to this function.

Within a quoted part, any char can appear, even new lines. However, in order to use the quotation mark itself, it has to be escaped. See use_escapes() and use_twin_quotes() for changing the behaviour of escaping with backslashes and with twin quotation marks.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 364 of file utils/formats/csv/reader.cpp.

std::string const & quotation_chars ( ) const

Return the currently set chars for quoting strings in fields.

See the setter of this function for more details.

Definition at line 376 of file utils/formats/csv/reader.cpp.

CsvReader & separator_chars ( std::string const &  chars)

Set the chars used to separate fields of the CSV data.

By default, the comma char , is used. Any other set of chars can be used instead, for example a combination of tabs and bars by providing \t| to this function.

Caveat: If more than one char is used as separater, any of them separates fields. That is, the string provided to this function is not taken as a whole to separate fields, but its single chars are used.

See merge_separators() to set whether consecutive separator chars are merged.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 400 of file utils/formats/csv/reader.cpp.

std::string const & separator_chars ( ) const

Return the currently set chars used to separate fields of the CSV data.

See the setter of this function for more details.

Definition at line 412 of file utils/formats/csv/reader.cpp.

CsvReader & skip_empty_lines ( bool  value)

Set whether to skip empty lines.

Default is false. If set to true, all lines that are empty (that is, no content, or just consisting of spaces and tabs) are skipped while reading.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 429 of file utils/formats/csv/reader.cpp.

bool skip_empty_lines ( ) const

Return whether currently empty lines are skipped.

See the setter of this function for more details.

Definition at line 440 of file utils/formats/csv/reader.cpp.

CsvReader & trim_chars ( std::string const &  chars)

Set chars that are trimmed from the start and end of each field.

By default, no chars are trimmed. Use this function to change that behaviour, e.g., to trim spaces and tabs. Be aware that according to some CSV definitions, blanks are considered to be part of the field and should not be trimmed.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 328 of file utils/formats/csv/reader.cpp.

std::string const & trim_chars ( ) const

Return the currently set chars that are trimmed from the start and end of each field.

See the setter of this function for more details.

Definition at line 340 of file utils/formats/csv/reader.cpp.

CsvReader & use_escapes ( bool  value)

Set whether to use backslash escape sequences.

Default is false. If set to true, character sequences of \x (backslash and some other char) are turned into the respective string form, according to the rules of deescape(). Also, see parse_quoted_string() for more information on escaping.

This works inside and outside of quoted strings. In order to create new lines within a field, either the sequence \n (backslash n) can be used, or a backslash at the end of the line.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 499 of file utils/formats/csv/reader.cpp.

bool use_escapes ( ) const

Return whether backslash escape sequences are used.

See the setter of this function for details.

Definition at line 510 of file utils/formats/csv/reader.cpp.

CsvReader & use_twin_quotes ( bool  value)

Set whether to interpret two consequtive quotation marks as a single ("escaped") one.

Default is true. Use this setting in order to be able to escape quotation marks by doubling them. This is a common variant in CSV data. It means, whenever two consecutive quotation marks are encountered, they are turned into one (thus, the first one "escapes" the second). This works both inside and outside of regularly quoted parts. That is, the following two fields are interpreted the same:

"My ""old"" friend"
My ""old"" friend

This also works in addition to normal backslash escape sequences, see use_escapes() for more on this.

See quotation_chars() to set which chars are interpreted as quotation marks.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 539 of file utils/formats/csv/reader.cpp.

bool use_twin_quotes ( ) const

Return whether to interpret two consequtive quotation marks as a single ("escaped") one.

See the setter of this function for details.

Definition at line 550 of file utils/formats/csv/reader.cpp.

Member Typedef Documentation

typedef std::string field

Definition at line 81 of file utils/formats/csv/reader.hpp.

typedef std::vector<field> row

Definition at line 82 of file utils/formats/csv/reader.hpp.

typedef std::vector<row> table

Definition at line 83 of file utils/formats/csv/reader.hpp.


The documentation for this class was generated from the following files: