A library for working with phylogenetic and population genetic data.
v0.27.0
CsvReader Class Reference

#include <genesis/utils/formats/csv/reader.hpp>

Detailed Description

Read Comma/Character Separated Values (CSV) data and other delimiter-separated formats.

This class provides simple facilities for reading data in a format that uses delimiter chars to separate tabulated data into fields, where one line represents one row of the table.

The read() function returns the table as a vector, with one entry per line (i.e., row). Each such entry is itself a vector of strings, representing the fields (values of the columns) of that row.

There are several properties that can be changed in order to customize the behaviour. By default, the reader uses the comma char to separate fields and uses double quotation marks. See the property functions for more information.

If the data is too big to be read at once into memory, or if you want to parse the data line by line, you can also use the parser functions parse_line() and parse_field() directly.

Definition at line 70 of file utils/formats/csv/reader.hpp.

Public Member Functions

 CsvReader ()=default
 
 CsvReader (CsvReader &&)=default
 
 CsvReader (CsvReader const &)=default
 
 ~CsvReader ()=default
 
std::string const & comment_chars () const
 Return the currently set chars that are used to mark comment lines. More...
 
CsvReadercomment_chars (std::string const &chars)
 Set chars that are used to mark comment lines. More...
 
bool merge_separators () const
 Return the current setting whether consecutive separators are merged or not. More...
 
CsvReadermerge_separators (bool value)
 Set whether consecutive separater chars are merged or whether each of them creates a new field. More...
 
CsvReaderoperator= (CsvReader &&)=default
 
CsvReaderoperator= (CsvReader const &)=default
 
Table parse_document (utils::InputStream &input_stream) const
 Parse a whole CSV document and return its contents. More...
 
std::string parse_field (utils::InputStream &input_stream) const
 Parse one field (i.e., one cell) of the CSV data and return it. More...
 
std::vector< std::string > parse_line (utils::InputStream &input_stream) const
 Parse one line of the CSV data and return it. More...
 
std::string const & quotation_chars () const
 Return the currently set chars for quoting strings in fields. More...
 
CsvReaderquotation_chars (std::string const &chars)
 Set the chars that are used for quoting strings in fields. More...
 
Table read (std::shared_ptr< BaseInputSource > source) const
 Read CSV data from a source and return it as a table, using a vector per line, containing a vector of fields found on that line. More...
 
std::string const & separator_chars () const
 Return the currently set chars used to separate fields of the CSV data. More...
 
CsvReaderseparator_chars (std::string const &chars)
 Set the chars used to separate fields of the CSV data. More...
 
bool skip_empty_lines () const
 Return whether currently empty lines are skipped. More...
 
CsvReaderskip_empty_lines (bool value)
 Set whether to skip empty lines. More...
 
std::string const & trim_chars () const
 Return the currently set chars that are trimmed from the start and end of each field. More...
 
CsvReadertrim_chars (std::string const &chars)
 Set chars that are trimmed from the start and end of each field. More...
 
bool use_escapes () const
 Return whether backslash escape sequences are used. More...
 
CsvReaderuse_escapes (bool value)
 Set whether to use backslash escape sequences. More...
 
bool use_twin_quotes () const
 Return whether to interpret two consequtive quotation marks as a single ("escaped") one. More...
 
CsvReaderuse_twin_quotes (bool value)
 Set whether to interpret two consequtive quotation marks as a single ("escaped") one. More...
 

Public Types

using Field = std::string
 
using Line = std::vector< Field >
 
using Table = std::vector< Line >
 

Constructor & Destructor Documentation

◆ CsvReader() [1/3]

CsvReader ( )
default

◆ ~CsvReader()

~CsvReader ( )
default

◆ CsvReader() [2/3]

CsvReader ( CsvReader const &  )
default

◆ CsvReader() [3/3]

CsvReader ( CsvReader &&  )
default

Member Function Documentation

◆ comment_chars() [1/2]

std::string const& comment_chars ( ) const
inline

Return the currently set chars that are used to mark comment lines.

See the setter of this function for more details.

Definition at line 181 of file utils/formats/csv/reader.hpp.

◆ comment_chars() [2/2]

CsvReader& comment_chars ( std::string const &  chars)
inline

Set chars that are used to mark comment lines.

By default, no chars are used, that is, no line is interpreted as comment. Use this function to change that behaviour, e.g., use # as marker for comment lines. All lines starting with any of the set chars are then skipped while reading. The char has to be the first on the line, that is, no leading blanks are allowed.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 169 of file utils/formats/csv/reader.hpp.

◆ merge_separators() [1/2]

bool merge_separators ( ) const
inline

Return the current setting whether consecutive separators are merged or not.

See the setter of this function for details.

Definition at line 329 of file utils/formats/csv/reader.hpp.

◆ merge_separators() [2/2]

CsvReader& merge_separators ( bool  value)
inline

Set whether consecutive separater chars are merged or whether each of them creates a new field.

Default is false. Usually, CSV data has the same number of columns for the whole dataset. Thus, empty fields will result in consecutive separator chars. When this value is set to false, those fields are correctly parsed into empty fields.

It might however be useful to not create separate empty fields when consecutive separator chars are encountered. This is particularly the case if spaces or tabs are used as separators. In this case it makes sense to have more than one of them consecutively in order to align the columns of the data. For such datasets, this value can be set to true.

To put it in other words, this value determines whether empty strings resulting from adjacent separator chars are excluded from the output.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 318 of file utils/formats/csv/reader.hpp.

◆ operator=() [1/2]

CsvReader& operator= ( CsvReader &&  )
default

◆ operator=() [2/2]

CsvReader& operator= ( CsvReader const &  )
default

◆ parse_document()

CsvReader::Table parse_document ( utils::InputStream input_stream) const

Parse a whole CSV document and return its contents.

Definition at line 64 of file utils/formats/csv/reader.cpp.

◆ parse_field()

std::string parse_field ( utils::InputStream input_stream) const

Parse one field (i.e., one cell) of the CSV data and return it.

This function reads from a given input stream until the column separator or the end of the line or the end of the stream is found. It furthermore trims the necessary chars from the beginning and end of the field, and handles quoted strings according to the settings of the CsvReader.

The stream is left at either the separator char, the new line char, or the end of the file, depending on which occurs first.

See trim_chars(), quotation_chars(), separator_chars(), use_escapes() and use_twin_quotes() to change the behaviour of this function.

Definition at line 86 of file utils/formats/csv/reader.cpp.

◆ parse_line()

std::vector< std::string > parse_line ( utils::InputStream input_stream) const

Parse one line of the CSV data and return it.

This function parses a whole line using parse_field() until the new line char (or the end of the stream) is found. The fields are returned in a vector. The stream is left at either the next char after the new line char or the end of the file, if there is no new line.

See merge_separators() to change the behaviour of this function.

Definition at line 160 of file utils/formats/csv/reader.cpp.

◆ quotation_chars() [1/2]

std::string const& quotation_chars ( ) const
inline

Return the currently set chars for quoting strings in fields.

See the setter of this function for more details.

Definition at line 239 of file utils/formats/csv/reader.hpp.

◆ quotation_chars() [2/2]

CsvReader& quotation_chars ( std::string const &  chars)
inline

Set the chars that are used for quoting strings in fields.

By default, the double quotation mark char " is used as quotation mark. Any other set of chars can be used instead, for example a combination of single and double quotation marks by providing ‘’"` to this function.

Within a quoted part, any char can appear, even new lines. However, in order to use the quotation mark itself, it has to be escaped. See use_escapes() and use_twin_quotes() for changing the behaviour of escaping with backslashes and with twin quotation marks.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 227 of file utils/formats/csv/reader.hpp.

◆ read()

CsvReader::Table read ( std::shared_ptr< BaseInputSource source) const

Read CSV data from a source and return it as a table, using a vector per line, containing a vector of fields found on that line.

Use functions such as utils::from_file() and utils::from_string() to conveniently get an input source that can be used here.

Definition at line 54 of file utils/formats/csv/reader.cpp.

◆ separator_chars() [1/2]

std::string const& separator_chars ( ) const
inline

Return the currently set chars used to separate fields of the CSV data.

See the setter of this function for more details.

Definition at line 271 of file utils/formats/csv/reader.hpp.

◆ separator_chars() [2/2]

CsvReader& separator_chars ( std::string const &  chars)
inline

Set the chars used to separate fields of the CSV data.

By default, the comma char , is used. Any other set of chars can be used instead, for example a combination of tabs and bars by providing \t| to this function.

Caveat: If more than one char is used as separater, any of them separates fields. That is, the string provided to this function is not taken as a whole to separate fields, but its single chars are used.

See merge_separators() to set whether consecutive separator chars are merged.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 259 of file utils/formats/csv/reader.hpp.

◆ skip_empty_lines() [1/2]

bool skip_empty_lines ( ) const
inline

Return whether currently empty lines are skipped.

See the setter of this function for more details.

Definition at line 295 of file utils/formats/csv/reader.hpp.

◆ skip_empty_lines() [2/2]

CsvReader& skip_empty_lines ( bool  value)
inline

Set whether to skip empty lines.

Default is false. If set to true, all lines that are empty (that is, no content, or just consisting of spaces and tabs) are skipped while reading.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 284 of file utils/formats/csv/reader.hpp.

◆ trim_chars() [1/2]

std::string const& trim_chars ( ) const
inline

Return the currently set chars that are trimmed from the start and end of each field.

See the setter of this function for more details.

Definition at line 207 of file utils/formats/csv/reader.hpp.

◆ trim_chars() [2/2]

CsvReader& trim_chars ( std::string const &  chars)
inline

Set chars that are trimmed from the start and end of each field.

By default, no chars are trimmed. Use this function to change that behaviour, e.g., to trim spaces and tabs. Be aware that according to some CSV definitions, blanks are considered to be part of the field and should not be trimmed.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 195 of file utils/formats/csv/reader.hpp.

◆ use_escapes() [1/2]

bool use_escapes ( ) const
inline

Return whether backslash escape sequences are used.

See the setter of this function for details.

Definition at line 357 of file utils/formats/csv/reader.hpp.

◆ use_escapes() [2/2]

CsvReader& use_escapes ( bool  value)
inline

Set whether to use backslash escape sequences.

Default is false. If set to true, character sequences of \x (backslash and some other char) are turned into the respective string form, according to the rules of deescape(). Also, see parse_quoted_string() for more information on escaping.

This works inside and outside of quoted strings. In order to create new lines within a field, either the sequence \n (backslash n) can be used, or a backslash at the end of the line.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 346 of file utils/formats/csv/reader.hpp.

◆ use_twin_quotes() [1/2]

bool use_twin_quotes ( ) const
inline

Return whether to interpret two consequtive quotation marks as a single ("escaped") one.

See the setter of this function for details.

Definition at line 393 of file utils/formats/csv/reader.hpp.

◆ use_twin_quotes() [2/2]

CsvReader& use_twin_quotes ( bool  value)
inline

Set whether to interpret two consequtive quotation marks as a single ("escaped") one.

Default is true. Use this setting in order to be able to escape quotation marks by doubling them. This is a common variant in CSV data. It means, whenever two consecutive quotation marks are encountered, they are turned into one (thus, the first one "escapes" the second). This works both inside and outside of regularly quoted parts. That is, the following two fields are interpreted the same:

"My ""old"" friend"
My ""old"" friend

This also works in addition to normal backslash escape sequences, see use_escapes() for more on this.

See quotation_chars() to set which chars are interpreted as quotation marks.

The function returns a reference to the CsvReader object in order to allow a fluent interface.

Definition at line 382 of file utils/formats/csv/reader.hpp.

Member Typedef Documentation

◆ Field

using Field = std::string

Definition at line 78 of file utils/formats/csv/reader.hpp.

◆ Line

using Line = std::vector<Field>

Definition at line 79 of file utils/formats/csv/reader.hpp.

◆ Table

using Table = std::vector<Line>

Definition at line 80 of file utils/formats/csv/reader.hpp.


The documentation for this class was generated from the following files: