A library for working with phylogenetic and population genetic data.
v0.27.0
InputStream Class Reference

#include <genesis/utils/io/input_stream.hpp>

Detailed Description

Stream interface for reading data from an InputSource, that keeps track of line and column counters.

This class provides similar functionality to std::istream, but has a different way of handling the stream and characters. The main differences are:

  • The stream is not automatically advanced after reading a char. This is because otherwise the line and column would already point to the next char while processing the last. Thus, advance() or the increment operator++() have to be called to get to the next char in the stream.
  • The handling of line feed chars (LF or \n, as used in Unix-like systems) and carriage return chars (CR or \r, which are the new line delimiters in many Mac systems, and which are part of the CR+LF new lines as used in Windows) is different. Both, CR and LF chars (and the whole CR+LF combination), are turned into single line feed chars (\n) in this iterator. This ensures that all new lines delimiters are internally represented as one LF, independently of the file format. That makes parsing way easier.

It has two member functions line() and column() that return the corresponding values for the current iterator position. Also, at() can be used to get a textual representation of the current position. The member function current() furthermore provides a checked version of the dereference operator.

Implementation details inspired by fast-cpp-csv-parser by Ben Strasser, see also Acknowledgements.

Definition at line 81 of file input_stream.hpp.

Public Member Functions

 InputStream ()
 
 InputStream (self_type &&other)
 
 InputStream (self_type const &)=delete
 
 InputStream (std::shared_ptr< BaseInputSource > input_source)
 
 ~InputStream ()
 
self_typeadvance ()
 Move to the next char in the stream and advance the counters. More...
 
std::string at () const
 Return a textual representation of the current input position in the form "line:column". More...
 
std::pair< char const *, size_t > buffer ()
 Direct access to the internal buffer. More...
 
size_t column () const
 Return the current column of the input stream. More...
 
char current () const
 Return the current char, with some checks. More...
 
bool eof () const
 Return true iff the input reached its end. More...
 
char get_char ()
 Extract a single char from the input. More...
 
std::string get_line ()
 Read the current line and move to the beginning of the next. More...
 
void get_line (std::string &target)
 Read the current line, append it to the target, and move to the beginning of the next line. More...
 
bool good () const
 Return true iff the input is good (not end of data) and can be read from. More...
 
void jump_unchecked (size_t n)
 Jump forward in the stream by a certain amount of chars. More...
 
size_t line () const
 Return the current line of the input stream. More...
 
 operator bool () const
 Return true iff the input is good (not end of data) and can be read from. Shortcut for good(). More...
 
char operator* () const
 Dereference operator. Return the current char. More...
 
self_typeoperator++ ()
 Move to the next char in the stream. Shortcut for advance(). More...
 
self_typeoperator= (self_type &&other)
 
self_typeoperator= (self_type const &)=delete
 
template<class T >
parse_integer ()
 Alias for parse_signed_integer(). More...
 
template<class T >
parse_signed_integer ()
 Read a signed integer from a stream and return it. More...
 
template<class T >
parse_unsigned_integer ()
 Read an unsigned integer from a stream and return it. More...
 
template<>
size_t parse_unsigned_integer ()
 Read an unsigned integer from a stream into a size_t and return it. More...
 
char read_char_or_throw (char const criterion)
 Lexing function that reads a single char from the stream and checks whether it equals the provided one. More...
 
char read_char_or_throw (std::function< bool(char)> criterion)
 Lexing function that reads a single char from the stream and checks whether it fulfills the provided criterion. More...
 
std::string source_name () const
 Get the input source name where this stream reads from. More...
 

Public Types

using self_type = InputStream
 
using value_type = char
 

Static Public Attributes

static const size_t BlockLength = 1 << 22
 Block length for internal buffering. More...
 

Constructor & Destructor Documentation

◆ InputStream() [1/4]

InputStream ( )
inline

Definition at line 103 of file input_stream.hpp.

◆ InputStream() [2/4]

InputStream ( std::shared_ptr< BaseInputSource input_source)
inlineexplicit

Definition at line 113 of file input_stream.hpp.

◆ ~InputStream()

~InputStream ( )
inline

Definition at line 120 of file input_stream.hpp.

◆ InputStream() [3/4]

InputStream ( self_type const &  )
delete

◆ InputStream() [4/4]

InputStream ( self_type &&  other)
inline

Definition at line 127 of file input_stream.hpp.

Member Function Documentation

◆ advance()

self_type& advance ( )
inline

Move to the next char in the stream and advance the counters.

Definition at line 180 of file input_stream.hpp.

◆ at()

std::string at ( ) const
inline

Return a textual representation of the current input position in the form "line:column".

Definition at line 481 of file input_stream.hpp.

◆ buffer()

std::pair<char const*, size_t> buffer ( )
inline

Direct access to the internal buffer.

This function returns a pointer to the internal buffer, as well as the length (past the end) that is currently buffered. This is meant for special file parsers that can optimize better when using this - but it is highly dangerous to use if you do not know what you are doing!

The idea is as follows: With access to the buffer, parse data as needed, keeping track of how many chars have been processed. Then, use the jump() function to update this stream to the new position of the stream (the char after the last one being processed by the parsing).

Caveat: Never parse and jump across new line characters (or, for that matter, carriage return characters, which won't be automatically converted when using the buffer directly)! This would invalidate the line counting!

Caveat: Never read after the end of the buffer, that is, never access the char at the returned last position buffer + length or after!

Definition at line 545 of file input_stream.hpp.

◆ column()

size_t column ( ) const
inline

Return the current column of the input stream.

The counter starts with column 1 for each line of the input stream. New line characters \n are included in counting and count as the last character of a line.

Definition at line 472 of file input_stream.hpp.

◆ current()

char current ( ) const
inline

Return the current char, with some checks.

This function is similar to the dereference operator, but additionally performs two checks:

  • End of input: If this function is called when there is no more data left in the input, it throws an runtime_error.
  • Current char: This iterator is meant for ASCII (or similar) text format encodings with single bytes, and its output should be usable for lookup tables etc. Thus, this function ensures that the char is in the range [0, 127]. If not, an std::domain_error is thrown.

Usually, those two conditions are checked in the parser anyway, so in most cases it is preferred to use the dereference operator instead.

Definition at line 162 of file input_stream.hpp.

◆ eof()

bool eof ( ) const
inline

Return true iff the input reached its end.

Definition at line 506 of file input_stream.hpp.

◆ get_char()

char get_char ( )
inline

Extract a single char from the input.

Return the current char and move to the next one.

Definition at line 224 of file input_stream.hpp.

◆ get_line() [1/2]

std::string get_line ( )
inline

Read the current line and move to the beginning of the next.

The function finds the end of the current line, starting from the current position, and returns the content, excluding the trailing new line char(s). The stream is left at the first char of the next line.

Definition at line 252 of file input_stream.hpp.

◆ get_line() [2/2]

void get_line ( std::string &  target)

Read the current line, append it to the target, and move to the beginning of the next line.

The function finds the end of the current line, starting from the current position, and appends the content to the given target, excluding the trailing new line char(s). The stream is left at the first char of the next line.

Definition at line 88 of file input_stream.cpp.

◆ good()

bool good ( ) const
inline

Return true iff the input is good (not end of data) and can be read from.

Definition at line 489 of file input_stream.hpp.

◆ jump_unchecked()

void jump_unchecked ( size_t  n)
inline

Jump forward in the stream by a certain amount of chars.

This is meant to update the stream position after using buffer() for direct parsing. See the caveats there!

In particular, we can never jump behind the current buffer end, and shall not jump across new lines. That is, this function is not meant as a way to jump to an arbitrary (later) position in a file!

Definition at line 561 of file input_stream.hpp.

◆ line()

size_t line ( ) const
inline

Return the current line of the input stream.

The counter starts with line 1 for input stream.

Definition at line 461 of file input_stream.hpp.

◆ operator bool()

operator bool ( ) const
inlineexplicit

Return true iff the input is good (not end of data) and can be read from. Shortcut for good().

Definition at line 498 of file input_stream.hpp.

◆ operator*()

char operator* ( ) const
inline

Dereference operator. Return the current char.

Definition at line 143 of file input_stream.hpp.

◆ operator++()

self_type& operator++ ( )
inline

Move to the next char in the stream. Shortcut for advance().

Definition at line 189 of file input_stream.hpp.

◆ operator=() [1/2]

InputStream & operator= ( self_type &&  other)

Definition at line 51 of file input_stream.cpp.

◆ operator=() [2/2]

self_type& operator= ( self_type const &  )
delete

◆ parse_integer()

T parse_integer ( )
inline

Alias for parse_signed_integer().

Definition at line 447 of file input_stream.hpp.

◆ parse_signed_integer()

T parse_signed_integer ( )
inline

Read a signed integer from a stream and return it.

The function expects a sequence of digits, possibly with a leading + or -. The first char after that has to be a digit, otherwise the function throws std::runtime_error. It stops reading at the first non-digit. In case the value range is too small, the function throws std::overflow_error, or underflow_error, respectively.

Definition at line 399 of file input_stream.hpp.

◆ parse_unsigned_integer() [1/2]

T parse_unsigned_integer ( )
inline

Read an unsigned integer from a stream and return it.

The function expects a sequence of digits. The current char in the stream has to be a digit, otherwise the function throws std::runtime_error. It stops reading at the first non-digit. In case the value range is too small, the function throws std::overflow_error.

We also have a specialization of this function for size_t at the end of this header file that does not do the conversion check, as yet another little bit of speed of when not needed.

Definition at line 366 of file input_stream.hpp.

◆ parse_unsigned_integer() [2/2]

size_t parse_unsigned_integer ( )
inline

Read an unsigned integer from a stream into a size_t and return it.

This template specialization is mean as yet another speedup for the case of this data type, as we can then work without casting and overflow check. It is also the base function that is called from the others that do the overflow check.

Definition at line 680 of file input_stream.hpp.

◆ read_char_or_throw() [1/2]

char read_char_or_throw ( char const  criterion)
inline

Lexing function that reads a single char from the stream and checks whether it equals the provided one.

If not, the function throws std::runtime_error. The stream is advanced by one position and the char is returned. For a similar function that checks the value of the current char but does not advance in the stream, see affirm_char_or_throw().

Definition at line 271 of file input_stream.hpp.

◆ read_char_or_throw() [2/2]

char read_char_or_throw ( std::function< bool(char)>  criterion)
inline

Lexing function that reads a single char from the stream and checks whether it fulfills the provided criterion.

If not, the function throws std::runtime_error. The stream is advanced by one position and the char is returned. For a similar function that checks the value of the current char but does not advance in the stream, see affirm_char_or_throw().

Definition at line 294 of file input_stream.hpp.

◆ source_name()

std::string source_name ( ) const
inline

Get the input source name where this stream reads from.

Depending on the type of input, this is either

  • "input string",
  • "input stream" or
  • "input file <filename>"

This is mainly useful for user output like log and error messages.

Definition at line 522 of file input_stream.hpp.

Member Typedef Documentation

◆ self_type

Definition at line 96 of file input_stream.hpp.

◆ value_type

using value_type = char

Definition at line 97 of file input_stream.hpp.

Member Data Documentation

◆ BlockLength

const size_t BlockLength = 1 << 22
static

Block length for internal buffering.

The buffer uses three blocks of this size (4MB each).

Definition at line 94 of file input_stream.hpp.


The documentation for this class was generated from the following files: