A library for working with phylogenetic and population genetic data.
v0.32.0
GenericInputStream< T, D > Class Template Reference

#include <genesis/utils/containers/generic_input_stream.hpp>

Detailed Description

template<class T, class D = EmptyGenericInputStreamData>
class genesis::utils::GenericInputStream< T, D >

Type erasure for iterators, using std::function to eliminate the underlying input type.

This class offers an abstraction to get a uniform iterator type over a set of underlying iterators of different type. It expects a function (most likely, you want to use a lambda) that converts the underlying data into the desired type T, which is where the type erasure happens. The data that is iterated over is stored here, and the end of the iterator is indicated by the lambda by returning false.

Example:

// Convert from an iterator over VcfRecord to Variant.
auto beg = vcf_range.begin();
auto end = vcf_range.end();
// Create the conversion with type erasure via the lambda function.
auto generator = GenericInputStream<Variant>(
[beg, end]( Variant& var ) mutable {
if( beg != end ) {
var = convert_to_variant(*beg);
++beg;
return true;
} else {
return false;
}
}
);
// Iterate over generator.begin() and generator.end()
for( auto const& it : generator ) ...

For other types of iterators, instead of beg and end, other input can be used:

// Use a pileup iterator, which does not offer begin and end.
auto it = SimplePileupInputIterator( utils::from_file( pileup_file_.value ), reader );
auto generator = GenericInputStream<Variant>(
[it]( Variant& var ) mutable {
if( it ) {
var = *it;
++it;
return true;
} else {
return false;
}
}
);

And accordingly for other underlying iterator types.

In addition to the type T that we iterate over, for user convenience, we also offer to use a data storage variable of the template type D (typedef'd as GenericInputStream::Data). This data is provided at construction of the GenericInputStream, and can be accessed via the data() functions. it is a generic extra variable to store iterator-specific information. As the GenericInputStream is intended to be initializable with just a lambda function that yields the elements to traverse over, there is otherwise no convenient way to access related information of the underlying iterator. For example, when iterating over a file, one might want to store the file name or other characteristics of the input in the data().

The class furthermore offers filters and transformations of the underlying iterator data, using the functions add_filter(), add_transform(), and add_transform_filter(), which can all be mixed and are executed as a combined list in the order in which they were added using these three functions (that is, it can be first a filter, then a transformation, then a filter again). This allows to easily skip elements of the underlying iterator without the need to add an additional layer of abstraction.

As an additional layer of convenience, the class offers observers for each element, as well as callbacks for the beginning and end of the iteration. Those are meant as simplifications to reduce code duplcation in user code, and can be used in combination with each other. For example, a observer can be added that (via a lambda capture) counts statistics of the elements being processed, and those can then be reported at the end of the iteration. This could of course also be achieved by adding this functionatlity in the loop body and at the loop end when running this iterator. However, for example in our tool https://github.com/lczech/grenedalf, we have setup functions for a GenericInputStream (of type VariantInputStream) that are re-used across commands. Then, instead of having to have code duplication or repeated function calls in those commands, we only need to add the callbacks in the shared code that creates the iterator, and they are used in every command automatically.

Lastly, the class offers block buffering in a separate thread, for speed up. This capability takes care of the underlying iterator processing (including potential file parsing etc), and buffers blocks of elements, so that the user of this class has faster access to it. For example, when processing data along a genome with lots of computations per position, it makes sense to run the file reading in a separate thread and buffer positions as needed, which this class does automatically. This can be activated by setting the block_size() to the indended number of elements to be buffered. By default, this is set to 0, meaning that no buffering is done. Note that small buffer sizes can induce overhead for the thread synchronisation; we hence recommend to use block sizes of 1000 or greater, as needed.

We are aware that with all this extra functionality, the class is slighly overloaded, and that the filters and the block buffering would typically go in separate classes for modularity. However, we are taking user convenience and speed into account here: Instead of having to add filters or a block buffer wrapper around each input iterator that is then wrapped in a GenericInputStream anyway, we rather take care of this in one place; this also reduces levels of abstraction, and hence (hopefully) increases processing speed.

See also
VariantInputStream for a use case of this iterator that allows to traverse different input file types that all are convertible to Variant.

Definition at line 163 of file generic_input_stream.hpp.

Public Member Functions

 GenericInputStream ()=default
 
 GenericInputStream (self_type &&)=default
 
 GenericInputStream (self_type const &)=default
 
 GenericInputStream (std::function< bool(value_type &)> get_element, Data &&data, std::shared_ptr< utils::ThreadPool > thread_pool=nullptr, size_t block_size=DEFAULT_BLOCK_SIZE)
 Create an iterator over some underlying content. More...
 
 GenericInputStream (std::function< bool(value_type &)> get_element, Data const &data, std::shared_ptr< utils::ThreadPool > thread_pool=nullptr, size_t block_size=DEFAULT_BLOCK_SIZE)
 Create an iterator over some underlying content. More...
 
 GenericInputStream (std::function< bool(value_type &)> get_element, std::shared_ptr< utils::ThreadPool > thread_pool=nullptr, size_t block_size=DEFAULT_BLOCK_SIZE)
 Create an iterator over some underlying content. More...
 
 ~GenericInputStream ()=default
 
self_typeadd_begin_callback (std::function< void(GenericInputStream const &)> const &callback)
 Add a callback function that is executed when beginning the iteration. More...
 
self_typeadd_end_callback (std::function< void(GenericInputStream const &)> const &callback)
 Add a callback function that is executed when the end of the iteration is reached. More...
 
self_typeadd_filter (std::function< bool(T const &)> const &filter)
 Add a filter function that is applied to each element of the iteration. More...
 
self_typeadd_on_enter_observer (std::function< void(T const &)> const &observer)
 Add a observer function that is executed when the iterator moves to a new element during the iteration. More...
 
self_typeadd_on_leave_observer (std::function< void(T const &)> const &observer)
 Add a observer function that is executed when the iterator moves away from an element during the iteration. More...
 
self_typeadd_transform (std::function< void(T &)> const &transform)
 Add a transformation function that is applied to each element of the iteration. More...
 
self_typeadd_transform_filter (std::function< bool(T &)> const &filter)
 Add a transformation and filter function that is applied to each element of the iteration. More...
 
Iterator begin ()
 
size_t block_size () const
 Get the currenlty set block size used for buffering the input data. More...
 
self_typeblock_size (size_t value)
 Set the block size used for buffering the input data. More...
 
self_typeclear_callbacks ()
 Clear all functions that have been added via add_begin_callback() and add_end_callback(). More...
 
self_typeclear_filters_and_transformations ()
 Clear all filters and transformations. More...
 
self_typeclear_observers ()
 Clear all functions that are executed on incrementing to the next element. More...
 
Datadata ()
 Access the data stored in the iterator. More...
 
Data const & data () const
 Access the data stored in the iterator. More...
 
Iterator end ()
 
 operator bool () const
 Return whether a function to get elemetns was assigend to this generator, that is, whether it is default constructed (false) or not (true). More...
 
self_typeoperator= (self_type &&)=default
 
self_typeoperator= (self_type const &)=default
 
std::shared_ptr< utils::ThreadPoolthread_pool () const
 Get the thread pool used for buffering of elements in this iterator. More...
 
self_typethread_pool (std::shared_ptr< utils::ThreadPool > value)
 Set the thread pool used for buffering of elements in this iterator. More...
 

Public Types

using Data = D
 
using difference_type = std::ptrdiff_t
 
using iterator_category = std::input_iterator_tag
 
using pointer = value_type const *
 
using reference = value_type const &
 
using self_type = GenericInputStream
 
using value_type = T
 

Public Attributes

friend Iterator
 

Static Public Attributes

static const size_t DEFAULT_BLOCK_SIZE = 0
 Default size for block buffering. More...
 

Classes

class  Iterator
 Internal iterator over the data. More...
 

Constructor & Destructor Documentation

◆ GenericInputStream() [1/6]

GenericInputStream ( )
default

◆ GenericInputStream() [2/6]

GenericInputStream ( std::function< bool(value_type &)>  get_element,
std::shared_ptr< utils::ThreadPool thread_pool = nullptr,
size_t  block_size = DEFAULT_BLOCK_SIZE 
)
inline

Create an iterator over some underlying content.

The constructor expects the function that takes an element by reference to assign it its new value at each iteration, and returns true if there was an element (iteration still ongoing), or false once the end of the underlying iterator is reached.

Definition at line 790 of file generic_input_stream.hpp.

◆ GenericInputStream() [3/6]

GenericInputStream ( std::function< bool(value_type &)>  get_element,
Data const &  data,
std::shared_ptr< utils::ThreadPool thread_pool = nullptr,
size_t  block_size = DEFAULT_BLOCK_SIZE 
)
inline

Create an iterator over some underlying content.

The constructor expects the function that takes an element by reference to assign it its new value at each iteration, and returns true if there was an element (iteration still ongoing), or false once the end of the underlying iterator is reached.

Additionally, data can be given here, which we simply store and make accessible via data(). This is a convenience so that iterators generated via a make function can for example forward their input source name for user output.

Definition at line 811 of file generic_input_stream.hpp.

◆ GenericInputStream() [4/6]

GenericInputStream ( std::function< bool(value_type &)>  get_element,
Data &&  data,
std::shared_ptr< utils::ThreadPool thread_pool = nullptr,
size_t  block_size = DEFAULT_BLOCK_SIZE 
)
inline

Create an iterator over some underlying content.

The constructor expects the function that takes an element by reference to assign it its new value at each iteration, and returns true if there was an element (iteration still ongoing), or false once the end of the underlying iterator is reached.

Additionally, data can be given here, which we simply store and make accessible via data(). This is a convenience so that iterators generated via a make function can for example forward their input source name for user output.

This version of the constructor takes the data by r-value reference, for moving it.

Definition at line 827 of file generic_input_stream.hpp.

◆ ~GenericInputStream()

~GenericInputStream ( )
default

◆ GenericInputStream() [5/6]

GenericInputStream ( self_type const &  )
default

◆ GenericInputStream() [6/6]

GenericInputStream ( self_type &&  )
default

Member Function Documentation

◆ add_begin_callback()

self_type& add_begin_callback ( std::function< void(GenericInputStream< T, D > const &)> const &  callback)
inline

Add a callback function that is executed when beginning the iteration.

The callback needs to accept the GenericInputStream object itself, as a means to, for example, access its data(), and is meant as a reporting mechanism. For example, callbacks can be added that write properties of the underlying data sources to log. They are executed in the order added.

Similar to the functionality offered by the observers, this could also be achieved by executing these functions direclty where needed, but having it as a callback here helps to reduce code duplication.

See also add_end_callback().

Definition at line 1047 of file generic_input_stream.hpp.

◆ add_end_callback()

self_type& add_end_callback ( std::function< void(GenericInputStream< T, D > const &)> const &  callback)
inline

Add a callback function that is executed when the end of the iteration is reached.

This is similar to the add_begin_callback() functionality, but instead of executing the callback when starting the iteration, it is called when ending it. Again, this is meant as a means to reduce user code duplication, for example for logging needs.

Definition at line 1065 of file generic_input_stream.hpp.

◆ add_filter()

self_type& add_filter ( std::function< bool(T const &)> const &  filter)
inline

Add a filter function that is applied to each element of the iteration.

If the function returns false, the element is skipped in the iteration.

Note that all of add_transform(), add_filter(), and add_transform_filter() are chained in the order in which they are added - meaning that they can be mixed as needed. For example, it makes sense to first filter by some property, and then apply transformations only on those elements that passed the filter to avoid unneeded work.

Definition at line 920 of file generic_input_stream.hpp.

◆ add_on_enter_observer()

self_type& add_on_enter_observer ( std::function< void(T const &)> const &  observer)
inline

Add a observer function that is executed when the iterator moves to a new element during the iteration.

These functions are executed when starting and incrementing the iterator, once for each element, in the order in which they are added here. They take the element that the iterator just moved to as their argument, so that user code can react to the new element.

They are a way of adding behaviour to the iteration loop that could also simply be placed in the beginning of the loop body of the user code. Still, offering this here can reduce redundant code, such as logging elements during the iteration.

Definition at line 981 of file generic_input_stream.hpp.

◆ add_on_leave_observer()

self_type& add_on_leave_observer ( std::function< void(T const &)> const &  observer)
inline

Add a observer function that is executed when the iterator moves away from an element during the iteration.

These functions are executed when incrementing the iterator, once for each element, in the order in which they are added here. They take the element that the iterator is about to move away from to as their argument, so that user code can react to the new element.

They are a way of adding behaviour to the iteration loop that could also simply be placed in the beginning of the loop body of the user code. Still, offering this here can reduce redundant code, such as logging elements during the iteration.

Definition at line 1005 of file generic_input_stream.hpp.

◆ add_transform()

self_type& add_transform ( std::function< void(T &)> const &  transform)
inline

Add a transformation function that is applied to each element of the iteration.

Note that all of add_transform(), add_filter(), and add_transform_filter() are chained in the order in which they are added - meaning that they can be mixed as needed. For example, it makes sense to first filter by some property, and then apply transformations only on those elements that passed the filter to avoid unneeded work.

Definition at line 903 of file generic_input_stream.hpp.

◆ add_transform_filter()

self_type& add_transform_filter ( std::function< bool(T &)> const &  filter)
inline

Add a transformation and filter function that is applied to each element of the iteration.

This can be used to transform and filter an alement at the same time, as a shortcut where several steps might be needed at once. If the function returns false, the element is skipped in the iteration.

Note that all of add_transform(), add_filter(), and add_transform_filter() are chained in the order in which they are added - meaning that they can be mixed as needed. For example, it makes sense to first filter by some property, and then apply transformations only on those elements that passed the filter to avoid unneeded work.

Definition at line 939 of file generic_input_stream.hpp.

◆ begin()

Iterator begin ( )
inline

Definition at line 852 of file generic_input_stream.hpp.

◆ block_size() [1/2]

size_t block_size ( ) const
inline

Get the currenlty set block size used for buffering the input data.

Definition at line 1128 of file generic_input_stream.hpp.

◆ block_size() [2/2]

self_type& block_size ( size_t  value)
inline

Set the block size used for buffering the input data.

Shall not be changed after iteration has started, that is, after calling begin().

By default, this is set to 0, meaning that no buffering is done. Note that small buffer sizes can induce overhead for the thread synchronisation; we hence recommend to use block sizes of 1000 or greater, as needed.

Definition at line 1142 of file generic_input_stream.hpp.

◆ clear_callbacks()

self_type& clear_callbacks ( )
inline

Clear all functions that have been added via add_begin_callback() and add_end_callback().

Definition at line 1080 of file generic_input_stream.hpp.

◆ clear_filters_and_transformations()

self_type& clear_filters_and_transformations ( )
inline

Clear all filters and transformations.

Definition at line 954 of file generic_input_stream.hpp.

◆ clear_observers()

self_type& clear_observers ( )
inline

Clear all functions that are executed on incrementing to the next element.

This clears both the on enter and on leave observers.

Definition at line 1021 of file generic_input_stream.hpp.

◆ data() [1/2]

Data& data ( )
inline

Access the data stored in the iterator.

Definition at line 886 of file generic_input_stream.hpp.

◆ data() [2/2]

Data const& data ( ) const
inline

Access the data stored in the iterator.

Definition at line 878 of file generic_input_stream.hpp.

◆ end()

Iterator end ( )
inline

Definition at line 861 of file generic_input_stream.hpp.

◆ operator bool()

operator bool ( ) const
inline

Return whether a function to get elemetns was assigend to this generator, that is, whether it is default constructed (false) or not (true).

Definition at line 870 of file generic_input_stream.hpp.

◆ operator=() [1/2]

self_type& operator= ( self_type &&  )
default

◆ operator=() [2/2]

self_type& operator= ( self_type const &  )
default

◆ thread_pool() [1/2]

std::shared_ptr<utils::ThreadPool> thread_pool ( ) const
inline

Get the thread pool used for buffering of elements in this iterator.

Definition at line 1099 of file generic_input_stream.hpp.

◆ thread_pool() [2/2]

self_type& thread_pool ( std::shared_ptr< utils::ThreadPool value)
inline

Set the thread pool used for buffering of elements in this iterator.

Shall not be changed after iteration has started, that is, after calling begin().

Definition at line 1109 of file generic_input_stream.hpp.

Member Typedef Documentation

◆ Data

using Data = D

Definition at line 178 of file generic_input_stream.hpp.

◆ difference_type

using difference_type = std::ptrdiff_t

Definition at line 175 of file generic_input_stream.hpp.

◆ iterator_category

using iterator_category = std::input_iterator_tag

Definition at line 176 of file generic_input_stream.hpp.

◆ pointer

using pointer = value_type const*

Definition at line 173 of file generic_input_stream.hpp.

◆ reference

using reference = value_type const&

Definition at line 174 of file generic_input_stream.hpp.

◆ self_type

Definition at line 171 of file generic_input_stream.hpp.

◆ value_type

using value_type = T

Definition at line 172 of file generic_input_stream.hpp.

Member Data Documentation

◆ DEFAULT_BLOCK_SIZE

const size_t DEFAULT_BLOCK_SIZE = 0
static

Default size for block buffering.

The class can buffers blocks of elements of this size, with the buffer loaded in a separate thread, in order to speed up iterating over elements that need some processing, such as input files, which is the typical use case of this class.

Definition at line 187 of file generic_input_stream.hpp.

◆ Iterator

friend Iterator

Definition at line 846 of file generic_input_stream.hpp.


The documentation for this class was generated from the following file:
genesis::population::convert_to_variant
Variant convert_to_variant(SimplePileupReader::Record const &record, unsigned char min_phred_score)
Definition: simple_pileup_common.cpp:146
genesis::utils::from_file
std::shared_ptr< BaseInputSource > from_file(std::string const &file_name, bool detect_compression=true)
Obtain an input source for reading from a file.
Definition: input_source.hpp:68
genesis::utils::GenericInputStream::end
Iterator end()
Definition: generic_input_stream.hpp:861