#include <genesis/utils/containers/generic_input_stream.hpp>
Type erasure for iterators, using std::function to eliminate the underlying input type.
This class offers an abstraction to get a uniform iterator type over a set of underlying iterators of different type. It expects a function (most likely, you want to use a lambda) that converts the underlying data into the desired type T, which is where the type erasure happens. The data that is iterated over is stored here, and the end of the iterator is indicated by the lambda by returning false
.
Example:
For other types of iterators, instead of beg
and end
, other input can be used:
And accordingly for other underlying iterator types.
In addition to the type T
that we iterate over, for user convenience, we also offer to use a data storage variable of the template type D
(typedef'd as GenericInputStream::Data). This data is provided at construction of the GenericInputStream, and can be accessed via the data() functions. it is a generic extra variable to store iterator-specific information. As the GenericInputStream is intended to be initializable with just a lambda function that yields the elements to traverse over, there is otherwise no convenient way to access related information of the underlying iterator. For example, when iterating over a file, one might want to store the file name or other characteristics of the input in the data().
The class furthermore offers filters and transformations of the underlying iterator data, using the functions add_filter(), add_transform(), and add_transform_filter(), which can all be mixed and are executed as a combined list in the order in which they were added using these three functions (that is, it can be first a filter, then a transformation, then a filter again). This allows to easily skip elements of the underlying iterator without the need to add an additional layer of abstraction.
As an additional layer of convenience, the class offers observers for each element, as well as callbacks for the beginning and end of the iteration. Those are meant as simplifications to reduce code duplcation in user code, and can be used in combination with each other. For example, a observer can be added that (via a lambda capture) counts statistics of the elements being processed, and those can then be reported at the end of the iteration. This could of course also be achieved by adding this functionatlity in the loop body and at the loop end when running this iterator. However, for example in our tool https://github.com/lczech/grenedalf, we have setup functions for a GenericInputStream (of type VariantInputStream) that are re-used across commands. Then, instead of having to have code duplication or repeated function calls in those commands, we only need to add the callbacks in the shared code that creates the iterator, and they are used in every command automatically.
Lastly, the class offers block buffering in a separate thread, for speed up. This capability takes care of the underlying iterator processing (including potential file parsing etc), and buffers blocks of elements, so that the user of this class has faster access to it. For example, when processing data along a genome with lots of computations per position, it makes sense to run the file reading in a separate thread and buffer positions as needed, which this class does automatically. This can be activated by setting the block_size() to the indended number of elements to be buffered. By default, this is set to 0, meaning that no buffering is done. Note that small buffer sizes can induce overhead for the thread synchronisation; we hence recommend to use block sizes of 1000 or greater, as needed.
We are aware that with all this extra functionality, the class is slighly overloaded, and that the filters and the block buffering would typically go in separate classes for modularity. However, we are taking user convenience and speed into account here: Instead of having to add filters or a block buffer wrapper around each input iterator that is then wrapped in a GenericInputStream anyway, we rather take care of this in one place; this also reduces levels of abstraction, and hence (hopefully) increases processing speed.
Definition at line 163 of file generic_input_stream.hpp.
Public Member Functions | |
GenericInputStream ()=default | |
GenericInputStream (self_type &&)=default | |
GenericInputStream (self_type const &)=default | |
GenericInputStream (std::function< bool(value_type &)> get_element, Data &&data, std::shared_ptr< utils::ThreadPool > thread_pool=nullptr, size_t block_size=DEFAULT_BLOCK_SIZE) | |
Create an iterator over some underlying content. More... | |
GenericInputStream (std::function< bool(value_type &)> get_element, Data const &data, std::shared_ptr< utils::ThreadPool > thread_pool=nullptr, size_t block_size=DEFAULT_BLOCK_SIZE) | |
Create an iterator over some underlying content. More... | |
GenericInputStream (std::function< bool(value_type &)> get_element, std::shared_ptr< utils::ThreadPool > thread_pool=nullptr, size_t block_size=DEFAULT_BLOCK_SIZE) | |
Create an iterator over some underlying content. More... | |
~GenericInputStream ()=default | |
self_type & | add_begin_callback (std::function< void(GenericInputStream const &)> const &callback) |
Add a callback function that is executed when beginning the iteration. More... | |
self_type & | add_end_callback (std::function< void(GenericInputStream const &)> const &callback) |
Add a callback function that is executed when the end of the iteration is reached. More... | |
self_type & | add_filter (std::function< bool(T const &)> const &filter) |
Add a filter function that is applied to each element of the iteration. More... | |
self_type & | add_on_enter_observer (std::function< void(T const &)> const &observer) |
Add a observer function that is executed when the iterator moves to a new element during the iteration. More... | |
self_type & | add_on_leave_observer (std::function< void(T const &)> const &observer) |
Add a observer function that is executed when the iterator moves away from an element during the iteration. More... | |
self_type & | add_transform (std::function< void(T &)> const &transform) |
Add a transformation function that is applied to each element of the iteration. More... | |
self_type & | add_transform_filter (std::function< bool(T &)> const &filter) |
Add a transformation and filter function that is applied to each element of the iteration. More... | |
Iterator | begin () |
size_t | block_size () const |
Get the currenlty set block size used for buffering the input data. More... | |
self_type & | block_size (size_t value) |
Set the block size used for buffering the input data. More... | |
self_type & | clear_callbacks () |
Clear all functions that have been added via add_begin_callback() and add_end_callback(). More... | |
self_type & | clear_filters_and_transformations () |
Clear all filters and transformations. More... | |
self_type & | clear_observers () |
Clear all functions that are executed on incrementing to the next element. More... | |
Data & | data () |
Access the data stored in the iterator. More... | |
Data const & | data () const |
Access the data stored in the iterator. More... | |
Iterator | end () |
operator bool () const | |
Return whether a function to get elemetns was assigend to this generator, that is, whether it is default constructed (false ) or not (true ). More... | |
self_type & | operator= (self_type &&)=default |
self_type & | operator= (self_type const &)=default |
std::shared_ptr< utils::ThreadPool > | thread_pool () const |
Get the thread pool used for buffering of elements in this iterator. More... | |
self_type & | thread_pool (std::shared_ptr< utils::ThreadPool > value) |
Set the thread pool used for buffering of elements in this iterator. More... | |
Public Types | |
using | Data = D |
using | difference_type = std::ptrdiff_t |
using | iterator_category = std::input_iterator_tag |
using | pointer = value_type const * |
using | reference = value_type const & |
using | self_type = GenericInputStream |
using | value_type = T |
Public Attributes | |
friend | Iterator |
Static Public Attributes | |
static const size_t | DEFAULT_BLOCK_SIZE = 0 |
Default size for block buffering. More... | |
Classes | |
class | Iterator |
Internal iterator over the data. More... | |
|
default |
|
inline |
Create an iterator over some underlying content.
The constructor expects the function that takes an element by reference to assign it its new value at each iteration, and returns true
if there was an element (iteration still ongoing), or false
once the end of the underlying iterator is reached.
Definition at line 790 of file generic_input_stream.hpp.
|
inline |
Create an iterator over some underlying content.
The constructor expects the function that takes an element by reference to assign it its new value at each iteration, and returns true
if there was an element (iteration still ongoing), or false
once the end of the underlying iterator is reached.
Additionally, data
can be given here, which we simply store and make accessible via data(). This is a convenience so that iterators generated via a make
function can for example forward their input source name for user output.
Definition at line 811 of file generic_input_stream.hpp.
|
inline |
Create an iterator over some underlying content.
The constructor expects the function that takes an element by reference to assign it its new value at each iteration, and returns true
if there was an element (iteration still ongoing), or false
once the end of the underlying iterator is reached.
Additionally, data
can be given here, which we simply store and make accessible via data(). This is a convenience so that iterators generated via a make
function can for example forward their input source name for user output.
This version of the constructor takes the data by r-value reference, for moving it.
Definition at line 827 of file generic_input_stream.hpp.
|
default |
|
default |
|
default |
|
inline |
Add a callback function that is executed when beginning the iteration.
The callback needs to accept the GenericInputStream object itself, as a means to, for example, access its data(), and is meant as a reporting mechanism. For example, callbacks can be added that write properties of the underlying data sources to log. They are executed in the order added.
Similar to the functionality offered by the observers, this could also be achieved by executing these functions direclty where needed, but having it as a callback here helps to reduce code duplication.
See also add_end_callback().
Definition at line 1047 of file generic_input_stream.hpp.
|
inline |
Add a callback function that is executed when the end of the iteration is reached.
This is similar to the add_begin_callback() functionality, but instead of executing the callback when starting the iteration, it is called when ending it. Again, this is meant as a means to reduce user code duplication, for example for logging needs.
Definition at line 1065 of file generic_input_stream.hpp.
|
inline |
Add a filter function that is applied to each element of the iteration.
If the function returns false
, the element is skipped in the iteration.
Note that all of add_transform(), add_filter(), and add_transform_filter() are chained in the order in which they are added - meaning that they can be mixed as needed. For example, it makes sense to first filter by some property, and then apply transformations only on those elements that passed the filter to avoid unneeded work.
Definition at line 920 of file generic_input_stream.hpp.
|
inline |
Add a observer function that is executed when the iterator moves to a new element during the iteration.
These functions are executed when starting and incrementing the iterator, once for each element, in the order in which they are added here. They take the element that the iterator just moved to as their argument, so that user code can react to the new element.
They are a way of adding behaviour to the iteration loop that could also simply be placed in the beginning of the loop body of the user code. Still, offering this here can reduce redundant code, such as logging elements during the iteration.
Definition at line 981 of file generic_input_stream.hpp.
|
inline |
Add a observer function that is executed when the iterator moves away from an element during the iteration.
These functions are executed when incrementing the iterator, once for each element, in the order in which they are added here. They take the element that the iterator is about to move away from to as their argument, so that user code can react to the new element.
They are a way of adding behaviour to the iteration loop that could also simply be placed in the beginning of the loop body of the user code. Still, offering this here can reduce redundant code, such as logging elements during the iteration.
Definition at line 1005 of file generic_input_stream.hpp.
|
inline |
Add a transformation function that is applied to each element of the iteration.
Note that all of add_transform(), add_filter(), and add_transform_filter() are chained in the order in which they are added - meaning that they can be mixed as needed. For example, it makes sense to first filter by some property, and then apply transformations only on those elements that passed the filter to avoid unneeded work.
Definition at line 903 of file generic_input_stream.hpp.
|
inline |
Add a transformation and filter function that is applied to each element of the iteration.
This can be used to transform and filter an alement at the same time, as a shortcut where several steps might be needed at once. If the function returns false
, the element is skipped in the iteration.
Note that all of add_transform(), add_filter(), and add_transform_filter() are chained in the order in which they are added - meaning that they can be mixed as needed. For example, it makes sense to first filter by some property, and then apply transformations only on those elements that passed the filter to avoid unneeded work.
Definition at line 939 of file generic_input_stream.hpp.
|
inline |
Definition at line 852 of file generic_input_stream.hpp.
|
inline |
Get the currenlty set block size used for buffering the input data.
Definition at line 1128 of file generic_input_stream.hpp.
|
inline |
Set the block size used for buffering the input data.
Shall not be changed after iteration has started, that is, after calling begin().
By default, this is set to 0, meaning that no buffering is done. Note that small buffer sizes can induce overhead for the thread synchronisation; we hence recommend to use block sizes of 1000 or greater, as needed.
Definition at line 1142 of file generic_input_stream.hpp.
|
inline |
Clear all functions that have been added via add_begin_callback() and add_end_callback().
Definition at line 1080 of file generic_input_stream.hpp.
|
inline |
Clear all filters and transformations.
Definition at line 954 of file generic_input_stream.hpp.
|
inline |
Clear all functions that are executed on incrementing to the next element.
This clears both the on enter and on leave observers.
Definition at line 1021 of file generic_input_stream.hpp.
|
inline |
Access the data stored in the iterator.
Definition at line 886 of file generic_input_stream.hpp.
|
inline |
Access the data stored in the iterator.
Definition at line 878 of file generic_input_stream.hpp.
|
inline |
Definition at line 861 of file generic_input_stream.hpp.
|
inline |
Return whether a function to get elemetns was assigend to this generator, that is, whether it is default constructed (false
) or not (true
).
Definition at line 870 of file generic_input_stream.hpp.
|
inline |
Get the thread pool used for buffering of elements in this iterator.
Definition at line 1099 of file generic_input_stream.hpp.
|
inline |
Set the thread pool used for buffering of elements in this iterator.
Shall not be changed after iteration has started, that is, after calling begin().
Definition at line 1109 of file generic_input_stream.hpp.
using Data = D |
Definition at line 178 of file generic_input_stream.hpp.
using difference_type = std::ptrdiff_t |
Definition at line 175 of file generic_input_stream.hpp.
using iterator_category = std::input_iterator_tag |
Definition at line 176 of file generic_input_stream.hpp.
using pointer = value_type const* |
Definition at line 173 of file generic_input_stream.hpp.
using reference = value_type const& |
Definition at line 174 of file generic_input_stream.hpp.
using self_type = GenericInputStream |
Definition at line 171 of file generic_input_stream.hpp.
using value_type = T |
Definition at line 172 of file generic_input_stream.hpp.
|
static |
Default size for block buffering.
The class can buffers blocks of elements of this size, with the buffer loaded in a separate thread, in order to speed up iterating over elements that need some processing, such as input files, which is the typical use case of this class.
Definition at line 187 of file generic_input_stream.hpp.
friend Iterator |
Definition at line 846 of file generic_input_stream.hpp.