#include <genesis/population/formats/variant_parallel_input_iterator.hpp>
Iterate multiple input sources that yield Variants in parallel.
This iterator allows to traverse multiple sources of data in parallel, where each stop of the traversal is a Locus in the input sources. Using ContributionType, one can select the contribution of loci of each input, that is, whether all its loci get used, or just the ones that also overlap with other input sources. See also add_carrying_locus() for other ways to specify the loci to iterate over.
At each visited locus, the iterator yields the data of the underlying input sources as a vector of Optional Variants, with one Variant per input source. If a source does not have data at the current locus, the Optional is empty. Use the dereference operator*()
and operator->()
of the iterator or the access functions variants() and variant_at() to get the set of variants at the current locus() of the iteration, or use joined_variant() to get one Variant that has all sample BaseCounts joined into it.
Furthermore, using the inputs() and input_at() functions, which are also available from the iterator itself, one can access additional information about the underlying iterators, such as the file name and sample names that are being read. This is particularly useful if input sources are added as in the example below, where we use functions such as make_variant_input_iterator_from_pileup_file() to get access to the files, which encapsulate and hence would otherwise hide this information from us. See VariantInputIteratorData for the data structure that is used to store these additional information, and see VariantInputIterator for details on the underlying iterator.
Example:
See the VariantParallelInputIterator::Iterator class for details on access to the data during traversal.
Definition at line 131 of file variant_parallel_input_iterator.hpp.
Public Member Functions | |
VariantParallelInputIterator ()=default | |
VariantParallelInputIterator (self_type &&)=default | |
VariantParallelInputIterator (self_type const &)=default | |
~VariantParallelInputIterator ()=default | |
template<class ForwardIterator > | |
self_type & | add_carrying_loci (ForwardIterator first, ForwardIterator last) |
Add a set of GenomeLoci that are used as carrying loci in the iteration. More... | |
self_type & | add_carrying_loci (std::vector< GenomeLocus > const &loci) |
Add a set of GenomeLoci that are used as carrying loci in the iteration. More... | |
self_type & | add_carrying_locus (GenomeLocus const &locus) |
Add a set of GenomeLoci that are used as carrying loci in the iteration. More... | |
self_type & | add_variant_input (std::function< bool(Variant &)> input_element_generator, ContributionType selection) |
Add an input to the parallel iterator. More... | |
self_type & | add_variant_input_iterator (VariantInputIterator const &input, ContributionType selection) |
Add an input to the parallel iterator. More... | |
Iterator | begin () |
Begin the iteration. More... | |
Iterator | end () |
End marker for the iteration. More... | |
VariantInputIterator & | input_at (size_t index) |
Get access to an input iterator that has been added to this parallel iterator. More... | |
VariantInputIterator const & | input_at (size_t index) const |
Get access to an input iterator that has been added to this parallel iterator. More... | |
size_t | input_size () const |
Return the number of input sourced added. More... | |
std::vector< VariantInputIterator > & | inputs () |
Get access to the input iterators that have been added to this parallel iterator. More... | |
std::vector< VariantInputIterator > const & | inputs () const |
Get access to the input iterators that have been added to this parallel iterator. More... | |
self_type & | operator= (self_type &&)=default |
self_type & | operator= (self_type const &)=default |
Public Types | |
enum | ContributionType { kCarrying, kFollowing } |
Select which loci of an input are used. More... | |
using | self_type = VariantParallelInputIterator |
using | value_type = Variant |
Public Attributes | |
friend | Iterator |
Classes | |
class | Iterator |
Iterator over loci of the input sources. More... | |
|
default |
|
default |
|
default |
|
default |
|
inline |
Add a set of GenomeLoci that are used as carrying loci in the iteration.
Definition at line 684 of file variant_parallel_input_iterator.hpp.
|
inline |
Add a set of GenomeLoci that are used as carrying loci in the iteration.
Definition at line 674 of file variant_parallel_input_iterator.hpp.
|
inline |
Add a set of GenomeLoci that are used as carrying loci in the iteration.
This allows to iterate over a pre-defined set of loci. The iterator stops at each of these loci, independently of whether any of the underlying input sources have data at this locus. That means, it acts as an "empty" input that only contributes loci, as if it were added with ContributionType::kCarrying, but without any actual variants. Duplicate loci in these additional carrying loci are ignored.
Using this is particularly useful for more complex subset operations of loci, such as intersections, complements, (symmetrical) differences, and exclusions. These cases cannot be modelled with our simple ContributionType based approach; so instead, one can externally prepare the list of loci that need to be visited, and provide these to this function. In these cases, to use exactly the list of provided loci, all actual input sources can be added as ContributionType::kFollowing, to make sure that none of them adds additional loci to the traversal.
Note that in addition to the loci added via this function, all loci of input sources that are of ContributionType::kCarrying are also visited.
Definition at line 650 of file variant_parallel_input_iterator.hpp.
|
inline |
Add an input to the parallel iterator.
This version of the function takes the function to obtain elements from the underlying data iterator, same as VariantInputIterator. See there and LambdaIterator for details.
Definition at line 569 of file variant_parallel_input_iterator.hpp.
|
inline |
Add an input to the parallel iterator.
Definition at line 549 of file variant_parallel_input_iterator.hpp.
|
inline |
Begin the iteration.
Use this to obtain an VariantParallelInputIterator::Iterator that starts traversing the input sources.
Definition at line 529 of file variant_parallel_input_iterator.hpp.
|
inline |
End marker for the iteration.
Definition at line 537 of file variant_parallel_input_iterator.hpp.
|
inline |
Get access to an input iterator that has been added to this parallel iterator.
Definition at line 611 of file variant_parallel_input_iterator.hpp.
|
inline |
Get access to an input iterator that has been added to this parallel iterator.
Definition at line 603 of file variant_parallel_input_iterator.hpp.
|
inline |
Return the number of input sourced added.
Definition at line 619 of file variant_parallel_input_iterator.hpp.
|
inline |
Get access to the input iterators that have been added to this parallel iterator.
This non-const version of the function can for exmple be used to bulk-add filters and transformations to the iterators, using their functions add_transform(), add_filter(), and add_transform_filter(); see utils::LambdaIterator for details.
Definition at line 595 of file variant_parallel_input_iterator.hpp.
|
inline |
Get access to the input iterators that have been added to this parallel iterator.
Definition at line 580 of file variant_parallel_input_iterator.hpp.
Definition at line 190 of file variant_parallel_input_iterator.hpp.
using value_type = Variant |
Definition at line 191 of file variant_parallel_input_iterator.hpp.
|
strong |
Select which loci of an input are used.
We offer two ways an input can be traversed over: Either take all its loci (carrying), or only those which also appear in other inputs as well (following).
For the most part, the kCarrying
type acts as a set union of the input loci; all loci of all sources that are added with that type get visited. The kFollowing
type on the other hand does not contribute its unique loci (i.e., the ones that are private to itself / do not appear in any other input source), but also does not change or constrain the ones that are visited by the carrying inputs.
A notable case happens if all inputs are added as type kFollowing
: In the absence of a carrying set of loci, only those loci are visited that are in all inputs; in other words, in this case, the kFollowing
type acts as an intersection of loci.
NB: We do not call these two types "union" and "intersection", as we feel that this might be confusing. These terms describe operations on two or more sets, and are not properties of any single set. For example, a carrying input and a following input combined do neither yield the union nor the intersection of the two, but instead just all loci from the first one. Only if all inputs are of the same type, either carrying or following, does the result behave as the union or intersection of the loci, respectively.
This model does not allow more complex subset operations of loci, such as per-input intersections, complements, (symmetrical) differences, and exclusions. For these cases, one can use the add_carrying_locus() and add_carrying_loci() functions that allow a pre-defined set of loci to be iterated over; if then all actual data inputs are set to be following, only those pre-defined loci will be visited, making it possible to select an arbitrary set of loci for iteration.
Enumerator | |
---|---|
kCarrying | For a given input, stop at all its positions. Other input sources that do not have data at these loci will then have the Optional be empty in the iterator at this locus. |
kFollowing | For a given input, only stop at positions where other inputs also want to stop. In other words, this input does not contribute the loci that are unique to it to the traversal, but contributes its data only at the loci that are visited by others (or has an empty Optional Variant, if it does not have data at a visited Locus). |
Definition at line 169 of file variant_parallel_input_iterator.hpp.
friend Iterator |
Definition at line 517 of file variant_parallel_input_iterator.hpp.