A library for working with phylogenetic and population genetic data.
v0.27.0
VariantParallelInputIterator Class Reference

#include <genesis/population/formats/variant_parallel_input_iterator.hpp>

Detailed Description

Iterate multiple input sources that yield Variants in parallel.

This iterator allows to traverse multiple sources of data in parallel, where each stop of the traversal is a Locus in the input sources. Using ContributionType, one can select the contribution of loci of each input, that is, whether all its loci get used, or just the ones that also overlap with other input sources. See also add_carrying_locus() for other ways to specify the loci to iterate over.

At each visited locus, the iterator yields the data of the underlying input sources as a vector of Optional Variants, with one Variant per input source. If a source does not have data at the current locus, the Optional is empty. Use the dereference operator*() and operator->() of the iterator or the access functions variants() and variant_at() to get the set of variants at the current locus() of the iteration, or use joined_variant() to get one Variant that has all sample BaseCounts joined into it.

Furthermore, using the inputs() and input_at() functions, which are also available from the iterator itself, one can access additional information about the underlying iterators, such as the file name and sample names that are being read. This is particularly useful if input sources are added as in the example below, where we use functions such as make_variant_input_iterator_from_pileup_file() to get access to the files, which encapsulate and hence would otherwise hide this information from us. See VariantInputIteratorData for the data structure that is used to store these additional information, and see VariantInputIterator for details on the underlying iterator.

Example:

// Add input sources to a parallel iterator, one carrying, so that all its loci are visited,
// and one following, meaning that its loci are only visited if the first one also
// as those loci.
auto parallel = VariantParallelInputIterator();
parallel.add_variant_input_iterator(
make_variant_input_iterator_from_pileup_file( "path/to/file.pileup.gz" ),
);
parallel.add_variant_input_iterator(
);
for( auto it = parallel.begin(); it != parallel.end(); ++it ) {
// Work with the iterator, which stops at every locus of the first input source.
std::cout << "At: " << it.locus() << "\n";
for( auto const& var : *it ) {
if( var ) {
// The optional has data, and the variant is valid, which means that
// the input has data at the current locus.
// For example, get the number of samples of that variant
// (which has to be constant over the genome for each input -
// we do not allow inputs to change their number of samples during iteration).
auto const s = var->samples.size();
}
}
// Or get all data combined into one Variant.
auto const joined_var = it.joined_variant();
}
// or
for( auto const& opt_variant : parallel ) {
// Work directly with the std::vector<utils::Optional<Variant>> opt_variant that is
// returned by dereferencing the iterator. Note that this does not allow access to other
// members of the iterator, such as its locus() and joined_variant() functions.
}

See the VariantParallelInputIterator::Iterator class for details on access to the data during traversal.

Definition at line 131 of file variant_parallel_input_iterator.hpp.

Public Member Functions

 VariantParallelInputIterator ()=default
 
 VariantParallelInputIterator (self_type &&)=default
 
 VariantParallelInputIterator (self_type const &)=default
 
 ~VariantParallelInputIterator ()=default
 
template<class ForwardIterator >
self_typeadd_carrying_loci (ForwardIterator first, ForwardIterator last)
 Add a set of GenomeLoci that are used as carrying loci in the iteration. More...
 
self_typeadd_carrying_loci (std::vector< GenomeLocus > const &loci)
 Add a set of GenomeLoci that are used as carrying loci in the iteration. More...
 
self_typeadd_carrying_locus (GenomeLocus const &locus)
 Add a set of GenomeLoci that are used as carrying loci in the iteration. More...
 
self_typeadd_variant_input (std::function< bool(Variant &)> input_element_generator, ContributionType selection)
 Add an input to the parallel iterator. More...
 
self_typeadd_variant_input_iterator (VariantInputIterator const &input, ContributionType selection)
 Add an input to the parallel iterator. More...
 
Iterator begin ()
 Begin the iteration. More...
 
Iterator end ()
 End marker for the iteration. More...
 
VariantInputIteratorinput_at (size_t index)
 Get access to an input iterator that has been added to this parallel iterator. More...
 
VariantInputIterator const & input_at (size_t index) const
 Get access to an input iterator that has been added to this parallel iterator. More...
 
size_t input_size () const
 Return the number of input sourced added. More...
 
std::vector< VariantInputIterator > & inputs ()
 Get access to the input iterators that have been added to this parallel iterator. More...
 
std::vector< VariantInputIterator > const & inputs () const
 Get access to the input iterators that have been added to this parallel iterator. More...
 
self_typeoperator= (self_type &&)=default
 
self_typeoperator= (self_type const &)=default
 

Public Types

enum  ContributionType { kCarrying, kFollowing }
 Select which loci of an input are used. More...
 
using self_type = VariantParallelInputIterator
 
using value_type = Variant
 

Public Attributes

friend Iterator
 

Classes

class  Iterator
 Iterator over loci of the input sources. More...
 

Constructor & Destructor Documentation

◆ VariantParallelInputIterator() [1/3]

◆ ~VariantParallelInputIterator()

◆ VariantParallelInputIterator() [2/3]

VariantParallelInputIterator ( self_type const &  )
default

◆ VariantParallelInputIterator() [3/3]

Member Function Documentation

◆ add_carrying_loci() [1/2]

self_type& add_carrying_loci ( ForwardIterator  first,
ForwardIterator  last 
)
inline

Add a set of GenomeLoci that are used as carrying loci in the iteration.

See also
VariantParallelInputIterator::add_carrying_locus( GenomeLocus const& )

Definition at line 684 of file variant_parallel_input_iterator.hpp.

◆ add_carrying_loci() [2/2]

self_type& add_carrying_loci ( std::vector< GenomeLocus > const &  loci)
inline

Add a set of GenomeLoci that are used as carrying loci in the iteration.

See also
VariantParallelInputIterator::add_carrying_locus( GenomeLocus const& )

Definition at line 674 of file variant_parallel_input_iterator.hpp.

◆ add_carrying_locus()

self_type& add_carrying_locus ( GenomeLocus const &  locus)
inline

Add a set of GenomeLoci that are used as carrying loci in the iteration.

This allows to iterate over a pre-defined set of loci. The iterator stops at each of these loci, independently of whether any of the underlying input sources have data at this locus. That means, it acts as an "empty" input that only contributes loci, as if it were added with ContributionType::kCarrying, but without any actual variants. Duplicate loci in these additional carrying loci are ignored.

Using this is particularly useful for more complex subset operations of loci, such as intersections, complements, (symmetrical) differences, and exclusions. These cases cannot be modelled with our simple ContributionType based approach; so instead, one can externally prepare the list of loci that need to be visited, and provide these to this function. In these cases, to use exactly the list of provided loci, all actual input sources can be added as ContributionType::kFollowing, to make sure that none of them adds additional loci to the traversal.

Note that in addition to the loci added via this function, all loci of input sources that are of ContributionType::kCarrying are also visited.

Definition at line 650 of file variant_parallel_input_iterator.hpp.

◆ add_variant_input()

self_type& add_variant_input ( std::function< bool(Variant &)>  input_element_generator,
ContributionType  selection 
)
inline

Add an input to the parallel iterator.

This version of the function takes the function to obtain elements from the underlying data iterator, same as VariantInputIterator. See there and LambdaIterator for details.

Definition at line 569 of file variant_parallel_input_iterator.hpp.

◆ add_variant_input_iterator()

self_type& add_variant_input_iterator ( VariantInputIterator const &  input,
ContributionType  selection 
)
inline

Add an input to the parallel iterator.

Definition at line 549 of file variant_parallel_input_iterator.hpp.

◆ begin()

Iterator begin ( )
inline

Begin the iteration.

Use this to obtain an VariantParallelInputIterator::Iterator that starts traversing the input sources.

Definition at line 529 of file variant_parallel_input_iterator.hpp.

◆ end()

Iterator end ( )
inline

End marker for the iteration.

Definition at line 537 of file variant_parallel_input_iterator.hpp.

◆ input_at() [1/2]

VariantInputIterator& input_at ( size_t  index)
inline

Get access to an input iterator that has been added to this parallel iterator.

Definition at line 611 of file variant_parallel_input_iterator.hpp.

◆ input_at() [2/2]

VariantInputIterator const& input_at ( size_t  index) const
inline

Get access to an input iterator that has been added to this parallel iterator.

Definition at line 603 of file variant_parallel_input_iterator.hpp.

◆ input_size()

size_t input_size ( ) const
inline

Return the number of input sourced added.

Definition at line 619 of file variant_parallel_input_iterator.hpp.

◆ inputs() [1/2]

std::vector<VariantInputIterator>& inputs ( )
inline

Get access to the input iterators that have been added to this parallel iterator.

This non-const version of the function can for exmple be used to bulk-add filters and transformations to the iterators, using their functions add_transform(), add_filter(), and add_transform_filter(); see utils::LambdaIterator for details.

Definition at line 595 of file variant_parallel_input_iterator.hpp.

◆ inputs() [2/2]

std::vector<VariantInputIterator> const& inputs ( ) const
inline

Get access to the input iterators that have been added to this parallel iterator.

Definition at line 580 of file variant_parallel_input_iterator.hpp.

◆ operator=() [1/2]

self_type& operator= ( self_type &&  )
default

◆ operator=() [2/2]

self_type& operator= ( self_type const &  )
default

Member Typedef Documentation

◆ self_type

◆ value_type

Definition at line 191 of file variant_parallel_input_iterator.hpp.

Member Enumeration Documentation

◆ ContributionType

enum ContributionType
strong

Select which loci of an input are used.

We offer two ways an input can be traversed over: Either take all its loci (carrying), or only those which also appear in other inputs as well (following).

For the most part, the kCarrying type acts as a set union of the input loci; all loci of all sources that are added with that type get visited. The kFollowing type on the other hand does not contribute its unique loci (i.e., the ones that are private to itself / do not appear in any other input source), but also does not change or constrain the ones that are visited by the carrying inputs.

A notable case happens if all inputs are added as type kFollowing: In the absence of a carrying set of loci, only those loci are visited that are in all inputs; in other words, in this case, the kFollowing type acts as an intersection of loci.

NB: We do not call these two types "union" and "intersection", as we feel that this might be confusing. These terms describe operations on two or more sets, and are not properties of any single set. For example, a carrying input and a following input combined do neither yield the union nor the intersection of the two, but instead just all loci from the first one. Only if all inputs are of the same type, either carrying or following, does the result behave as the union or intersection of the loci, respectively.

This model does not allow more complex subset operations of loci, such as per-input intersections, complements, (symmetrical) differences, and exclusions. For these cases, one can use the add_carrying_locus() and add_carrying_loci() functions that allow a pre-defined set of loci to be iterated over; if then all actual data inputs are set to be following, only those pre-defined loci will be visited, making it possible to select an arbitrary set of loci for iteration.

Enumerator
kCarrying 

For a given input, stop at all its positions.

Other input sources that do not have data at these loci will then have the Optional be empty in the iterator at this locus.

kFollowing 

For a given input, only stop at positions where other inputs also want to stop.

In other words, this input does not contribute the loci that are unique to it to the traversal, but contributes its data only at the loci that are visited by others (or has an empty Optional Variant, if it does not have data at a visited Locus).

Definition at line 169 of file variant_parallel_input_iterator.hpp.

Member Data Documentation

◆ Iterator

friend Iterator

Definition at line 517 of file variant_parallel_input_iterator.hpp.


The documentation for this class was generated from the following file:
genesis::population::VariantParallelInputIterator::VariantParallelInputIterator
VariantParallelInputIterator()=default
genesis::population::make_variant_input_iterator_from_sync_file
VariantInputIterator make_variant_input_iterator_from_sync_file(std::string const &filename)
Create a VariantInputIterator to iterate the contents of a PoPoolation2 sync file as Variants.
Definition: variant_input_iterator.cpp:314
genesis::population::VariantParallelInputIterator::ContributionType::kCarrying
@ kCarrying
For a given input, stop at all its positions.
genesis::population::VariantParallelInputIterator::ContributionType::kFollowing
@ kFollowing
For a given input, only stop at positions where other inputs also want to stop.
genesis::population::make_variant_input_iterator_from_pileup_file
VariantInputIterator make_variant_input_iterator_from_pileup_file(std::string const &filename, SimplePileupReader const &reader)
Create a VariantInputIterator to iterate the contents of a (m)pileup file as Variants.
Definition: variant_input_iterator.cpp:237