A library for working with phylogenetic data.
v0.25.0
VcfInputIterator Class Reference

#include <genesis/population/formats/vcf_input_iterator.hpp>

Detailed Description

Iterate an input source and parse it as a VCF/BCF file.

This simple wrapper allows easy iteration through the records/lines of a VCF/BCF file, and takes care of setting up the HtsFile, VcfHeader, and VcfRecord.

Basic usage:

auto it = VcfInputIterator( infile );
while( it ) {
    // work with it.record() or it->...
    ++it;
}

or

for( auto it = VcfInputIterator( infile );  it; ++it ) {
    // work with it
}

For details on working with the records/lines, see VcfRecord and VcfFormatIterator.

Caveat: The iterator is an input iterator that traverses a single VCF file in one go. We internally use a buffer to speed up the reading asynchronously in the background. However, as we do not copy the buffer when copying this iterator (for speed reasons, because in typical iterator use cases that is just not necessary), any copy of this iterator might point to old or invalid data. Hence, once an iterator is incremented, all copies of it are invalidated.

This also means that the iterator is not thread safe: Incrementing an iterator or any copy of it from multiple tasks leads to undefined behaviour, and hence needs to be synchronized externally.

All this is typically not an issue, as an iterator is traversed in a single loop over a file. Any copies are usually mere coincidental while preparing the input etc, but normally they are not concurrently accessed in the actual loop over VcfRecords.

Definition at line 91 of file vcf_input_iterator.hpp.

Public Member Functions

 VcfInputIterator ()=default
 Create a default instance, with no input. This is also the past-the-end iterator. More...
 
 VcfInputIterator (self_type &&)=default
 
 VcfInputIterator (self_type const &)=default
 
 VcfInputIterator (std::string const &filename, size_t block_size=1024)
 Create an instance that reads from an input file name. More...
 
 VcfInputIterator (std::string const &filename, std::vector< std::string > const &sample_names, bool inverse_sample_names=false, size_t block_size=1024)
 Create an instance that reads from an input file name. More...
 
 ~VcfInputIterator ()=default
 
std::string const & filename () const
 
bool good () const
 
VcfHeaderheader ()
 
VcfHeader const & header () const
 
HtsFilehts_file ()
 
HtsFile const & hts_file () const
 
 operator bool () const
 Return true iff dereferencing is valid, i.e., iff there is a VCF record available. More...
 
bool operator!= (self_type const &it) const
 
VcfRecordoperator* ()
 
VcfRecord const & operator* () const
 
self_typeoperator++ ()
 
self_typeoperator++ (int)
 
VcfRecordoperator-> ()
 
VcfRecord const * operator-> () const
 
self_typeoperator= (self_type &&)=default
 
self_typeoperator= (self_type const &)=default
 
bool operator== (self_type const &it) const
 
VcfRecordrecord ()
 
VcfRecord const & record () const
 

Public Types

using difference_type = std::ptrdiff_t
 
using iterator_category = std::input_iterator_tag
 
using pointer = value_type const *
 
using reference = value_type const &
 
using self_type = VcfInputIterator
 
using value_type = VcfRecord
 

Constructor & Destructor Documentation

◆ VcfInputIterator() [1/5]

VcfInputIterator ( )
default

Create a default instance, with no input. This is also the past-the-end iterator.

◆ VcfInputIterator() [2/5]

VcfInputIterator ( std::string const &  filename,
size_t  block_size = 1024 
)
inlineexplicit

Create an instance that reads from an input file name.

The optional parameter block_size sets the number of VcfRecords that are read asynchronously into a buffer for speed improvements. This is mostly interesting for window- or region-based analyses, where a certain number of records are needed to fill the window, on which afterwards some (potentially time-consuming) operations and computations are performed. In that time, an asynchronous thread can already read the next block of VCF records. In these scenarios, it is best to chose a block_size that is larger than the typical number of records per window/region that is being processed. E.g., if most windows contain between 1200 and 1500 VcfRecords, a good block_size is 3000 or 5000, so that subsequent windows can be filled quickly without having to wait for the reading.

Definition at line 128 of file vcf_input_iterator.hpp.

◆ VcfInputIterator() [3/5]

VcfInputIterator ( std::string const &  filename,
std::vector< std::string > const &  sample_names,
bool  inverse_sample_names = false,
size_t  block_size = 1024 
)
inline

Create an instance that reads from an input file name.

Additionally, this constructor takes a list of sample_names which are used as filter so that only those samples (columns of the VCF records) are evaluated and accessible - or, if inverse_sample_names is set to true, instead all but those samples.

The optional parameter block_size sets the number of VcfRecords that are read asynchronously into a buffer for speed improvements. This is mostly interesting for window- or region-based analyses, where a certain number of records are needed to fill the window, on which afterwards some (potentially time-consuming) operations and computations are performed. In that time, an asynchronous thread can already read the next block of VCF records. In these scenarios, it is best to chose a block_size that is larger than the typical number of records per window/region that is being processed. E.g., if most windows contain between 1200 and 1500 VcfRecords, a good block_size is 3000 or 5000, so that subsequent windows can be filled quickly without having to wait for the reading.

Definition at line 145 of file vcf_input_iterator.hpp.

◆ ~VcfInputIterator()

~VcfInputIterator ( )
default

◆ VcfInputIterator() [4/5]

VcfInputIterator ( self_type const &  )
default

◆ VcfInputIterator() [5/5]

VcfInputIterator ( self_type &&  )
default

Member Function Documentation

◆ filename()

std::string const& filename ( ) const
inline

Definition at line 209 of file vcf_input_iterator.hpp.

◆ good()

bool good ( ) const
inline

Definition at line 199 of file vcf_input_iterator.hpp.

◆ header() [1/2]

VcfHeader& header ( )
inline

Definition at line 237 of file vcf_input_iterator.hpp.

◆ header() [2/2]

VcfHeader const& header ( ) const
inline

Definition at line 231 of file vcf_input_iterator.hpp.

◆ hts_file() [1/2]

HtsFile& hts_file ( )
inline

Definition at line 225 of file vcf_input_iterator.hpp.

◆ hts_file() [2/2]

HtsFile const& hts_file ( ) const
inline

Definition at line 214 of file vcf_input_iterator.hpp.

◆ operator bool()

operator bool ( ) const
inlineexplicit

Return true iff dereferencing is valid, i.e., iff there is a VCF record available.

Definition at line 193 of file vcf_input_iterator.hpp.

◆ operator!=()

bool operator!= ( self_type const &  it) const
inline

Definition at line 331 of file vcf_input_iterator.hpp.

◆ operator*() [1/2]

VcfRecord& operator* ( )
inline

Definition at line 278 of file vcf_input_iterator.hpp.

◆ operator*() [2/2]

VcfRecord const& operator* ( ) const
inline

Definition at line 271 of file vcf_input_iterator.hpp.

◆ operator++() [1/2]

self_type& operator++ ( )
inline

Definition at line 289 of file vcf_input_iterator.hpp.

◆ operator++() [2/2]

self_type& operator++ ( int  )
inline

Definition at line 295 of file vcf_input_iterator.hpp.

◆ operator->() [1/2]

VcfRecord* operator-> ( )
inline

Definition at line 264 of file vcf_input_iterator.hpp.

◆ operator->() [2/2]

VcfRecord const* operator-> ( ) const
inline

Definition at line 257 of file vcf_input_iterator.hpp.

◆ operator=() [1/2]

self_type& operator= ( self_type &&  )
default

◆ operator=() [2/2]

self_type& operator= ( self_type const &  )
default

◆ operator==()

bool operator== ( self_type const &  it) const
inline

Definition at line 301 of file vcf_input_iterator.hpp.

◆ record() [1/2]

VcfRecord& record ( )
inline

Definition at line 250 of file vcf_input_iterator.hpp.

◆ record() [2/2]

VcfRecord const& record ( ) const
inline

Definition at line 243 of file vcf_input_iterator.hpp.

Member Typedef Documentation

◆ difference_type

using difference_type = std::ptrdiff_t

Definition at line 103 of file vcf_input_iterator.hpp.

◆ iterator_category

using iterator_category = std::input_iterator_tag

Definition at line 104 of file vcf_input_iterator.hpp.

◆ pointer

using pointer = value_type const*

Definition at line 101 of file vcf_input_iterator.hpp.

◆ reference

using reference = value_type const&

Definition at line 102 of file vcf_input_iterator.hpp.

◆ self_type

Definition at line 99 of file vcf_input_iterator.hpp.

◆ value_type

Definition at line 100 of file vcf_input_iterator.hpp.


The documentation for this class was generated from the following file: