A library for working with phylogenetic data. v0.25.0

#include <genesis/population/formats/simple_pileup_reader.hpp>

## Detailed Description

Reader for line-by-line assessment of (m)pileup files.

This simple reader processes (m)pileup files line by line. That is, it does not take into consideration which read starts at which position, but instead gives a quick and simple tally of the bases of all reads that cover a given position. This makes it fast in cases where only per-position, but no per-read information is needed.

For each processed line, a SimplePileupReader::Record is produced, which captures the basic information of the line, as well as a tally for each sample in the line, collected in SimplePileupReader::Sample. One such sample consists of two or more columns in the file. The number of columns per sample depends on the additional information contained in the file. As we have no way of deciding this automatically, these columns have to be activated beforehand:

More columns might be needed in the future, and potentially their ordering might need to be adapted. But for now, we only have these use cases.

## Public Member Functions

self_typeoperator= (self_type &&)=default

self_typeoperator= (self_type const &)=default

bool parse_line (utils::InputStream &input_stream, Record &record) const

bool parse_line (utils::InputStream &input_stream, Record &record, std::vector< bool > const &sample_filter) const
Read an (m)pileup line, but only the samples at which the sample_filter is true. More...

sequence::QualityEncoding quality_encoding () const

self_typequality_encoding (sequence::QualityEncoding value)
Set the type of encoding for the quality code string. More...

std::vector< Recordread (std::shared_ptr< utils::BaseInputSource > source) const
Read an (m)pileup file line by line. More...

std::vector< Recordread (std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const &sample_filter) const
Read an (m)pileup file line by line, but only the samples at which the sample_filter is true. More...

std::vector< Recordread (std::shared_ptr< utils::BaseInputSource > source, std::vector< size_t > const &sample_indices) const
Read an (m)pileup file line by line, but only the samples at the given indices. More...

bool with_ancestral_base () const

self_typewith_ancestral_base (bool value)
Set whether to expect the base of the ancestral allele as the last part of each sample in a record line. More...

bool with_quality_string () const

self_typewith_quality_string (bool value)
Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample. More...

## Static Public Member Functions

static std::vector< bool > make_sample_filter (std::vector< size_t > const &indices)
Helper function to create a sample filter from a list of sample indices. More...

## Classes

struct  Record
Single line/record from a pileup file. More...

struct  Sample
One sample in a pileup line/record. More...

## Constructor & Destructor Documentation

 SimplePileupReader ( self_type const & )
## ◆ make_sample_filter()

 std::vector< bool > make_sample_filter ( std::vector< size_t > const & indices )
Helper function to create a sample filter from a list of sample indices.

## ◆ operator=() [1/2]

 self_type& operator= ( self_type && )
## ◆ operator=() [2/2]

 self_type& operator= ( self_type const & )
## ◆ parse_line() [1/2]

 bool parse_line ( utils::InputStream & input_stream, SimplePileupReader::Record & record ) const

## ◆ parse_line() [2/2]

 bool parse_line ( utils::InputStream & input_stream, SimplePileupReader::Record & record, std::vector< bool > const & sample_filter ) const

Read an (m)pileup line, but only the samples at which the sample_filter is true.

This filter does not need to contain the same number of values as the record has samples. If it is shorter, all samples after its last index will be ignored. If it is longer, the remaining entries are not used as a filter.

## ◆ quality_encoding() [1/2]

 sequence::QualityEncoding quality_encoding ( ) const
## ◆ quality_encoding() [2/2]

 self_type& quality_encoding ( sequence::QualityEncoding value )
Set the type of encoding for the quality code string.

If with_quality_string() is set to true (default), this encoding is used to transform the ASCII-encoded string into actual phred-scaled scores. See sequence::quality_decode_to_phred_score() for details.

Read an (m)pileup file line by line.

 std::vector< SimplePileupReader::Record > read ( std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const & sample_filter ) const

Read an (m)pileup file line by line, but only the samples at which the sample_filter is true.

This filter does not need to contain the same number of values as the record has samples. If it is shorter, all samples after its last index will be ignored. If it is longer, the remaining entries are not used as a filter.

 std::vector< SimplePileupReader::Record > read ( std::shared_ptr< utils::BaseInputSource > source, std::vector< size_t > const & sample_indices ) const

Read an (m)pileup file line by line, but only the samples at the given indices.

## ◆ with_ancestral_base() [1/2]

 bool with_ancestral_base ( ) const
## ◆ with_ancestral_base() [2/2]

 self_type& with_ancestral_base ( bool value )
Set whether to expect the base of the ancestral allele as the last part of each sample in a record line.

This is a pipeup extension used by Pool-HMM (Boitard et al 2013) to denote the ancestral allele of each position directly within the pipleup file. Set to true when this is present in the input.

A typical line from a pileup file looks like

2L  30  A   15  aaaAaaaAaAAaaAa PY\aVO^ZaaV[_S A


which contains the three fixed columns, and then four columns for the sample, with the last one A being the ancestral allele for that sample.

## ◆ with_quality_string() [1/2]

 bool with_quality_string ( ) const
## ◆ with_quality_string() [2/2]

 self_type& with_quality_string ( bool value )
Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample.

A typical line from a pileup file looks like

seq1 272 T 24  ,.\$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&


with the last field being quality codes. However, this last field is optional, and hence we offer this option. If true (default), the field is expected to be there; if false`, it is expected not to be there. That is, at the moment, we have no automatic setting for this.

See quality_encoding() for changing the encoding that is used in this column. Default is Sanger encoding. See genesis::sequence::QualityEncoding for details.

