A library for working with phylogenetic data. v0.25.0

#include <genesis/population/formats/simple_pileup_reader.hpp>

## Detailed Description

Reader for line-by-line assessment of (m)pileup files.

This simple reader processes (m)pileup files line by line. That is, it does not take into consideration which read starts at which position, but instead gives a quick and simple tally of the bases of all reads that cover a given position. This makes it fast in cases where only per-position, but no per-read information is needed.

For each processed line, a SimplePileupReader::Record is produced, which captures the basic information of the line, as well as a tally for each sample in the line, collected in SimplePileupReader::Sample. One such sample consists of two or more columns in the file. The number of columns per sample depends on the additional information contained in the file. As we have no way of deciding this automatically, these columns have to be activated beforehand:

More columns might be needed in the future, and potentially their ordering might need to be adapted. But for now, we only have these use cases.

Definition at line 68 of file simple_pileup_reader.hpp.

## Public Member Functions

self_typeoperator= (self_type &&)=default

self_typeoperator= (self_type const &)=default

bool parse_line (utils::InputStream &input_stream, Record &record) const

bool parse_line (utils::InputStream &input_stream, Record &record, std::vector< bool > const &sample_filter) const
Read an (m)pileup line, but only the samples at which the sample_filter is true. More...

sequence::QualityEncoding quality_encoding () const

self_typequality_encoding (sequence::QualityEncoding value)
Set the type of encoding for the quality code string. More...

std::vector< Recordread (std::shared_ptr< utils::BaseInputSource > source) const
Read an (m)pileup file line by line. More...

std::vector< Recordread (std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const &sample_filter) const
Read an (m)pileup file line by line, but only the samples at which the sample_filter is true. More...

std::vector< Recordread (std::shared_ptr< utils::BaseInputSource > source, std::vector< size_t > const &sample_indices) const
Read an (m)pileup file line by line, but only the samples at the given indices. More...

bool with_ancestral_base () const

self_typewith_ancestral_base (bool value)
Set whether to expect the base of the ancestral allele as the last part of each sample in a record line. More...

bool with_quality_string () const

self_typewith_quality_string (bool value)
Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample. More...

## Static Public Member Functions

static std::vector< bool > make_sample_filter (std::vector< size_t > const &indices)
Helper function to create a sample filter from a list of sample indices. More...

## Classes

struct  Record
Single line/record from a pileup file. More...

struct  Sample
One sample in a pileup line/record. More...

## Constructor & Destructor Documentation

default

default

 SimplePileupReader ( self_type const & )
default

default

## ◆ make_sample_filter()

 std::vector< bool > make_sample_filter ( std::vector< size_t > const & indices )
static

Helper function to create a sample filter from a list of sample indices.

Definition at line 114 of file simple_pileup_reader.cpp.

## ◆ operator=() [1/2]

 self_type& operator= ( self_type && )
default

## ◆ operator=() [2/2]

 self_type& operator= ( self_type const & )
default

## ◆ parse_line() [1/2]

 bool parse_line ( utils::InputStream & input_stream, SimplePileupReader::Record & record ) const

Definition at line 95 of file simple_pileup_reader.cpp.

## ◆ parse_line() [2/2]

 bool parse_line ( utils::InputStream & input_stream, SimplePileupReader::Record & record, std::vector< bool > const & sample_filter ) const

Read an (m)pileup line, but only the samples at which the sample_filter is true.

This filter does not need to contain the same number of values as the record has samples. If it is shorter, all samples after its last index will be ignored. If it is longer, the remaining entries are not used as a filter.

Definition at line 102 of file simple_pileup_reader.cpp.

## ◆ quality_encoding() [1/2]

 sequence::QualityEncoding quality_encoding ( ) const
inline

Definition at line 249 of file simple_pileup_reader.hpp.

## ◆ quality_encoding() [2/2]

 self_type& quality_encoding ( sequence::QualityEncoding value )
inline

Set the type of encoding for the quality code string.

If with_quality_string() is set to true (default), this encoding is used to transform the ASCII-encoded string into actual phred-scaled scores. See sequence::quality_decode_to_phred_score() for details.

Definition at line 261 of file simple_pileup_reader.hpp.

Read an (m)pileup file line by line.

Definition at line 51 of file simple_pileup_reader.cpp.

 std::vector< SimplePileupReader::Record > read ( std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const & sample_filter ) const

Read an (m)pileup file line by line, but only the samples at which the sample_filter is true.

This filter does not need to contain the same number of values as the record has samples. If it is shorter, all samples after its last index will be ignored. If it is longer, the remaining entries are not used as a filter.

Definition at line 81 of file simple_pileup_reader.cpp.

 std::vector< SimplePileupReader::Record > read ( std::shared_ptr< utils::BaseInputSource > source, std::vector< size_t > const & sample_indices ) const

Read an (m)pileup file line by line, but only the samples at the given indices.

Definition at line 64 of file simple_pileup_reader.cpp.

## ◆ with_ancestral_base() [1/2]

 bool with_ancestral_base ( ) const
inline

Definition at line 267 of file simple_pileup_reader.hpp.

## ◆ with_ancestral_base() [2/2]

 self_type& with_ancestral_base ( bool value )
inline

Set whether to expect the base of the ancestral allele as the last part of each sample in a record line.

This is a pipeup extension used by Pool-HMM (Boitard et al 2013) to denote the ancestral allele of each position directly within the pipleup file. Set to true when this is present in the input.

A typical line from a pileup file looks like

2L  30  A   15  aaaAaaaAaAAaaAa PY\aVO^ZaaV[_S A


which contains the three fixed columns, and then four columns for the sample, with the last one A being the ancestral allele for that sample.

Definition at line 287 of file simple_pileup_reader.hpp.

## ◆ with_quality_string() [1/2]

 bool with_quality_string ( ) const
inline

Definition at line 223 of file simple_pileup_reader.hpp.

## ◆ with_quality_string() [2/2]

 self_type& with_quality_string ( bool value )
inline

Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample.

A typical line from a pileup file looks like

seq1 272 T 24  ,.\$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&


with the last field being quality codes. However, this last field is optional, and hence we offer this option. If true (default), the field is expected to be there; if false`, it is expected not to be there. That is, at the moment, we have no automatic setting for this.

See quality_encoding() for changing the encoding that is used in this column. Default is Sanger encoding. See genesis::sequence::QualityEncoding for details.

Definition at line 243 of file simple_pileup_reader.hpp.