A library for working with phylogenetic and population genetic data.
v0.27.0
SimplePileupReader Class Reference

#include <genesis/population/formats/simple_pileup_reader.hpp>

Detailed Description

Reader for line-by-line assessment of (m)pileup files.

This simple reader processes (m)pileup files line by line. That is, it does not take into consideration which mapped read starts at which position, but instead gives a quick and simple tally of the bases of all reads that cover a given position. This makes it fast in cases where only per-position, but no per-read information is needed.

For each processed line, a SimplePileupReader::Record is produced when using the record versions of the read and parse function, which captures the basic information of the line, as well as a tally for each sample in the line, collected in SimplePileupReader::Sample. One such sample consists of two or more columns in the file.

The number of columns per sample depends on the additional information contained in the file. As we have no way of deciding this automatically, these columns have to be activated beforehand:

More columns might be needed in the future, and potentially their ordering might need to be adapted. But for now, we only have these use cases.

Alternatvely, using the variant versions of the read and parse functions, instead of producing a SimplePileupReader::Record, a Variant per line in the mpileup file can be produced. This tends to be slightly faster, and elimiates the need to do downstream conversion. That is, instead of yielding per-line tallied bases and phred quality scores, these functions directly yields their summed up counts of bases per line.

Definition at line 77 of file simple_pileup_reader.hpp.

Public Member Functions

 SimplePileupReader ()=default
 
 SimplePileupReader (self_type &&)=default
 
 SimplePileupReader (self_type const &)=default
 
 ~SimplePileupReader ()=default
 
size_t min_base_quality () const
 Get the currently set minimum phred quality score that a base needs to have to be added to the Variant BaseCounts for a sample. More...
 
self_typemin_base_quality (size_t value)
 Set the minimum phred quality score that a base needs to have to be added to the Variant BaseCounts for a sample. More...
 
self_typeoperator= (self_type &&)=default
 
self_typeoperator= (self_type const &)=default
 
bool parse_line_record (utils::InputStream &input_stream, Record &record) const
 Read an (m)pileup line, as a Record. More...
 
bool parse_line_record (utils::InputStream &input_stream, Record &record, std::vector< bool > const &sample_filter) const
 Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Record. More...
 
bool parse_line_variant (utils::InputStream &input_stream, Variant &variant) const
 Read an (m)pileup line, as a Variant. More...
 
bool parse_line_variant (utils::InputStream &input_stream, Variant &variant, std::vector< bool > const &sample_filter) const
 Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Variant. More...
 
sequence::QualityEncoding quality_encoding () const
 
self_typequality_encoding (sequence::QualityEncoding value)
 Set the type of encoding for the quality code string. More...
 
std::vector< Recordread_records (std::shared_ptr< utils::BaseInputSource > source) const
 Read an (m)pileup file line by line, as pileup Records. More...
 
std::vector< Recordread_records (std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const &sample_filter) const
 Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as pileup Records. More...
 
std::vector< Variantread_variants (std::shared_ptr< utils::BaseInputSource > source) const
 Read an (m)pileup file line by line, as Variants. More...
 
std::vector< Variantread_variants (std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const &sample_filter) const
 Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as Variants. More...
 
bool strict_bases () const
 
self_typestrict_bases (bool value)
 Set whether to strictly require bases to be in ACGTN. More...
 
bool with_ancestral_base () const
 
self_typewith_ancestral_base (bool value)
 Set whether to expect the base of the ancestral allele as the last part of each sample in a record line. More...
 
bool with_quality_string () const
 
self_typewith_quality_string (bool value)
 Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample. More...
 

Public Types

using self_type = SimplePileupReader
 

Classes

struct  Record
 Single line/record from a pileup file. More...
 
struct  Sample
 One sample in a pileup line/record. More...
 

Constructor & Destructor Documentation

◆ SimplePileupReader() [1/3]

SimplePileupReader ( )
default

◆ ~SimplePileupReader()

~SimplePileupReader ( )
default

◆ SimplePileupReader() [2/3]

SimplePileupReader ( self_type const &  )
default

◆ SimplePileupReader() [3/3]

SimplePileupReader ( self_type &&  )
default

Member Function Documentation

◆ min_base_quality() [1/2]

size_t min_base_quality ( ) const
inline

Get the currently set minimum phred quality score that a base needs to have to be added to the Variant BaseCounts for a sample.

This is only used for the reading and parsing functions that return Variants.

Definition at line 392 of file simple_pileup_reader.hpp.

◆ min_base_quality() [2/2]

self_type& min_base_quality ( size_t  value)
inline

Set the minimum phred quality score that a base needs to have to be added to the Variant BaseCounts for a sample.

Bases below this quality score are ignored when summing up the counts per sample. Default is 0, meaning that all bases are used.

This is only used for the reading and parsing functions that return Variants. When reading a Sample instead, all bases and their quality scores are in the output.

Definition at line 407 of file simple_pileup_reader.hpp.

◆ operator=() [1/2]

self_type& operator= ( self_type &&  )
default

◆ operator=() [2/2]

self_type& operator= ( self_type const &  )
default

◆ parse_line_record() [1/2]

bool parse_line_record ( utils::InputStream input_stream,
SimplePileupReader::Record record 
) const

Read an (m)pileup line, as a Record.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputIterator for ways to read in (m)pileup data that have this check.

Definition at line 189 of file simple_pileup_reader.cpp.

◆ parse_line_record() [2/2]

bool parse_line_record ( utils::InputStream input_stream,
SimplePileupReader::Record record,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Record.

We expect this filter to contain the same number of values as the record has samples.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputIterator for ways to read in (m)pileup data that have this check.

Definition at line 196 of file simple_pileup_reader.cpp.

◆ parse_line_variant() [1/2]

bool parse_line_variant ( utils::InputStream input_stream,
Variant variant 
) const

Read an (m)pileup line, as a Variant.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputIterator for ways to read in (m)pileup data that have this check.

Definition at line 208 of file simple_pileup_reader.cpp.

◆ parse_line_variant() [2/2]

bool parse_line_variant ( utils::InputStream input_stream,
Variant variant,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Variant.

We expect this filter to contain the same number of values as the record has samples.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputIterator for ways to read in (m)pileup data that have this check.

Definition at line 215 of file simple_pileup_reader.cpp.

◆ quality_encoding() [1/2]

sequence::QualityEncoding quality_encoding ( ) const
inline

Definition at line 338 of file simple_pileup_reader.hpp.

◆ quality_encoding() [2/2]

self_type& quality_encoding ( sequence::QualityEncoding  value)
inline

Set the type of encoding for the quality code string.

If with_quality_string() is set to true (default), this encoding is used to transform the ASCII-encoded string into actual phred-scaled scores. See sequence::quality_decode_to_phred_score() for details.

Definition at line 350 of file simple_pileup_reader.hpp.

◆ read_records() [1/2]

std::vector< SimplePileupReader::Record > read_records ( std::shared_ptr< utils::BaseInputSource source) const

Read an (m)pileup file line by line, as pileup Records.

Definition at line 69 of file simple_pileup_reader.cpp.

◆ read_records() [2/2]

std::vector< SimplePileupReader::Record > read_records ( std::shared_ptr< utils::BaseInputSource source,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as pileup Records.

We expect this filter to contain the same number of values as the record has samples.

Definition at line 107 of file simple_pileup_reader.cpp.

◆ read_variants() [1/2]

std::vector< Variant > read_variants ( std::shared_ptr< utils::BaseInputSource source) const

Read an (m)pileup file line by line, as Variants.

Definition at line 129 of file simple_pileup_reader.cpp.

◆ read_variants() [2/2]

std::vector< Variant > read_variants ( std::shared_ptr< utils::BaseInputSource source,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as Variants.

We expect this filter to contain the same number of values as the record has samples.

Definition at line 167 of file simple_pileup_reader.cpp.

◆ strict_bases() [1/2]

bool strict_bases ( ) const
inline

Definition at line 295 of file simple_pileup_reader.hpp.

◆ strict_bases() [2/2]

self_type& strict_bases ( bool  value)
inline

Set whether to strictly require bases to be in ACGTN.

If set to true, we expect bases to be ACGTN, and throw otherwise. If set to false, we will change any other base to be N.

Definition at line 306 of file simple_pileup_reader.hpp.

◆ with_ancestral_base() [1/2]

bool with_ancestral_base ( ) const
inline

Definition at line 356 of file simple_pileup_reader.hpp.

◆ with_ancestral_base() [2/2]

self_type& with_ancestral_base ( bool  value)
inline

Set whether to expect the base of the ancestral allele as the last part of each sample in a record line.

This is a pileup extension used by Pool-HMM (Boitard et al 2013) to denote the ancestral allele of each position directly within the pipleup file. Set to true when this is present in the input.

A typical line from a pileup file with ancestral bases looks like

2L  30  A   15  aaaAaaaAaAAaaAa PY\aVO^`ZaaV[_S A

which contains the three fixed columns, and then four columns for the sample, with the last one A being the ancestral allele for that sample.

Definition at line 376 of file simple_pileup_reader.hpp.

◆ with_quality_string() [1/2]

bool with_quality_string ( ) const
inline

Definition at line 312 of file simple_pileup_reader.hpp.

◆ with_quality_string() [2/2]

self_type& with_quality_string ( bool  value)
inline

Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample.

A typical line from a pileup file looks like

seq1 272 T 24  ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&

with the last field being quality codes. However, this last field is optional, and hence we offer this option. If true (default), the field is expected to be there; if false, it is expected not to be there. That is, at the moment, we have no automatic setting for this.

See quality_encoding() for changing the encoding that is used in this column. Default is Sanger encoding. See genesis::sequence::QualityEncoding for details.

Definition at line 332 of file simple_pileup_reader.hpp.

Member Typedef Documentation

◆ self_type

Definition at line 159 of file simple_pileup_reader.hpp.


The documentation for this class was generated from the following files: