A library for working with phylogenetic and population genetic data.
v0.32.0
SimplePileupReader Class Reference

#include <genesis/population/format/simple_pileup_reader.hpp>

Detailed Description

Reader for line-by-line assessment of (m)pileup files.

This simple reader processes (m)pileup files line by line. That is, it does not take into consideration which mapped read starts at which position, but instead gives a quick and simple tally of the bases of all reads that cover a given position. This makes it fast in cases where only per-position, but no per-read information is needed.

For each processed line, a SimplePileupReader::Record is produced when using the record versions of the read and parse function, which captures the basic information of the line, as well as a tally for each sample in the line, collected in SimplePileupReader::Sample. One such sample consists of two or more columns in the file.

The number of columns per sample depends on the additional information contained in the file. As we have no way of deciding this automatically, these columns have to be activated beforehand:

More columns might be needed in the future, and potentially their ordering might need to be adapted. But for now, we only have these use cases.

Alternatvely, using the variant versions of the read and parse functions, instead of producing a SimplePileupReader::Record, a Variant per line in the mpileup file can be produced. This tends to be slightly faster, and elimiates the need to do downstream conversion. That is, instead of yielding per-line tallied bases and phred quality scores, these functions directly yields their summed up counts of bases per line.

Definition at line 78 of file simple_pileup_reader.hpp.

Public Member Functions

 SimplePileupReader ()=default
 
 SimplePileupReader (self_type &&)=default
 
 SimplePileupReader (self_type const &)=default
 
 ~SimplePileupReader ()=default
 
size_t min_base_quality () const
 Get the currently set minimum phred quality score that a base needs to have to be added to the Variant SampleCounts for a sample. More...
 
self_typemin_base_quality (size_t value)
 Set the minimum phred quality score that a base needs to have to be added to the Variant SampleCounts for a sample. More...
 
self_typeoperator= (self_type &&)=default
 
self_typeoperator= (self_type const &)=default
 
bool parse_line_record (utils::InputStream &input_stream, Record &record) const
 Read an (m)pileup line, as a Record. More...
 
bool parse_line_record (utils::InputStream &input_stream, Record &record, std::vector< bool > const &sample_filter) const
 Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Record. More...
 
bool parse_line_variant (utils::InputStream &input_stream, Variant &variant) const
 Read an (m)pileup line, as a Variant. More...
 
bool parse_line_variant (utils::InputStream &input_stream, Variant &variant, std::vector< bool > const &sample_filter) const
 Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Variant. More...
 
std::array< size_t, 128 > const & quality_code_counts () const
 Return the counts for all quality base codes found so far when parsing an input. More...
 
sequence::QualityEncoding quality_encoding () const
 
self_typequality_encoding (sequence::QualityEncoding value)
 Set the type of encoding for the quality code string. More...
 
std::vector< Recordread_records (std::shared_ptr< utils::BaseInputSource > source) const
 Read an (m)pileup file line by line, as pileup Records. More...
 
std::vector< Recordread_records (std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const &sample_filter) const
 Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as pileup Records. More...
 
std::vector< Variantread_variants (std::shared_ptr< utils::BaseInputSource > source) const
 Read an (m)pileup file line by line, as Variants. More...
 
std::vector< Variantread_variants (std::shared_ptr< utils::BaseInputSource > source, std::vector< bool > const &sample_filter) const
 Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as Variants. More...
 
bool strict_bases () const
 
self_typestrict_bases (bool value)
 Set whether to strictly require bases to be in ACGTN. More...
 
bool with_ancestral_base () const
 
self_typewith_ancestral_base (bool value)
 Set whether to expect the base of the ancestral allele as the last part of each sample in a record line. More...
 
bool with_quality_string () const
 
self_typewith_quality_string (bool value)
 Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample. More...
 

Public Types

using self_type = SimplePileupReader
 

Classes

struct  Record
 Single line/record from a pileup file. More...
 
struct  Sample
 One sample in a pileup line/record. More...
 

Constructor & Destructor Documentation

◆ SimplePileupReader() [1/3]

SimplePileupReader ( )
default

◆ ~SimplePileupReader()

~SimplePileupReader ( )
default

◆ SimplePileupReader() [2/3]

SimplePileupReader ( self_type const &  )
default

◆ SimplePileupReader() [3/3]

SimplePileupReader ( self_type &&  )
default

Member Function Documentation

◆ min_base_quality() [1/2]

size_t min_base_quality ( ) const
inline

Get the currently set minimum phred quality score that a base needs to have to be added to the Variant SampleCounts for a sample.

This is only used for the reading and parsing functions that return Variants.

Definition at line 408 of file simple_pileup_reader.hpp.

◆ min_base_quality() [2/2]

self_type& min_base_quality ( size_t  value)
inline

Set the minimum phred quality score that a base needs to have to be added to the Variant SampleCounts for a sample.

Bases below this quality score are ignored when summing up the counts per sample. Default is 0, meaning that all bases are used.

This is only used for the reading and parsing functions that return Variants. When reading a Sample instead, all bases and their quality scores are in the output.

Definition at line 423 of file simple_pileup_reader.hpp.

◆ operator=() [1/2]

self_type& operator= ( self_type &&  )
default

◆ operator=() [2/2]

self_type& operator= ( self_type const &  )
default

◆ parse_line_record() [1/2]

bool parse_line_record ( utils::InputStream input_stream,
SimplePileupReader::Record record 
) const

Read an (m)pileup line, as a Record.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputStream for ways to read in (m)pileup data that have this check.

Definition at line 162 of file simple_pileup_reader.cpp.

◆ parse_line_record() [2/2]

bool parse_line_record ( utils::InputStream input_stream,
SimplePileupReader::Record record,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Record.

We expect this filter to contain the same number of values as the record has samples.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputStream for ways to read in (m)pileup data that have this check.

Definition at line 169 of file simple_pileup_reader.cpp.

◆ parse_line_variant() [1/2]

bool parse_line_variant ( utils::InputStream input_stream,
Variant variant 
) const

Read an (m)pileup line, as a Variant.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputStream for ways to read in (m)pileup data that have this check.

Definition at line 181 of file simple_pileup_reader.cpp.

◆ parse_line_variant() [2/2]

bool parse_line_variant ( utils::InputStream input_stream,
Variant variant,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup line, but only the samples at which the sample_filter is true, as a Variant.

We expect this filter to contain the same number of values as the record has samples.

Note that this only handles a single line, and hence cannot check that the correct order of chromosomes and positions in the input is kept. A well-formed (m)pileup file will have the correct order, so that should not be an issue. Use the read_... functions, or the SimplePileupInputStream for ways to read in (m)pileup data that have this check.

Definition at line 189 of file simple_pileup_reader.cpp.

◆ quality_code_counts()

std::array<size_t, 128> const& quality_code_counts ( ) const
inline

Return the counts for all quality base codes found so far when parsing an input.

Only available when read_records() or parse_line_record() are used; not availablef or the Variant-based reading functions.

While parsing with_quality_string(), we keep track of the counts of each quality code found, so that we can check that the right encoding was used (for user friendliness). Counts here are simply indexed by their ASCII values.

Definition at line 367 of file simple_pileup_reader.hpp.

◆ quality_encoding() [1/2]

sequence::QualityEncoding quality_encoding ( ) const
inline

Definition at line 339 of file simple_pileup_reader.hpp.

◆ quality_encoding() [2/2]

self_type& quality_encoding ( sequence::QualityEncoding  value)
inline

Set the type of encoding for the quality code string.

If with_quality_string() is set to true (default), this encoding is used to transform the ASCII-encoded string into actual phred-scaled scores. See sequence::quality_decode_to_phred_score() for details.

Definition at line 351 of file simple_pileup_reader.hpp.

◆ read_records() [1/2]

std::vector< SimplePileupReader::Record > read_records ( std::shared_ptr< utils::BaseInputSource source) const

Read an (m)pileup file line by line, as pileup Records.

Definition at line 52 of file simple_pileup_reader.cpp.

◆ read_records() [2/2]

std::vector< SimplePileupReader::Record > read_records ( std::shared_ptr< utils::BaseInputSource source,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as pileup Records.

We expect this filter to contain the same number of values as the record has samples.

Definition at line 87 of file simple_pileup_reader.cpp.

◆ read_variants() [1/2]

std::vector< Variant > read_variants ( std::shared_ptr< utils::BaseInputSource source) const

Read an (m)pileup file line by line, as Variants.

Definition at line 110 of file simple_pileup_reader.cpp.

◆ read_variants() [2/2]

std::vector< Variant > read_variants ( std::shared_ptr< utils::BaseInputSource source,
std::vector< bool > const &  sample_filter 
) const

Read an (m)pileup file line by line, but only the samples at which the sample_filter is true, as Variants.

We expect this filter to contain the same number of values as the record has samples.

Definition at line 142 of file simple_pileup_reader.cpp.

◆ strict_bases() [1/2]

bool strict_bases ( ) const
inline

Definition at line 296 of file simple_pileup_reader.hpp.

◆ strict_bases() [2/2]

self_type& strict_bases ( bool  value)
inline

Set whether to strictly require bases to be in ACGTN.

If set to true, we expect bases to be ACGTN, and throw otherwise. If set to false, we will change any other base to be N.

Definition at line 307 of file simple_pileup_reader.hpp.

◆ with_ancestral_base() [1/2]

bool with_ancestral_base ( ) const
inline

Definition at line 372 of file simple_pileup_reader.hpp.

◆ with_ancestral_base() [2/2]

self_type& with_ancestral_base ( bool  value)
inline

Set whether to expect the base of the ancestral allele as the last part of each sample in a record line.

This is a pileup extension used by Pool-HMM (Boitard et al 2013) to denote the ancestral allele of each position directly within the pipleup file. Set to true when this is present in the input.

A typical line from a pileup file with ancestral bases looks like

2L  30  A   15  aaaAaaaAaAAaaAa PY\aVO^`ZaaV[_S A

which contains the three fixed columns, and then four columns for the sample, with the last one A being the ancestral allele for that sample.

Definition at line 392 of file simple_pileup_reader.hpp.

◆ with_quality_string() [1/2]

bool with_quality_string ( ) const
inline

Definition at line 313 of file simple_pileup_reader.hpp.

◆ with_quality_string() [2/2]

self_type& with_quality_string ( bool  value)
inline

Set whether to expect a phred-scaled, ASCII-encoded quality code string per sample.

A typical line from a pileup file looks like

seq1 272 T 24  ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&

with the last field being quality codes. However, this last field is optional, and hence we offer this option. If true (default), the field is expected to be there; if false, it is expected not to be there. That is, at the moment, we have no automatic setting for this.

See quality_encoding() for changing the encoding that is used in this column. Default is Sanger encoding. See genesis::sequence::QualityEncoding for details.

Definition at line 333 of file simple_pileup_reader.hpp.

Member Typedef Documentation

◆ self_type

Definition at line 160 of file simple_pileup_reader.hpp.


The documentation for this class was generated from the following files: