A library for working with phylogenetic and population genetic data.
SimplePileupReader::Sample Struct Reference

#include <genesis/population/formats/simple_pileup_reader.hpp>

Detailed Description

One sample in a pileup line/record.

Each sample in a pileup file corresponds to the reads of one sample that cover a certain position on a chromosome, and consists of two or three entries/columns in the file:

  1. A read count.
  2. A list of bases (and some other information on read start and end etc) from the reads that cover the given position on the chromosome.
  3. (Optionally) A list of phread-scaled ASCII-encoded quality scores for the list of bases.
  4. (Optionally) The ancestral base at the position (some pileup files have this).

As this here is a simple reader, we ignore the information on read starts/ends, as well as potential insertions and deletions (indels), and instead simply tally up the number of actual bases of the reads that cover a position. This Sample struct here collects this information.

Definition at line 102 of file simple_pileup_reader.hpp.

Public Attributes

char ancestral_base = '\0'
 Base of the ancestral allele. More...
std::vector< unsigned char > phred_scores
 Phread-scaled scores of the bases as given in read_bases. More...
std::string read_bases
 All bases (expect for indels) of the reads that cover the given position. More...
size_t read_coverage = 0
 Total count of reads covering this position. More...

Member Data Documentation

◆ ancestral_base

char ancestral_base = '\0'

Base of the ancestral allele.

Only read if with_ancestral_base() is set to true. See there for details.

Definition at line 141 of file simple_pileup_reader.hpp.

◆ phred_scores

std::vector<unsigned char> phred_scores

Phread-scaled scores of the bases as given in read_bases.

This is the data from the third column of the sample. It is only parsed and filled in if with_quality_string() is set to true (default), in which case this data is expected to be present in the file.

Definition at line 134 of file simple_pileup_reader.hpp.

◆ read_bases

std::string read_bases

All bases (expect for indels) of the reads that cover the given position.

These are the data of the second column of the sample, but without the read start/end and indel data. Furthermore, the pileup notation for using the reference base (. and ,) is replaced by the actual reference base here.

Definition at line 125 of file simple_pileup_reader.hpp.

◆ read_coverage

size_t read_coverage = 0

Total count of reads covering this position.

This is the number given in the first column of each sample. In a well-formed pileup file, this also corresponds to the number of actual bases that are listed for the sample, that is read_bases.size().

In our simple reader here, this value is almost identical to the sum of all other counters here (that is, a_count + c_count + g_count + t_count + n_count + d_count), with the exception of RNA symbols (<>) in the string, that we ignore. If those are needed as well, read_bases can be scanned again to count them.

Definition at line 116 of file simple_pileup_reader.hpp.

The documentation for this struct was generated from the following file: