#include <genesis/population/format/simple_pileup_reader.hpp>

Detailed Description

One sample in a pileup line/record.

Each sample in a pileup file corresponds to the reads of one sample that cover a certain position on a chromosome, and consists of two or three entries/columns in the file:

A read count.
A list of bases (and some other information on read start and end etc) from the reads that cover the given position on the chromosome.
(Optionally) A list of phread-scaled ASCII-encoded quality scores for the list of bases.
(Optionally) The ancestral base at the position (some pileup files have this).

As this here is a simple reader, we ignore the information on read starts/ends, as well as potential insertions and deletions (indels), and instead simply tally up the number of actual bases of the reads that cover a position. This Sample struct here collects this information.

Definition at line 103 of file simple_pileup_reader.hpp.

Public Attributes
char	ancestral_base = '\0'
	Base of the ancestral allele. More...

std::vector< unsigned char >	phred_scores
	Phread-scaled scores of the bases as given in `read_bases`. More...

std::string	read_bases
	All bases (expect for indels) of the reads that cover the given position. More...

size_t	read_depth = 0
	Total count of reads covering this position. More...

Member Data Documentation

◆ ancestral_base

char ancestral_base = '\0'

Base of the ancestral allele.

Only read if with_ancestral_base() is set to true. See there for details.

Definition at line 142 of file simple_pileup_reader.hpp.

◆ phred_scores

std::vector<unsigned char> phred_scores

Phread-scaled scores of the bases as given in read_bases.

This is the data from the third column of the sample. It is only parsed and filled in if with_quality_string() is set to true (default), in which case this data is expected to be present in the file.

Definition at line 135 of file simple_pileup_reader.hpp.

◆ read_bases

std::string read_bases

All bases (expect for indels) of the reads that cover the given position.

These are the data of the second column of the sample, but without the read start/end and indel data. Furthermore, the pileup notation for using the reference base (. and ,) is replaced by the actual reference base here.

Definition at line 126 of file simple_pileup_reader.hpp.

◆ read_depth

size_t read_depth = 0

Total count of reads covering this position.

This is the number given in the first column of each sample. In a well-formed pileup file, this also corresponds to the number of actual bases that are listed for the sample, that is read_bases.size().

In our simple reader here, this value is almost identical to the sum of all other counters here (that is, a_count + c_count + g_count + t_count + n_count + d_count), with the exception of RNA symbols (<>) in the string, that we ignore. If those are needed as well, read_bases can be scanned again to count them.

Definition at line 117 of file simple_pileup_reader.hpp.

The documentation for this struct was generated from the following file: