#include <genesis/population/format/simple_pileup_reader.hpp>
One sample in a pileup line/record.
Each sample in a pileup file corresponds to the reads of one sample that cover a certain position on a chromosome, and consists of two or three entries/columns in the file:
As this here is a simple reader, we ignore the information on read starts/ends, as well as potential insertions and deletions (indels), and instead simply tally up the number of actual bases of the reads that cover a position. This Sample struct here collects this information.
Definition at line 103 of file simple_pileup_reader.hpp.
Public Attributes | |
char | ancestral_base = '\0' |
Base of the ancestral allele. More... | |
std::vector< unsigned char > | phred_scores |
Phread-scaled scores of the bases as given in read_bases . More... | |
std::string | read_bases |
All bases (expect for indels) of the reads that cover the given position. More... | |
size_t | read_depth = 0 |
Total count of reads covering this position. More... | |
char ancestral_base = '\0' |
Base of the ancestral allele.
Only read if with_ancestral_base() is set to true
. See there for details.
Definition at line 142 of file simple_pileup_reader.hpp.
std::vector<unsigned char> phred_scores |
Phread-scaled scores of the bases as given in read_bases
.
This is the data from the third column of the sample. It is only parsed and filled in if with_quality_string() is set to true
(default), in which case this data is expected to be present in the file.
Definition at line 135 of file simple_pileup_reader.hpp.
std::string read_bases |
All bases (expect for indels) of the reads that cover the given position.
These are the data of the second column of the sample, but without the read start/end and indel data. Furthermore, the pileup notation for using the reference base (.
and ,
) is replaced by the actual reference base here.
Definition at line 126 of file simple_pileup_reader.hpp.
size_t read_depth = 0 |
Total count of reads covering this position.
This is the number given in the first column of each sample. In a well-formed pileup file, this also corresponds to the number of actual bases that are listed for the sample, that is read_bases.size()
.
In our simple reader here, this value is almost identical to the sum of all other counters here (that is, a_count + c_count + g_count + t_count + n_count + d_count
), with the exception of RNA symbols (<>
) in the string, that we ignore. If those are needed as well, read_bases can be scanned again to count them.
Definition at line 117 of file simple_pileup_reader.hpp.