#include <genesis/population/format/simple_pileup_reader.hpp>
One sample in a pileup line/record.
Each sample in a pileup file corresponds to the reads of one sample that cover a certain position on a chromosome, and consists of two or three entries/columns in the file:
As this here is a simple reader, we ignore the information on read starts/ends, as well as potential insertions and deletions (indels), and instead simply tally up the number of actual bases of the reads that cover a position. This Sample struct here collects this information.
Definition at line 103 of file simple_pileup_reader.hpp.
Public Attributes | |
| char | ancestral_base = '\0' |
| Base of the ancestral allele. More... | |
| std::vector< unsigned char > | phred_scores |
Phread-scaled scores of the bases as given in read_bases. More... | |
| std::string | read_bases |
| All bases (expect for indels) of the reads that cover the given position. More... | |
| size_t | read_depth = 0 |
| Total count of reads covering this position. More... | |
| char ancestral_base = '\0' |
Base of the ancestral allele.
Only read if with_ancestral_base() is set to true. See there for details.
Definition at line 142 of file simple_pileup_reader.hpp.
| std::vector<unsigned char> phred_scores |
Phread-scaled scores of the bases as given in read_bases.
This is the data from the third column of the sample. It is only parsed and filled in if with_quality_string() is set to true (default), in which case this data is expected to be present in the file.
Definition at line 135 of file simple_pileup_reader.hpp.
| std::string read_bases |
All bases (expect for indels) of the reads that cover the given position.
These are the data of the second column of the sample, but without the read start/end and indel data. Furthermore, the pileup notation for using the reference base (. and ,) is replaced by the actual reference base here.
Definition at line 126 of file simple_pileup_reader.hpp.
| size_t read_depth = 0 |
Total count of reads covering this position.
This is the number given in the first column of each sample. In a well-formed pileup file, this also corresponds to the number of actual bases that are listed for the sample, that is read_bases.size().
In our simple reader here, this value is almost identical to the sum of all other counters here (that is, a_count + c_count + g_count + t_count + n_count + d_count), with the exception of RNA symbols (<>) in the string, that we ignore. If those are needed as well, read_bases can be scanned again to count them.
Definition at line 117 of file simple_pileup_reader.hpp.