#include <genesis/sequence/counts.hpp>
Store counts of the occurence for certain characters at the sites of Sequences.
This class is a helper for statistical analyses of Sequences, and for calculating consensus sequences and the like. It stores a Matrix of counts, for a set of characters and a given sequence length. That is, it expected aligned (same length) sequences.
For example, we create an instance like this:
auto sequence_counts = SiteCounts( "ACGT-", 6 );
which counts the occurences of the nucleodite characters and the gap character for Sequences of length 6. Then, after adding several Sequences, the matrix might look like this (site indices in columns, characters in rows):
site | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
A | 3 | 0 | 1 | 3 | 0 | 0 |
C | 1 | 2 | 1 | 1 | 4 | 1 |
G | 0 | 1 | 1 | 0 | 1 | 1 |
T | 2 | 1 | 3 | 3 | 1 | 3 |
- | 2 | 4 | 2 | 1 | 2 | 2 |
The class has to be constructed with the desired set of characters and sequences length. Characters are automatically used in both upper and lower case for counting. All not-included characters are simply ignored when adding Sequences.
Use add_sequence() and add_sequences() to accumulate counts. Use count_of() and count_at() to get the counter values for specific positions in the matrix.
Definition at line 85 of file counts.hpp.
Public Member Functions | |
SiteCounts ()=default | |
Default constructor. More... | |
SiteCounts (SiteCounts &&)=default | |
Default move constructor. More... | |
SiteCounts (SiteCounts const &)=default | |
Default copy constructor. More... | |
SiteCounts (std::string const &characters, size_t length) | |
Construct an object that counts the occurences of the given characters for Sequences of lenght length . More... | |
~SiteCounts ()=default | |
Default destructor. More... | |
void | add_sequence (Sequence const &sequence, bool use_abundance=true) |
Process a single Sequence and add its counts to the existing ones. More... | |
void | add_sequence (std::string const &sites, CountsIntType weight=1) |
Process a single sequence in string form and add its counts to the existing ones. More... | |
void | add_sequences (SequenceSet const &sequences, bool use_abundances=true) |
Process a SequenceSet and add its counts to the existing ones for all contained Sequences. More... | |
CountsIntType | added_sequences_count () const |
Return the number of processed Sequences, i.e., how many Sequences were added in total. More... | |
std::string | characters () const |
Return the character set that is used for counting. More... | |
void | clear () |
Clear the object, that is, delete everything. More... | |
void | clear_counts () |
Reset all counts to 0. More... | |
CountsIntType | count_at (size_t character_index, size_t site_index) const |
Return the count for a character and a site, given their indices. More... | |
CountsIntType | count_of (char character, size_t site_index) const |
Return the count of a specific character at a given site. More... | |
size_t | length () const |
Return the number of sites used for counting. More... | |
SiteCounts & | operator= (SiteCounts &&)=default |
Default move assignment. More... | |
SiteCounts & | operator= (SiteCounts const &)=default |
Default copy assignment. More... | |
Public Types | |
using | CountsIntType = uint32_t |
Type of uint used for internally counting the freuqencies of Sequence sites. More... | |
|
default |
Default constructor.
This instanciates an object with no characters to count and no sequence length. It is thus empty and cannot be used for any further analyses. It is provided here becase offering a default constructor might be good in some cases.
SiteCounts | ( | std::string const & | characters, |
size_t | length | ||
) |
Construct an object that counts the occurences of the given characters
for Sequences of lenght length
.
Definition at line 50 of file counts.cpp.
|
default |
Default destructor.
|
default |
Default copy constructor.
|
default |
Default move constructor.
void add_sequence | ( | Sequence const & | sequence, |
bool | use_abundance = true |
||
) |
Process a single Sequence and add its counts to the existing ones.
If use_abundance
is true
(default), the abundance of the Sequence is used as weight for the counting. Otherwise, a weight of 1
is used.
Definition at line 131 of file counts.cpp.
void add_sequence | ( | std::string const & | sites, |
CountsIntType | weight = 1 |
||
) |
Process a single sequence in string form and add its counts to the existing ones.
Using weight
, the sequence can be weighted. That is, the count is increased by the weight.
Definition at line 140 of file counts.cpp.
void add_sequences | ( | SequenceSet const & | sequences, |
bool | use_abundances = true |
||
) |
Process a SequenceSet and add its counts to the existing ones for all contained Sequences.
If use_abundances
is true
(default), the abundances of the Sequences are used as weights for the counting. Otherwise, a weight of 1
is used.
Definition at line 170 of file counts.cpp.
SiteCounts::CountsIntType added_sequences_count | ( | ) | const |
Return the number of processed Sequences, i.e., how many Sequences were added in total.
Definition at line 86 of file counts.cpp.
std::string characters | ( | ) | const |
Return the character set that is used for counting.
This function returns the upper case letters of the internal list of characters that is used for counting, in the order that is also used by the count_at() function.
Definition at line 80 of file counts.cpp.
void clear | ( | ) |
Clear the object, that is, delete everything.
This function sets the object status to the same that the default constructor gives. Thus, it is not usable any more. It is mainly intended to save memory when many objects are used and then no longer needed.
For an alternative function that simply resets the counts to zero but keeps the dimentions of the count matrix, see clear_counts().
Definition at line 177 of file counts.cpp.
void clear_counts | ( | ) |
Reset all counts to 0.
This clears the counts so that the object is as if newly created, while keeping the counted characters and length of the count matrix. It also clears the count for added_sequences_count().
Definition at line 185 of file counts.cpp.
SiteCounts::CountsIntType count_at | ( | size_t | character_index, |
size_t | site_index | ||
) | const |
Return the count for a character and a site, given their indices.
The characters are indexed in the order given by characters(). This function is thus mainly for speedup reasons when iterating the whole Matrix.
Definition at line 109 of file counts.cpp.
SiteCounts::CountsIntType count_of | ( | char | character, |
size_t | site_index | ||
) | const |
Return the count of a specific character at a given site.
If the charater is not part of the set of used characters, the function throws an exception. This function is case-independent. See characters() to retrieve the set of characters.
Definition at line 91 of file counts.cpp.
size_t length | ( | ) | const |
Return the number of sites used for counting.
This has to match the Sequence::length() property of the Sequences to be added for counting.
Definition at line 75 of file counts.cpp.
|
default |
Default move assignment.
|
default |
Default copy assignment.
using CountsIntType = uint32_t |
Type of uint used for internally counting the freuqencies of Sequence sites.
We use this alias here, because in the future, we might need to adjust this value: Either to save memory if many different objects of type SiteCounts are needed, so that they need to be small; or on the contrary, to allow for more Sequences being counted by using a broader type here.
Definition at line 101 of file counts.hpp.