A toolkit for working with phylogenetic data.
v0.20.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
SiteCounts Class Reference

#include <genesis/sequence/counts.hpp>

Detailed Description

Store counts of the occurence for certain characters at the sites of Sequences.

This class is a helper for statistical analyses of Sequences, and for calculating consensus sequences and the like. It stores a Matrix of counts, for a set of characters and a given sequence length. That is, it expected aligned (same length) sequences.

For example, we create an instance like this:

auto sequence_counts = SiteCounts( "ACGT-", 6 );

which counts the occurences of the nucleodite characters and the gap character for Sequences of length 6. Then, after adding several Sequences, the matrix might look like this (site indices in columns, characters in rows):

site 0 1 2 3 4 5
A 3 0 1 3 0 0
C 1 2 1 1 4 1
G 0 1 1 0 1 1
T 2 1 3 3 1 3
- 2 4 2 1 2 2

The class has to be constructed with the desired set of characters and sequences length. Characters are automatically used in both upper and lower case for counting. All not-included characters are simply ignored when adding Sequences.

Use add_sequence() and add_sequences() to accumulate counts. Use count_of() and count_at() to get the counter values for specific positions in the matrix.

Definition at line 85 of file counts.hpp.

Public Member Functions

 SiteCounts ()=default
 Default constructor. More...
 
 SiteCounts (std::string const &characters, size_t length)
 Construct an object that counts the occurences of the given characters for Sequences of lenght length. More...
 
 SiteCounts (SiteCounts const &)=default
 Default copy constructor. More...
 
 SiteCounts (SiteCounts &&)=default
 Default move constructor. More...
 
 ~SiteCounts ()=default
 Default destructor. More...
 
void add_sequence (Sequence const &sequence, bool use_abundance=true)
 Process a single Sequence and add its counts to the existing ones. More...
 
void add_sequence (std::string const &sites, CountsIntType weight=1)
 Process a single sequence in string form and add its counts to the existing ones. More...
 
void add_sequences (SequenceSet const &sequences)
 Process a SequenceSet and add its counts to the existing ones for all contained Sequences. More...
 
CountsIntType added_sequences_count () const
 Return the number of processed Sequences, i.e., how many Sequences were added in total. More...
 
std::string characters () const
 Return the character set that is used for counting. More...
 
void clear ()
 Clear the object, that is, delete everything. More...
 
void clear_counts ()
 Reset all counts to 0. More...
 
CountsIntType count_at (size_t character_index, size_t site_index) const
 Return the count for a character and a site, given their indices. More...
 
CountsIntType count_of (char character, size_t site_index) const
 Return the count of a specific character at a given site. More...
 
size_t length () const
 Return the number of sites used for counting. More...
 
SiteCountsoperator= (SiteCounts const &)=default
 Default copy assignment. More...
 
SiteCountsoperator= (SiteCounts &&)=default
 Default move assignment. More...
 

Public Types

using CountsIntType = uint32_t
 Type of uint used for internally counting the freuqencies of Sequence sites. More...
 

Constructor & Destructor Documentation

SiteCounts ( )
default

Default constructor.

This instanciates an object with no characters to count and no sequence length. It is thus empty and cannot be used for any further analyses. It is provided here becase offering a default constructor might be good in some cases.

SiteCounts ( std::string const &  characters,
size_t  length 
)

Construct an object that counts the occurences of the given characters for Sequences of lenght length.

Definition at line 50 of file counts.cpp.

~SiteCounts ( )
default

Default destructor.

SiteCounts ( SiteCounts const &  )
default

Default copy constructor.

SiteCounts ( SiteCounts &&  )
default

Default move constructor.

Member Function Documentation

void add_sequence ( Sequence const &  sequence,
bool  use_abundance = true 
)

Process a single Sequence and add its counts to the existing ones.

If use_abundance is true (default), the abundance of the Sequence is used as weight for the counting. Otherwise, a weight of 1 is used.

Definition at line 131 of file counts.cpp.

void add_sequence ( std::string const &  sites,
CountsIntType  weight = 1 
)

Process a single sequence in string form and add its counts to the existing ones.

Using weight, the sequence can be weighted. That is, the count is increased by the weight.

Definition at line 140 of file counts.cpp.

void add_sequences ( SequenceSet const &  sequences)

Process a SequenceSet and add its counts to the existing ones for all contained Sequences.

Definition at line 170 of file counts.cpp.

SiteCounts::CountsIntType added_sequences_count ( ) const

Return the number of processed Sequences, i.e., how many Sequences were added in total.

Definition at line 86 of file counts.cpp.

std::string characters ( ) const

Return the character set that is used for counting.

This function returns the upper case letters of the internal list of characters that is used for counting, in the order that is also used by the count_at() function.

Definition at line 80 of file counts.cpp.

void clear ( )

Clear the object, that is, delete everything.

This function sets the object status to the same that the default constructor gives. Thus, it is not usable any more. It is mainly intended to save memory when many objects are used and then no longer needed.

For an alternative function that simply resets the counts to zero but keeps the dimentions of the count matrix, see clear_counts().

Definition at line 177 of file counts.cpp.

void clear_counts ( )

Reset all counts to 0.

This clears the counts so that the object is as if newly created, while keeping the counted characters and length of the count matrix. It also clears the count for added_sequences_count().

Definition at line 185 of file counts.cpp.

SiteCounts::CountsIntType count_at ( size_t  character_index,
size_t  site_index 
) const

Return the count for a character and a site, given their indices.

The characters are indexed in the order given by characters(). This function is thus mainly for speedup reasons when iterating the whole Matrix.

Definition at line 109 of file counts.cpp.

SiteCounts::CountsIntType count_of ( char  character,
size_t  site_index 
) const

Return the count of a specific character at a given site.

If the charater is not part of the set of used characters, the function throws an exception. This function is case-independent. See characters() to retrieve the set of characters.

Definition at line 91 of file counts.cpp.

size_t length ( ) const

Return the number of sites used for counting.

This has to match the Sequence::length() property of the Sequences to be added for counting.

Definition at line 75 of file counts.cpp.

SiteCounts& operator= ( SiteCounts const &  )
default

Default copy assignment.

SiteCounts& operator= ( SiteCounts &&  )
default

Default move assignment.

Member Typedef Documentation

using CountsIntType = uint32_t

Type of uint used for internally counting the freuqencies of Sequence sites.

We use this alias here, because in the future, we might need to adjust this value: Either to save memory if many different objects of type SiteCounts are needed, so that they need to be small; or on the contrary, to allow for more Sequences being counted by using a broader type here.

Definition at line 101 of file counts.hpp.


The documentation for this class was generated from the following files: