A toolkit for working with phylogenetic data.
v0.18.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
SequenceCounts Class Reference

#include <genesis/sequence/counts.hpp>

Detailed Description

Store counts of the occurence for certain characters of Sequences.

This class is a helper for statistical analyses of Sequences, and for calculating consensus sequences and the like. It stores a Matrix of counts, for a set of characters and a given sequence length.

For example, we create an instance like this:

auto sequence_counts = SequenceCounts( "ACGT-", 6 );

which counts the occurences of the nucleodite characters and the gap character for Sequences of length 6. Then, after adding several Sequences, the matrix might look like this (site indices in columns, characters in rows):

site 0 1 2 3 4 5
A 3 0 1 3 0 0
C 1 2 1 1 4 1
G 0 1 1 0 1 1
T 2 1 3 3 1 3
- 2 4 2 1 2 2

The class has to be constructed with the desired set of characters and sequences length. Characters are automatically used in both upper and lower case for counting. All not-included characters are simply ignored when adding Sequences.

Use add_sequence() and add_sequences() to accumulate counts. Use count_of() and count_at() to get the counter values for specific positions in the matrix.

Definition at line 84 of file counts.hpp.

Public Member Functions

 SequenceCounts ()=default
 Default constructor. More...
 
 SequenceCounts (std::string const &characters, size_t length)
 Construct an object that counts the occurences of the given characters for Sequences of lenght length. More...
 
 SequenceCounts (SequenceCounts const &)=default
 Default copy constructor. More...
 
 SequenceCounts (SequenceCounts &&)=default
 Default move constructor. More...
 
 ~SequenceCounts ()=default
 Default destructor. More...
 
void add_sequence (Sequence const &sequence)
 Process a single Sequence and add its counts to the existing ones. More...
 
void add_sequence (std::string const &sites)
 Process a single sequence in string form and add its counts to the existing ones. More...
 
void add_sequences (SequenceSet const &sequences)
 Process a SequenceSet and add its counts to the existing ones for all contained Sequences. More...
 
CountsIntType added_sequences_count () const
 Return the number of processed Sequences, i.e., how many Sequences were added in total. More...
 
std::string characters () const
 Return the character set that is used for counting. More...
 
void clear ()
 Clear the object, that is, delete everything. More...
 
void clear_counts ()
 Reset all counts to 0. More...
 
CountsIntType count_at (size_t character_index, size_t site_index) const
 Return the count for a character and a site, given their indices. More...
 
CountsIntType count_of (char character, size_t site_index) const
 Return the count of a specific character at a given site. More...
 
size_t length () const
 Return the number of sites used for counting. More...
 
SequenceCountsoperator= (SequenceCounts const &)=default
 Default copy assignment. More...
 
SequenceCountsoperator= (SequenceCounts &&)=default
 Default move assignment. More...
 

Public Types

using CountsIntType = uint32_t
 Type of uint used for internally counting the freuqencies of Sequence sites. More...
 

Constructor & Destructor Documentation

SequenceCounts ( )
default

Default constructor.

This instanciates an object with no characters to count and no sequence length. It is thus empty and cannot be used for any further analyses. It is provided here becase offering a default constructor might be good in some cases.

SequenceCounts ( std::string const &  characters,
size_t  length 
)

Construct an object that counts the occurences of the given characters for Sequences of lenght length.

Definition at line 50 of file counts.cpp.

~SequenceCounts ( )
default

Default destructor.

SequenceCounts ( SequenceCounts const &  )
default

Default copy constructor.

SequenceCounts ( SequenceCounts &&  )
default

Default move constructor.

Member Function Documentation

void add_sequence ( Sequence const &  sequence)

Process a single Sequence and add its counts to the existing ones.

Definition at line 131 of file counts.cpp.

void add_sequence ( std::string const &  sites)

Process a single sequence in string form and add its counts to the existing ones.

Definition at line 136 of file counts.cpp.

void add_sequences ( SequenceSet const &  sequences)

Process a SequenceSet and add its counts to the existing ones for all contained Sequences.

Definition at line 166 of file counts.cpp.

SequenceCounts::CountsIntType added_sequences_count ( ) const

Return the number of processed Sequences, i.e., how many Sequences were added in total.

Definition at line 86 of file counts.cpp.

std::string characters ( ) const

Return the character set that is used for counting.

This function returns the upper case letters of the internal list of characters that is used for counting, in the order that is also used by the count_at() function.

Definition at line 80 of file counts.cpp.

void clear ( )

Clear the object, that is, delete everything.

This function sets the object status to the same that the default constructor gives. Thus, it is not usable any more. It is mainly intended to save memory when many objects are used and then no longer needed.

For an alternative function that simply resets the counts to zero but keeps the dimentions of the count matrix, see clear_counts().

Definition at line 173 of file counts.cpp.

void clear_counts ( )

Reset all counts to 0.

This clears the counts so that the object is as if newly created, while keeping the counted characters and length of the count matrix. It also clears the count for added_sequences_count().

Definition at line 181 of file counts.cpp.

SequenceCounts::CountsIntType count_at ( size_t  character_index,
size_t  site_index 
) const

Return the count for a character and a site, given their indices.

The characters are indexed in the order given by characters(). This function is thus mainly for speedup reasons when iterating the whole Matrix.

Definition at line 109 of file counts.cpp.

SequenceCounts::CountsIntType count_of ( char  character,
size_t  site_index 
) const

Return the count of a specific character at a given site.

If the charater is not part of the set of used characters, the function throws an exception. This function is case-independent. See characters() to retrieve the set of characters.

Definition at line 91 of file counts.cpp.

size_t length ( ) const

Return the number of sites used for counting.

This has to match the Sequence::length() property of the Sequences to be added for counting.

Definition at line 75 of file counts.cpp.

SequenceCounts& operator= ( SequenceCounts const &  )
default

Default copy assignment.

SequenceCounts& operator= ( SequenceCounts &&  )
default

Default move assignment.

Member Typedef Documentation

using CountsIntType = uint32_t

Type of uint used for internally counting the freuqencies of Sequence sites.

We use this alias here, because in the future, we might need to adjust this value: Either to save memory if many different objects of type SequenceCounts are needed, so that they need to be small; or on the contrary, to allow for more Sequences being counted by using a broader type here.

Definition at line 100 of file counts.hpp.


The documentation for this class was generated from the following files: