A library for working with phylogenetic and population genetic data.
v0.32.0
ReferenceGenome Class Reference

#include <genesis/sequence/reference_genome.hpp>

Detailed Description

Lookup of Sequences of a reference genome.

The class stores Sequences in the order they are added, but also stores a hash map for quickly finding a Sequence given its name/label, as well as a lookup of bases at positions in the genome.

See also
SequenceDict

Definition at line 65 of file reference_genome.hpp.

Public Member Functions

 ReferenceGenome ()
 
 ReferenceGenome (ReferenceGenome &&)=default
 
 ReferenceGenome (ReferenceGenome const &)=delete
 
 ~ReferenceGenome ()=default
 
const_reference add (Sequence &&seq, bool also_look_up_first_word=true)
 Add a Sequence to the ReferenceGenome by moving it, and return a const_reference to it. More...
 
const_reference add (Sequence const &seq, bool also_look_up_first_word=true)
 Add a Sequence to the ReferenceGenome by copying it, and return a const_reference to it. More...
 
const_iterator begin () const
 
const_iterator cbegin () const
 
const_iterator cend () const
 
void clear ()
 Remove all Sequences from the ReferenceGenome, leaving it with a size() of 0. More...
 
bool contains (std::string const &label) const
 
bool empty () const
 
const_iterator end () const
 
const_iterator find (std::string const &label) const
 Return an iterator to the Sequence with the given label, or an iterator to end() if no Sequence with that label is present. More...
 
const_reference get (std::string const &label) const
 Same as find(), but returns the sequence directly, or throws if not present. More...
 
char get_base (const_iterator it, size_t position, bool to_upper=true) const
 Get a particular base at the given sequence iterator and position. More...
 
char get_base (std::string const &chromosome, size_t position, bool to_upper=true) const
 Get a particular base at a given chromosome and position. More...
 
ReferenceGenomeoperator= (ReferenceGenome &&)=default
 
ReferenceGenomeoperator= (ReferenceGenome const &)=delete
 
size_t size () const
 

Public Types

using const_iterator = typename std::vector< Sequence >::const_iterator
 
using const_reference = Sequence const &
 
using iterator = typename std::vector< Sequence >::iterator
 
using reference = Sequence &
 

Constructor & Destructor Documentation

◆ ReferenceGenome() [1/3]

ReferenceGenome ( )
inline

Definition at line 83 of file reference_genome.hpp.

◆ ~ReferenceGenome()

~ReferenceGenome ( )
default

◆ ReferenceGenome() [2/3]

ReferenceGenome ( ReferenceGenome const &  )
delete

◆ ReferenceGenome() [3/3]

ReferenceGenome ( ReferenceGenome &&  )
default

Member Function Documentation

◆ add() [1/2]

const_reference add ( Sequence &&  seq,
bool  also_look_up_first_word = true 
)
inline

Add a Sequence to the ReferenceGenome by moving it, and return a const_reference to it.

If also_look_up_first_word is set (true by default), we add an additional look up name for the added sequence: In addition to its full name, it can also be looked up with just the first word, that is, until the first tab or space character, in case there are any, as this is what typical fasta indexing tools also seem to do. The sequence is still stored with its original name though, and just that additional lookup is added for using find() or get().

Definition at line 232 of file reference_genome.hpp.

◆ add() [2/2]

const_reference add ( Sequence const &  seq,
bool  also_look_up_first_word = true 
)
inline

Add a Sequence to the ReferenceGenome by copying it, and return a const_reference to it.

If also_look_up_first_word is set (true by default), we add an additional look up name for the added sequence: In addition to its full name, it can also be looked up with just the first word, that is, until the first tab or space character, in case there are any, as this is what typical fasta indexing tools also seem to do. The sequence is still stored with its original name though, and just that additional lookup is added for using find() or get().

Definition at line 222 of file reference_genome.hpp.

◆ begin()

const_iterator begin ( ) const
inline

Definition at line 297 of file reference_genome.hpp.

◆ cbegin()

const_iterator cbegin ( ) const
inline

Definition at line 307 of file reference_genome.hpp.

◆ cend()

const_iterator cend ( ) const
inline

Definition at line 312 of file reference_genome.hpp.

◆ clear()

void clear ( )
inline

Remove all Sequences from the ReferenceGenome, leaving it with a size() of 0.

Definition at line 284 of file reference_genome.hpp.

◆ contains()

bool contains ( std::string const &  label) const
inline

Definition at line 116 of file reference_genome.hpp.

◆ empty()

bool empty ( ) const
inline

Return whether the ReferenceGenome is empty, i.e. whether its size() is 0.

Definition at line 111 of file reference_genome.hpp.

◆ end()

const_iterator end ( ) const
inline

Definition at line 302 of file reference_genome.hpp.

◆ find()

const_iterator find ( std::string const &  label) const
inline

Return an iterator to the Sequence with the given label, or an iterator to end() if no Sequence with that label is present.

Definition at line 125 of file reference_genome.hpp.

◆ get()

const_reference get ( std::string const &  label) const
inline

Same as find(), but returns the sequence directly, or throws if not present.

Definition at line 152 of file reference_genome.hpp.

◆ get_base() [1/2]

char get_base ( const_iterator  it,
size_t  position,
bool  to_upper = true 
) const
inline

Get a particular base at the given sequence iterator and position.

This is intended as an optimization, when an iterator returned from find() is cached. That way, the lookup does not have to be performed for every position in the genome.

Definition at line 185 of file reference_genome.hpp.

◆ get_base() [2/2]

char get_base ( std::string const &  chromosome,
size_t  position,
bool  to_upper = true 
) const
inline

Get a particular base at a given chromosome and position.

Reference genomes are often used to look up a particular base, so we offer this functionality here directly. The function throws if either the chromosome is not part of the genome, or if the position is outside of the size of the chromosome.

Important: We here use 1-based indexing for the position, which differs from a direct lookup using the sites of the sequence directly, but is more in line with the usage in our population functions.

Definition at line 174 of file reference_genome.hpp.

◆ operator=() [1/2]

ReferenceGenome& operator= ( ReferenceGenome &&  )
default

◆ operator=() [2/2]

ReferenceGenome& operator= ( ReferenceGenome const &  )
delete

◆ size()

size_t size ( ) const
inline

Return the number of Sequences in the ReferenceGenome.

Definition at line 103 of file reference_genome.hpp.

Member Typedef Documentation

◆ const_iterator

using const_iterator = typename std::vector<Sequence>::const_iterator

Definition at line 74 of file reference_genome.hpp.

◆ const_reference

using const_reference = Sequence const&

Definition at line 77 of file reference_genome.hpp.

◆ iterator

using iterator = typename std::vector<Sequence>::iterator

Definition at line 73 of file reference_genome.hpp.

◆ reference

using reference = Sequence&

Definition at line 76 of file reference_genome.hpp.


The documentation for this class was generated from the following file: