A library for working with phylogenetic and population genetic data.
v0.32.0
GenomeLocusSet Class Reference

#include <genesis/population/genome_locus_set.hpp>

Detailed Description

List of positions/coordinates in a genome, for each chromosome.

The data structure stores a list of genome positions/coordinates, and allows fast querying, that is, whether a certain position on a chromosome is stored here. Internally, we use a Bitvector for each chromosome, marking its positions as set or not set.

Positions are 1-based. We also offer the special case to add a whole chromosome, in which case the is_covered() function will return true for all positions on that chromosome (without checking that the position is in fact within the length of the chromosome - as we do not use information on the lengths of chromosomes in this class). We use position 0 to mark this special whole-chromosome case - be aware of that when adding positions to the list. See also GenomeRegionList, GenomeLocus and GenomeRegion for related classes that have the same special cases.

See also
GenomeLocus
GenomeRegion
GenomeRegionList

Definition at line 75 of file genome_locus_set.hpp.

Public Member Functions

 GenomeLocusSet ()=default
 
 GenomeLocusSet (GenomeLocusSet &&)=default
 
 GenomeLocusSet (GenomeLocusSet const &)=default
 
 ~GenomeLocusSet ()=default
 
void add (GenomeLocus const &locus)
 Add a single GenomeLocus, that is, an interval covering one position on a chromosome. More...
 
void add (GenomeLocus const &start, GenomeLocus const &end)
 Add an interval between two GenomeLoci on the same chromosome. More...
 
void add (GenomeRegion const &region)
 Add a GenomeRegion to the list. More...
 
void add (GenomeRegionList const &list)
 Add a complete GenomeRegionList to this list. More...
 
void add (std::string const &chromosome)
 Add a whole chromosome to the list, so that all its positions are considered to be covered. More...
 
void add (std::string const &chromosome, size_t position)
 Add a single locus (position, coordinate) to the list. More...
 
void add (std::string const &chromosome, size_t start, size_t end)
 Add a region to the list, given its chromosome, and start and end positions. More...
 
void add (std::string const &chromosome, utils::Bitvector &&values)
 Add a chromosome to the list, given the full Bitvector representation of loci. More...
 
void add (std::string const &chromosome, utils::Bitvector const &values)
 Add a chromosome to the list, given the full Bitvector representation of loci. More...
 
bool any_covered (std::string const &chromosome) const
 Return if the given chromosome has any loci covered. More...
 
const_iterator begin () const
 Return an iterator to the beginning of the map of chromosome names to Bitvectors. More...
 
size_t chromosome_count () const
 Return the number of chromosomes for which there are positions stored. More...
 
std::vector< std::string > chromosome_names () const
 Get a list of all stored chromosome names. More...
 
utils::Bitvectorchromosome_positions (std::string const &chromosome)
 For a given chromosome, return the Bitvector that stores its positions. More...
 
utils::Bitvector const & chromosome_positions (std::string const &chromosome) const
 For a given chromosome, return the Bitvector that stores its positions. More...
 
void clear ()
 Remove all stored regions from all chromosomes. More...
 
void clear (std::string const &chromosome)
 Remove the regions of the specified chromosome. More...
 
bool empty () const
 Return whether there are chromosomes with positions stored. More...
 
const_iterator end () const
 Return an iterator to the end of the map of chromosome names to Bitvectors. More...
 
const_iterator find (std::string const &chromosome)
 Find a chromosome in the map. More...
 
bool has_chromosome (std::string const &chromosome) const
 Return whether a chromosome is stored. More...
 
void invert (sequence::SequenceDict const &sequence_dict)
 Invert all chromosome regions. More...
 
bool is_covered (std::string const &chromosome) const
 Return whether a whole chromosome is covered. More...
 
bool is_covered (std::string const &chromosome, size_t position) const
 Return whether a given position on a chromosome is part of any of the regions stored. More...
 
size_t next_covered (std::string const &chromosome, size_t start_position) const
 Return the next position (including the start_position) that is covered. More...
 
GenomeLocusSetoperator= (GenomeLocusSet &&)=default
 
GenomeLocusSetoperator= (GenomeLocusSet const &)=default
 
void set_intersect (GenomeLocusSet const &rhs)
 Compute the intersection with another GenomeLocusSet rhs. More...
 
void set_union (GenomeLocusSet const &rhs)
 Compute the union with another GenomeLocusSet rhs. More...
 

Static Public Member Functions

static bool is_covered (const_iterator const &it, size_t position)
 Return whether a given position on the provided iterator is covered. More...
 
static bool is_covered (utils::Bitvector const &bitvector, size_t position)
 Return whether a given position on the provided bitvector is covered. More...
 
static size_t next_covered (const_iterator const &it, size_t start_position)
 Return the next position (including the start_position) that is covered. More...
 
static size_t next_covered (utils::Bitvector const &bitvector, size_t start_position)
 Return the next position (including the start_position) that is covered. More...
 

Public Types

using const_iterator = typename std::unordered_map< std::string, utils::Bitvector >::const_iterator
 
using const_reference = value_type const &
 
using iterator = typename std::unordered_map< std::string, utils::Bitvector >::iterator
 
using key_type = std::string
 
using mapped_type = utils::Bitvector
 
using reference = value_type &
 
using value_type = std::pair< std::string const, utils::Bitvector >
 

Static Public Attributes

static const size_t npos = std::numeric_limits<size_t>::max()
 Position value to indicate that next_covered did not find any covered position. More...
 

Constructor & Destructor Documentation

◆ GenomeLocusSet() [1/3]

GenomeLocusSet ( )
default

◆ ~GenomeLocusSet()

~GenomeLocusSet ( )
default

◆ GenomeLocusSet() [2/3]

GenomeLocusSet ( GenomeLocusSet const &  )
default

◆ GenomeLocusSet() [3/3]

GenomeLocusSet ( GenomeLocusSet &&  )
default

Member Function Documentation

◆ add() [1/9]

void add ( GenomeLocus const &  locus)
inline

Add a single GenomeLocus, that is, an interval covering one position on a chromosome.

Definition at line 163 of file genome_locus_set.hpp.

◆ add() [2/9]

void add ( GenomeLocus const &  start,
GenomeLocus const &  end 
)
inline

Add an interval between two GenomeLoci on the same chromosome.

Definition at line 171 of file genome_locus_set.hpp.

◆ add() [3/9]

void add ( GenomeRegion const &  region)
inline

Add a GenomeRegion to the list.

This function ensures that regions are valid (start < end).

Definition at line 192 of file genome_locus_set.hpp.

◆ add() [4/9]

void add ( GenomeRegionList const &  list)
inline

Add a complete GenomeRegionList to this list.

Definition at line 204 of file genome_locus_set.hpp.

◆ add() [5/9]

void add ( std::string const &  chromosome)
inline

Add a whole chromosome to the list, so that all its positions are considered to be covered.

Definition at line 127 of file genome_locus_set.hpp.

◆ add() [6/9]

void add ( std::string const &  chromosome,
size_t  position 
)
inline

Add a single locus (position, coordinate) to the list.

Definition at line 136 of file genome_locus_set.hpp.

◆ add() [7/9]

void add ( std::string const &  chromosome,
size_t  start,
size_t  end 
)

Add a region to the list, given its chromosome, and start and end positions.

All positions between start and end are set. The chromosome cannot be empty, and we expect start < end (or both equal to 0, for the special case denoting that the whole chromosome is to be considered covered). Both start and end are 1-based, and inclusive, that is, the interval between them is closed.

Definition at line 46 of file genome_locus_set.cpp.

◆ add() [8/9]

void add ( std::string const &  chromosome,
utils::Bitvector &&  values 
)

Add a chromosome to the list, given the full Bitvector representation of loci.

This assumes that the data of the Bitvector has been assembled according to the specifications of this class, i.e., respecting the special role of the 0th bit.

Definition at line 104 of file genome_locus_set.cpp.

◆ add() [9/9]

void add ( std::string const &  chromosome,
utils::Bitvector const &  values 
)

Add a chromosome to the list, given the full Bitvector representation of loci.

This assumes that the data of the Bitvector has been assembled according to the specifications of this class, i.e., respecting the special role of the 0th bit.

Definition at line 99 of file genome_locus_set.cpp.

◆ any_covered()

bool any_covered ( std::string const &  chromosome) const
inline

Return if the given chromosome has any loci covered.

Definition at line 378 of file genome_locus_set.hpp.

◆ begin()

const_iterator begin ( ) const
inline

Return an iterator to the beginning of the map of chromosome names to Bitvectors.

Definition at line 467 of file genome_locus_set.hpp.

◆ chromosome_count()

size_t chromosome_count ( ) const
inline

Return the number of chromosomes for which there are positions stored.

Definition at line 495 of file genome_locus_set.hpp.

◆ chromosome_names()

std::vector<std::string> chromosome_names ( ) const
inline

Get a list of all stored chromosome names.

Definition at line 503 of file genome_locus_set.hpp.

◆ chromosome_positions() [1/2]

utils::Bitvector& chromosome_positions ( std::string const &  chromosome)
inline

For a given chromosome, return the Bitvector that stores its positions.

Note that this exposes the underlying container, and hence has to be used with caution. In particular position 0 is considered special: Any chromosome for which we have stored an interval that covers 0 is considered to be fully covered for all its positions.

Definition at line 547 of file genome_locus_set.hpp.

◆ chromosome_positions() [2/2]

utils::Bitvector const& chromosome_positions ( std::string const &  chromosome) const
inline

For a given chromosome, return the Bitvector that stores its positions.

Definition at line 534 of file genome_locus_set.hpp.

◆ clear() [1/2]

void clear ( )
inline

Remove all stored regions from all chromosomes.

Definition at line 237 of file genome_locus_set.hpp.

◆ clear() [2/2]

void clear ( std::string const &  chromosome)
inline

Remove the regions of the specified chromosome.

Definition at line 245 of file genome_locus_set.hpp.

◆ empty()

bool empty ( ) const
inline

Return whether there are chromosomes with positions stored.

Definition at line 487 of file genome_locus_set.hpp.

◆ end()

const_iterator end ( ) const
inline

Return an iterator to the end of the map of chromosome names to Bitvectors.

Definition at line 475 of file genome_locus_set.hpp.

◆ find()

const_iterator find ( std::string const &  chromosome)
inline

Find a chromosome in the map.

Return an iterator to the entry, or to end() if the chromosome is not part of the set.

Definition at line 525 of file genome_locus_set.hpp.

◆ has_chromosome()

bool has_chromosome ( std::string const &  chromosome) const
inline

Return whether a chromosome is stored.

Definition at line 515 of file genome_locus_set.hpp.

◆ invert()

void invert ( sequence::SequenceDict const &  sequence_dict)

Invert all chromosome regions.

This needs a means of getting the length of each chromosome, in order to know how many positions towards the end of each chromosome need to be inverted. If the given sequence_dict does not contain a chromosome that is present in this set here, or the set contains set positions beyond the dict, an exception is thrown.

Definition at line 262 of file genome_locus_set.cpp.

◆ is_covered() [1/4]

static bool is_covered ( const_iterator const &  it,
size_t  position 
)
inlinestatic

Return whether a given position on the provided iterator is covered.

See also
is_covered( std::string const& chromosome, size_t position ) const

This overload is meant to work in combination with find(), to avoid having to look up the chromosome every time, in cases where this does not change. The function does not check that the iterator is valid - in fact, it is static, and hence not tied to any object.

Definition at line 332 of file genome_locus_set.hpp.

◆ is_covered() [2/4]

bool is_covered ( std::string const &  chromosome) const
inline

Return whether a whole chromosome is covered.

If the special 0th bit is set, we take that as the whole chromosome being covered, i.e., in cases where no individual positions were specified.

Definition at line 360 of file genome_locus_set.hpp.

◆ is_covered() [3/4]

bool is_covered ( std::string const &  chromosome,
size_t  position 
) const
inline

Return whether a given position on a chromosome is part of any of the regions stored.

Note that position is 1-based.

Definition at line 342 of file genome_locus_set.hpp.

◆ is_covered() [4/4]

static bool is_covered ( utils::Bitvector const &  bitvector,
size_t  position 
)
inlinestatic

Return whether a given position on the provided bitvector is covered.

See also
is_covered( std::string const& chromosome, size_t position ) const

This overload of the function accepts a Bitvector directly, without checking that it belongs to any of the chromosomes stored (hence, it is a static function). The Bitvector is expected to follow the convention of that class, that is, bit 0 is used to indicate that the whole chromosome is covered, and all other bits correspond to 1-based positions. This can for instance be used in combination with chromosome_positions().

Definition at line 296 of file genome_locus_set.hpp.

◆ next_covered() [1/3]

static size_t next_covered ( const_iterator const &  it,
size_t  start_position 
)
inlinestatic

Return the next position (including the start_position) that is covered.

See also
next_covered( std::string const& chromosome, size_t start_position ) const
is_covered( std::string const& chromosome, size_t position ) const

This overload is meant to work in combination with find(), to avoid having to look up the chromosome every time, in cases where this does not change. The function does not check that the iterator is valid - in fact, it is static, and hence not tied to any object.

Definition at line 435 of file genome_locus_set.hpp.

◆ next_covered() [2/3]

size_t next_covered ( std::string const &  chromosome,
size_t  start_position 
) const
inline

Return the next position (including the start_position) that is covered.

The function finds the next position after or including the start_position that is covered. If the whole chromosome is covered (the 0th bit being true as the indicator for that), then the start_position is returned. If no position after the start_position is covered on the chromosome at all, or the chromosome is not in the set, then GenomeLocusSet::npos is returned.

Definition at line 449 of file genome_locus_set.hpp.

◆ next_covered() [3/3]

static size_t next_covered ( utils::Bitvector const &  bitvector,
size_t  start_position 
)
inlinestatic

Return the next position (including the start_position) that is covered.

See also
next_covered( std::string const& chromosome, size_t start_position ) const
is_covered( std::string const& chromosome, size_t position ) const

This overload of the function accepts a Bitvector directly, without checking that it belongs to any of the chromosomes stored (hence, it is a static function). The Bitvector is expected to follow the convention of that class, that is, bit 0 is used to indicate that the whole chromosome is covered, and all other bits correspond to 1-based positions. This can for instance be used in combination with chromosome_positions().

Definition at line 403 of file genome_locus_set.hpp.

◆ operator=() [1/2]

GenomeLocusSet& operator= ( GenomeLocusSet &&  )
default

◆ operator=() [2/2]

GenomeLocusSet& operator= ( GenomeLocusSet const &  )
default

◆ set_intersect()

void set_intersect ( GenomeLocusSet const &  rhs)

Compute the intersection with another GenomeLocusSet rhs.

Any chromosomes that end up having no positions covered are removed.

Definition at line 139 of file genome_locus_set.cpp.

◆ set_union()

void set_union ( GenomeLocusSet const &  rhs)

Compute the union with another GenomeLocusSet rhs.

Definition at line 219 of file genome_locus_set.cpp.

Member Typedef Documentation

◆ const_iterator

using const_iterator = typename std::unordered_map<std::string, utils::Bitvector>::const_iterator

Definition at line 91 of file genome_locus_set.hpp.

◆ const_reference

using const_reference = value_type const&

Definition at line 88 of file genome_locus_set.hpp.

◆ iterator

using iterator = typename std::unordered_map<std::string, utils::Bitvector>::iterator

Definition at line 90 of file genome_locus_set.hpp.

◆ key_type

using key_type = std::string

Definition at line 83 of file genome_locus_set.hpp.

◆ mapped_type

Definition at line 84 of file genome_locus_set.hpp.

◆ reference

Definition at line 87 of file genome_locus_set.hpp.

◆ value_type

using value_type = std::pair<std::string const, utils::Bitvector>

Definition at line 85 of file genome_locus_set.hpp.

Member Data Documentation

◆ npos

const size_t npos = std::numeric_limits<size_t>::max()
static

Position value to indicate that next_covered did not find any covered position.

Definition at line 96 of file genome_locus_set.hpp.


The documentation for this class was generated from the following files: