#include <genesis/population/genome_locus_set.hpp>
List of positions/coordinates in a genome, for each chromosome.
The data structure stores a list of genome positions/coordinates, and allows fast querying, that is, whether a certain position on a chromosome is stored here. Internally, we use a Bitvector for each chromosome, marking its positions as set or not set.
Positions are 1-based. We also offer the special case to add a whole chromosome, in which case the is_covered() function will return true
for all positions on that chromosome (without checking that the position is in fact within the length of the chromosome - as we do not use information on the lengths of chromosomes in this class). We use position 0
to mark this special whole-chromosome case - be aware of that when adding positions to the list. See also GenomeRegionList, GenomeLocus and GenomeRegion for related classes that have the same special cases.
Definition at line 75 of file genome_locus_set.hpp.
Public Member Functions | |
GenomeLocusSet ()=default | |
GenomeLocusSet (GenomeLocusSet &&)=default | |
GenomeLocusSet (GenomeLocusSet const &)=default | |
~GenomeLocusSet ()=default | |
void | add (GenomeLocus const &locus) |
Add a single GenomeLocus, that is, an interval covering one position on a chromosome. More... | |
void | add (GenomeLocus const &start, GenomeLocus const &end) |
Add an interval between two GenomeLoci on the same chromosome. More... | |
void | add (GenomeRegion const ®ion) |
Add a GenomeRegion to the list. More... | |
void | add (GenomeRegionList const &list) |
Add a complete GenomeRegionList to this list. More... | |
void | add (std::string const &chromosome) |
Add a whole chromosome to the list, so that all its positions are considered to be covered. More... | |
void | add (std::string const &chromosome, size_t position) |
Add a single locus (position, coordinate) to the list. More... | |
void | add (std::string const &chromosome, size_t start, size_t end) |
Add a region to the list, given its chromosome, and start and end positions. More... | |
void | add (std::string const &chromosome, utils::Bitvector &&values) |
Add a chromosome to the list, given the full Bitvector representation of loci. More... | |
void | add (std::string const &chromosome, utils::Bitvector const &values) |
Add a chromosome to the list, given the full Bitvector representation of loci. More... | |
bool | any_covered (std::string const &chromosome) const |
Return if the given chromosome has any loci covered. More... | |
const_iterator | begin () const |
Return an iterator to the beginning of the map of chromosome names to Bitvectors. More... | |
size_t | chromosome_count () const |
Return the number of chromosomes for which there are positions stored. More... | |
std::vector< std::string > | chromosome_names () const |
Get a list of all stored chromosome names. More... | |
utils::Bitvector & | chromosome_positions (std::string const &chromosome) |
For a given chromosome, return the Bitvector that stores its positions. More... | |
utils::Bitvector const & | chromosome_positions (std::string const &chromosome) const |
For a given chromosome, return the Bitvector that stores its positions. More... | |
void | clear () |
Remove all stored regions from all chromosomes. More... | |
void | clear (std::string const &chromosome) |
Remove the regions of the specified chromosome . More... | |
bool | empty () const |
Return whether there are chromosomes with positions stored. More... | |
const_iterator | end () const |
Return an iterator to the end of the map of chromosome names to Bitvectors. More... | |
const_iterator | find (std::string const &chromosome) |
Find a chromosome in the map. More... | |
bool | has_chromosome (std::string const &chromosome) const |
Return whether a chromosome is stored. More... | |
void | invert (sequence::SequenceDict const &sequence_dict) |
Invert all chromosome regions. More... | |
bool | is_covered (std::string const &chromosome) const |
Return whether a whole chromosome is covered. More... | |
bool | is_covered (std::string const &chromosome, size_t position) const |
Return whether a given position on a chromosome is part of any of the regions stored. More... | |
size_t | next_covered (std::string const &chromosome, size_t start_position) const |
Return the next position (including the start_position ) that is covered. More... | |
GenomeLocusSet & | operator= (GenomeLocusSet &&)=default |
GenomeLocusSet & | operator= (GenomeLocusSet const &)=default |
void | set_intersect (GenomeLocusSet const &rhs) |
Compute the intersection with another GenomeLocusSet rhs . More... | |
void | set_union (GenomeLocusSet const &rhs) |
Compute the union with another GenomeLocusSet rhs . More... | |
Static Public Member Functions | |
static bool | is_covered (const_iterator const &it, size_t position) |
Return whether a given position on the provided iterator is covered. More... | |
static bool | is_covered (utils::Bitvector const &bitvector, size_t position) |
Return whether a given position on the provided bitvector is covered. More... | |
static size_t | next_covered (const_iterator const &it, size_t start_position) |
Return the next position (including the start_position ) that is covered. More... | |
static size_t | next_covered (utils::Bitvector const &bitvector, size_t start_position) |
Return the next position (including the start_position ) that is covered. More... | |
Public Types | |
using | const_iterator = typename std::unordered_map< std::string, utils::Bitvector >::const_iterator |
using | const_reference = value_type const & |
using | iterator = typename std::unordered_map< std::string, utils::Bitvector >::iterator |
using | key_type = std::string |
using | mapped_type = utils::Bitvector |
using | reference = value_type & |
using | value_type = std::pair< std::string const, utils::Bitvector > |
Static Public Attributes | |
static const size_t | npos = std::numeric_limits<size_t>::max() |
Position value to indicate that next_covered did not find any covered position. More... | |
|
default |
|
default |
|
default |
|
default |
|
inline |
Add a single GenomeLocus, that is, an interval covering one position on a chromosome.
Definition at line 163 of file genome_locus_set.hpp.
|
inline |
Add an interval between two GenomeLoci on the same chromosome.
Definition at line 171 of file genome_locus_set.hpp.
|
inline |
Add a GenomeRegion to the list.
This function ensures that regions are valid (start < end
).
Definition at line 192 of file genome_locus_set.hpp.
|
inline |
Add a complete GenomeRegionList to this list.
Definition at line 204 of file genome_locus_set.hpp.
|
inline |
Add a whole chromosome to the list, so that all its positions are considered to be covered.
Definition at line 127 of file genome_locus_set.hpp.
|
inline |
Add a single locus (position, coordinate) to the list.
Definition at line 136 of file genome_locus_set.hpp.
void add | ( | std::string const & | chromosome, |
size_t | start, | ||
size_t | end | ||
) |
Add a region to the list, given its chromosome, and start
and end
positions.
All positions between start
and end
are set. The chromosome
cannot be empty, and we expect start
< end
(or both equal to 0, for the special case denoting that the whole chromosome is to be considered covered). Both start
and end
are 1-based, and inclusive, that is, the interval between them is closed.
Definition at line 46 of file genome_locus_set.cpp.
void add | ( | std::string const & | chromosome, |
utils::Bitvector && | values | ||
) |
Add a chromosome to the list, given the full Bitvector representation of loci.
This assumes that the data of the Bitvector has been assembled according to the specifications of this class, i.e., respecting the special role of the 0th bit.
Definition at line 104 of file genome_locus_set.cpp.
void add | ( | std::string const & | chromosome, |
utils::Bitvector const & | values | ||
) |
Add a chromosome to the list, given the full Bitvector representation of loci.
This assumes that the data of the Bitvector has been assembled according to the specifications of this class, i.e., respecting the special role of the 0th bit.
Definition at line 99 of file genome_locus_set.cpp.
|
inline |
Return if the given chromosome
has any loci covered.
Definition at line 378 of file genome_locus_set.hpp.
|
inline |
Return an iterator to the beginning of the map of chromosome names to Bitvectors.
Definition at line 467 of file genome_locus_set.hpp.
|
inline |
Return the number of chromosomes for which there are positions stored.
Definition at line 495 of file genome_locus_set.hpp.
|
inline |
Get a list of all stored chromosome names.
Definition at line 503 of file genome_locus_set.hpp.
|
inline |
For a given chromosome, return the Bitvector that stores its positions.
Note that this exposes the underlying container, and hence has to be used with caution. In particular position 0 is considered special: Any chromosome for which we have stored an interval that covers 0 is considered to be fully covered for all its positions.
Definition at line 547 of file genome_locus_set.hpp.
|
inline |
For a given chromosome, return the Bitvector that stores its positions.
Definition at line 534 of file genome_locus_set.hpp.
|
inline |
Remove all stored regions from all chromosomes.
Definition at line 237 of file genome_locus_set.hpp.
|
inline |
Remove the regions of the specified chromosome
.
Definition at line 245 of file genome_locus_set.hpp.
|
inline |
Return whether there are chromosomes with positions stored.
Definition at line 487 of file genome_locus_set.hpp.
|
inline |
Return an iterator to the end of the map of chromosome names to Bitvectors.
Definition at line 475 of file genome_locus_set.hpp.
|
inline |
Find a chromosome in the map.
Return an iterator to the entry, or to end() if the chromosome is not part of the set.
Definition at line 525 of file genome_locus_set.hpp.
|
inline |
Return whether a chromosome is stored.
Definition at line 515 of file genome_locus_set.hpp.
void invert | ( | sequence::SequenceDict const & | sequence_dict | ) |
Invert all chromosome regions.
This needs a means of getting the length of each chromosome, in order to know how many positions towards the end of each chromosome need to be inverted. If the given sequence_dict
does not contain a chromosome that is present in this set here, or the set contains set positions beyond the dict, an exception is thrown.
Definition at line 262 of file genome_locus_set.cpp.
|
inlinestatic |
Return whether a given position on the provided iterator is covered.
This overload is meant to work in combination with find(), to avoid having to look up the chromosome every time, in cases where this does not change. The function does not check that the iterator is valid - in fact, it is static, and hence not tied to any object.
Definition at line 332 of file genome_locus_set.hpp.
|
inline |
Return whether a whole chromosome is covered.
If the special 0th bit is set, we take that as the whole chromosome being covered, i.e., in cases where no individual positions were specified.
Definition at line 360 of file genome_locus_set.hpp.
|
inline |
Return whether a given position on a chromosome is part of any of the regions stored.
Note that position
is 1-based.
Definition at line 342 of file genome_locus_set.hpp.
|
inlinestatic |
Return whether a given position on the provided bitvector
is covered.
This overload of the function accepts a Bitvector directly, without checking that it belongs to any of the chromosomes stored (hence, it is a static function). The Bitvector is expected to follow the convention of that class, that is, bit 0 is used to indicate that the whole chromosome is covered, and all other bits correspond to 1-based positions. This can for instance be used in combination with chromosome_positions().
Definition at line 296 of file genome_locus_set.hpp.
|
inlinestatic |
Return the next position (including the start_position
) that is covered.
This overload is meant to work in combination with find(), to avoid having to look up the chromosome every time, in cases where this does not change. The function does not check that the iterator is valid - in fact, it is static, and hence not tied to any object.
Definition at line 435 of file genome_locus_set.hpp.
|
inline |
Return the next position (including the start_position
) that is covered.
The function finds the next position after or including the start_position
that is covered. If the whole chromosome is covered (the 0th bit being true as the indicator for that), then the start_position
is returned. If no position after the start_position
is covered on the chromosome at all, or the chromosome is not in the set, then GenomeLocusSet::npos is returned.
Definition at line 449 of file genome_locus_set.hpp.
|
inlinestatic |
Return the next position (including the start_position
) that is covered.
This overload of the function accepts a Bitvector directly, without checking that it belongs to any of the chromosomes stored (hence, it is a static function). The Bitvector is expected to follow the convention of that class, that is, bit 0 is used to indicate that the whole chromosome is covered, and all other bits correspond to 1-based positions. This can for instance be used in combination with chromosome_positions().
Definition at line 403 of file genome_locus_set.hpp.
|
default |
|
default |
void set_intersect | ( | GenomeLocusSet const & | rhs | ) |
Compute the intersection with another GenomeLocusSet rhs
.
Any chromosomes that end up having no positions covered are removed.
Definition at line 139 of file genome_locus_set.cpp.
void set_union | ( | GenomeLocusSet const & | rhs | ) |
Compute the union with another GenomeLocusSet rhs
.
Definition at line 219 of file genome_locus_set.cpp.
using const_iterator = typename std::unordered_map<std::string, utils::Bitvector>::const_iterator |
Definition at line 91 of file genome_locus_set.hpp.
using const_reference = value_type const& |
Definition at line 88 of file genome_locus_set.hpp.
using iterator = typename std::unordered_map<std::string, utils::Bitvector>::iterator |
Definition at line 90 of file genome_locus_set.hpp.
using key_type = std::string |
Definition at line 83 of file genome_locus_set.hpp.
using mapped_type = utils::Bitvector |
Definition at line 84 of file genome_locus_set.hpp.
using reference = value_type& |
Definition at line 87 of file genome_locus_set.hpp.
using value_type = std::pair<std::string const, utils::Bitvector> |
Definition at line 85 of file genome_locus_set.hpp.
|
static |
Position value to indicate that next_covered did not find any covered position.
Definition at line 96 of file genome_locus_set.hpp.