#include <genesis/population/genome_region_list.hpp>
List of regions in a genome, for each chromosome.
The data structure stores a list of genome regions, such as coming from BED or GFF files. It allows fast querying, that is, whether a certain position on a chromosome is part of one of the stored regions. Furthermore, the class allows to iterate through the regions of each chromosome.
Positions in the interval of each region are 1-based and inclusive, that is, we used closed intervals.
Interally, we use an IntervalTree to represent the regions of each chromosome, stored in a map from chromosome name to IntervalTree. This is so that access and querying of contained positions is as fast as possible, and so that we do not store the chromosome name string with every region.
Definition at line 82 of file genome_region_list.hpp.
Public Member Functions | |
GenomeRegionList ()=default | |
GenomeRegionList (GenomeRegionList &&)=default | |
GenomeRegionList (GenomeRegionList const &)=default | |
~GenomeRegionList ()=default | |
void | add (GenomeLocus const &locus, bool overlap=false) |
Add a single Locus, that is, an interval covering one position on a chromosome. More... | |
void | add (GenomeLocus const &start, GenomeLocus const &end, bool overlap=false) |
Add an interval between two Loci on the same chromosome. More... | |
void | add (GenomeRegion const ®ion, bool overlap=false) |
Add a GenomeRegion to the list. More... | |
void | add (GenomeRegionList const &other, bool overlap=false) |
Add a complete GenomeRegionList to this list. More... | |
void | add (std::string const &chromosome, numerical_type start, numerical_type end, bool overlap=false) |
Add a GenomeRegion to the list, given its chromosome, and start and end positions. More... | |
size_t | chromosome_count () const |
Return the number of chromosomes for which there are regions stored. More... | |
std::map< std::string, tree_type > & | chromosome_map () |
Access the underlying container directly. More... | |
std::map< std::string, tree_type > const & | chromosome_map () const |
Access the underlying container directly. More... | |
std::vector< std::string > | chromosome_names () const |
Get a list of all stored chromosome names. More... | |
tree_type & | chromosome_regions (std::string const &chromosome) |
For a given chromosome, return the IntervalTree that stores its regions. More... | |
tree_type const & | chromosome_regions (std::string const &chromosome) const |
For a given chromosome, return the IntervalTree that stores its regions. More... | |
void | clear () |
Remove all stored regions from all chromosomes. More... | |
void | clear (std::string const &chromosome) |
Remove the regions of the specified chromosome . More... | |
bool | empty () const |
Return whether there are chromosomes with regions stored. More... | |
bool | has_chromosome (std::string const &chromosome) const |
Return whether a chromosome is stored. More... | |
bool | is_covered (std::string const &chromosome, numerical_type position) const |
Return whether a given position on a chromosome is part of any of the regions stored. More... | |
GenomeRegionList & | operator= (GenomeRegionList &&)=default |
GenomeRegionList & | operator= (GenomeRegionList const &)=default |
size_t | region_count (std::string const &chromosome) const |
Return the number of regions stored for the specified chromosome . More... | |
size_t | total_region_count () const |
Return the number of regions stored in total, across all chromosomes. More... | |
Public Types | |
using | const_iterator = typename tree_type::const_iterator |
using | data_type = EmptyGenomeData |
using | iterator = typename tree_type::iterator |
using | numerical_type = size_t |
using | self_type = GenomeRegionList |
using | tree_type = genesis::utils::IntervalTree< data_type, numerical_type, genesis::utils::IntervalClosed > |
|
default |
|
default |
|
default |
|
default |
|
inline |
Add a single Locus, that is, an interval covering one position on a chromosome.
If overlap
is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.
Definition at line 183 of file genome_region_list.hpp.
|
inline |
Add an interval between two Loci on the same chromosome.
If overlap
is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.
Definition at line 193 of file genome_region_list.hpp.
|
inline |
Add a GenomeRegion to the list.
This function ensures that regions are valid (start < end
), and keeps the list sorted.
If overlap
is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.
Definition at line 216 of file genome_region_list.hpp.
|
inline |
Add a complete GenomeRegionList to this list.
This function copies all entries of the list
.
If overlap
is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.
Definition at line 232 of file genome_region_list.hpp.
|
inline |
Add a GenomeRegion to the list, given its chromosome, and start and end positions.
The chromosome
cannot be empty, and we expect start
< end
. Both start
and end
are 1-based, and inclusive, that is, the interval between them is closed.
If overlap
is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.
Definition at line 130 of file genome_region_list.hpp.
|
inline |
Return the number of chromosomes for which there are regions stored.
Definition at line 320 of file genome_region_list.hpp.
|
inline |
Access the underlying container directly.
Expose the map from chromosome names to the IntervalTree that stores the regions of each chromosome. This is okay to expose, as this class is merely a thin convenience wrapper around it anyway. If the class ever changes to be more than that, we might remove access to this.
Definition at line 405 of file genome_region_list.hpp.
|
inline |
Access the underlying container directly.
Expose the map from chromosome names to the IntervalTree that stores the regions of each chromosome. This is okay to expose, as this class is merely a thin convenience wrapper around it anyway. If the class ever changes to be more than that, we might remove access to this.
Definition at line 397 of file genome_region_list.hpp.
|
inline |
Get a list of all stored chromosome names.
Definition at line 328 of file genome_region_list.hpp.
|
inline |
For a given chromosome, return the IntervalTree that stores its regions.
Definition at line 358 of file genome_region_list.hpp.
|
inline |
For a given chromosome, return the IntervalTree that stores its regions.
Definition at line 349 of file genome_region_list.hpp.
|
inline |
Remove all stored regions from all chromosomes.
Definition at line 248 of file genome_region_list.hpp.
|
inline |
Remove the regions of the specified chromosome
.
Definition at line 256 of file genome_region_list.hpp.
|
inline |
Return whether there are chromosomes with regions stored.
Definition at line 312 of file genome_region_list.hpp.
|
inline |
Return whether a chromosome is stored.
Definition at line 340 of file genome_region_list.hpp.
|
inline |
Return whether a given position on a chromosome is part of any of the regions stored.
Definition at line 273 of file genome_region_list.hpp.
|
default |
|
default |
|
inline |
Return the number of regions stored for the specified chromosome
.
Definition at line 366 of file genome_region_list.hpp.
|
inline |
Return the number of regions stored in total, across all chromosomes.
Definition at line 379 of file genome_region_list.hpp.
using const_iterator = typename tree_type::const_iterator |
Definition at line 99 of file genome_region_list.hpp.
using data_type = EmptyGenomeData |
Definition at line 91 of file genome_region_list.hpp.
using iterator = typename tree_type::iterator |
Definition at line 98 of file genome_region_list.hpp.
using numerical_type = size_t |
Definition at line 92 of file genome_region_list.hpp.
using self_type = GenomeRegionList |
Definition at line 96 of file genome_region_list.hpp.
using tree_type = genesis::utils::IntervalTree< data_type, numerical_type, genesis::utils::IntervalClosed > |
Definition at line 95 of file genome_region_list.hpp.