A library for working with phylogenetic and population genetic data.
v0.27.0
GenomeRegionList Class Reference

#include <genesis/population/genome_region_list.hpp>

Detailed Description

List of regions in a genome, for each chromosome.

The data structure stores a list of genome regions, such as coming from BED or GFF files. It allows fast querying, that is, whether a certain position on a chromosome is part of one of the stored regions. Furthermore, the class allows to iterate through the regions of each chromosome.

Positions in the interval of each region are 1-based and inclusive, that is, we used closed intervals.

Interally, we use an IntervalTree to represent the regions of each chromosome, stored in a map from chromosome name to IntervalTree. This is so that access and querying of contained positions is as fast as possible, and so that we do not store the chromosome name string with every region.

See also
GenomeLocus
GenomeRegion

Definition at line 82 of file genome_region_list.hpp.

Public Member Functions

 GenomeRegionList ()=default
 
 GenomeRegionList (GenomeRegionList &&)=default
 
 GenomeRegionList (GenomeRegionList const &)=default
 
 ~GenomeRegionList ()=default
 
void add (GenomeLocus const &locus, bool overlap=false)
 Add a single Locus, that is, an interval covering one position on a chromosome. More...
 
void add (GenomeLocus const &start, GenomeLocus const &end, bool overlap=false)
 Add an interval between two Loci on the same chromosome. More...
 
void add (GenomeRegion const &region, bool overlap=false)
 Add a GenomeRegion to the list. More...
 
void add (GenomeRegionList const &other, bool overlap=false)
 Add a complete GenomeRegionList to this list. More...
 
void add (std::string const &chromosome, numerical_type start, numerical_type end, bool overlap=false)
 Add a GenomeRegion to the list, given its chromosome, and start and end positions. More...
 
size_t chromosome_count () const
 Return the number of chromosomes for which there are regions stored. More...
 
std::map< std::string, tree_type > & chromosome_map ()
 Access the underlying container directly. More...
 
std::map< std::string, tree_type > const & chromosome_map () const
 Access the underlying container directly. More...
 
std::vector< std::string > chromosome_names () const
 Get a list of all stored chromosome names. More...
 
tree_typechromosome_regions (std::string const &chromosome)
 For a given chromosome, return the IntervalTree that stores its regions. More...
 
tree_type const & chromosome_regions (std::string const &chromosome) const
 For a given chromosome, return the IntervalTree that stores its regions. More...
 
void clear ()
 Remove all stored regions from all chromosomes. More...
 
void clear (std::string const &chromosome)
 Remove the regions of the specified chromosome. More...
 
bool empty () const
 Return whether there are chromosomes with regions stored. More...
 
bool has_chromosome (std::string const &chromosome) const
 Return whether a chromosome is stored. More...
 
bool is_covered (std::string const &chromosome, numerical_type position) const
 Return whether a given position on a chromosome is part of any of the regions stored. More...
 
GenomeRegionListoperator= (GenomeRegionList &&)=default
 
GenomeRegionListoperator= (GenomeRegionList const &)=default
 
size_t region_count (std::string const &chromosome) const
 Return the number of regions stored for the specified chromosome. More...
 
size_t total_region_count () const
 Return the number of regions stored in total, across all chromosomes. More...
 

Public Types

using const_iterator = typename tree_type::const_iterator
 
using data_type = EmptyGenomeData
 
using iterator = typename tree_type::iterator
 
using numerical_type = size_t
 
using self_type = GenomeRegionList
 
using tree_type = genesis::utils::IntervalTree< data_type, numerical_type, genesis::utils::IntervalClosed >
 

Constructor & Destructor Documentation

◆ GenomeRegionList() [1/3]

GenomeRegionList ( )
default

◆ ~GenomeRegionList()

~GenomeRegionList ( )
default

◆ GenomeRegionList() [2/3]

GenomeRegionList ( GenomeRegionList const &  )
default

◆ GenomeRegionList() [3/3]

Member Function Documentation

◆ add() [1/5]

void add ( GenomeLocus const &  locus,
bool  overlap = false 
)
inline

Add a single Locus, that is, an interval covering one position on a chromosome.

If overlap is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.

Definition at line 183 of file genome_region_list.hpp.

◆ add() [2/5]

void add ( GenomeLocus const &  start,
GenomeLocus const &  end,
bool  overlap = false 
)
inline

Add an interval between two Loci on the same chromosome.

If overlap is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.

Definition at line 193 of file genome_region_list.hpp.

◆ add() [3/5]

void add ( GenomeRegion const &  region,
bool  overlap = false 
)
inline

Add a GenomeRegion to the list.

This function ensures that regions are valid (start < end), and keeps the list sorted.

If overlap is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.

Definition at line 216 of file genome_region_list.hpp.

◆ add() [4/5]

void add ( GenomeRegionList const &  other,
bool  overlap = false 
)
inline

Add a complete GenomeRegionList to this list.

This function copies all entries of the list.

If overlap is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.

Definition at line 232 of file genome_region_list.hpp.

◆ add() [5/5]

void add ( std::string const &  chromosome,
numerical_type  start,
numerical_type  end,
bool  overlap = false 
)
inline

Add a GenomeRegion to the list, given its chromosome, and start and end positions.

The chromosome cannot be empty, and we expect start < end. Both start and end are 1-based, and inclusive, that is, the interval between them is closed.

If overlap is set, we first check if there is a region already in the list that overlaps with the one that is to be added. If so, the new region is merged with existing one, instead of inserting it. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes.

Definition at line 130 of file genome_region_list.hpp.

◆ chromosome_count()

size_t chromosome_count ( ) const
inline

Return the number of chromosomes for which there are regions stored.

Definition at line 320 of file genome_region_list.hpp.

◆ chromosome_map() [1/2]

std::map<std::string, tree_type>& chromosome_map ( )
inline

Access the underlying container directly.

Expose the map from chromosome names to the IntervalTree that stores the regions of each chromosome. This is okay to expose, as this class is merely a thin convenience wrapper around it anyway. If the class ever changes to be more than that, we might remove access to this.

Definition at line 405 of file genome_region_list.hpp.

◆ chromosome_map() [2/2]

std::map<std::string, tree_type> const& chromosome_map ( ) const
inline

Access the underlying container directly.

Expose the map from chromosome names to the IntervalTree that stores the regions of each chromosome. This is okay to expose, as this class is merely a thin convenience wrapper around it anyway. If the class ever changes to be more than that, we might remove access to this.

Definition at line 397 of file genome_region_list.hpp.

◆ chromosome_names()

std::vector<std::string> chromosome_names ( ) const
inline

Get a list of all stored chromosome names.

Definition at line 328 of file genome_region_list.hpp.

◆ chromosome_regions() [1/2]

tree_type& chromosome_regions ( std::string const &  chromosome)
inline

For a given chromosome, return the IntervalTree that stores its regions.

Definition at line 358 of file genome_region_list.hpp.

◆ chromosome_regions() [2/2]

tree_type const& chromosome_regions ( std::string const &  chromosome) const
inline

For a given chromosome, return the IntervalTree that stores its regions.

Definition at line 349 of file genome_region_list.hpp.

◆ clear() [1/2]

void clear ( )
inline

Remove all stored regions from all chromosomes.

Definition at line 248 of file genome_region_list.hpp.

◆ clear() [2/2]

void clear ( std::string const &  chromosome)
inline

Remove the regions of the specified chromosome.

Definition at line 256 of file genome_region_list.hpp.

◆ empty()

bool empty ( ) const
inline

Return whether there are chromosomes with regions stored.

Definition at line 312 of file genome_region_list.hpp.

◆ has_chromosome()

bool has_chromosome ( std::string const &  chromosome) const
inline

Return whether a chromosome is stored.

Definition at line 340 of file genome_region_list.hpp.

◆ is_covered()

bool is_covered ( std::string const &  chromosome,
numerical_type  position 
) const
inline

Return whether a given position on a chromosome is part of any of the regions stored.

Definition at line 273 of file genome_region_list.hpp.

◆ operator=() [1/2]

GenomeRegionList& operator= ( GenomeRegionList &&  )
default

◆ operator=() [2/2]

GenomeRegionList& operator= ( GenomeRegionList const &  )
default

◆ region_count()

size_t region_count ( std::string const &  chromosome) const
inline

Return the number of regions stored for the specified chromosome.

Definition at line 366 of file genome_region_list.hpp.

◆ total_region_count()

size_t total_region_count ( ) const
inline

Return the number of regions stored in total, across all chromosomes.

Definition at line 379 of file genome_region_list.hpp.

Member Typedef Documentation

◆ const_iterator

Definition at line 99 of file genome_region_list.hpp.

◆ data_type

Definition at line 91 of file genome_region_list.hpp.

◆ iterator

using iterator = typename tree_type::iterator

Definition at line 98 of file genome_region_list.hpp.

◆ numerical_type

using numerical_type = size_t

Definition at line 92 of file genome_region_list.hpp.

◆ self_type

Definition at line 96 of file genome_region_list.hpp.

◆ tree_type


The documentation for this class was generated from the following file: