A library for working with phylogenetic and population genetic data.
v0.32.0
DiversityPoolCalculator Class Reference

#include <genesis/population/function/diversity_pool_calculator.hpp>

Detailed Description

Compute Theta Pi, Theta Watterson, and Tajia's D in their pool-sequencing corrected versions according to Kofler et al.

This is an efficient high level helper that is meant to compute these statistics on input iterator ranges. See theta_pi_pool(), theta_watterson_pool(), and tajima_d_pool() for details on the functions it computes.

The provided DiversityPoolSettings take care of most options offered by PoPoolation. In particular, we want to set the min_count, as well as the min_read_depth and max_read_depth. These read depths are called "coverage" in PoPoolation, which seems wrong.

We do expect here that the input samples that are provided to the process() function are already filtered (with the appropriate filter status set for the Variant and the SampleCounts) and transformed as needed. For example, typically, we want to use a SampleCountsFilter with settings that match the DiversityPoolSettings:

filter.min_count = settings.min_count;
filter.min_read_depth = settings.min_read_depth;
filter.max_read_depth = settings.max_read_depth;
filter.only_snps = true;

That is, the settings for the pool statistics should match the settings used for filtering the samples. The function filter_sample_counts() can be used to transform and filter the input coming from a file, in order to filter out base counts and samples that do not match these filters.

There are multiple ways that this filtering can be applied. Typically for example, we want to process a VariantInputStream, which allows us to use input from a variety of input file formats, all converted into Variants at each position in the genome. This internally is a genesis::utils::GenericInputStream, which offers to add add_transform_filter() functions for this purpose. The make_sample_counts_filter_numerical_tagging() is a convenience function that creates such a filter/transform function given a SampleCountsFilter settings instance.

Alternaively, genesis::utils::make_filter_range() can be used to achieve the same effect, but requiring a bit more manual "wiring" of the components first. This however has the advantage that SampleCountsFilterStats can be provided, e.g., per window of the analysis, to capture the number of sites that pass read depth filters etc. These numbers can then be used for get_theta_pi_relative() and get_theta_watterson_relative(), respectively. Otherwise (when instead filtering directly in the VariantInputStream), these numbers are lost, and instead the relative values would need to be computed, e.g., using the full window sizes, instead of taking only sufficiently covered positions into account for the normalization.

With either way of filtering, for all SNP positions of interest, call process() to compute the values for theta pi and theta watterson of this sample. The values are internally accumualted.

Once all samples have been processed, the getter functions get_theta_pi_absolute(), get_theta_pi_relative(), get_theta_watterson_absolute(), and get_theta_watterson_relative() can be used to obtain Theta Pi and Theta Watterson directly. For Tajima's D, more computation is needed, which is why the according function is called compute_tajima_d().

See

R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925

for details on the equations. The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf

Definition at line 122 of file diversity_pool_calculator.hpp.

Public Member Functions

 DiversityPoolCalculator (DiversityPoolCalculator &&)=default
 
 DiversityPoolCalculator (DiversityPoolCalculator const &)=default
 
 DiversityPoolCalculator (DiversityPoolSettings const &settings, size_t pool_size)
 
 ~DiversityPoolCalculator ()=default
 
bool enable_tajima_d () const
 
self_typeenable_tajima_d (bool value)
 
bool enable_theta_pi () const
 
self_typeenable_theta_pi (bool value)
 
bool enable_theta_watterson () const
 
self_typeenable_theta_watterson (bool value)
 
SampleCountsFilterStats get_filter_stats () const
 Get the sum of filter statistics of all sample pairs processed here. More...
 
Result get_result (double window_avg_denom) const
 Convenience function to obtain all results at once. More...
 
bool only_passing_samples () const
 
self_typeonly_passing_samples (bool value)
 
DiversityPoolCalculatoroperator= (DiversityPoolCalculator &&)=default
 
DiversityPoolCalculatoroperator= (DiversityPoolCalculator const &)=default
 
void process (SampleCounts const &sample)
 Process a sample, by computing its Theta Pi and Theta Watterson values, respectively. More...
 
void reset ()
 

Public Types

using self_type = DiversityPoolCalculator
 

Classes

struct  Result
 Data struct to collect all diversity statistics computed here. More...
 

Constructor & Destructor Documentation

◆ DiversityPoolCalculator() [1/3]

DiversityPoolCalculator ( DiversityPoolSettings const &  settings,
size_t  pool_size 
)
inline

Definition at line 151 of file diversity_pool_calculator.hpp.

◆ ~DiversityPoolCalculator()

◆ DiversityPoolCalculator() [2/3]

◆ DiversityPoolCalculator() [3/3]

Member Function Documentation

◆ enable_tajima_d() [1/2]

bool enable_tajima_d ( ) const
inline

Definition at line 219 of file diversity_pool_calculator.hpp.

◆ enable_tajima_d() [2/2]

self_type& enable_tajima_d ( bool  value)
inline

Definition at line 213 of file diversity_pool_calculator.hpp.

◆ enable_theta_pi() [1/2]

bool enable_theta_pi ( ) const
inline

Definition at line 197 of file diversity_pool_calculator.hpp.

◆ enable_theta_pi() [2/2]

self_type& enable_theta_pi ( bool  value)
inline

Definition at line 191 of file diversity_pool_calculator.hpp.

◆ enable_theta_watterson() [1/2]

bool enable_theta_watterson ( ) const
inline

Definition at line 208 of file diversity_pool_calculator.hpp.

◆ enable_theta_watterson() [2/2]

self_type& enable_theta_watterson ( bool  value)
inline

Definition at line 202 of file diversity_pool_calculator.hpp.

◆ get_filter_stats()

SampleCountsFilterStats get_filter_stats ( ) const
inline

Get the sum of filter statistics of all sample pairs processed here.

With each call to process(), the filter stats are increased according to the filter status of both provided samples. Hence, the counts returned here always have an even sum.

Definition at line 330 of file diversity_pool_calculator.hpp.

◆ get_result()

Result get_result ( double  window_avg_denom) const
inline

Convenience function to obtain all results at once.

The function fills the Result with both diversity statistics, depending on which of them have been computed, according to enable_theta_pi(), enable_theta_watterson(). It computes the relative variants of those statistics using the provided window averaging, and computes Tajima's D if enable_tajima_d() is set.

Definition at line 291 of file diversity_pool_calculator.hpp.

◆ only_passing_samples() [1/2]

bool only_passing_samples ( ) const
inline

Definition at line 186 of file diversity_pool_calculator.hpp.

◆ only_passing_samples() [2/2]

self_type& only_passing_samples ( bool  value)
inline

Definition at line 180 of file diversity_pool_calculator.hpp.

◆ operator=() [1/2]

DiversityPoolCalculator& operator= ( DiversityPoolCalculator &&  )
default

◆ operator=() [2/2]

DiversityPoolCalculator& operator= ( DiversityPoolCalculator const &  )
default

◆ process()

void process ( SampleCounts const &  sample)
inline

Process a sample, by computing its Theta Pi and Theta Watterson values, respectively.

The values are internally accumulated, so that they are usable for the getter functions. This function here also returns both of them (Pi first, Watterson second) for the given sample, as a convenience.

Definition at line 244 of file diversity_pool_calculator.hpp.

◆ reset()

void reset ( )
inline

Definition at line 228 of file diversity_pool_calculator.hpp.

Member Typedef Documentation

◆ self_type


The documentation for this class was generated from the following file: