#include <genesis/population/function/diversity_pool_calculator.hpp>

Detailed Description

Compute Theta Pi, Theta Watterson, and Tajia's D in their pool-sequencing corrected versions according to Kofler et al.

This is an efficient high level helper that is meant to compute these statistics on input iterator ranges. See theta_pi_pool(), theta_watterson_pool(), and tajima_d_pool() for details on the functions it computes.

The provided DiversityPoolSettings take care of most options offered by PoPoolation. In particular, we want to set the min_count, as well as the min_read_depth and max_read_depth. These read depths are called "coverage" in PoPoolation, which seems wrong.

We do expect here that the input samples that are provided to the process() function are already filtered (with the appropriate filter status set for the Variant and the SampleCounts) and transformed as needed. For example, typically, we want to use a SampleCountsFilter with settings that match the DiversityPoolSettings:

filter.min_count = settings.min_count;
filter.min_read_depth = settings.min_read_depth;
filter.max_read_depth = settings.max_read_depth;
filter.only_snps = true;

That is, the settings for the pool statistics should match the settings used for filtering the samples. The function filter_sample_counts() can be used to transform and filter the input coming from a file, in order to filter out base counts and samples that do not match these filters.

There are multiple ways that this filtering can be applied. Typically for example, we want to process a VariantInputStream, which allows us to use input from a variety of input file formats, all converted into Variants at each position in the genome. This internally is a genesis::utils::GenericInputStream, which offers to add add_transform_filter() functions for this purpose. The make_sample_counts_filter_numerical_tagging() is a convenience function that creates such a filter/transform function given a SampleCountsFilter settings instance.

Alternaively, genesis::utils::make_filter_range() can be used to achieve the same effect, but requiring a bit more manual "wiring" of the components first. This however has the advantage that SampleCountsFilterStats can be provided, e.g., per window of the analysis, to capture the number of sites that pass read depth filters etc. These numbers can then be used for get_theta_pi_relative() and get_theta_watterson_relative(), respectively. Otherwise (when instead filtering directly in the VariantInputStream), these numbers are lost, and instead the relative values would need to be computed, e.g., using the full window sizes, instead of taking only sufficiently covered positions into account for the normalization.

With either way of filtering, for all SNP positions of interest, call process() to compute the values for theta pi and theta watterson of this sample. The values are internally accumualted.

Once all samples have been processed, the getter functions get_theta_pi_absolute(), get_theta_pi_relative(), get_theta_watterson_absolute(), and get_theta_watterson_relative() can be used to obtain Theta Pi and Theta Watterson directly. For Tajima's D, more computation is needed, which is why the according function is called compute_tajima_d().

See

R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925

for details on the equations. The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf

Definition at line 122 of file diversity_pool_calculator.hpp.

Public Member Functions
	DiversityPoolCalculator (DiversityPoolCalculator &&)=default

	DiversityPoolCalculator (DiversityPoolCalculator const &)=default

	DiversityPoolCalculator (DiversityPoolSettings const &settings, size_t pool_size)

	~DiversityPoolCalculator ()=default

bool	enable_tajima_d () const

self_type &	enable_tajima_d (bool value)

bool	enable_theta_pi () const

self_type &	enable_theta_pi (bool value)

bool	enable_theta_watterson () const

self_type &	enable_theta_watterson (bool value)

SampleCountsFilterStats	get_filter_stats () const
	Get the sum of filter statistics of all sample pairs processed here. More...

Result	get_result (double window_avg_denom) const
	Convenience function to obtain all results at once. More...

bool	only_passing_samples () const

self_type &	only_passing_samples (bool value)

DiversityPoolCalculator &	operator= (DiversityPoolCalculator &&)=default

DiversityPoolCalculator &	operator= (DiversityPoolCalculator const &)=default

void	process (SampleCounts const &sample)
	Process a `sample`, by computing its Theta Pi and Theta Watterson values, respectively. More...

void	reset ()

Public Types
using	self_type = DiversityPoolCalculator

Classes
struct	Result
	Data struct to collect all diversity statistics computed here. More...

Constructor & Destructor Documentation

◆ DiversityPoolCalculator() [1/3]

DiversityPoolCalculator	(	DiversityPoolSettings const &	settings,
		size_t	pool_size
	)

inline

Definition at line 151 of file diversity_pool_calculator.hpp.

◆ ~DiversityPoolCalculator()

~DiversityPoolCalculator ( )

default

◆ DiversityPoolCalculator() [2/3]

DiversityPoolCalculator ( DiversityPoolCalculator const & )

default

◆ DiversityPoolCalculator() [3/3]

DiversityPoolCalculator ( DiversityPoolCalculator && )

default

Member Function Documentation

◆ enable_tajima_d() [1/2]

bool enable_tajima_d ( ) const

inline

Definition at line 219 of file diversity_pool_calculator.hpp.

◆ enable_tajima_d() [2/2]

self_type& enable_tajima_d ( bool value )

inline

Definition at line 213 of file diversity_pool_calculator.hpp.

◆ enable_theta_pi() [1/2]

bool enable_theta_pi ( ) const

inline

Definition at line 197 of file diversity_pool_calculator.hpp.

◆ enable_theta_pi() [2/2]

self_type& enable_theta_pi ( bool value )

inline

Definition at line 191 of file diversity_pool_calculator.hpp.

◆ enable_theta_watterson() [1/2]

bool enable_theta_watterson ( ) const

inline

Definition at line 208 of file diversity_pool_calculator.hpp.

◆ enable_theta_watterson() [2/2]

self_type& enable_theta_watterson ( bool value )

inline

Definition at line 202 of file diversity_pool_calculator.hpp.

◆ get_filter_stats()

SampleCountsFilterStats get_filter_stats ( ) const

inline

Get the sum of filter statistics of all sample pairs processed here.

With each call to process(), the filter stats are increased according to the filter status of both provided samples. Hence, the counts returned here always have an even sum.

Definition at line 330 of file diversity_pool_calculator.hpp.

◆ get_result()

Result get_result ( double window_avg_denom ) const

inline

Convenience function to obtain all results at once.

The function fills the Result with both diversity statistics, depending on which of them have been computed, according to enable_theta_pi(), enable_theta_watterson(). It computes the relative variants of those statistics using the provided window averaging, and computes Tajima's D if enable_tajima_d() is set.

Definition at line 291 of file diversity_pool_calculator.hpp.

◆ only_passing_samples() [1/2]

bool only_passing_samples ( ) const

inline

Definition at line 186 of file diversity_pool_calculator.hpp.

◆ only_passing_samples() [2/2]

self_type& only_passing_samples ( bool value )

inline

Definition at line 180 of file diversity_pool_calculator.hpp.

◆ operator=() [1/2]

DiversityPoolCalculator& operator= ( DiversityPoolCalculator && )

default

◆ operator=() [2/2]

DiversityPoolCalculator& operator= ( DiversityPoolCalculator const & )

default

◆ process()

void process ( SampleCounts const & sample )

inline

Process a sample, by computing its Theta Pi and Theta Watterson values, respectively.

The values are internally accumulated, so that they are usable for the getter functions. This function here also returns both of them (Pi first, Watterson second) for the given sample, as a convenience.

Definition at line 244 of file diversity_pool_calculator.hpp.

◆ reset()

void reset ( )

inline

Definition at line 228 of file diversity_pool_calculator.hpp.

Member Typedef Documentation

◆ self_type

using self_type = DiversityPoolCalculator

Definition at line 174 of file diversity_pool_calculator.hpp.

The documentation for this class was generated from the following file:

diversity_pool_calculator.hpp

Detailed Description

Public Member Functions

Public Types

Classes

Constructor & Destructor Documentation

◆ DiversityPoolCalculator() [1/3]

◆ ~DiversityPoolCalculator()

◆ DiversityPoolCalculator() [2/3]

◆ DiversityPoolCalculator() [3/3]

Member Function Documentation

◆ enable_tajima_d() [1/2]

◆ enable_tajima_d() [2/2]

◆ enable_theta_pi() [1/2]

◆ enable_theta_pi() [2/2]

◆ enable_theta_watterson() [1/2]

◆ enable_theta_watterson() [2/2]

◆ get_filter_stats()

◆ get_result()

◆ only_passing_samples() [1/2]

◆ only_passing_samples() [2/2]

◆ operator=() [1/2]

◆ operator=() [2/2]

◆ process()

◆ reset()

Member Typedef Documentation

◆ self_type