Classes | |
class | AlleleFrequencyWindow |
class | BaseFstPoolCalculator |
Base class to compute FST between two pooled samples, given two instances of SampleCounts. More... | |
class | BaseWindow |
Base class for Window and WindowView, to share common functionality. More... | |
class | BaseWindowStream |
Base class for streams of Windows over the chromosomes of a genome. More... | |
class | BedReader |
Reader for BED (Browser Extensible Data) files. More... | |
struct | CathedralPlotParameters |
Plot parameters to make a cathedral plot. More... | |
struct | CathedralPlotRecord |
Collection of the data used for making for a cathedral plot. More... | |
class | ChromosomeWindowStream |
Stream for traversing each chromosome as a whole, with an inner WindowView iterator over the positions of each chromosome. More... | |
class | DiversityPoolCalculator |
Compute Theta Pi, Theta Watterson, and Tajia's D in their pool-sequencing corrected versions according to Kofler et al. More... | |
class | DiversityPoolProcessor |
Helper class to iterate over Variants and process the samples (SampleCounts), using a set of DiversityPoolCalculator instances, one for each sample. More... | |
struct | DiversityPoolSettings |
Settings used by different pool-sequencing corrected diversity statistics. More... | |
struct | EmptyAccumulator |
Empty helper data struct to serve as a dummy for Window. More... | |
struct | EmptyGenomeData |
Helper struct to define a default empty data for the classes GenomeLocus, GenomeRegion, and GenomeRegionList. More... | |
struct | FilterStats |
Counts of how many entries with a particular Filter Tag occured in some data. More... | |
class | FilterStatus |
Tag class to assign a filter status to a Variant or SampleCounts. More... | |
class | FrequencyTableInputStream |
Iterate an input source and parse it as a table of allele frequencies or counts. More... | |
class | FstCathedralAccumulator |
Accumulate the partial pi values for a given window to produce a cathedral plot. More... | |
struct | FstCathedralPlotRecord |
Data for making one FST cathedral plot, that is, one pair of samples and one chromosome. More... | |
class | FstPoolCalculatorKarlsson |
Compute the F_ST statistic for pool-sequenced data of Karlsson et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
class | FstPoolCalculatorKofler |
Compute the F_ST statistic for pool-sequenced data of Kofler et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
class | FstPoolCalculatorUnbiased |
Compute our unbiased F_ST statistic for pool-sequenced data for two ranges of SampleCountss. More... | |
class | FstPoolProcessor |
Helper class to iterate over Variants and process pairs of FST between their samples (SampleCounts), using a set of BaseFstPoolCalculator. More... | |
class | GenomeHeatmap |
struct | GenomeLocus |
A single locus, that is, a position (or coordinate) on a chromosome. More... | |
class | GenomeLocusSet |
List of positions/coordinates in a genome, for each chromosome. More... | |
struct | GenomeRegion |
A region (between two positions) on a chromosome. More... | |
class | GenomeRegionList |
List of regions in a genome, for each chromosome. More... | |
class | GenomeRegionReader |
Generic reader for inputs that contain a genomic region or locus per line, in different formats. More... | |
class | GenomeWindowStream |
Stream for traversing the entire genome as a single window, with an inner WindowView iterator over the positions along the chromosomes. More... | |
class | GffReader |
Reader for GFF2 and GFF3 (General Feature Format) and GTF (General Transfer Format) files. More... | |
class | HeatmapColorization |
class | HtsFile |
Wrap an ::htsFile struct. More... | |
class | IntervalWindowStream |
Stream for sliding Windows of fixed sized intervals over the chromosomes of a genome. More... | |
class | MapBimReader |
Reader for map/bim files as used by PLINK. More... | |
class | PositionWindowStream |
Stream for traversing each position along a genome individually. More... | |
class | QueueWindowStream |
Stream for Windows contaiing a queue of entries, i.e., sliding Windows of a fixed number of selected positions in a genome. More... | |
class | RegionWindowStream |
Stream for Windows representing regions of a genome. More... | |
struct | SampleCounts |
One set of nucleotide sample counts, for example for a given sample that represents a pool of sequenced individuals. More... | |
struct | SampleCountsFilterNumericalParams |
Filter settings to filter and transform SampleCounts. More... | |
class | SamVariantInputStream |
Input stream for SAM/BAM/CRAM files that produces a Variant per genome position. More... | |
class | SimplePileupInputStream |
Iterate an input source and parse it as a (m)pileup file. More... | |
class | SimplePileupReader |
Reader for line-by-line assessment of (m)pileup files. More... | |
class | SlidingWindowGenerator |
Generator for sliding Windows over the chromosomes of a genome. More... | |
struct | SortedSampleCounts |
Ordered array of sample counts for the four nucleotides. More... | |
class | SyncInputStream |
Iterate an input source and parse it as a sync file. More... | |
class | SyncReader |
Reader for PoPoolation2's "synchronized" files. More... | |
struct | Variant |
A single variant at a position in a chromosome, along with SampleCounts for a set of samples. More... | |
struct | VariantFilterNumericalParams |
class | VariantGaplessInputStream |
Stream adapter that visits every position in the genome. More... | |
struct | VariantInputStreamData |
Data storage for input-specific information when traversing a variant file. More... | |
struct | VariantInputStreamFromVcfParams |
Parameters to use when streaming through a VCF file as Variants. More... | |
class | VariantParallelInputStream |
Iterate multiple input sources that yield Variants in parallel. More... | |
class | VcfFormatHelper |
Provide htslib helper functions. More... | |
class | VcfFormatIterator |
Iterate the FORMAT information for the samples in a SNP/variant line in a VCF/BCF file. More... | |
class | VcfGenotype |
Simple wrapper class for one genotype field for a sample. More... | |
class | VcfHeader |
Capture the information from a header of a VCF/BCF file. More... | |
class | VcfInputStream |
Iterate an input source and parse it as a VCF/BCF file. More... | |
class | VcfRecord |
Capture the information of a single SNP/variant line in a VCF/BCF file. More... | |
struct | VcfSpecification |
Collect the four required keys that describe an INFO or FORMAT sub-field of VCF/BCF files. More... | |
class | Window |
Window over the chromosomes of a genome. More... | |
class | WindowView |
Proxy view over window-like regions of a genome. More... | |
class | WindowViewStream |
Stream wrapper that turns a BaseWindowStream over Window into a BaseWindowStream over WindowView. More... | |
Functions | |
double | a_n (double n) |
Compute a_n , the sum of reciprocals. More... | |
bool | all_finite_ (FstCathedralPlotRecord::Entry const &entry) |
size_t | allele_count (SampleCounts const &sample) |
Return the number of alleles, that is, of non-zero nucleotide counts of the sample . More... | |
size_t | allele_count (SampleCounts const &sample, size_t min_count) |
Return the number of alleles, taking a min_count into consideration, that is, we compute the number of nucleotide counts of the sample that are at least the min_count . More... | |
size_t | allele_count (SampleCounts const &sample, size_t min_count, size_t max_count) |
Return the number of alleles, taking a min_count and max_count into consideration, that is, we compute the number of nucleotide counts of the sample that are at least min_count and at most max_count . More... | |
double | alpha_star (double n) |
Compute alpha* according to Achaz 2008 and Kofler et al. 2011. More... | |
double | amnm_ (size_t poolsize, size_t nucleotide_count, size_t allele_frequency) |
Local helper function to compute values for the denominator. More... | |
template<class D > | |
size_t | anchor_position (BaseWindow< D > const &window, WindowAnchorType anchor_type=WindowAnchorType::kIntervalBegin) |
Get the position in the chromosome reported according to a specific WindowAnchorType. More... | |
template<class D , class A = EmptyAccumulator> | |
size_t | anchor_position (Window< D, A > const &window, WindowAnchorType anchor_type=WindowAnchorType::kIntervalBegin) |
Get the position in the chromosome reported according to a specific WindowAnchorType. More... | |
bool | apply_sample_counts_filter_numerical (SampleCounts &sample, SampleCountsFilterNumericalParams const ¶ms) |
Filter a given SampleCounts based on the numerical properties of the counts. More... | |
bool | apply_sample_counts_filter_numerical (SampleCounts &sample, SampleCountsFilterNumericalParams const ¶ms, SampleCountsFilterStats &stats) |
Filter a given SampleCounts based on the numerical properties of the counts. More... | |
bool | apply_sample_counts_filter_numerical (Variant &variant, SampleCountsFilterNumericalParams const ¶ms, bool all_need_pass=false) |
bool | apply_sample_counts_filter_numerical (Variant &variant, SampleCountsFilterNumericalParams const ¶ms, VariantFilterStats &variant_stats, SampleCountsFilterStats &sample_count_stats, bool all_need_pass=false) |
Filter a given SampleCounts based on the numerical properties of the counts. More... | |
bool | apply_variant_filter_numerical (Variant &variant, VariantFilterNumericalParams const ¶ms) |
Filter a given Variant based on the numerical properties of the counts. More... | |
bool | apply_variant_filter_numerical (Variant &variant, VariantFilterNumericalParams const ¶ms, VariantFilterStats &stats) |
Filter a given Variant based on the numerical properties of the counts. More... | |
double | b_n (double n) |
Compute b_n , the sum of squared reciprocals. More... | |
double | beta_star (double n) |
Compute beta* according to Achaz 2008 and Kofler et al. 2011. More... | |
genesis::utils::JsonDocument | cathedral_plot_parameters_to_json_document (CathedralPlotParameters const ¶meters) |
Get a user-readable description of a CathedralPlotParameters as a JsonDocument. More... | |
genesis::utils::JsonDocument | cathedral_plot_record_to_json_document (CathedralPlotRecord const &record) |
Get a user-readable description of the data of a CathedralPlotRecord as a JsonDocument. More... | |
double | cathedral_window_width (CathedralPlotRecord const &record, size_t row) |
Compute the window width for a row in a cathedral plot. More... | |
CathedralWindowWidthMethod | cathedral_window_width_method_from_string (std::string const &method) |
Helper function to return a CathedralWindowWidthMethod from its textual representation. More... | |
std::string | cathedral_window_width_method_to_string (CathedralWindowWidthMethod method) |
Helper function to return a textual representation of the method . More... | |
template<class Record , class Accumulator > | |
void | compute_cathedral_matrix (CathedralPlotParameters const ¶meters, Record &record, Accumulator accumulator=Accumulator{}) |
Template function to compute the value matrix for a cathedral plot, given a recored with plot parameters and per-position data to accumulate per window. More... | |
void | compute_fst_cathedral_matrix (CathedralPlotParameters const ¶meters, FstCathedralPlotRecord &record) |
Compute the matrix of values that represents the cathedral plot for FST. More... | |
std::vector< FstCathedralPlotRecord > | compute_fst_cathedral_records (VariantInputStream &iterator, FstPoolProcessor &processor, FstPoolCalculatorUnbiased::Estimator fst_estimator, std::vector< std::string > const &sample_names=std::vector< std::string >{}, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict=nullptr) |
Compute the components of per-position FST data for all pairs of samples in the given processor , for the chromosomes in the given input iterator . More... | |
std::vector< FstCathedralPlotRecord > | compute_fst_cathedral_records_for_chromosome (VariantInputStream::Iterator &iterator, FstPoolProcessor &processor, FstPoolCalculatorUnbiased::Estimator fst_estimator, std::vector< std::string > const &sample_names=std::vector< std::string >{}, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict=nullptr) |
Compute the components of per-position FST data for all pairs of samples in the given processor , for the current chromosome in the given input iterator . More... | |
std::pair< char, double > | consensus (SampleCounts const &sample) |
Consensus character for a SampleCounts, and its confidence. More... | |
std::pair< char, double > | consensus (SampleCounts const &sample, bool is_covered) |
Consensus character for a SampleCounts, and its confidence. More... | |
SampleCounts | convert_to_sample_counts (SimplePileupReader::Sample const &sample, unsigned char min_phred_score) |
Variant | convert_to_variant (SimplePileupReader::Record const &record, unsigned char min_phred_score) |
Variant | convert_to_variant_as_individuals (VcfRecord const &record, bool use_allelic_depth=false) |
Convert a VcfRecord to a Variant, treating each sample as an individual, and combining them all into one SampleCounts sample. More... | |
Variant | convert_to_variant_as_pool (VcfRecord const &record) |
Convert a VcfRecord to a Variant, treating each sample column as a pool of individuals. More... | |
void | convert_to_variant_as_pool_set_missing_gt_ (VcfRecord const &record, Variant &variant) |
Local helper function that sets the filter status of a Variant and its samples to missing depending on whether the genotypes of the samples are missing or not. More... | |
void | convert_to_variant_as_pool_tally_bases_ (VcfRecord const &record, std::pair< std::array< char, 6 >, size_t > const &snp_chars, VcfFormatIteratorInt const &sample_ad, SampleCounts &sample) |
Local helper function to tally up the bases form a VcfRecord into a SampleCounts. More... | |
template<class ForwardIterator1 , class ForwardIterator2 > | |
double | f_st_pool_karlsson (ForwardIterator1 p1_begin, ForwardIterator1 p1_end, ForwardIterator2 p2_begin, ForwardIterator2 p2_end, bool only_passing_samples=true) |
Compute the F_ST statistic for pool-sequenced data of Karlsson et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
template<class ForwardIterator1 , class ForwardIterator2 > | |
double | f_st_pool_kofler (size_t p1_poolsize, size_t p2_poolsize, ForwardIterator1 p1_begin, ForwardIterator1 p1_end, ForwardIterator2 p2_begin, ForwardIterator2 p2_end, bool only_passing_samples=true) |
Compute the F_ST statistic for pool-sequenced data of Kofler et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
template<class ForwardIterator1 , class ForwardIterator2 > | |
std::pair< double, double > | f_st_pool_unbiased (size_t p1_poolsize, size_t p2_poolsize, ForwardIterator1 p1_begin, ForwardIterator1 p1_end, ForwardIterator2 p2_begin, ForwardIterator2 p2_end, bool only_passing_samples=true) |
Compute our unbiased F_ST statistic for pool-sequenced data for two ranges of SampleCountss. More... | |
double | f_star (double a_n, double n) |
Compute f* according to Achaz 2008 and Kofler et al. 2011. More... | |
void | fill_fst_cathedral_records_from_processor_ (FstPoolProcessor const &processor, std::vector< FstCathedralPlotRecord > &records, size_t position) |
genesis::utils::JsonDocument | fst_cathedral_plot_record_to_json_document (FstCathedralPlotRecord const &record) |
Get a user-readable description of the data of a FstCathedralPlotRecord as a JsonDocument. More... | |
std::vector< std::pair< std::string, std::string > > | fst_pool_processor_sample_names (FstPoolProcessor const &processor, std::vector< std::string > const &sample_names) |
Return a list of sample name pairs for each calculator in an FstPoolProcessor. More... | |
FstPoolCalculatorUnbiased::Estimator | fst_pool_unbiased_estimator_from_string (std::string const &str) |
std::string | fst_pool_unbiased_estimator_to_string (FstPoolCalculatorUnbiased::Estimator estimator) |
GenomeLocusSet | genome_locus_set_from_vcf_file (std::string const &file) |
Read a VCF file, and use its positions to create a GenomeLocusSet. More... | |
GenomeRegionList | genome_region_list_from_vcf_file (std::string const &file) |
Read a VCF file, and use its positions to create a GenomeRegionList. More... | |
void | genome_region_list_from_vcf_file (std::string const &file, GenomeRegionList &target) |
Read a VCF file, and add its positions to an existing GenomeRegionList. More... | |
SampleCounts::size_type | get_base_count (SampleCounts const &sample, char base) |
Get the count for a base given as a char. More... | |
std::pair< std::array< char, 6 >, size_t > | get_vcf_record_snp_ref_alt_chars_ (VcfRecord const &record) |
Local helper function that returns the REF and ALT chars of a VcfRecord for SNPs. More... | |
template<class D > | |
size_t | get_window_length (BaseWindow< D > const &window) |
Get the length of a given Window. More... | |
template<class D > | |
size_t | get_window_provided_loci_count (BaseWindow< D > const &window, std::shared_ptr< GenomeLocusSet > provided_loci) |
Get the count of provided loci in a window. More... | |
char | guess_alternative_base (Variant const &variant, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the alternative base of a Variant. More... | |
void | guess_and_set_ref_and_alt_bases (Variant &variant, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference and alternative bases for a Variant, and set them. More... | |
void | guess_and_set_ref_and_alt_bases (Variant &variant, char ref_base, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference base. More... | |
void | guess_and_set_ref_and_alt_bases (Variant &variant, genesis::sequence::ReferenceGenome const &ref_genome, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference genme to obtain the base. More... | |
genesis::sequence::QualityEncoding | guess_pileup_quality_encoding (std::shared_ptr< utils::BaseInputSource > source, size_t max_lines=0) |
Guess the quality score encoding for an (m)pileup input, based on counts of how often each char appeared in the quality string (of the input pileup file for example). More... | |
char | guess_reference_base (Variant const &variant, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference base of a Variant. More... | |
double | heterozygosity (SampleCounts const &sample, bool with_bessel=false) |
Compute classic heterozygosity. More... | |
bool | is_covered (GenomeLocusSet const &loci, std::string const &chromosome, size_t position) |
Test whether the chromosome/position is within a given GenomeLocusSet . More... | |
template<class T > | |
bool | is_covered (GenomeLocusSet const &loci, T const &locus) |
Test whether the chromosome/position of a locus is within a given GenomeLocusSet . More... | |
bool | is_covered (GenomeLocusSet const &loci, VcfRecord const &variant) |
bool | is_covered (GenomeRegion const ®ion, std::string const &chromosome, size_t position) |
Test whether the chromosome/position is within a given genomic region . More... | |
template<class T > | |
bool | is_covered (GenomeRegion const ®ion, T const &locus) |
Test whether the chromosome/position of a locus is within a given genomic region . More... | |
bool | is_covered (GenomeRegion const ®ion, VcfRecord const &variant) |
bool | is_covered (GenomeRegionList const ®ions, std::string const &chromosome, size_t position) |
Test whether the chromosome/position is within a given list of genomic regions . More... | |
template<class T > | |
bool | is_covered (GenomeRegionList const ®ions, T const &locus) |
Test whether the chromosome/position of a locus is within a given list of genomic regions . More... | |
bool | is_covered (GenomeRegionList const ®ions, VcfRecord const &variant) |
constexpr bool | is_valid_base (char c) |
Return whether a given base is in ACGT , case insensitive. More... | |
constexpr bool | is_valid_base_or_n (char c) |
Return whether a given base is in ACGTN , case insensitive. More... | |
std::pair< genesis::utils::JsonDocument, genesis::utils::Matrix< double > > | load_cathedral_plot_record_components_from_files (std::string const &base_path) |
Load the parts of a cathedral plot from a set of files. More... | |
CathedralPlotRecord | load_cathedral_plot_record_from_files (std::string const &base_path) |
Load the record of a cathedral plot from a set of files. More... | |
int | locus_compare (GenomeLocus const &l, GenomeLocus const &r) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
bool | locus_equal (GenomeLocus const &l, GenomeLocus const &r) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | locus_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | locus_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | locus_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Equality comparison (== ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, GenomeLocus const &r) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
bool | locus_greater (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Greater than comparison (> ) for two loci in a genome. More... | |
bool | locus_greater (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, GenomeLocus const &r) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
bool | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
bool | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
bool | locus_inequal (GenomeLocus const &l, GenomeLocus const &r) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | locus_inequal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | locus_inequal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | locus_inequal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Inequality comparison (!= ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, GenomeLocus const &r) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
bool | locus_less (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Less than comparison (< ) for two loci in a genome. More... | |
bool | locus_less (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, GenomeLocus const &r) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
bool | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
bool | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
genesis::utils::Matrix< genesis::utils::Color > | make_cathedral_plot_heatmap (CathedralPlotRecord const &record, genesis::utils::HeatmapParameters const &heatmap_parameters) |
Make a cathedral plot heat map as a color matrix. More... | |
genesis::utils::SvgDocument | make_cathedral_plot_svg (CathedralPlotRecord const &record, genesis::utils::HeatmapParameters const &heatmap_parameters) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes. More... | |
genesis::utils::SvgDocument | make_cathedral_plot_svg (CathedralPlotRecord const &record, genesis::utils::HeatmapParameters const &heatmap_parameters, genesis::utils::Matrix< genesis::utils::Color > const &image) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
ChromosomeWindowStream< InputStreamIterator, DataType > | make_chromosome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator > | |
ChromosomeWindowStream< InputStreamIterator > | make_default_chromosome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, for a default use case. More... | |
template<class InputStreamIterator > | |
GenomeWindowStream< InputStreamIterator > | make_default_genome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a GenomeWindowStream for the whole genome, for a default use case. More... | |
template<class InputStreamIterator > | |
IntervalWindowStream< InputStreamIterator > | make_default_interval_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t width=0, size_t stride=0) |
Helper function to instantiate a IntervalWindowStream for a default use case. More... | |
template<class InputStreamIterator > | |
PositionWindowStream< InputStreamIterator > | make_default_position_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, for a default use case. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_position_window_view_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper class that creates a PositionWindowStream with default functors and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
QueueWindowStream< InputStreamIterator > | make_default_queue_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t count=0, size_t stride=0) |
Helper function to instantiate a QueueWindowStream for a default use case. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_queue_window_view_stream (InputStreamIterator begin, InputStreamIterator end, size_t count, size_t stride=0) |
Helper class that creates a QueueWindowStream with default functors and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
RegionWindowStream< InputStreamIterator > | make_default_region_window_stream (InputStreamIterator begin, InputStreamIterator end, std::shared_ptr< GenomeRegionList > region_list) |
Helper function to instantiate a RegionWindowStream for a default use case. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_region_window_view_stream (InputStreamIterator begin, InputStreamIterator end, std::shared_ptr< GenomeRegionList > region_list) |
Helper class that creates a RegionWindowStream and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_sliding_interval_window_view_stream (InputStreamIterator begin, InputStreamIterator end, size_t width=0, size_t stride=0) |
Helper class that creates a IntervalWindowStream and wraps it in a WindowViewStream. More... | |
DiversityPoolProcessor | make_diversity_pool_processor (WindowAveragePolicy window_average_policy, DiversityPoolSettings const &settings, std::vector< size_t > const &pool_sizes) |
Create an DiversityPoolProcessor to compute diversity for all samples. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (size_t index, std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for one-to-all FST computation between one sample and all others. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (size_t index_1, size_t index_2, std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for one-to-one FST computation between two samples. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for all-to-all computation of FST between all pairs of samples. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (std::vector< std::pair< size_t, size_t >> const &sample_pairs, std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for computation of FST between specific pairs of samples. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
GenomeWindowStream< InputStreamIterator, DataType > | make_genome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a GenomeWindowStream for the whole genome, without the need to specify the template parameters manually. More... | |
template<class T , class R > | |
std::shared_ptr< T > | make_input_stream_with_sample_filter_ (std::string const &filename, R const &reader, std::vector< size_t > const &sample_indices, bool inverse_sample_indices, std::vector< bool > const &sample_filter) |
Local helper function template that takes care of intilizing an input stream, and setting the sample filters, for those streams for which we do not know the number of samples prior to starting the file iteration. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
IntervalWindowStream< InputStreamIterator, DataType > | make_interval_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t width=0, size_t stride=0) |
Helper function to instantiate a IntervalWindowStream without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator > | |
PositionWindowStream< InputStreamIterator > | make_passing_variant_position_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a PositionWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_passing_variant_position_window_view_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper class that creates a PositionWindowStream with default functions for Variant data, and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
QueueWindowStream< InputStreamIterator > | make_passing_variant_queue_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t count=0, size_t stride=0) |
Helper function to instantiate a QueueWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_passing_variant_queue_window_view_stream (InputStreamIterator begin, InputStreamIterator end, size_t count, size_t stride=0) |
Helper class that creates a QueueWindowStream with default functions for Variant data, and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
PositionWindowStream< InputStreamIterator, DataType > | make_position_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
QueueWindowStream< InputStreamIterator, DataType > | make_queue_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t count=0, size_t stride=0) |
Helper function to instantiate a QueueWindowStream without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
RegionWindowStream< InputStreamIterator, DataType > | make_region_window_stream (InputStreamIterator begin, InputStreamIterator end, std::shared_ptr< GenomeRegionList > region_list) |
Helper function to instantiate a RegionWindowStream without the need to specify the template parameters manually. More... | |
template<class GenomeMaskType > | |
std::function< void(Variant &)> | make_sample_counts_filter_by_region_tagging (std::vector< std::shared_ptr< GenomeMaskType >> const &sample_masks, SampleCountsFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream on a Variant to filter its SampleCounts by genome regions, by tagging non-covered positions with the given tag . More... | |
std::function< void(Variant &)> | make_sample_counts_filter_numerical_tagging (SampleCountsFilterNumericalParams const ¶ms, bool all_need_pass=false) |
Return a functional to numerically filter the SampleCounts samples in a Variant tagging the ones that do not pass the filters, and potentially tagging the Variant. More... | |
std::function< void(Variant &)> | make_sample_counts_filter_numerical_tagging (SampleCountsFilterNumericalParams const ¶ms, VariantFilterStats &variant_stats, SampleCountsFilterStats &sample_count_stats, bool all_need_pass=false) |
std::vector< bool > | make_sample_name_filter (std::vector< std::string > const &sample_names, std::vector< std::string > const &names_filter, bool inverse_filter=false) |
Create a filter for samples, indicating which to keep. More... | |
std::vector< std::string > | make_sample_name_list_ (std::string const &source_name, size_t size) |
Local helper to fill the sample names of file formats without sample names. More... | |
std::function< bool(Variant const &)> | make_variant_filter_by_region_excluding (GenomeRegion const ®ion, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a genome region, by excluding non-covered positions from the stream. More... | |
std::function< bool(Variant const &)> | make_variant_filter_by_region_excluding (std::shared_ptr< GenomeLocusSet > loci, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream. More... | |
std::function< bool(Variant const &)> | make_variant_filter_by_region_excluding (std::shared_ptr< GenomeRegionList > regions, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream. More... | |
std::function< void(Variant &)> | make_variant_filter_by_region_tagging (GenomeRegion const ®ion, VariantFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a genome region, by tagging non-covered positions with the given tag . More... | |
std::function< void(Variant &)> | make_variant_filter_by_region_tagging (std::shared_ptr< GenomeLocusSet > loci, VariantFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag . More... | |
std::function< void(Variant &)> | make_variant_filter_by_region_tagging (std::shared_ptr< GenomeRegionList > regions, VariantFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag . More... | |
std::function< bool(Variant &)> | make_variant_filter_numerical_excluding (VariantFilterNumericalParams const ¶ms) |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters. More... | |
std::function< bool(Variant &)> | make_variant_filter_numerical_excluding (VariantFilterNumericalParams const ¶ms, VariantFilterStats &stats) |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (SampleCountsFilterNumericalParams const &sample_count_params, VariantFilterNumericalParams const &variant_params, bool all_need_pass=false) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (SampleCountsFilterNumericalParams const &sample_count_params, VariantFilterNumericalParams const &variant_params, VariantFilterStats &variant_stats, SampleCountsFilterStats &sample_count_stats, bool all_need_pass=false) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (VariantFilterNumericalParams const ¶ms) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (VariantFilterNumericalParams const ¶ms, VariantFilterStats &stats) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr< GenomeLocusSet > genome_locus_set) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::ReferenceGenome > ref_genome) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::ReferenceGenome > ref_genome, std::shared_ptr< GenomeLocusSet > genome_locus_set) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::SequenceDict > seq_dict) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::SequenceDict > seq_dict, std::shared_ptr< GenomeLocusSet > genome_locus_set) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_input_stream_from_frequency_table_file (std::string const &filename, char separator_char='\t', FrequencyTableInputStream const &reader=FrequencyTableInputStream{}) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_frequency_table_file (std::string const &filename, std::vector< std::string > const &sample_names_filter, bool inverse_sample_names_filter=false, char separator_char='\t', FrequencyTableInputStream const &reader=FrequencyTableInputStream{}) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_individual_vcf_file (std::string const &filename, VariantInputStreamFromVcfParams const ¶ms=VariantInputStreamFromVcfParams{}, bool use_allelic_depth=false) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as an individual, and combining them all into one SampleCounts sample. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file (std::string const &filename, SimplePileupReader const &reader=SimplePileupReader{}) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file (std::string const &filename, std::vector< bool > const &sample_filter, SimplePileupReader const &reader=SimplePileupReader{}) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file (std::string const &filename, std::vector< size_t > const &sample_indices, bool inverse_sample_indices=false, SimplePileupReader const &reader=SimplePileupReader{}) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file_ (std::string const &filename, SimplePileupReader const &reader, std::vector< size_t > const &sample_indices, bool inverse_sample_indices, std::vector< bool > const &sample_filter) |
Local helper function that takes care of the three functions below. More... | |
VariantInputStream | make_variant_input_stream_from_pool_vcf_file (std::string const &filename, VariantInputStreamFromVcfParams const ¶ms=VariantInputStreamFromVcfParams{}) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as a pool of individuals. More... | |
VariantInputStream | make_variant_input_stream_from_sam_file (std::string const &filename, SamVariantInputStream const &reader=SamVariantInputStream{}) |
Create a VariantInputStream to iterate the contents of a SAM/BAM/CRAM file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file (std::string const &filename) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file (std::string const &filename, std::vector< bool > const &sample_filter) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file (std::string const &filename, std::vector< size_t > const &sample_indices, bool inverse_sample_indices=false) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file_ (std::string const &filename, std::vector< size_t > const &sample_indices, bool inverse_sample_indices, std::vector< bool > const &sample_filter) |
VariantInputStream | make_variant_input_stream_from_variant_gapless_input_stream (VariantGaplessInputStream const &gapless_input) |
Create a VariantInputStream that wraps a VariantGaplessInputStream. More... | |
VariantInputStream | make_variant_input_stream_from_variant_parallel_input_stream (VariantParallelInputStream const ¶llel_input, VariantParallelInputStream::JoinedVariantParams const &joined_variant_params=VariantParallelInputStream::JoinedVariantParams{}) |
Create a VariantInputStream to iterate multiple input sources at once, using a VariantParallelInputStream. More... | |
VariantInputStream | make_variant_input_stream_from_vcf_file_ (std::string const &filename, VariantInputStreamFromVcfParams const ¶ms, bool pool_samples, bool use_allelic_depth) |
Local helper function that takes care of both main functions below. More... | |
VariantInputStream | make_variant_input_stream_from_vector (std::vector< Variant > const &variants) |
Create a VariantInputStream to iterate the contents of std::vector containing Variants. More... | |
std::function< void(Variant &)> | make_variant_input_stream_sample_name_filter_transform (std::vector< bool > const &sample_filter) |
std::function< void(Variant &)> | make_variant_input_stream_sample_subsampling_transform (size_t max_depth, SubsamplingMethod method) |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_length_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict) |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_order_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict, bool check_sequence_lengths) |
VariantInputStream | make_variant_merging_input_stream (VariantInputStream const &input, std::unordered_map< std::string, std::string > const &sample_name_to_group, bool allow_ungrouped_samples=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Create a VariantInputStream that merges samples from its underlying input . More... | |
VariantMergeGroupAssignment | make_variant_merging_input_stream_group_assignment_ (VariantInputStream const &variant_input, std::unordered_map< std::string, std::string > const &sample_name_to_group, bool allow_ungrouped_samples) |
Helper function to create a mapping from sample indices to group indices. More... | |
template<class T > | |
WindowViewStream< typename T::InputStreamType, typename T::DataType > | make_window_view_stream (T &&window_iterator) |
Create a WindowViewStream that iterates some underlying BaseWindowStream. More... | |
template<class T > | |
WindowViewStream< typename T::InputStreamType, typename T::DataType > | make_window_view_stream (T const &window_iterator) |
Create a WindowViewStream that iterates some underlying BaseWindowStream. More... | |
SampleCounts | merge (SampleCounts const &p1, SampleCounts const &p2) |
Merge the counts of two SampleCountss. More... | |
SampleCounts | merge (std::vector< SampleCounts > const &p, SampleCountsFilterPolicy filter_policy) |
Merge the counts of a vector SampleCountss. More... | |
void | merge_inplace (SampleCounts &p1, SampleCounts const &p2) |
Merge the counts of two SampleCountss, by adding the counts of the second (p2 ) to the first (p1 ). More... | |
SampleCounts | merge_sample_counts (Variant const &v, SampleCountsFilterPolicy filter_policy) |
Merge the counts of a vector SampleCountss. More... | |
double | n_base (size_t read_depth, size_t poolsize) |
Compute the n_base term used for Tajima's D in Kofler et al. 2011, using a faster closed form expression. More... | |
double | n_base_matrix (size_t read_depth, size_t poolsize) |
Compute the n_base term used for Tajima's D in Kofler et al. 2011, following their approach. More... | |
template<typename T > | |
std::array< size_t, 4 > | nucleotide_sorting_order (std::array< T, 4 > const &values) |
Return the sorting order of four values, for instance of the four nucleotides ACGT , in descending order (largest first). More... | |
constexpr size_t | nucleotide_sum (SampleCounts const &sample) |
Count of the pure nucleotide bases at this position, that is, the sum of all A , C , G , and T . More... | |
bool | operator!= (GenomeLocus const &l, GenomeLocus const &r) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | operator!= (GenomeRegion const &a, GenomeRegion const &b) |
Inequality comparison (!= ) for two GenomeRegions. More... | |
bool | operator< (GenomeLocus const &l, GenomeLocus const &r) |
Less than comparison (< ) for two loci in a genome. More... | |
std::ostream & | operator<< (std::ostream &os, GenomeLocus const &locus) |
std::ostream & | operator<< (std::ostream &os, GenomeRegion const ®ion) |
std::ostream & | operator<< (std::ostream &os, SampleCounts const &bs) |
Output stream operator for SampleCounts instances. More... | |
bool | operator<= (GenomeLocus const &l, GenomeLocus const &r) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
bool | operator== (GenomeLocus const &l, GenomeLocus const &r) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | operator== (GenomeRegion const &a, GenomeRegion const &b) |
Equality comparison (!= ) for two GenomeRegions. More... | |
bool | operator> (GenomeLocus const &l, GenomeLocus const &r) |
Greater than comparison (> ) for two loci in a genome. More... | |
bool | operator>= (GenomeLocus const &l, GenomeLocus const &r) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
GenomeRegion | parse_genome_region (std::string const ®ion, bool zero_based=false, bool end_exclusive=false) |
Parse a genomic region. More... | |
GenomeRegionList | parse_genome_regions (std::string const ®ions, bool zero_based=false, bool end_exclusive=false) |
Parse a set/list of genomic regions. More... | |
genesis::utils::Matrix< double > | pij_matrix_ (size_t max_read_depth, size_t poolsize) |
genesis::utils::Matrix< double > const & | pij_matrix_resolver_ (size_t max_read_depth, size_t poolsize) |
std::vector< FstCathedralPlotRecord > | prepare_fst_cathedral_records_for_chromosome_ (std::string const &chromosome, FstPoolProcessor const &processor, FstPoolCalculatorUnbiased::Estimator fst_estimator, std::vector< std::string > const &sample_names) |
std::string | print_sample_counts_filter_category_stats (SampleCountsFilterCategoryStats const &stats, bool verbose) |
std::string | print_sample_counts_filter_category_stats (SampleCountsFilterStats const &stats, bool verbose=false) |
std::ostream & | print_sample_counts_filter_category_stats (std::ostream &os, SampleCountsFilterCategoryStats const &stats, bool verbose) |
std::ostream & | print_sample_counts_filter_category_stats (std::ostream &os, SampleCountsFilterStats const &stats, bool verbose=false) |
std::string | print_sample_counts_filter_stats (SampleCountsFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
std::ostream & | print_sample_counts_filter_stats (std::ostream &os, SampleCountsFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
std::ostream & | print_variant_filter_category_stats (std::ostream &os, VariantFilterCategoryStats const &stats, bool verbose) |
std::ostream & | print_variant_filter_category_stats (std::ostream &os, VariantFilterStats const &stats, bool verbose=false) |
std::string | print_variant_filter_category_stats (VariantFilterCategoryStats const &stats, bool verbose) |
std::string | print_variant_filter_category_stats (VariantFilterStats const &stats, bool verbose=false) |
std::ostream & | print_variant_filter_stats (std::ostream &os, VariantFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
std::string | print_variant_filter_stats (VariantFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
GenomeLocusSet | read_mask_fasta (std::shared_ptr< utils::BaseInputSource > source, size_t mask_min=0, bool invert=false) |
Read an input source as a mask fasta file, and return its content as a GenomeLocusSet. More... | |
genesis::sequence::SequenceDict | reference_locus_set_to_dict (GenomeLocusSet const &set) |
void | resample_counts (SampleCounts &sample, size_t target_depth) |
Resample all counts in a SampleCounts sample to a new target_depth . More... | |
void | resample_counts (Variant &variant, size_t target_depth) |
Resample all counts in a SampleCounts sample to a new target_depth . More... | |
template<typename Distribution > | |
void | resample_counts_ (SampleCounts &sample, size_t max_depth, Distribution distribution, bool skip_if_below_target_depth) |
Local helper function to avoid code duplication. Takes the distribution (with or without replacement) and performs the resampling of base counts. More... | |
void | rescale_counts (SampleCounts &sample, size_t target_depth) |
Transform a SampleCounts sample by re-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max if max_depth is exceeded for the sample. More... | |
void | rescale_counts (Variant &variant, size_t target_depth) |
Transform a SampleCounts sample by re-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max if max_depth is exceeded for the sample. More... | |
void | rescale_counts_ (SampleCounts &sample, size_t target_depth, bool skip_if_below_target_depth) |
template<class Data , class Accumulator = EmptyAccumulator> | |
void | run_vcf_window (SlidingWindowGenerator< Data, Accumulator > &generator, std::string const &vcf_file, std::function< Data(VcfRecord const &)> conversion, std::function< bool(VcfRecord const &)> condition={}) |
Convenience function to iterate over a whole VCF file. More... | |
std::string | sam_flag_to_string (int flags) |
Turn a set of flags for sam/bam/cram reads into their textual representation. More... | |
SampleCountsFilterCategoryStats | sample_counts_filter_stats_category_counts (SampleCountsFilterStats const &stats) |
Generate summary counts for a SampleCountsFilterStats counter. More... | |
size_t | sample_counts_filter_stats_category_counts (SampleCountsFilterStats const &stats, SampleCountsFilterTagCategory category) |
Overload that only reports back a single category sum of the filter stats. More... | |
SampleCountsFilterTagCategory | sample_counts_filter_tag_to_category (SampleCountsFilterTag tag) |
For a given tag , return its category tag. More... | |
template<typename T > | |
std::array< size_t, 6 > | sample_counts_sorting_order (std::array< T, 6 > const &v) |
Return the sorting order of six values, for instance of the four nucleotides ACGT and the N and D counts of a SampleCounts object, in descending order (largest first). More... | |
constexpr size_t | sample_counts_sum (SampleCounts const &sample) |
Sum up all the base counts at this sample , that is, the sum of all A , C , G , T , as well as the N and D count for indetermined and deleted counts. More... | |
void | save_cathedral_plot_record_to_files (CathedralPlotRecord const &record, std::string const &base_path) |
Convenience function to save the record of a cathedral plot in a set of files. More... | |
void | save_cathedral_plot_record_to_files (genesis::utils::JsonDocument const &record_document, genesis::utils::Matrix< double > const &record_value_matrix, std::string const &base_path) |
Save the record of a cathedral plot in a set of files. More... | |
void | save_cathedral_plot_record_to_targets (genesis::utils::JsonDocument const &record_document, genesis::utils::Matrix< double > const &record_value_matrix, std::shared_ptr< genesis::utils::BaseOutputTarget > json_target, std::shared_ptr< genesis::utils::BaseOutputTarget > csv_target) |
Save the record of a cathedral plot in a set of output targets. More... | |
void | set_base_count (SampleCounts &sample, char base, SampleCounts::size_type value) |
Set the count for a base given as a char. More... | |
template<> | |
void | SimplePileupReader::process_ancestral_base_< SimplePileupReader::Sample > (utils::InputStream &input_stream, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::process_quality_string_< SimplePileupReader::Sample > (utils::InputStream &input_stream, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::set_sample_read_bases_< SimplePileupReader::Sample > (std::string const &read_bases, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::set_sample_read_depth_< SimplePileupReader::Sample > (size_t read_depth, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::set_target_alternative_base_< SimplePileupReader::Record > (SimplePileupReader::Record &target) const |
std::pair< SortedSampleCounts, SortedSampleCounts > | sorted_average_sample_counts (SampleCounts const &sample_a, SampleCounts const &sample_b) |
Return the sorted base counts of both input samples, orderd by the average frequencies of the nucleotide counts in the two samples. More... | |
SortedSampleCounts | sorted_sample_counts (SampleCounts const &sample) |
Return the order of base counts (nucleotides), largest one first. More... | |
SortedSampleCounts | sorted_sample_counts (Variant const &variant, bool reference_first, SampleCountsFilterPolicy filter_policy) |
Get a list of bases sorted by their counts. More... | |
SortedSampleCounts | sorted_sample_counts_ (Variant const &variant, bool reference_first, SampleCounts const &total) |
Local helper function that takes an already computed total from merge_sample_counts(), so that it can be re-used internally here. More... | |
int | string_to_sam_flag (std::string const &value) |
Parse a string as a set of flags for sam/bam/cram reads. More... | |
void | subsample_counts_with_replacement (SampleCounts &sample, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) with replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subsample_counts_with_replacement (Variant &variant, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) with replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subsample_counts_without_replacement (SampleCounts &sample, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) without replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subsample_counts_without_replacement (Variant &variant, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) without replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subscale_counts (SampleCounts &sample, size_t max_depth) |
Transform a SampleCounts sample by sub-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max_depth if max_depth is exceeded for the sample. More... | |
void | subscale_counts (Variant &variant, size_t max_depth) |
Transform a SampleCounts sample by sub-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max_depth if max_depth is exceeded for the sample. More... | |
double | tajima_d_pool (DiversityPoolSettings const &settings, double theta_pi, double theta_watterson, size_t poolsize, double window_avg_denom, size_t empirical_min_read_depth) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | tajima_d_pool (DiversityPoolSettings const &settings, double theta_pi, double theta_watterson, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | tajima_d_pool (DiversityPoolSettings const &settings, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al. More... | |
double | tajima_d_pool_denominator (DiversityPoolSettings const &settings, double theta, size_t poolsize, double window_avg_denom, size_t empirical_min_read_depth) |
Compute the denominator for the pool-sequencing correction of Tajima's D according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | theta_pi (ForwardIterator begin, ForwardIterator end, bool with_bessel=true, bool only_passing_samples=true) |
Compute classic theta pi, that is, the sum of heterozygosities. More... | |
template<class ForwardIterator > | |
double | theta_pi_pool (DiversityPoolSettings const &settings, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute theta pi with pool-sequencing correction according to Kofler et al, that is, the sum of heterozygosities divided by the correction denominator. More... | |
double | theta_pi_pool (DiversityPoolSettings const &settings, size_t poolsize, SampleCounts const &sample) |
Compute theta pi with pool-sequencing correction according to Kofler et al, for a single SampleCounts. More... | |
double | theta_pi_pool_denominator (DiversityPoolSettings const &settings, size_t poolsize, size_t nucleotide_count) |
Compute the denominator for the pool-sequencing correction of theta pi according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | theta_pi_within_pool (size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute classic theta pi (within a population), that is, the sum of heterozygosities including Bessel's correction for total nucleotide sum at each position, and Bessel's correction for the pool size. More... | |
template<class ForwardIterator > | |
double | theta_watterson_pool (DiversityPoolSettings const &settings, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute theta watterson with pool-sequencing correction according to Kofler et al. More... | |
double | theta_watterson_pool (DiversityPoolSettings const &settings, size_t poolsize, SampleCounts const &sample) |
Compute theta watterson with pool-sequencing correction according to Kofler et al, for a single SampleCounts sample. More... | |
double | theta_watterson_pool_denominator (DiversityPoolSettings const &settings, size_t poolsize, size_t nucleotide_count) |
Compute the denominator for the pool-sequencing correction of theta watterson according to Kofler et al. More... | |
std::string | to_string (GenomeLocus const &locus) |
std::string | to_string (GenomeRegion const ®ion) |
std::ostream & | to_sync (SampleCounts const &bs, std::ostream &os, bool use_status_and_missing=true) |
Output a SampleCounts instance to a stream in the PoPoolation2 sync format. More... | |
std::ostream & | to_sync (Variant const &var, std::ostream &os, bool use_status_and_missing=true) |
Output a Variant instance to a stream in the PoPoolation2 sync format. More... | |
size_t | total_nucleotide_sum (Variant const &variant, SampleCountsFilterPolicy filter_policy) |
Count of the pure nucleotide bases at this position, that is, the sum of all A , C , G , and T . More... | |
size_t | total_sample_counts_sum (Variant const &variant, SampleCountsFilterPolicy filter_policy) |
Sum up all the base counts at this sample , that is, the sum of all A , C , G , T , as well as the N and D count for indetermined and deleted counts. More... | |
void | transform_zero_out_by_max_count (SampleCounts &sample, size_t max_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if max_count is exceeded for that nucleotide. More... | |
void | transform_zero_out_by_max_count (Variant &variant, size_t max_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if max_count is exceeded for that nucleotide. More... | |
void | transform_zero_out_by_min_count (SampleCounts &sample, size_t min_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if min_count is not reached for that nucleotide. More... | |
void | transform_zero_out_by_min_count (Variant &variant, size_t min_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if min_count is not reached for that nucleotide. More... | |
void | validate_cathedral_plot_record (CathedralPlotRecord const &record) |
Check a Cathedral Plot record for internal consistency. More... | |
VariantFilterCategoryStats | variant_filter_stats_category_counts (VariantFilterStats const &stats) |
Generate summary counts for a VariantFilterStats counter. More... | |
size_t | variant_filter_stats_category_counts (VariantFilterStats const &stats, VariantFilterTagCategory category) |
Overload that only reports back a single category sum of the filter stats. More... | |
VariantFilterTagCategory | variant_filter_tag_to_category (VariantFilterTag tag) |
For a given tag , return its category tag. More... | |
std::string | vcf_genotype_string (std::vector< VcfGenotype > const &genotypes) |
Return the VCF-like string representation of a set of VcfGenotype entries. More... | |
size_t | vcf_genotype_sum (std::vector< VcfGenotype > const &genotypes) |
Return the sum of genotypes for a set of VcfGenotype entries, typically used to construct a genotype matrix with entries 0,1,2. More... | |
std::string | vcf_hl_type_to_string (int hl_type) |
Internal helper function to convert htslib-internal BCF_HL_* header line type values to their string representation as used in the VCF header ("FILTER", "INFO", "FORMAT", etc). More... | |
std::string | vcf_value_special_to_string (int vl_type_num) |
std::string | vcf_value_special_to_string (VcfValueSpecial vl_type_num) |
std::string | vcf_value_type_to_string (int ht_type) |
std::string | vcf_value_type_to_string (VcfValueType ht_type) |
template<class D > | |
double | window_average_denominator (WindowAveragePolicy policy, BaseWindow< D > const &window, std::shared_ptr< GenomeLocusSet > provided_loci, VariantFilterStats const &variant_filter_stats, SampleCountsFilterStats const &sample_counts_filter_stats) |
Get the denoninator to use for averaging an estimator across a window. More... | |
Enumerations | |
enum | CathedralWindowWidthMethod { kExponential, kGeometric, kLinear } |
Interpolation algorithm for window sizes across the rows of a cathedral plot. More... | |
enum | SampleCountsFilterPolicy { kAll, kOnlyPassing } |
Policy helper to decide how to treat filtered SampleCounts. More... | |
enum | SampleCountsFilterTag : FilterStatus::IntType { kPassed = 0, kMaskedPosition, kMaskedRegion, kMissing, kNotPassed, kInvalid, kEmpty, kBelowMinReadDepth, kAboveMaxReadDepth, kAboveDeletionsCountLimit, kNotSnp, kNotBiallelicSnp, kEnd } |
enum | SampleCountsFilterTagCategory : FilterStatus::IntType { kPassed = 0, kMasked, kMissingInvalid, kNumeric, kEnd } |
List of filter categories for a SampleCounts. More... | |
enum | SlidingWindowType { kInterval, kVariants, kChromosome } |
SlidingWindowType of a Window, that is, whether we slide along a fixed size interval of the genome, along a fixed number of variants, or represents a whole chromosome. More... | |
enum | SubsamplingMethod { kSubscale, kSubsampleWithReplacement, kSubsampleWithoutReplacement } |
Select which method to use for reducing the max read depth of a SampleCounts sample or a Variant. More... | |
enum | TajimaDenominatorPolicy { kEmpiricalMinReadDepth, kProvidedMinReadDepth, kWithPopoolationBugs, kPoolsize, kUncorrected } |
Select how to compute the denominator for the pool sequencing correction of Tajima's D. More... | |
enum | VariantFilterTag : FilterStatus::IntType { kPassed = 0, kMaskedPosition, kMaskedRegion, kMissing, kNotPassed, kInvalid, kNoSamplePassed, kNotAllSamplesPassed, kEmpty, kBelowMinReadDepth, kAboveMaxReadDepth, kAboveDeletionsCountLimit, kNotSnp, kNotBiallelicSnp, kBelowSnpMinCount, kAboveSnpMaxCount, kBelowMinAlleleFreq, kEnd } |
List of filters that we apply to a Variant, to indicate whether the Variant passed or not. More... | |
enum | VariantFilterTagCategory : FilterStatus::IntType { kPassed = 0, kMasked, kMissingInvalid, kSamplesFailed, kNumeric, kInvariant, kEnd } |
List of filter categories for a Variant. More... | |
enum | VcfHeaderLine : int { kFilter = 0, kInfo = 1, kFormat = 2, kContig = 3, kStructured = 4, kGeneric = 5 } |
Specification for the values determining header line types of VCF/BCF files. More... | |
enum | VcfValueSpecial : int { kFixed = 0, kVariable = 1, kAllele = 2, kGenotype = 3, kReference = 4 } |
Specification for special markers for the number of values expected for key-value-pairs of VCF/BCF files. More... | |
enum | VcfValueType : int { kFlag = 0, kInteger = 1, kFloat = 2, kString = 3 } |
Specification for the data type of the values expected in key-value-pairs of VCF/BCF files. More... | |
enum | WindowAnchorType { kIntervalBegin, kIntervalEnd, kIntervalMidpoint, kVariantFirst, kVariantLast, kVariantMedian, kVariantMean, kVariantMidpoint } |
Position in the genome that is used for reporting when emitting or using a window. More... | |
enum | WindowAveragePolicy { kWindowLength, kAvailableLoci, kValidLoci, kValidSnps, kSum, kProvidedLoci } |
Select the method to use for computing window averages of statistic estimators. More... | |
Variables | |
std::function< void(Variant &)> | make_variant_input_stream_sample_name_filter_transform (std::vector< bool > const &sample_filter) |
Helper function to create a Variant transform to filter out samples. More... | |
std::function< void(Variant &)> | make_variant_input_stream_sample_subsampling_transform (size_t max_depth, SubsamplingMethod method=SubsamplingMethod::kSubscale) |
Create a Variant transformation function that subscales or subsamples the base counts to be below a given max_depth . More... | |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_length_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict) |
Helper function to check that some Variant input has positions that agree with those reported in a SequenceDict. More... | |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_order_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict={}, bool check_sequence_lengths=true) |
Helper function to check that some Variant input is sorted properly. More... | |
static const std::unordered_map< std::string, int > | sam_flag_name_to_int_ |
Map from sam flags to their numerical value, for different types of naming of the flags. More... | |
double a_n | ( | double | n | ) |
Compute a_n
, the sum of reciprocals.
This is the sum of reciprocals up to n-1
, which is \( a_n = \sum_{i=1}^{n-1} \frac{1}{i} \).
See Equation 3.6 in
Hahn, M. W. (2018). Molecular Population Genetics. https://global.oup.com/academic/product/molecular-population-genetics-9780878939657
for details.
Note that we are implementing this for double
n
, instead of an unsigned integer type, as some variants of the tajima_d() computation actually use n_base() to get an "effective" pool size. That is kind of wrong, but we have implemented it here for comparability with PoPoolation. In these cases, we round n
to the nearest integer first. For any actual integer numbers of pool sizes, double
has enough precision to accurately stor that integer value, so there is no loss of accuracy in those cases.
Definition at line 285 of file diversity_pool_functions.cpp.
bool genesis::population::all_finite_ | ( | FstCathedralPlotRecord::Entry const & | entry | ) |
Definition at line 49 of file fst_cathedral.cpp.
size_t allele_count | ( | SampleCounts const & | sample | ) |
Return the number of alleles, that is, of non-zero nucleotide counts of the sample
.
This looks at all four nucleotide counts (ACGT
), and returns the number of them that are non zero. The result hence is between 0 and 4, with 0 = no allele had any counts and 4 = all alleles have a non-zero count.
Definition at line 309 of file population/function/functions.cpp.
size_t allele_count | ( | SampleCounts const & | sample, |
size_t | min_count | ||
) |
Return the number of alleles, taking a min_count
into consideration, that is, we compute the number of nucleotide counts of the sample
that are at least the min_count
.
This looks at all four nucleotide counts (ACGT
), and returns the number of them that are at least the min_count
. If min_count == 0
, we instead call the allele_count(SampleCounts const&) overload of this function that does not consider minimum counts.
Definition at line 329 of file population/function/functions.cpp.
size_t allele_count | ( | SampleCounts const & | sample, |
size_t | min_count, | ||
size_t | max_count | ||
) |
Return the number of alleles, taking a min_count
and max_count
into consideration, that is, we compute the number of nucleotide counts of the sample
that are at least min_count
and at most max_count
.
This looks at all four nucleotide counts (ACGT
), and returns the number of them that are at least the min_count
and at most max_count
. If either of them is zero, they are not taken into account though.
Definition at line 351 of file population/function/functions.cpp.
double alpha_star | ( | double | n | ) |
Compute alpha*
according to Achaz 2008 and Kofler et al. 2011.
This is needed for the computation of tajima_d_pool() according to
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
The equation is based on
G. Achaz.
Testing for neutrality in samples with sequencing errors.
(2008) Genetics, 179(3), 1409–1424. https://doi.org/10.1534/genetics.107.082198
See there for details.
Definition at line 330 of file diversity_pool_functions.cpp.
double genesis::population::amnm_ | ( | size_t | poolsize, |
size_t | nucleotide_count, | ||
size_t | allele_frequency | ||
) |
Local helper function to compute values for the denominator.
This computes the sum over all r poolsizes of 1/r times a binomial:
\( \sum_{m=b}^{C-b} \frac{1}{k} {C \choose m} \left(\frac{k}{n}\right)^m \left(\frac{n-k}{n}\right)^{C-m} \)
This is needed in the pool seq correction denoinators of Theta Pi and Theta Watterson.
Definition at line 65 of file diversity_pool_functions.cpp.
size_t genesis::population::anchor_position | ( | BaseWindow< D > const & | window, |
WindowAnchorType | anchor_type = WindowAnchorType::kIntervalBegin |
||
) |
Get the position in the chromosome reported according to a specific WindowAnchorType.
This overload accepts both Window and WindowView, and dispatches as needed. For WindowView, only interval-based anchor types are available. Furthermore, Window has an additional template parameter A
, which we need to ignore here to fit the BaseWindow signature. Hence, when using a Window with a non-defaulted A
template parameter, the dispatch cannot bet done with this function.
Definition at line 157 of file population/window/functions.hpp.
size_t genesis::population::anchor_position | ( | Window< D, A > const & | window, |
WindowAnchorType | anchor_type = WindowAnchorType::kIntervalBegin |
||
) |
Get the position in the chromosome reported according to a specific WindowAnchorType.
When a window is filled with data, we need to report the position in the genome at which the window is. There are several ways that this position can be computed. Typically, just the first position of the window is used (that is, for an interval, the beginning of the interval, and for variants, the position of the first variant).
However, it might be desirable to report a different position, for example when plotting the results. When using WindowType::kVariants for example, one might want to plot the values computed per window at the midpoint genome position of the variants in that window.
Definition at line 82 of file population/window/functions.hpp.
bool apply_sample_counts_filter_numerical | ( | SampleCounts & | sample, |
SampleCountsFilterNumericalParams const & | params | ||
) |
Filter a given SampleCounts based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the SampleCounts::status to the corresponding SampleCountsFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
This overload simply omits the incrementing of the SampleCountsFilterStats counter.
Definition at line 217 of file sample_counts_filter_numerical.cpp.
bool apply_sample_counts_filter_numerical | ( | SampleCounts & | sample, |
SampleCountsFilterNumericalParams const & | params, | ||
SampleCountsFilterStats & | stats | ||
) |
Filter a given SampleCounts based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the SampleCounts::status to the corresponding SampleCountsFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
Definition at line 115 of file sample_counts_filter_numerical.cpp.
bool apply_sample_counts_filter_numerical | ( | Variant & | variant, |
SampleCountsFilterNumericalParams const & | params, | ||
bool | all_need_pass = false |
||
) |
This overload simply omits the incrementing of the SampleCountsFilterStats counter.
Definition at line 277 of file sample_counts_filter_numerical.cpp.
bool apply_sample_counts_filter_numerical | ( | Variant & | variant, |
SampleCountsFilterNumericalParams const & | params, | ||
VariantFilterStats & | variant_stats, | ||
SampleCountsFilterStats & | sample_count_stats, | ||
bool | all_need_pass = false |
||
) |
Filter a given SampleCounts based on the numerical properties of the counts.
This function applies the version of this function for SampleCounts to all Variant::samples. If all_need_pass
is set, the function returns true
iff all individual samples passed all filters, and false
otherwise, and sets the Variant::status to VariantFilterTag::kNotAllSamplesPassed. If all_need_pass
is not set, the function returns true
if any sample passed the filters. In either case, all samples of the variant
are always processed (no short-circuit, as we want all of them to have the count transformations applied to them). If all of them fail the filter settings, the Variant::status is set to VariantFilterTag::kNoSamplePassed, independently of all_need_pass
.
Definition at line 229 of file sample_counts_filter_numerical.cpp.
bool apply_variant_filter_numerical | ( | Variant & | variant, |
VariantFilterNumericalParams const & | params | ||
) |
Filter a given Variant based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the Variant::status to the corresponding VariantFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
This overload simply omits the incrementing of the VariantFilterStats counter.
Definition at line 232 of file variant_filter_numerical.cpp.
bool apply_variant_filter_numerical | ( | Variant & | variant, |
VariantFilterNumericalParams const & | params, | ||
VariantFilterStats & | stats | ||
) |
Filter a given Variant based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the Variant::status to the corresponding VariantFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
Definition at line 50 of file variant_filter_numerical.cpp.
double b_n | ( | double | n | ) |
Compute b_n
, the sum of squared reciprocals.
This is the sum of squared reciprocals up to n-1
, which is \( b_n = \sum_{i=1}^{n-1} \frac{1}{i^2} \).
See
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
for details. The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
See also tne note in a_n() about the usage of double
here for the argument.
Definition at line 307 of file diversity_pool_functions.cpp.
double beta_star | ( | double | n | ) |
Compute beta*
according to Achaz 2008 and Kofler et al. 2011.
This is needed for the computation of tajima_d_pool() according to
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
The equation is based on
G. Achaz.
Testing for neutrality in samples with sequencing errors.
(2008) Genetics, 179(3), 1409–1424. https://doi.org/10.1534/genetics.107.082198
See there for details.
Definition at line 358 of file diversity_pool_functions.cpp.
genesis::utils::JsonDocument cathedral_plot_parameters_to_json_document | ( | CathedralPlotParameters const & | parameters | ) |
Get a user-readable description of a CathedralPlotParameters as a JsonDocument.
Definition at line 173 of file cathedral_plot.cpp.
genesis::utils::JsonDocument cathedral_plot_record_to_json_document | ( | CathedralPlotRecord const & | record | ) |
Get a user-readable description of the data of a CathedralPlotRecord as a JsonDocument.
This is meant for user output, so that cathedral plots can be generated from a data matrix, without having to recompute the matrix.
Definition at line 196 of file cathedral_plot.cpp.
double cathedral_window_width | ( | CathedralPlotRecord const & | record, |
size_t | row | ||
) |
Compute the window width for a row in a cathedral plot.
This uses the chromosome length and the intended plot dimensions to compute window widths where the first row of the image has a width corresponding to the whole image width, the last row has a window width corresponding to a single pixel, and the rows in between are interpolated using one of the CathedralWindowWidthMethod methods.
Definition at line 86 of file cathedral_plot.cpp.
CathedralWindowWidthMethod cathedral_window_width_method_from_string | ( | std::string const & | method | ) |
Helper function to return a CathedralWindowWidthMethod from its textual representation.
Definition at line 152 of file cathedral_plot.cpp.
std::string cathedral_window_width_method_to_string | ( | CathedralWindowWidthMethod | method | ) |
Helper function to return a textual representation of the method
.
Definition at line 136 of file cathedral_plot.cpp.
void genesis::population::compute_cathedral_matrix | ( | CathedralPlotParameters const & | parameters, |
Record & | record, | ||
Accumulator | accumulator = Accumulator{} |
||
) |
Template function to compute the value matrix for a cathedral plot, given a recored with plot parameters and per-position data to accumulate per window.
The function computes the accumulated values across windows for each pixel in a cathedral plot, which can then be visualized as a heat map.
The function expects a cathedral plot record
, containing data needed to compute the values per pixel. It expects record
to contain an iterable container std::vector<Entry> entries
whose contained elements have a member position
, and also contain the data that is needed by the accumulator
. See FstCathedralPlotRecord for an example.
The accumulator
needs to have functions accumulate()
and dissipate()
that each take an element of the record
entries. These functions are meant to accumulate values, and then un-do this again, which is what we use to speed up the computation here. Also, the accumulator
needs to have a aggregate()
function that uses the currently accumulated data to compute the value for a given window. See FstCathedralAccumulator for an example. We take this as an (optional) argument, so that it can be set up with other parameters as needed.
record
for that case. Definition at line 197 of file cathedral_plot.hpp.
|
inline |
Compute the matrix of values that represents the cathedral plot for FST.
This is merely a shortcut to call compute_cathedral_matrix() with the arguments for a cathedral plot of FST, using the result of compute_fst_cathedral_records(). The returned matrix can then be plotted as a heatmap.
Definition at line 216 of file fst_cathedral.hpp.
std::vector< FstCathedralPlotRecord > compute_fst_cathedral_records | ( | VariantInputStream & | iterator, |
FstPoolProcessor & | processor, | ||
FstPoolCalculatorUnbiased::Estimator | fst_estimator, | ||
std::vector< std::string > const & | sample_names = std::vector< std::string >{} , |
||
std::shared_ptr< genesis::sequence::SequenceDict > const & | sequence_dict = nullptr |
||
) |
Compute the components of per-position FST data for all pairs of samples in the given processor
, for the chromosomes in the given input iterator
.
The result contains entries for all pairs of samples and all chromosomes, in one vector. This is a convenience function that calls compute_fst_cathedral_records_for_chromosome() for each chromosome. We however do not recommend this for larger datasets, as the resulting data can be quite memory-intense. It might hence be better to use this per-chromosome function instead, and process the returned data before starting with the next chromosome.
Definition at line 275 of file fst_cathedral.cpp.
std::vector< FstCathedralPlotRecord > compute_fst_cathedral_records_for_chromosome | ( | VariantInputStream::Iterator & | iterator, |
FstPoolProcessor & | processor, | ||
FstPoolCalculatorUnbiased::Estimator | fst_estimator, | ||
std::vector< std::string > const & | sample_names = std::vector< std::string >{} , |
||
std::shared_ptr< genesis::sequence::SequenceDict > const & | sequence_dict = nullptr |
||
) |
Compute the components of per-position FST data for all pairs of samples in the given processor
, for the current chromosome in the given input iterator
.
The result contains entries for all pairs of samples. The computation starts at the current position in iterator
, uses that chromosome, and iterates until its end or until the next chromosome is found, and stops there. See compute_fst_cathedral_records() for a helper function that does this for all chromosomes in the input.
This expects the processor to only contain FstPoolCalculatorUnbiased calculators, as those are the only ones for which we can compute cathedral plots with our current implementation.
If given sample_names
, we use those to set the sample names in the resulting FstCathedralPlotRecord objects, so that downstream we can keep track of them.
If given a sequence_dict
, we use the information in there to set the chromosome length; otherwise, we use the last position found in the data for that.
Definition at line 219 of file fst_cathedral.cpp.
std::pair< char, double > consensus | ( | SampleCounts const & | sample | ) |
Consensus character for a SampleCounts, and its confidence.
This is simply the character (out of ACGT
) that appears most often (or, for ties, the lexicographically smallest character), unless all of (A
, C
, G
, T
) are zero, in which case the consensus character is N
. The confidence is the count of the consensus character, divided by the total count of all four nucleotides.
Definition at line 428 of file population/function/functions.cpp.
std::pair< char, double > consensus | ( | SampleCounts const & | sample, |
bool | is_covered | ||
) |
Consensus character for a SampleCounts, and its confidence.
This is simply the character (out of ACGT
) that appears most often (or, for ties, the lexicographically smallest character). If is_covered
is false (meaning, the position is not well covered by reads), the consensus character is N
. The confidence is the count of the consensus character, divided by the total count of all four nucleotides.
Definition at line 469 of file population/function/functions.cpp.
SampleCounts convert_to_sample_counts | ( | SimplePileupReader::Sample const & | sample, |
unsigned char | min_phred_score | ||
) |
Definition at line 46 of file simple_pileup_common.cpp.
Variant convert_to_variant | ( | SimplePileupReader::Record const & | record, |
unsigned char | min_phred_score | ||
) |
Definition at line 146 of file simple_pileup_common.cpp.
Variant convert_to_variant_as_individuals | ( | VcfRecord const & | record, |
bool | use_allelic_depth = false |
||
) |
Convert a VcfRecord to a Variant, treating each sample as an individual, and combining them all into one SampleCounts sample.
In this function, we assume that the data that was used to create the VCF file was the typical use case of VCF, where each sample (column) in the file corresponds to an individual. When using this function, all samples (individuals) are combined into one, as our targeted output type Variant is used to describe allele counts of several individual (e.g., in a pool). As all columns are combined, the resulting Variant only contains a single SampleCounts object. We only consider biallelic SNP positions here.
We offer two ways of combining the samples (columns) of the input VCF record into the SampleCounts:
use_allelic_depth
is false
(default), individuals simply contribute to the SampleCounts according to their polidy. That is, an individual with genotype A/T
will contribute one count each for A
and T
.use_allelic_depth
is true
instead, we use the "AD" FORMAT field instead, to obtain the actual counts for the reference and alterantive allele, and use these to sum up the SampleCounts data.Definition at line 453 of file vcf_common.cpp.
Convert a VcfRecord to a Variant, treating each sample column as a pool of individuals.
This assumes that the data that was used to create the VCF file was actually a pool of individuals (e.g., from pool sequencing) for each sample (column) of the VCF file. We do not actually recommend to use variant calling software on pool-seq data, as it induces frequency shifts due to the statistical models employed by variant calles that were not built for pool sequencing data. It however seems to be a commonly used approach, and hence we offer this function here. For this type of data, the VCF allelic depth ("AD") information contains the counts of the reference and alternative base, which in this context can be interpreted as describing the allele frequencines of each pool of individuals. This requires the VCF to have the "AD" FORMAT field.
Only SNP data (no indels) are allowed in this function; use VcfRecord::is_snp() to test this.
Definition at line 393 of file vcf_common.cpp.
void genesis::population::convert_to_variant_as_pool_set_missing_gt_ | ( | VcfRecord const & | record, |
Variant & | variant | ||
) |
Local helper function that sets the filter status of a Variant and its samples to missing depending on whether the genotypes of the samples are missing or not.
Definition at line 344 of file vcf_common.cpp.
void genesis::population::convert_to_variant_as_pool_tally_bases_ | ( | VcfRecord const & | record, |
std::pair< std::array< char, 6 >, size_t > const & | snp_chars, | ||
VcfFormatIteratorInt const & | sample_ad, | ||
SampleCounts & | sample | ||
) |
Local helper function to tally up the bases form a VcfRecord into a SampleCounts.
Definition at line 279 of file vcf_common.cpp.
double genesis::population::f_st_pool_karlsson | ( | ForwardIterator1 | p1_begin, |
ForwardIterator1 | p1_end, | ||
ForwardIterator2 | p2_begin, | ||
ForwardIterator2 | p2_end, | ||
bool | only_passing_samples = true |
||
) |
Compute the F_ST statistic for pool-sequenced data of Karlsson et al as used in PoPoolation2, for two ranges of SampleCountss.
The approach is called the "asymptotically unbiased" estimator in PoPoolation2 [1], and follows Karlsson et al [2].
[1] PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq).
Kofler R, Pandey RV, Schlotterer C.
Bioinformatics, 2011, 27(24), 3435–3436. https://doi.org/10.1093/bioinformatics/btr589
[2] Efficient mapping of mendelian traits in dogs through genome-wide association.
Karlsson EK, Baranowska I, Wade CM, Salmon Hillbertz NHC, Zody MC, Anderson N, Biagi TM, Patterson N, Pielberg GR, Kulbokas EJ, Comstock KE, Keller ET, Mesirov JP, Von Euler H, Kämpe O, Hedhammar Å, Lander ES, Andersson G, Andersson L, Lindblad-Toh K.
Nature Genetics, 2007, 39(11), 1321–1328. https://doi.org/10.1038/ng.2007.10
Definition at line 267 of file fst_pool_functions.hpp.
double genesis::population::f_st_pool_kofler | ( | size_t | p1_poolsize, |
size_t | p2_poolsize, | ||
ForwardIterator1 | p1_begin, | ||
ForwardIterator1 | p1_end, | ||
ForwardIterator2 | p2_begin, | ||
ForwardIterator2 | p2_end, | ||
bool | only_passing_samples = true |
||
) |
Compute the F_ST statistic for pool-sequenced data of Kofler et al as used in PoPoolation2, for two ranges of SampleCountss.
The approach is called the "classical" or "conventional" estimator in PoPoolation2 [1], and follows Hartl and Clark [2].
[1] PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq).
Kofler R, Pandey RV, Schlotterer C.
Bioinformatics, 2011, 27(24), 3435–3436. https://doi.org/10.1093/bioinformatics/btr589
[2] Principles of Population Genetics.
Hartl DL, Clark AG.
Sinauer, 2007.
Definition at line 188 of file fst_pool_functions.hpp.
std::pair<double, double> genesis::population::f_st_pool_unbiased | ( | size_t | p1_poolsize, |
size_t | p2_poolsize, | ||
ForwardIterator1 | p1_begin, | ||
ForwardIterator1 | p1_end, | ||
ForwardIterator2 | p2_begin, | ||
ForwardIterator2 | p2_end, | ||
bool | only_passing_samples = true |
||
) |
Compute our unbiased F_ST statistic for pool-sequenced data for two ranges of SampleCountss.
This is our novel approach for estimating F_ST, using pool-sequencing corrected estimates of Pi within, Pi between, and Pi total, to compute F_ST following the definitions of Nei [1] and Hudson [2], respectively. These are returned here as a pair in that order. See https://github.com/lczech/pool-seq-pop-gen-stats for details.
[1] Analysis of Gene Diversity in Subdivided Populations.
Nei M.
Proceedings of the National Academy of Sciences, 1973, 70(12), 3321–3323. https://doi.org/10.1073/PNAS.70.12.3321
[2] Estimation of levels of gene flow from DNA sequence data.
Hudson RR, Slatkin M, Maddison WP.
Genetics, 1992, 132(2), 583–589. https://doi.org/10.1093/GENETICS/132.2.583
Definition at line 333 of file fst_pool_functions.hpp.
double f_star | ( | double | a_n, |
double | n | ||
) |
Compute f*
according to Achaz 2008 and Kofler et al. 2011.
This is compuated as \( f_{star} = \frac{n - 3}{a_n \cdot (n-1) - n} \), and needed for the computation of alpha_star() and beta_star(). See there for some more details, and see
G. Achaz.
Testing for neutrality in samples with sequencing errors.
(2008) Genetics, 179(3), 1409–1424. https://doi.org/10.1534/genetics.107.082198
for the original equations.
Definition at line 324 of file diversity_pool_functions.cpp.
void genesis::population::fill_fst_cathedral_records_from_processor_ | ( | FstPoolProcessor const & | processor, |
std::vector< FstCathedralPlotRecord > & | records, | ||
size_t | position | ||
) |
Definition at line 177 of file fst_cathedral.cpp.
genesis::utils::JsonDocument fst_cathedral_plot_record_to_json_document | ( | FstCathedralPlotRecord const & | record | ) |
Get a user-readable description of the data of a FstCathedralPlotRecord as a JsonDocument.
Definition at line 307 of file fst_cathedral.cpp.
|
inline |
Return a list of sample name pairs for each calculator in an FstPoolProcessor.
The function takes a processor
, and the original list of sample_names
of the samples in the calculators in the processor
, and uses their indices (as stored in the processor
) to get pairs of sample names.
Definition at line 578 of file fst_pool_processor.hpp.
|
inline |
Definition at line 462 of file fst_pool_unbiased.hpp.
|
inline |
Definition at line 446 of file fst_pool_unbiased.hpp.
GenomeLocusSet genome_locus_set_from_vcf_file | ( | std::string const & | file | ) |
Read a VCF file, and use its positions to create a GenomeLocusSet.
This is for example useful to restrict some analysis to the loci of known variants. Note that the whole file has to be read still; it can hence be better to only do this once and convert to a faster file format, such as simple genome region lists, see GenomeRegionReader.
This ignores all sample information, and simply uses the CHROM
and POS
data to construct the resulting set. The VCF file does not have to be sorted for this.
Definition at line 580 of file vcf_common.cpp.
GenomeRegionList genome_region_list_from_vcf_file | ( | std::string const & | file | ) |
Read a VCF file, and use its positions to create a GenomeRegionList.
This is for example useful to restrict some analysis to the loci of known variants; however, for that use case, it is recommended to use genome_locus_set_from_vcf_file() instead, as testing genome coordinate coverage is way faster with that.
Note that the whole file has to be read still; it can hence be better to only do this once and convert to a faster file format, such as simple genome region lists, see GenomeRegionReader.
This ignores all sample information, and simply uses the CHROM
and POS
data to construct intervals of consecutive positions along the chromosomes, i.e., if the file contains positions 1
, 2
, and 3
, but not 4
, an interval spanning 1-3
is inserted into the list.
The VCF file does not have to be sorted for this.
Definition at line 600 of file vcf_common.cpp.
void genome_region_list_from_vcf_file | ( | std::string const & | file, |
GenomeRegionList & | target | ||
) |
Read a VCF file, and add its positions to an existing GenomeRegionList.
This is for example useful to restrict some analysis to the loci of known variants; however, for that use case, it is recommended to use genome_locus_set_from_vcf_file() instead, as testing genome coordinate coverage is way faster with that.
Note that the whole file has to be read still; it can hence be better to only do this once and convert to a faster file format, such as simple genome region lists, see GenomeRegionReader.
This ignores all sample information, and simply uses the CHROM
and POS
data to construct intervals of consecutive positions along the chromosomes, i.e., if the file contains positions 1
, 2
, and 3
, but not 4
, an interval spanning 1-3
is inserted into the list.
The VCF file does not have to be sorted for this.
Definition at line 607 of file vcf_common.cpp.
SampleCounts::size_type get_base_count | ( | SampleCounts const & | sample, |
char | base | ||
) |
Get the count for a base
given as a char.
The given base
has to be one of ACGTDN
(case insensitive), or *#.
for deletions as well.
Definition at line 50 of file population/function/functions.cpp.
std::pair<std::array<char, 6>, size_t> genesis::population::get_vcf_record_snp_ref_alt_chars_ | ( | VcfRecord const & | record | ) |
Local helper function that returns the REF and ALT chars of a VcfRecord for SNPs.
This function expects the record
to only contain SNP REF and ALT (single nucleotides), and throws when not. It then fills the resulting array with these chars. That is, result[0] is the REF char, result[1] the first ALT char, and so forth.
To keep it speedy, we always return an array that is large enough for all ACGTND
, and return the number of used entries as the second value of the pair.
Definition at line 235 of file vcf_common.cpp.
|
inline |
Get the length of a given Window.
This is needed for the special case of a WindowView over the whole genome, which we indicate by WindowView::is_whole_genome() being set. In this case, the length is not contiguous along a single chromosome. In all other window cases, we simply use the first and last position of the window, via BaseWindow::width().
Definition at line 146 of file window_average.hpp.
|
inline |
Get the count of provided loci in a window.
Definition at line 166 of file window_average.hpp.
char guess_alternative_base | ( | Variant const & | variant, |
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the alternative base of a Variant.
If the Variant already has an alternative_base
in ACGT
and force
is not true
, this original base is returned (meaning that this function is idempotent; it does not change the alternative base if there already is one). However, if the alternative_base
is N
or any other char not in ACGT
, or if force
is true
, the base with the highest count that is not the reference base is returned instead. This also means that the reference base has to be set to a value in ACGT
, as otherwise the concept of an alternative base is meaningless anyway. If the reference base is not one of ACGT
, the returned alternative base is N
. Furthermore, if all three non-reference bases have count 0, the returned alternative base is N
.
Definition at line 495 of file population/function/functions.cpp.
void guess_and_set_ref_and_alt_bases | ( | Variant & | variant, |
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference and alternative bases for a Variant, and set them.
This uses the same approach as guess_reference_base() and guess_alternative_base(), but is more efficient than calling both in sequence. See there for details.
Definition at line 515 of file population/function/functions.cpp.
void guess_and_set_ref_and_alt_bases | ( | Variant & | variant, |
char | ref_base, | ||
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference base.
This uses the same approach as guess_and_set_ref_and_alt_bases( Variant&, bool ), but additionally consideres the given ref_base
. If the reference base contains a value in ACGT
(case insensitive) at the position of the variant
, it is used as the reference. Note that the function throws an exception should the reference base already be set to a different value that is not code for the base of the Variant, in order to notify users that something is off. That is, we do check for ambiguity codes, and if the reference base is an ambiguous base that contains the one already set in the Variant, this is okay. An exception is thrown on mismatch only.
If the reference base is N
though, the function behaves the same as its reference-free overload of the function. For the alternative base, it always uses the most abundant base that is not the reference, same as its alternative function.
Definition at line 566 of file population/function/functions.cpp.
void guess_and_set_ref_and_alt_bases | ( | Variant & | variant, |
genesis::sequence::ReferenceGenome const & | ref_genome, | ||
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference genme to obtain the base.
This simply calls guess_and_set_ref_and_alt_bases( Variant&, char, bool ) with the base given by the ref_genome
. See there for details.
Definition at line 634 of file population/function/functions.cpp.
genesis::sequence::QualityEncoding guess_pileup_quality_encoding | ( | std::shared_ptr< utils::BaseInputSource > | source, |
size_t | max_lines = 0 |
||
) |
Guess the quality score encoding for an (m)pileup input, based on counts of how often each char appeared in the quality string (of the input pileup file for example).
The function reads and parses the input source as a pileup file, counts all quality score chars as they appear in there, and then guesses the encoding that was used. If max_lines
is set to a value greater than 0, only that many lines are read. If max_chars
is set to a value greater than 0, only that many quality score charaters are read.
Definition at line 178 of file simple_pileup_common.cpp.
char guess_reference_base | ( | Variant const & | variant, |
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference base of a Variant.
If the Variant already has a reference_base
in ACGT
, this base is returned (meaning that this function is idempotent; it does not change the reference base if there already is one). However, if the reference_base
is N
or any other value not in ACGT
, or if force
is true
, the base with the highest count is returned instead, unless all counts are 0, in which case the returned reference base is N
.
Definition at line 478 of file population/function/functions.cpp.
double heterozygosity | ( | SampleCounts const & | sample, |
bool | with_bessel = false |
||
) |
Compute classic heterozygosity.
This is computed as \( h = \frac{n}{n-1} \left( 1 - \sum p^2 \right) \) with n
the total nucleotide_sum() (sum of A
,C
,G
,T
in the sample), and p
their respective nucleotide frequencies, with with_bessel
, or without Bessel's correction in the beginning of the equation when with_bessel
is set to false
(default).
See Equation 3.1 in
Hahn, M. W.
(2018). Molecular Population Genetics.
https://global.oup.com/academic/product/molecular-population-genetics-9780878939657
for details.
Definition at line 150 of file diversity_pool_functions.cpp.
|
inline |
Test whether the chromosome/position is within a given GenomeLocusSet
.
Definition at line 124 of file function/genome_region.hpp.
bool genesis::population::is_covered | ( | GenomeLocusSet const & | loci, |
T const & | locus | ||
) |
Test whether the chromosome/position of a locus
is within a given GenomeLocusSet
.
This is a function template, so that it can accept any data structure that contains public member variables chromosome
(std::string
) and position
(size_t
), such as Variant or GenomeLocus.
Definition at line 168 of file function/genome_region.hpp.
|
inline |
Definition at line 189 of file function/genome_region.hpp.
bool is_covered | ( | GenomeRegion const & | region, |
std::string const & | chromosome, | ||
size_t | position | ||
) |
Test whether the chromosome/position is within a given genomic region
.
Definition at line 207 of file genome_region.cpp.
bool genesis::population::is_covered | ( | GenomeRegion const & | region, |
T const & | locus | ||
) |
Test whether the chromosome/position of a locus
is within a given genomic region
.
This is a function template, so that it can accept any data structure that contains public member variables chromosome
(std::string
) and position
(size_t
), such as Variant or GenomeLocus.
Definition at line 141 of file function/genome_region.hpp.
|
inline |
Definition at line 179 of file function/genome_region.hpp.
|
inline |
Test whether the chromosome/position is within a given list of genomic regions
.
Definition at line 116 of file function/genome_region.hpp.
bool genesis::population::is_covered | ( | GenomeRegionList const & | regions, |
T const & | locus | ||
) |
Test whether the chromosome/position of a locus
is within a given list of genomic regions
.
This is a function template, so that it can accept any data structure that contains public member variables chromosome
(std::string
) and position
(size_t
), such as Variant or GenomeLocus.
Definition at line 155 of file function/genome_region.hpp.
|
inline |
Definition at line 184 of file function/genome_region.hpp.
|
inlineconstexpr |
Return whether a given base is in ACGT
, case insensitive.
Definition at line 56 of file population/function/functions.hpp.
|
inlineconstexpr |
Return whether a given base is in ACGTN
, case insensitive.
Definition at line 71 of file population/function/functions.hpp.
std::pair< genesis::utils::JsonDocument, genesis::utils::Matrix< double > > load_cathedral_plot_record_components_from_files | ( | std::string const & | base_path | ) |
Load the parts of a cathedral plot from a set of files.
Reverse of save_cathedral_plot_record_to_files(), returning the files as a Json document, and a Matrix of values for the heatmap. See load_cathedral_plot_record_from_files() for the convenience function that actually loads and fills the CathedralPlotRecord from that.
Definition at line 276 of file cathedral_plot.cpp.
CathedralPlotRecord load_cathedral_plot_record_from_files | ( | std::string const & | base_path | ) |
Load the record of a cathedral plot from a set of files.
See save_cathedral_plot_record_to_files(). This reads a json and a csv file using the base_path
with the extensions .json
and .csv
. For convenience, it is also possible to specify one of the two file paths directly, and the respective other will be inferred.
Definition at line 311 of file cathedral_plot.cpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 232 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 249 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 330 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 310 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 320 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 300 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 465 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 476 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 552 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 563 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 386 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 366 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 376 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 356 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 414 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 424 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 505 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 517 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
genesis::utils::Matrix< genesis::utils::Color > make_cathedral_plot_heatmap | ( | CathedralPlotRecord const & | record, |
genesis::utils::HeatmapParameters const & | heatmap_parameters | ||
) |
Make a cathedral plot heat map as a color matrix.
This uses the data from a record
, and the color heat map parameters
.
This is meant as a high level function for convenience, and to show how such a heat map can be made. See also make_cathedral_plot_svg().
Definition at line 346 of file cathedral_plot.cpp.
genesis::utils::SvgDocument make_cathedral_plot_svg | ( | CathedralPlotRecord const & | record, |
genesis::utils::HeatmapParameters const & | heatmap_parameters | ||
) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes.
This uses the data from a record
, and the color heat map parameters
.
This is meant as a high level function for convenience, and to show how such a heat map can be made. See also make_cathedral_plot_heatmap().
Definition at line 435 of file cathedral_plot.cpp.
genesis::utils::SvgDocument make_cathedral_plot_svg | ( | CathedralPlotRecord const & | record, |
genesis::utils::HeatmapParameters const & | heatmap_parameters, | ||
genesis::utils::Matrix< genesis::utils::Color > const & | image | ||
) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes.
This uses the data from a record
, and the color heat map parameters
.
This is meant as a high level function for convenience, and to show how such a heat map can be made. See also make_cathedral_plot_heatmap().
Definition at line 354 of file cathedral_plot.cpp.
ChromosomeWindowStream<InputStreamIterator, DataType> genesis::population::make_chromosome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, without the need to specify the template parameters manually.
Definition at line 451 of file chromosome_window_stream.hpp.
ChromosomeWindowStream<InputStreamIterator> genesis::population::make_default_chromosome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, for a default use case.
This helper assumes that the underlying type of the input data stream and of the data that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the ChromosomeWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the ChromosomeWindowStream. For example, a data type that this works for is Variant data.
Definition at line 470 of file chromosome_window_stream.hpp.
GenomeWindowStream<InputStreamIterator> genesis::population::make_default_genome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a GenomeWindowStream for the whole genome, for a default use case.
This helper assumes that the underlying type of the input data stream and of the data that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the GenomeWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the GenomeWindowStream. For example, a data type that this works for is Variant data.
This helper function creates a GenomeWindowStream from the given pair of iterators, so that the whole genome is traversed without stopping at individual chromosomes in each iteration.
Definition at line 465 of file genome_window_stream.hpp.
IntervalWindowStream<InputStreamIterator> genesis::population::make_default_interval_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | width = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a IntervalWindowStream for a default use case.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the IntervalWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the IntervalWindowStream. For example, a data type that this works for is Variant data.
Definition at line 522 of file interval_window_stream.hpp.
PositionWindowStream<InputStreamIterator> genesis::population::make_default_position_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, for a default use case.
This helper assumes that the underlying type of the input data stream and of the data that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the PositionWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the PositionWindowStream. For example, a data type that this works for is Variant data.
The PositionWindowStream::entry_selection_function is set so that all entries are selected to be considered in the iteration. This can be re-set afterwards if a different criterion is needed. See also make_passing_variant_position_window_stream() and make_passing_variant_position_window_view_stream() for specializations of this for data type Variant, which instead only select entries that have Variant::status passing.
Definition at line 373 of file position_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_position_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper class that creates a PositionWindowStream with default functors and wraps it in a WindowViewStream.
See make_default_position_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of PositionWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 410 of file position_window_stream.hpp.
QueueWindowStream<InputStreamIterator> genesis::population::make_default_queue_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a QueueWindowStream for a default use case.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the QueueWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the QueueWindowStream. For example, a data type that this works for is Variant data.
The QueueWindowStream::entry_selection_function is set so that all entries are selected to be considered towards the QueueWindowStream::count() of entries per window. This can be re-set afterwards if a different criterion is needed. See also make_passing_variant_queue_window_stream() and make_passing_variant_queue_window_view_stream() for specializations of this for data type Variant, which instead only select entries that have Variant::status passing.
Definition at line 855 of file queue_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_queue_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count, | ||
size_t | stride = 0 |
||
) |
Helper class that creates a QueueWindowStream with default functors and wraps it in a WindowViewStream.
See make_default_queue_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of QueueWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 895 of file queue_window_stream.hpp.
RegionWindowStream<InputStreamIterator> genesis::population::make_default_region_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
std::shared_ptr< GenomeRegionList > | region_list | ||
) |
Helper function to instantiate a RegionWindowStream for a default use case.
This helper assumes that the underlying type of the input data stream and of the Windows that we are iterating over are of the same type, that is, we do no conversion in the entry_input_function
functor of the RegionWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the RegionWindowStream. For example, a data type that this works for is Variant data.
Definition at line 842 of file region_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_region_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
std::shared_ptr< GenomeRegionList > | region_list | ||
) |
Helper class that creates a RegionWindowStream and wraps it in a WindowViewStream.
See make_default_region_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of RegionWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 877 of file region_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_sliding_interval_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | width = 0 , |
||
size_t | stride = 0 |
||
) |
Helper class that creates a IntervalWindowStream and wraps it in a WindowViewStream.
See make_default_interval_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of IntervalWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 558 of file interval_window_stream.hpp.
|
inline |
Create an DiversityPoolProcessor to compute diversity for all samples.
The function expects the settings to use for all samples, as well as the list of pool sizes of all samples. It then yields a processor that can be provided with all Variants of interest along the genome, and computes diversity for each sample.
Compared to the corresponding make_fst_pool_processor() functions, this function here does not really do much, and is just provided for symmetry reasons with the fst functions...
Definition at line 356 of file diversity_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for one-to-all FST computation between one sample and all others.
The function expects the pool sizes of all samples, as well as the index of the Variant::samples SampleCounts object between which FST to all other samples shall be calculated, and any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between the given index and all other samples.
Definition at line 521 of file fst_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for one-to-one FST computation between two samples.
The function expects the pool sizes of all samples, as well as two indices of the Variant::samples SampleCounts objects between which FST shall be calculated, and any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between the given pair of samples.
Definition at line 550 of file fst_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for all-to-all computation of FST between all pairs of samples.
The function expects the pool sizes of all samples, as well as any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between all pairs of their samples.
Definition at line 454 of file fst_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for computation of FST between specific pairs of samples.
The function expects the the pool sizes of all samples, as well as the pairs of indices of the Variant::samples SampleCounts between which FST shall be calculated, and any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between all provided pairs of their samples.
Definition at line 484 of file fst_pool_processor.hpp.
GenomeWindowStream<InputStreamIterator, DataType> genesis::population::make_genome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a GenomeWindowStream for the whole genome, without the need to specify the template parameters manually.
This helper function creates a GenomeWindowStream from the given pair of iterators, so that the whole genome is traversed without stopping at individual chromosomes in each iteration.
Definition at line 443 of file genome_window_stream.hpp.
std::shared_ptr<T> genesis::population::make_input_stream_with_sample_filter_ | ( | std::string const & | filename, |
R const & | reader, | ||
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices, | ||
std::vector< bool > const & | sample_filter | ||
) |
Local helper function template that takes care of intilizing an input stream, and setting the sample filters, for those streams for which we do not know the number of samples prior to starting the file iteration.
The template arguments are: T
the returned type of input stream, and R
the underlying reader type. This is very specific for the use case here, and currently is only meant for how we work with the SimplePileupReader and the SyncReader and their streams. Both their streams accept a reader to take settings from.
Definition at line 67 of file variant_input_stream_sources.cpp.
IntervalWindowStream<InputStreamIterator, DataType> genesis::population::make_interval_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | width = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a IntervalWindowStream without the need to specify the template parameters manually.
The three functors entry_input_function
, chromosome_function
, and position_function
of the IntervalWindowStream have to be set in the returned stream before using it. See make_default_interval_window_stream() for an alternative make function that sets these three functors to reasonable defaults that work for the Variant data type.
Definition at line 501 of file interval_window_stream.hpp.
PositionWindowStream<InputStreamIterator> genesis::population::make_passing_variant_position_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a PositionWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of type Variant. It is hence a more specialized version of make_default_position_window_stream(). Here, we check the Variant::status, and only select those Variantss to yield a window that have a passing FilterStatus. The PositionWindowStream::entry_selection_function is set accordingly.
Definition at line 433 of file position_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_passing_variant_position_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper class that creates a PositionWindowStream with default functions for Variant data, and wraps it in a WindowViewStream.
See make_passing_variant_position_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of PositionWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 469 of file position_window_stream.hpp.
QueueWindowStream<InputStreamIterator> genesis::population::make_passing_variant_queue_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a QueueWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of type Variant. It is hence a more specialized version of make_default_queue_window_stream(). Here, we check the Variant::status, and only select those Variantss towards the QueueWindowStream::count() of each window That have a passing FilterStatus. The QueueWindowStream::entry_selection_function is set accordingly.
Definition at line 919 of file queue_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_passing_variant_queue_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count, | ||
size_t | stride = 0 |
||
) |
Helper class that creates a QueueWindowStream with default functions for Variant data, and wraps it in a WindowViewStream.
See make_passing_variant_queue_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of QueueWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 958 of file queue_window_stream.hpp.
PositionWindowStream<InputStreamIterator, DataType> genesis::population::make_position_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, without the need to specify the template parameters manually.
Definition at line 345 of file position_window_stream.hpp.
QueueWindowStream<InputStreamIterator, DataType> genesis::population::make_queue_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a QueueWindowStream without the need to specify the template parameters manually.
This still requires to set the four needed functionals for processing the input stream, as described in QueueWindowStream.
Definition at line 824 of file queue_window_stream.hpp.
RegionWindowStream<InputStreamIterator, DataType> genesis::population::make_region_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
std::shared_ptr< GenomeRegionList > | region_list | ||
) |
Helper function to instantiate a RegionWindowStream without the need to specify the template parameters manually.
The three functors entry_input_function
, chromosome_function
, and position_function
of the RegionWindowStream have to be set in the returned stream before using it. See make_default_region_window_stream() for an alternative make function that sets these three functors to reasonable defaults that work for the Variant data type.
Definition at line 822 of file region_window_stream.hpp.
|
inline |
Filter function to be used with VariantInputStream on a Variant to filter its SampleCounts by genome regions, by tagging non-covered positions with the given tag
.
This function is similar to make_variant_filter_by_region_tagging(), but instead of setting the status of the whole Variant, it applies per-sample filters instead, and sets their status flags. The function expects a set of GenomeLocusSet or GenomeRegionList pointers to be given, one for each sample of the Variant. The template parameter GenomeMaskType allows either of those two mask types to be used.
Definition at line 66 of file sample_counts_filter_positional.hpp.
|
inline |
Return a functional to numerically filter the SampleCounts samples in a Variant tagging the ones that do not pass the filters, and potentially tagging the Variant.
The function uses apply_sample_counts_filter_numerical(), modifying the samples, and tagging whether the filtering determined that the samples should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream. alternative that instead excludes the Variant::status from the stream.
Definition at line 280 of file sample_counts_filter_numerical.hpp.
|
inline |
This overload also includes the statistics of the failing or passing filter.
Definition at line 294 of file sample_counts_filter_numerical.hpp.
std::vector< bool > make_sample_name_filter | ( | std::vector< std::string > const & | sample_names, |
std::vector< std::string > const & | names_filter, | ||
bool | inverse_filter = false |
||
) |
Create a filter for samples, indicating which to keep.
The resulting bool vector has the same length as the input sample_names
vector, and is true
for all samples that are meant to be kept, and false
otherwise. By default, with inverse_filter == false
, sample names that are in the names_filter
are kept, and those that are not are not kept. With inverse_filter == true
, this is reversed.
The function also checks that sample_names
and names_filter
are unique (as otherwise the filtering might be wrong), and that the names in the names_filter
actually appear in the sample_names
.
Definition at line 46 of file variant_input_stream.cpp.
std::vector<std::string> genesis::population::make_sample_name_list_ | ( | std::string const & | source_name, |
size_t | size | ||
) |
Local helper to fill the sample names of file formats without sample names.
We want to use a standardized format for that: the file base name, followed by consecutive numbers for each sample, separated by a character.
Definition at line 133 of file variant_input_stream_sources.cpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a genome region, by excluding non-covered positions from the stream.
This function can be used as a filter with VariantInputStream::add_filter(), in order to only iterate over Variants that are in the given region
(if complement
is false
, default), or only over Variants that are outside of the region
(if complement
is true
).
Definition at line 66 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream.
This function can be used as a filter with VariantInputStream::add_filter(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
Definition at line 103 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream.
This function can be used as a filter with VariantInputStream::add_filter(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
Definition at line 85 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a genome region, by tagging non-covered positions with the given tag
.
This function can be used as a filter with VariantInputStream::add_transform(), in order to only iterate over Variants that are in the given region
(if complement
is false
, default), or only over Variants that are outside of the region
(if complement
is true
).
The two tag options are VariantFilterTag::kMaskedPosition and VariantFilterTag::kMaskedRegion, which we check, in order to avoid accidental mistakes. We distinguish between those in the sense that a masked region is meant to be a larger part, where only certain chromosomes or genes are not masked, while a masked position is meant to be a finer scale, that can be decided per position, such as to mark synonymous vs non-synonymous SNPs.
Definition at line 138 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag
.
This function can be used as a filter with VariantInputStream::add_transform(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
The two tag options are VariantFilterTag::kMaskedPosition and VariantFilterTag::kMaskedRegion, which we check, in order to avoid accidental mistakes. We distinguish between those in the sense that a masked region is meant to be a larger part, where only certain chromosomes or genes are not masked, while a masked position is meant to be a finer scale, that can be decided per position, such as to mark synonymous vs non-synonymous SNPs.
Definition at line 209 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag
.
This function can be used as a filter with VariantInputStream::add_transform(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
The two tag options are VariantFilterTag::kMaskedPosition and VariantFilterTag::kMaskedRegion, which we check, in order to avoid accidental mistakes. We distinguish between those in the sense that a masked region is meant to be a larger part, where only certain chromosomes or genes are not masked, while a masked position is meant to be a finer scale, that can be decided per position, such as to mark synonymous vs non-synonymous SNPs.
Definition at line 177 of file variant_filter_positional.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), which returns true
or false
depending on whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform_filter() to exclude positions fully from the stream.
See make_variant_filter_numerical_tagging() for an alternative that instead simply sets the Variant::status to an appropriate VariantFilterTag, but does not exclude it from the stream.
Definition at line 206 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), which returns true
or false
depending on whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform_filter() to exclude positions fully from the stream.
See make_variant_filter_numerical_tagging() for an alternative that instead simply sets the Variant::status to an appropriate VariantFilterTag, but does not exclude it from the stream.
This overload also includes the statistics of the failing or passing filter.
Definition at line 219 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
This overload additionally runs apply_sample_counts_filter_numerical() on all samples, i.e., it additionally does the same as make_sample_counts_filter_numerical_tagging(). This is meant as a convenience function that just does all the typical numercial filtering at once.
Definition at line 273 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
This overload additionally runs apply_sample_counts_filter_numerical() on all samples, i.e., it additionally does the same as make_sample_counts_filter_numerical_tagging(). This is meant as a convenience function that just does all the typical numercial filtering at once. The Variant filter is also set to fitting non-passing values if
This overload also includes the statistics of the failing or passing filter.
Definition at line 296 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
Definition at line 244 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
This overload also includes the statistics of the failing or passing filter.
Definition at line 257 of file variant_filter_numerical.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 97 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets a genome locus set to filter the positions.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 143 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the reference genome for the gapless iteration.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 111 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the reference genome for the gapless iteration, as well as a genome locus set to filter the positions.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 160 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the sequence dictionary for the gapless iteration.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 127 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the sequence dictionary for the gapless iteration, as well as a genome locus set to filter the positions.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 179 of file variant_input_stream_adapters.hpp.
VariantInputStream make_variant_input_stream_from_frequency_table_file | ( | std::string const & | filename, |
char | separator_char = '\t' , |
||
FrequencyTableInputStream const & | reader = FrequencyTableInputStream{} |
||
) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants.
Optionally, this takes a reader
with settings to be used.
Definition at line 412 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_frequency_table_file | ( | std::string const & | filename, |
std::vector< std::string > const & | sample_names_filter, | ||
bool | inverse_sample_names_filter = false , |
||
char | separator_char = '\t' , |
||
FrequencyTableInputStream const & | reader = FrequencyTableInputStream{} |
||
) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants.
Additionally, this version of the function takes a list of sample_names
which are used as filter so that only those samples (columns of the frequency table) are evaluated and accessible - or, if inverse_sample_names
is set to true
, instead all but those samples.
Optionally, this takes a reader
with settings to be used.
Definition at line 422 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_individual_vcf_file | ( | std::string const & | filename, |
VariantInputStreamFromVcfParams const & | params = VariantInputStreamFromVcfParams{} , |
||
bool | use_allelic_depth = false |
||
) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as an individual, and combining them all into one SampleCounts sample.
See convert_to_variant_as_individuals( VcfRecord const&, bool ) for details on the conversion from VcfRecord to Variant.
Definition at line 591 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pileup_file | ( | std::string const & | filename, |
SimplePileupReader const & | reader = SimplePileupReader{} |
||
) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants.
Optionally, this takes a reader
with settings to be used.
Definition at line 302 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pileup_file | ( | std::string const & | filename, |
std::vector< bool > const & | sample_filter, | ||
SimplePileupReader const & | reader = SimplePileupReader{} |
||
) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants.
This uses only the samples at the indices where the sample_filter
is true
. Optionally, this takes a reader
with settings to be used.
Definition at line 322 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pileup_file | ( | std::string const & | filename, |
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices = false , |
||
SimplePileupReader const & | reader = SimplePileupReader{} |
||
) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants.
This uses only the samples at the zero-based indices given in the sample_indices
list. If inverse_sample_indices
is true
, this list is inversed, that is, all sample indices but the ones listed are included in the output.
For example, given a list { 0, 2 }
and a file with 4 samples, only the first and the third sample will be in the output. When however inverse_sample_indices
is also set, then the output will contain the second and fourth sample.
Optionally, this takes a reader
with settings to be used.
Definition at line 311 of file variant_input_stream_sources.cpp.
VariantInputStream genesis::population::make_variant_input_stream_from_pileup_file_ | ( | std::string const & | filename, |
SimplePileupReader const & | reader, | ||
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices, | ||
std::vector< bool > const & | sample_filter | ||
) |
Local helper function that takes care of the three functions below.
Definition at line 261 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pool_vcf_file | ( | std::string const & | filename, |
VariantInputStreamFromVcfParams const & | params = VariantInputStreamFromVcfParams{} |
||
) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as a pool of individuals.
See convert_to_variant_as_pool( VcfRecord const& ) for details on the conversion from VcfRecord to Variant.
This function requires the VCF to have the "AD" FORMAT field. It only iterates over those VCF record lines that actually have the "AD" FORMAT provided, as this is the information that we use to convert the samples to Variants. All records without that field are skipped.
Definition at line 582 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sam_file | ( | std::string const & | filename, |
SamVariantInputStream const & | reader = SamVariantInputStream{} |
||
) |
Create a VariantInputStream to iterate the contents of a SAM/BAM/CRAM file as Variants.
An instance of SamVariantInputStream can be provided from which the settings are copied.
Depending on the settings used in the reader
, this can either produce a single sample (one SampleCounts object in the resulting Variant at each position in the genome), or split the input file by the read group (RG) tag (potentially also allowing for an "unaccounted" group of reads).
The other make_variant_input_stream_...
functions offer settings to sub-set (filter) the samples based on their names or indices. This can be achieved here as well, but has instead to be done directly in the reader
, instead of providing the fitler arguments to this function.
Definition at line 188 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sync_file | ( | std::string const & | filename | ) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants.
Definition at line 381 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sync_file | ( | std::string const & | filename, |
std::vector< bool > const & | sample_filter | ||
) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants.
This uses only the samples at the indices where the sample_filter
is true
. Optionally, this takes a reader
with settings to be used.
Definition at line 399 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sync_file | ( | std::string const & | filename, |
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices = false |
||
) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants.
This uses only the samples at the zero-based indices given in the sample_indices
list. If inverse_sample_indices
is true
, this list is inversed, that is, all sample indices but the ones listed are included in the output.
For example, given a list { 0, 2 }
and a file with 4 samples, only the first and the third sample will be in the output. When however inverse_sample_indices
is also set, then the output will contain the second and fourth sample.
Definition at line 389 of file variant_input_stream_sources.cpp.
VariantInputStream genesis::population::make_variant_input_stream_from_sync_file_ | ( | std::string const & | filename, |
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices, | ||
std::vector< bool > const & | sample_filter | ||
) |
Definition at line 336 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_variant_gapless_input_stream | ( | VariantGaplessInputStream const & | gapless_input | ) |
Create a VariantInputStream that wraps a VariantGaplessInputStream.
See also make_variant_gapless_input_stream()
Definition at line 121 of file variant_input_stream_adapters.cpp.
VariantInputStream make_variant_input_stream_from_variant_parallel_input_stream | ( | VariantParallelInputStream const & | parallel_input, |
VariantParallelInputStream::JoinedVariantParams const & | joined_variant_params = VariantParallelInputStream::JoinedVariantParams{} |
||
) |
Create a VariantInputStream to iterate multiple input sources at once, using a VariantParallelInputStream.
This wraps multiple input sources into one stream that traverses all of them in parallel, and is here then yet again turned into a Variant per position, using VariantParallelInputStream::Iterator::joined_variant() to combine all input sources into one. See there for the meaning of the two bool
parameters of this function.
As this is iterating multiple files, we leave the VariantInputStreamData::file_path and VariantInputStreamData::source_name empty, and fill the VariantInputStreamData::sample_names with the sample names of the underlying input sources of the parallel stream, checking for duplicates to avoid downstream trouble.
Definition at line 55 of file variant_input_stream_adapters.cpp.
VariantInputStream genesis::population::make_variant_input_stream_from_vcf_file_ | ( | std::string const & | filename, |
VariantInputStreamFromVcfParams const & | params, | ||
bool | pool_samples, | ||
bool | use_allelic_depth | ||
) |
Local helper function that takes care of both main functions below.
Definition at line 486 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_vector | ( | std::vector< Variant > const & | variants | ) |
Create a VariantInputStream to iterate the contents of std::vector
containing Variants.
This is a simple wrapper to bring a vector of in-memory Variants into the input stream format that we use for file streaming as well. Meant as a speed-up for small files that fit into memory, in cases where they for example have to processed multiple times.
The user needs to make sure that the lifetime of the given input variants
vector is longer than the stream returned here, and that the vector is not modified after calling this function.
Definition at line 147 of file variant_input_stream_sources.cpp.
std::function<void(Variant&)> genesis::population::make_variant_input_stream_sample_name_filter_transform | ( | std::vector< bool > const & | sample_filter | ) |
Definition at line 103 of file variant_input_stream.cpp.
std::function<void(Variant&)> genesis::population::make_variant_input_stream_sample_subsampling_transform | ( | size_t | max_depth, |
SubsamplingMethod | method | ||
) |
Definition at line 143 of file variant_input_stream.cpp.
std::function<void(Variant const&)> genesis::population::make_variant_input_stream_sequence_length_observer | ( | std::shared_ptr< genesis::sequence::SequenceDict > | sequence_dict | ) |
Definition at line 229 of file variant_input_stream.cpp.
std::function<void(Variant const&)> genesis::population::make_variant_input_stream_sequence_order_observer | ( | std::shared_ptr< genesis::sequence::SequenceDict > | sequence_dict, |
bool | check_sequence_lengths | ||
) |
Definition at line 175 of file variant_input_stream.cpp.
VariantInputStream make_variant_merging_input_stream | ( | VariantInputStream const & | input, |
std::unordered_map< std::string, std::string > const & | sample_name_to_group, | ||
bool | allow_ungrouped_samples = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Create a VariantInputStream that merges samples from its underlying input
.
This provides an on-the-fly merging of input samples by simply summing out their SampleCounts. It takes a mapping of sample names to group names, and creates a VariantInputStream with the group names as new sample names, which then merge the input of their respective samples.
If allow_ungrouped_samples
is set to true
, any sample that does not occur in the map will be added as-is, with its original sample name, and as its own "group". By default, we throw an exception in this case, in order to make sure that the behavior is intended.
Definition at line 276 of file variant_input_stream_adapters.cpp.
VariantMergeGroupAssignment genesis::population::make_variant_merging_input_stream_group_assignment_ | ( | VariantInputStream const & | variant_input, |
std::unordered_map< std::string, std::string > const & | sample_name_to_group, | ||
bool | allow_ungrouped_samples | ||
) |
Helper function to create a mapping from sample indices to group indices.
Definition at line 180 of file variant_input_stream_adapters.cpp.
WindowViewStream<typename T::InputStreamType, typename T::DataType> genesis::population::make_window_view_stream | ( | T && | window_iterator | ) |
Create a WindowViewStream that iterates some underlying BaseWindowStream.
The template parameter T
is expected to be a BaseWindowStream.
This serves as an abstraction to be able to use WindowViewStream everywhere, instead of having to switch between WindowViewStream and WindowStream depending on the type of windowing that is being done. See WindowViewStream for details.
This overload of the function takes the underlying iterator by r-value ref, so that it can be provided directly without copy.
Definition at line 337 of file window_view_stream.hpp.
WindowViewStream<typename T::InputStreamType, typename T::DataType> genesis::population::make_window_view_stream | ( | T const & | window_iterator | ) |
Create a WindowViewStream that iterates some underlying BaseWindowStream.
The template parameter T
is expected to be a BaseWindowStream.
This serves as an abstraction to be able to use WindowViewStream everywhere, instead of having to switch between WindowViewStream and WindowStream depending on the type of windowing that is being done. See WindowViewStream for details.
Definition at line 317 of file window_view_stream.hpp.
SampleCounts merge | ( | SampleCounts const & | p1, |
SampleCounts const & | p2 | ||
) |
Merge the counts of two SampleCountss.
Definition at line 400 of file population/function/functions.cpp.
SampleCounts merge | ( | std::vector< SampleCounts > const & | p, |
SampleCountsFilterPolicy | filter_policy | ||
) |
Merge the counts of a vector SampleCountss.
Definition at line 407 of file population/function/functions.cpp.
void merge_inplace | ( | SampleCounts & | p1, |
SampleCounts const & | p2 | ||
) |
Merge the counts of two SampleCountss, by adding the counts of the second (p2
) to the first (p1
).
Definition at line 383 of file population/function/functions.cpp.
|
inline |
Merge the counts of a vector SampleCountss.
Definition at line 282 of file population/function/functions.hpp.
double n_base | ( | size_t | read_depth, |
size_t | poolsize | ||
) |
Compute the n_base
term used for Tajima's D in Kofler et al. 2011, using a faster closed form expression.
This term is the expected number of distinct individuals sequenced, which is equivalent to finding the expected number of distinct values selected from a set of integers.
The computation in PoPoolation is slowm, see n_base_matrix(). We here instead use a closed form expression following the reasoning of https://math.stackexchange.com/a/72351 See there for the derivation of the equation.
Definition at line 501 of file diversity_pool_functions.cpp.
double n_base_matrix | ( | size_t | read_depth, |
size_t | poolsize | ||
) |
Compute the n_base
term used for Tajima's D in Kofler et al. 2011, following their approach.
This term is the expected number of distinct individuals sequenced, which is equivalent to finding the expected number of distinct values selected from a set of integers.
The computation of this term in PoPoolation uses a recursive dynamic programming approach to sum over different possibilities of selecting sets of integers. This gets rather slow for larger inputs, and there is an equivalent closed form that we here use instead. See n_base() for details. We here merely offer the original PoPoolation implementation as a point of reference.
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
Definition at line 467 of file diversity_pool_functions.cpp.
std::array<size_t, 4> genesis::population::nucleotide_sorting_order | ( | std::array< T, 4 > const & | values | ) |
Return the sorting order of four values, for instance of the four nucleotides ACGT
, in descending order (largest first).
The input are four values, either counts or frequencies. The output are the indices into this array that are sorted so that the largest one comes first:
auto const data = std::array<T, 4>{ 15, 10, 20, 5 }; auto const order = nucleotide_sorting_order( data );
yields { 2, 0, 1, 3 }
, so that data[order[0]] = data[2] = 20
is the largest value, data[order[1]] = data[0] = 15
the second largest, and so forth.
Usage with actual data might be as follows:
SampleCounts sample = ...; auto const data = std::array<T, 4>{ sample.a_count, sample.c_count, sample.g_count, sample.t_count }; auto const order = nucleotide_sorting_order( data ); // ...
See also sample_counts_sorting_order() for an equivalent function that also considers the "any" (N
) and "deletion" (D
) counts of a SampleCounts object.
Definition at line 128 of file population/function/functions.hpp.
|
inlineconstexpr |
Count of the pure nucleotide bases at this position, that is, the sum of all A
, C
, G
, and T
.
This is simply the sum of a_count + c_count + g_count + t_count
, which we often use as the read depth at the given site.
NB: In PoPoolation, this variable is called eucov
.
Definition at line 296 of file population/function/functions.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 396 of file function/genome_locus.hpp.
bool operator!= | ( | GenomeRegion const & | a, |
GenomeRegion const & | b | ||
) |
Inequality comparison (!=
) for two GenomeRegions.
Definition at line 56 of file genome_region.cpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 451 of file function/genome_locus.hpp.
|
inline |
Definition at line 68 of file function/genome_locus.hpp.
std::ostream & operator<< | ( | std::ostream & | os, |
GenomeRegion const & | region | ||
) |
Definition at line 65 of file genome_region.cpp.
std::ostream & operator<< | ( | std::ostream & | os, |
SampleCounts const & | bs | ||
) |
Output stream operator for SampleCounts instances.
Definition at line 649 of file population/function/functions.cpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 538 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 340 of file function/genome_locus.hpp.
bool operator== | ( | GenomeRegion const & | a, |
GenomeRegion const & | b | ||
) |
Equality comparison (!=
) for two GenomeRegions.
Definition at line 51 of file genome_region.cpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 491 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 578 of file function/genome_locus.hpp.
GenomeRegion parse_genome_region | ( | std::string const & | region, |
bool | zero_based = false , |
||
bool | end_exclusive = false |
||
) |
Parse a genomic region.
Accepted formats are "chromosome", "chromosome:position", "chromosome:start-end", and "chromosome:start..end".
By default, we expect positions (coordindates) to be 1-based amd inclusive (closed interval), but this can be changed with the additional parameters zero_based
and end_exclusive
.
Definition at line 107 of file genome_region.cpp.
GenomeRegionList parse_genome_regions | ( | std::string const & | regions, |
bool | zero_based = false , |
||
bool | end_exclusive = false |
||
) |
Parse a set/list of genomic regions.
The individual regions need to be separated by commas (surrounding white space is okay), and each region needs to follow the format as explained in parse_genome_region(). See there for details.
Definition at line 190 of file genome_region.cpp.
genesis::utils::Matrix<double> genesis::population::pij_matrix_ | ( | size_t | max_read_depth, |
size_t | poolsize | ||
) |
Definition at line 395 of file diversity_pool_functions.cpp.
genesis::utils::Matrix<double> const& genesis::population::pij_matrix_resolver_ | ( | size_t | max_read_depth, |
size_t | poolsize | ||
) |
Definition at line 429 of file diversity_pool_functions.cpp.
std::vector<FstCathedralPlotRecord> genesis::population::prepare_fst_cathedral_records_for_chromosome_ | ( | std::string const & | chromosome, |
FstPoolProcessor const & | processor, | ||
FstPoolCalculatorUnbiased::Estimator | fst_estimator, | ||
std::vector< std::string > const & | sample_names | ||
) |
Definition at line 125 of file fst_cathedral.cpp.
std::string print_sample_counts_filter_category_stats | ( | SampleCountsFilterCategoryStats const & | stats, |
bool | verbose | ||
) |
Definition at line 254 of file sample_counts_filter.cpp.
|
inline |
Definition at line 311 of file sample_counts_filter.hpp.
std::ostream & print_sample_counts_filter_category_stats | ( | std::ostream & | os, |
SampleCountsFilterCategoryStats const & | stats, | ||
bool | verbose | ||
) |
Definition at line 230 of file sample_counts_filter.cpp.
|
inline |
Definition at line 301 of file sample_counts_filter.hpp.
std::string print_sample_counts_filter_stats | ( | SampleCountsFilterStats const & | stats, |
bool | verbose | ||
) |
Print a textual representation of the counts collected.
Definition at line 217 of file sample_counts_filter.cpp.
std::ostream & print_sample_counts_filter_stats | ( | std::ostream & | os, |
SampleCountsFilterStats const & | stats, | ||
bool | verbose | ||
) |
Print a textual representation of the counts collected.
Definition at line 169 of file sample_counts_filter.cpp.
std::ostream & print_variant_filter_category_stats | ( | std::ostream & | os, |
VariantFilterCategoryStats const & | stats, | ||
bool | verbose | ||
) |
Definition at line 269 of file variant_filter.cpp.
|
inline |
Definition at line 383 of file variant_filter.hpp.
std::string print_variant_filter_category_stats | ( | VariantFilterCategoryStats const & | stats, |
bool | verbose | ||
) |
Definition at line 299 of file variant_filter.cpp.
|
inline |
Definition at line 393 of file variant_filter.hpp.
std::ostream & print_variant_filter_stats | ( | std::ostream & | os, |
VariantFilterStats const & | stats, | ||
bool | verbose | ||
) |
Print a textual representation of the counts collected.
Definition at line 193 of file variant_filter.cpp.
std::string print_variant_filter_stats | ( | VariantFilterStats const & | stats, |
bool | verbose | ||
) |
Print a textual representation of the counts collected.
Definition at line 256 of file variant_filter.cpp.
GenomeLocusSet read_mask_fasta | ( | std::shared_ptr< utils::BaseInputSource > | source, |
size_t | mask_min = 0 , |
||
bool | invert = false |
||
) |
Read an input source as a mask fasta file, and return its content as a GenomeLocusSet.
The input is expected to be a FASTA-like mask file, e.g., to filter positions with. This mask file contains a sequence of integer digits (between 0 and 9) for each position on a chromosome that specify if a site at that position should be filtered/masked or not.
An example mask file would look like:
>1 0000011111222... >2 2222211111000...
In this example, the first 5 sites of the start of chromosome 1 are not masked, whereas sites at position 6 onwards would be filtered out. And sites after the 11th position on chromosome 2 are kept as well. (NB: The vcftools documentation as of 2023-03-29 states though that these sites "would be filtered out as well", which seems like an error.)
The mask_min
argument specifies a threshold mask value between 0 and 9 to filter positions by. The default threshold is 0, meaning only sites with that value or lower will be kept. The invert
argument flips the interpretation of masked/unmasked.
Our internal representation of this data is to set the masked/filtered positions to true
in the underlying Bitvector, and the non-masked/kept positions to false
by default. The argument invert
flips this. The special position 0
of the GenomeLocusSet is always set to false
.
See https://vcftools.github.io/man_latest.html for details.
Definition at line 69 of file function/genome_locus_set.cpp.
genesis::sequence::SequenceDict reference_locus_set_to_dict | ( | GenomeLocusSet const & | set | ) |
Definition at line 50 of file function/genome_locus_set.cpp.
void resample_counts | ( | SampleCounts & | sample, |
size_t | target_depth | ||
) |
Resample all counts in a SampleCounts sample
to a new target_depth
.
This samples with replacement from a multinomial_distribution distrubtion based on the previous counts of the sample
. This is the same as subsample_counts_with_replacement(), but performs the resampling regardless of whether the sum of counts exceeds the specified read depth.
The function can be seen as a way of creating in-silico replicates of a given population sample. There is no equivalent without replacement, as those could not sample more counts than there are in the original population anyway - meaning that subsample_counts_without_replacement() already coveres that distribution.
Definition at line 248 of file subsample.cpp.
void resample_counts | ( | Variant & | variant, |
size_t | target_depth | ||
) |
Resample all counts in a SampleCounts sample
to a new target_depth
.
This samples with replacement from a multinomial_distribution distrubtion based on the previous counts of the sample
. This is the same as subsample_counts_with_replacement(), but performs the resampling regardless of whether the sum of counts exceeds the specified read depth.
The function can be seen as a way of creating in-silico replicates of a given population sample. There is no equivalent without replacement, as those could not sample more counts than there are in the original population anyway - meaning that subsample_counts_without_replacement() already coveres that distribution.
This overload acts on all Variant::samples in the given variant
.
Definition at line 258 of file subsample.cpp.
void genesis::population::resample_counts_ | ( | SampleCounts & | sample, |
size_t | max_depth, | ||
Distribution | distribution, | ||
bool | skip_if_below_target_depth | ||
) |
Local helper function to avoid code duplication. Takes the distribution (with or without replacement) and performs the resampling of base counts.
Definition at line 192 of file subsample.cpp.
void rescale_counts | ( | SampleCounts & | sample, |
size_t | target_depth | ||
) |
Transform a SampleCounts sample
by re-scaling the base counts (A
, C
, G
, T
, as well as N
and D
) to sum up to max
if max_depth
is exceeded for the sample.
This is identical to subscale_counts(), but performs the transformation regardless of whether the sum of counts exceeds the specified read depth. In other words, this simply performs a linear re-scaling of the counts so that they sum to the given target_depth
.
Definition at line 166 of file subsample.cpp.
void rescale_counts | ( | Variant & | variant, |
size_t | target_depth | ||
) |
Transform a SampleCounts sample
by re-scaling the base counts (A
, C
, G
, T
, as well as N
and D
) to sum up to max
if max_depth
is exceeded for the sample.
This is identical to subscale_counts(), but performs the transformation regardless of whether the sum of counts exceeds the specified read depth. In other words, this simply performs a linear re-scaling of the counts so that they sum to the given target_depth
.
This overload acts on all Variant::samples in the given variant
.
Definition at line 173 of file subsample.cpp.
void genesis::population::rescale_counts_ | ( | SampleCounts & | sample, |
size_t | target_depth, | ||
bool | skip_if_below_target_depth | ||
) |
Definition at line 49 of file subsample.cpp.
void genesis::population::run_vcf_window | ( | SlidingWindowGenerator< Data, Accumulator > & | generator, |
std::string const & | vcf_file, | ||
std::function< Data(VcfRecord const &)> | conversion, | ||
std::function< bool(VcfRecord const &)> | condition = {} |
||
) |
Convenience function to iterate over a whole VCF file.
This function is convenience, and takes care of iterating a VCF file record by record (that is, line by line), using a provided conversion
function to extract the D
/Data
from the VcfRecord. It furthermore takes care of finishing all chromosomes properly, using their lengths as provided in the VCF header.
Before calling the function, of course, all necessary plugin functions have to be set in the SlidingWindowGenerator instance, so that the data is processed as intended. In particular, take care of setting SlidingWindowGenerator::emit_incomplete_windows() to the desired value.
Furthermore, the function offers a condition
function that can be used to skip records that do not fullfil a given condition. That is, if condition
is used, it needs to return true
for records that shall be processed, and false
for those that shall be skipped.
Definition at line 73 of file vcf_window.hpp.
std::string sam_flag_to_string | ( | int | flags | ) |
Turn a set of flags for sam/bam/cram reads into their textual representation.
This is useful for user output. We here use the format of names as used by htslib and samtools, were names are upper case and words in flag names separated by underscores. This ensures compatibility of the output with existing tools.
See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details.
Definition at line 132 of file sam_flags.cpp.
SampleCountsFilterCategoryStats sample_counts_filter_stats_category_counts | ( | SampleCountsFilterStats const & | stats | ) |
Generate summary counts for a SampleCountsFilterStats counter.
The given stats
contain counts for different reasons of filters that could have failed when filtering a SampleCounts. This function summarizes those stats into three basic categories, and gives their sums.
This is meant as a broad summary, for instance for user output, where it might not be overly relevant which exact numerical filter got triggered how often by a particular filter, but rather we want to have an overview of which classes or categories of filters got triggered how often.
Definition at line 95 of file sample_counts_filter.cpp.
size_t sample_counts_filter_stats_category_counts | ( | SampleCountsFilterStats const & | stats, |
SampleCountsFilterTagCategory | category | ||
) |
Overload that only reports back a single category sum of the filter stats.
Definition at line 118 of file sample_counts_filter.cpp.
SampleCountsFilterTagCategory sample_counts_filter_tag_to_category | ( | SampleCountsFilterTag | tag | ) |
For a given tag
, return its category tag.
Definition at line 62 of file sample_counts_filter.cpp.
std::array<size_t, 6> genesis::population::sample_counts_sorting_order | ( | std::array< T, 6 > const & | v | ) |
Return the sorting order of six values, for instance of the four nucleotides ACGT
and the N
and D
counts of a SampleCounts object, in descending order (largest first).
Same as nucleotide_sorting_order(), but also taking N
and D
into account. See there for details.
Definition at line 165 of file population/function/functions.hpp.
|
inlineconstexpr |
Sum up all the base counts at this sample
, that is, the sum of all A
, C
, G
, T
, as well as the N
and D
count for indetermined and deleted counts.
This is simply the sum of a_count + c_count + g_count + t_count + n_count + d_count
, of the SampleCounts object. See nucleotide_sum() for a function that only sums ACGT
, but not N
and D
.
Definition at line 318 of file population/function/functions.hpp.
void save_cathedral_plot_record_to_files | ( | CathedralPlotRecord const & | record, |
std::string const & | base_path | ||
) |
Convenience function to save the record of a cathedral plot in a set of files.
This overload wrapper simply calls cathedral_plot_record_to_json_document() to obtain a Json representation of the given record
, and uses that to save the files. This however only uses the basic fields of CathedralPlotRecord, and omits other fields that might be interesting for users. See for instance fst_cathedral_plot_record_to_json_document() for a function that also stores specialized fields.
Note however that when loading files again via load_cathedral_plot_record_from_files(), we only load the fields of CathedralPlotRecord, and omit those extra ones anyway, as they are not used in the plotting routines anyway.
Definition at line 267 of file cathedral_plot.cpp.
void save_cathedral_plot_record_to_files | ( | genesis::utils::JsonDocument const & | record_document, |
genesis::utils::Matrix< double > const & | record_value_matrix, | ||
std::string const & | base_path | ||
) |
Save the record of a cathedral plot in a set of files.
To allow for flexibility, the CathedralPlotRecord, or its derived classes such as FstCathedralPlotRecord, are expected to be converted to a Json document first, with cathedral_plot_record_to_json_document() and related functions.
We then store that meta-data, as well as the value matrix computed with compute_cathedral_matrix() in two files, which use the given base_path
, and append extensions .json
and .csv
, respectively. The resulting files can be loaded again with load_cathedral_plot_record_from_files().
Definition at line 253 of file cathedral_plot.cpp.
void save_cathedral_plot_record_to_targets | ( | genesis::utils::JsonDocument const & | record_document, |
genesis::utils::Matrix< double > const & | record_value_matrix, | ||
std::shared_ptr< genesis::utils::BaseOutputTarget > | json_target, | ||
std::shared_ptr< genesis::utils::BaseOutputTarget > | csv_target | ||
) |
Save the record of a cathedral plot in a set of output targets.
This overload allows to specify the targets directly, instead of creating fitting targets according to a file base_path
. See save_cathedral_plot_record_to_files() for the file-based versions of this function.
Definition at line 225 of file cathedral_plot.cpp.
void set_base_count | ( | SampleCounts & | sample, |
char | base, | ||
SampleCounts::size_type | value | ||
) |
Set the count for a base
given as a char.
The given base
has to be one of ACGTDN
(case insensitive), or *#.
for deletions as well.
Definition at line 86 of file population/function/functions.cpp.
void genesis::population::SimplePileupReader::process_ancestral_base_< SimplePileupReader::Sample > | ( | utils::InputStream & | input_stream, |
SimplePileupReader::Sample & | sample | ||
) | const |
Definition at line 922 of file simple_pileup_reader.cpp.
void genesis::population::SimplePileupReader::process_quality_string_< SimplePileupReader::Sample > | ( | utils::InputStream & | input_stream, |
SimplePileupReader::Sample & | sample | ||
) | const |
Definition at line 714 of file simple_pileup_reader.cpp.
void genesis::population::SimplePileupReader::set_sample_read_bases_< SimplePileupReader::Sample > | ( | std::string const & | read_bases, |
SimplePileupReader::Sample & | sample | ||
) | const |
Definition at line 692 of file simple_pileup_reader.cpp.
void genesis::population::SimplePileupReader::set_sample_read_depth_< SimplePileupReader::Sample > | ( | size_t | read_depth, |
SimplePileupReader::Sample & | sample | ||
) | const |
Definition at line 438 of file simple_pileup_reader.cpp.
void genesis::population::SimplePileupReader::set_target_alternative_base_< SimplePileupReader::Record > | ( | SimplePileupReader::Record & | target | ) | const |
Definition at line 417 of file simple_pileup_reader.cpp.
std::pair< SortedSampleCounts, SortedSampleCounts > sorted_average_sample_counts | ( | SampleCounts const & | sample_a, |
SampleCounts const & | sample_b | ||
) |
Return the sorted base counts of both input samples, orderd by the average frequencies of the nucleotide counts in the two samples.
Both returned counts will be in the same order, with the nucleotide first that has the highest average count in the two samples, etc.
Definition at line 164 of file population/function/functions.cpp.
SortedSampleCounts sorted_sample_counts | ( | SampleCounts const & | sample | ) |
Return the order of base counts (nucleotides), largest one first.
Definition at line 134 of file population/function/functions.cpp.
SortedSampleCounts sorted_sample_counts | ( | Variant const & | variant, |
bool | reference_first, | ||
SampleCountsFilterPolicy | filter_policy | ||
) |
Get a list of bases sorted by their counts.
If reference_first
is set to true
, the first entry in the resulting array is always the reference base of the Variant, while the other three bases are sorted by counts. If reference_first
is set to false
, all four bases are sorted by their counts.
Definition at line 298 of file population/function/functions.cpp.
SortedSampleCounts genesis::population::sorted_sample_counts_ | ( | Variant const & | variant, |
bool | reference_first, | ||
SampleCounts const & | total | ||
) |
Local helper function that takes an already computed total
from merge_sample_counts(), so that it can be re-used internally here.
Definition at line 235 of file population/function/functions.cpp.
int string_to_sam_flag | ( | std::string const & | value | ) |
Parse a string as a set of flags for sam/bam/cram reads.
The given string can either be the numeric value as specified by the sam standard, or given as a list of flag names or values, which can be separated by comma, space, vertical bar, or plus sign, and where each flag name is treated case-insensitive and without regarding non-alpha-numeric characters. This is a more lenient parsing than what htslib and samtools offer.
For example, it accepts:
1 0x12 PROPER_PAIR,MREVERSE ProperPair + MateReverse PROPER_PAIR | 0x20
See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details.
Definition at line 81 of file sam_flags.cpp.
void subsample_counts_with_replacement | ( | SampleCounts & | sample, |
size_t | max_depth | ||
) |
Transform a SampleCounts sample
by subsampling the nucleotide counts (A
, C
, G
, T
, as well as N
and D
) with replacement to sum up to max
if max_depth
is exceeded for the sample.
If the sum of nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, SampleCounts::t_count, SampleCounts::n_count, and SampleCounts::d_count) exceeds the given max_depth
, the counts are resampled with replacement so that their sum is the given max_depth
. This uses multinomial_distribution() for the sampling. If the count sum is below, nothing is done.
Definition at line 228 of file subsample.cpp.
void subsample_counts_with_replacement | ( | Variant & | variant, |
size_t | max_depth | ||
) |
Transform a SampleCounts sample
by subsampling the nucleotide counts (A
, C
, G
, T
, as well as N
and D
) with replacement to sum up to max
if max_depth
is exceeded for the sample.
If the sum of nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, SampleCounts::t_count, SampleCounts::n_count, and SampleCounts::d_count) exceeds the given max_depth
, the counts are resampled with replacement so that their sum is the given max_depth
. This uses multinomial_distribution() for the sampling. If the count sum is below, nothing is done.
This overload acts on all Variant::samples in the given variant
.
Definition at line 238 of file subsample.cpp.
void subsample_counts_without_replacement | ( | SampleCounts & | sample, |
size_t | max_depth | ||
) |
Transform a SampleCounts sample
by subsampling the nucleotide counts (A
, C
, G
, T
, as well as N
and D
) without replacement to sum up to max
if max_depth
is exceeded for the sample.
If the sum of nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, SampleCounts::t_count, SampleCounts::n_count, and SampleCounts::d_count) exceeds the given max_depth
, the counts are resampled without replacement so that their sum is the given max_depth
. This uses multivariate_hypergeometric_distribution() for the sampling. If the count sum is below, nothing is done.
Definition at line 268 of file subsample.cpp.
void subsample_counts_without_replacement | ( | Variant & | variant, |
size_t | max_depth | ||
) |
Transform a SampleCounts sample
by subsampling the nucleotide counts (A
, C
, G
, T
, as well as N
and D
) without replacement to sum up to max
if max_depth
is exceeded for the sample.
If the sum of nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, SampleCounts::t_count, SampleCounts::n_count, and SampleCounts::d_count) exceeds the given max_depth
, the counts are resampled without replacement so that their sum is the given max_depth
. This uses multivariate_hypergeometric_distribution() for the sampling. If the count sum is below, nothing is done.
This overload acts on all Variant::samples in the given variant
.
Definition at line 278 of file subsample.cpp.
void subscale_counts | ( | SampleCounts & | sample, |
size_t | max_depth | ||
) |
Transform a SampleCounts sample
by sub-scaling the base counts (A
, C
, G
, T
, as well as N
and D
) to sum up to max_depth
if max_depth
is exceeded for the sample.
If the sum of counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, SampleCounts::t_count, SampleCounts::n_count, and SampleCounts::d_count) exceeds the given max_depth
, all counts are scaled proportionally so that their sum is max_depth
. If the sum is below max_depth
, nothing happens.
This transformation is used to limit the max read depth without filtering out the sample completely. This is for instance useful when computing diversity estimators, which have a runtime and memory cost that depends on the read depth. Hence, sub-scaling can reduce the overall runtime and memory usage, without significantly altering the results.
Definition at line 149 of file subsample.cpp.
void subscale_counts | ( | Variant & | variant, |
size_t | max_depth | ||
) |
Transform a SampleCounts sample
by sub-scaling the base counts (A
, C
, G
, T
, as well as N
and D
) to sum up to max_depth
if max_depth
is exceeded for the sample.
If the sum of counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, SampleCounts::t_count, SampleCounts::n_count, and SampleCounts::d_count) exceeds the given max_depth
, all counts are scaled proportionally so that their sum is max_depth
. If the sum is below max_depth
, nothing happens.
This transformation is used to limit the max read depth without filtering out the sample completely. This is for instance useful when computing diversity estimators, which have a runtime and memory cost that depends on the read depth. Hence, sub-scaling can reduce the overall runtime and memory usage, without significantly altering the results.
This overload acts on all Variant::samples in the given variant
.
Definition at line 156 of file subsample.cpp.
|
inline |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al.
The argument empirical_min_read_depth
is only needed when using settings
with TajimaDenominatorPolicy::kEmpiricalMinReadDepth
Definition at line 553 of file diversity_pool_functions.hpp.
double genesis::population::tajima_d_pool | ( | DiversityPoolSettings const & | settings, |
double | theta_pi, | ||
double | theta_watterson, | ||
size_t | poolsize, | ||
ForwardIterator | begin, | ||
ForwardIterator | end, | ||
bool | only_passing_samples = true |
||
) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al.
The provided range between begin
and end
is expected to be already filtered and transformed as needed. We use the full size of that range as the number of SNPs; hence, when instead calling this function with a range that still contains non-SNP positions, the result might be wrong. See DiversityPoolCalculator for details on this.
Definition at line 585 of file diversity_pool_functions.hpp.
double genesis::population::tajima_d_pool | ( | DiversityPoolSettings const & | settings, |
size_t | poolsize, | ||
ForwardIterator | begin, | ||
ForwardIterator | end, | ||
bool | only_passing_samples = true |
||
) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al.
This overload of the function is computing theta_pi and theta_watterson first, and hence inefficient in cases where those have already been computed elsewhere.
Same as tajima_d_pool( DiversityPoolSettings const&, size_t, double, double, ForwardIterator, ForwardIterator ), we also expect the range to be filtered already. See there, and see DiversityPoolCalculator for details.
Definition at line 632 of file diversity_pool_functions.hpp.
double tajima_d_pool_denominator | ( | DiversityPoolSettings const & | settings, |
double | theta, | ||
size_t | poolsize, | ||
double | window_avg_denom, | ||
size_t | empirical_min_read_depth | ||
) |
Compute the denominator for the pool-sequencing correction of Tajima's D according to Kofler et al.
The argument window_avg_denom
is meant to be the total number of valid positions that have been processed to get the values for theta_pi
and theta_watterson
, that is, the sum of all SNP positions as well as all other (invariant) positions that have passed all filters. That is for instance given when usinng window_average_denominator() to determine that number.
Interestingly, PoPoolation only uses the number of SNPs here, which seems wrong. We are unsure why PoPoolation does that, as it is using pileup files, which contain data for all positions, and so the correct number (including the invariant positions) should be available for their code as well.
The argument empirical_min_read_depth
is needed when using the TajimaDenominatorPolicy::kEmpiricalMinReadDepth policy. We always request it as an argument, to make sure that this function cannot accidentally be misused without having kept track of that number.
Definition at line 520 of file diversity_pool_functions.cpp.
double genesis::population::theta_pi | ( | ForwardIterator | begin, |
ForwardIterator | end, | ||
bool | with_bessel = true , |
||
bool | only_passing_samples = true |
||
) |
Compute classic theta pi, that is, the sum of heterozygosities.
The function simply sums heterozygosity() for all samples in the given range. If with_bessel
is set, Bessel's correction for the total nucleotide count is used.
Definition at line 178 of file diversity_pool_functions.hpp.
double genesis::population::theta_pi_pool | ( | DiversityPoolSettings const & | settings, |
size_t | poolsize, | ||
ForwardIterator | begin, | ||
ForwardIterator | end, | ||
bool | only_passing_samples = true |
||
) |
Compute theta pi with pool-sequencing correction according to Kofler et al, that is, the sum of heterozygosities divided by the correction denominator.
The function sums heterozygosity() for all samples in the given range, including Bessel's correction for the total nucleotide count at each position, and divides each by the respective theta_pi_pool_denominator() to correct for error from pool sequencing.
The provided range between begin
and end
is expected to be already filtered and transformed as needed. See DiversityPoolCalculator for details on this.
Definition at line 250 of file diversity_pool_functions.hpp.
|
inline |
Compute theta pi with pool-sequencing correction according to Kofler et al, for a single SampleCounts.
The function computes the heterozygosity() for the given sample
, including Bessel's correction for the total nucleotide count at each position, and divides it by the theta_pi_pool_denominator() to correct for error from pool sequencing.
Definition at line 288 of file diversity_pool_functions.hpp.
double theta_pi_pool_denominator | ( | DiversityPoolSettings const & | settings, |
size_t | poolsize, | ||
size_t | nucleotide_count | ||
) |
Compute the denominator for the pool-sequencing correction of theta pi according to Kofler et al.
We here compute the denominator for a given poolsize
, with a fix DiversityPoolSettings::min_count. Values are identical for each given nucleotide_count
, and henced cached internally for speedup.
Definition at line 172 of file diversity_pool_functions.cpp.
double genesis::population::theta_pi_within_pool | ( | size_t | poolsize, |
ForwardIterator | begin, | ||
ForwardIterator | end, | ||
bool | only_passing_samples = true |
||
) |
Compute classic theta pi (within a population), that is, the sum of heterozygosities including Bessel's correction for total nucleotide sum at each position, and Bessel's correction for the pool size.
This is the same computation used for theta pi within in the FST computation of f_st_pool_unbiased(). It does not use the pool seq correction of Kofler et al.
Definition at line 203 of file diversity_pool_functions.hpp.
double genesis::population::theta_watterson_pool | ( | DiversityPoolSettings const & | settings, |
size_t | poolsize, | ||
ForwardIterator | begin, | ||
ForwardIterator | end, | ||
bool | only_passing_samples = true |
||
) |
Compute theta watterson with pool-sequencing correction according to Kofler et al.
The provided range between begin
and end
is expected to be already filtered and transformed as needed. See DiversityPoolCalculator for details on this.
Definition at line 323 of file diversity_pool_functions.hpp.
|
inline |
Compute theta watterson with pool-sequencing correction according to Kofler et al, for a single SampleCounts sample.
Definition at line 356 of file diversity_pool_functions.hpp.
double theta_watterson_pool_denominator | ( | DiversityPoolSettings const & | settings, |
size_t | poolsize, | ||
size_t | nucleotide_count | ||
) |
Compute the denominator for the pool-sequencing correction of theta watterson according to Kofler et al.
We here compute the denominator for a given poolsize
, with a fix DiversityPoolSettings::min_count. Values are identical for each given nucleotide_count
, and henced cached internally for speedup.
Definition at line 231 of file diversity_pool_functions.cpp.
|
inline |
Definition at line 52 of file function/genome_locus.hpp.
std::string to_string | ( | GenomeRegion const & | region | ) |
Definition at line 72 of file genome_region.cpp.
std::ostream & to_sync | ( | SampleCounts const & | bs, |
std::ostream & | os, | ||
bool | use_status_and_missing = true |
||
) |
Output a SampleCounts instance to a stream in the PoPoolation2 sync format.
This is one column from that file, outputting the counts separated by colons, in the order A:T:C:G:N:D
, with D
being deletions (*
in pileup).
If use_status_and_missing
is set to true
(default), any sample for which the SampleCounts::status is not passing (any status value other than 0) is considered to be filtered out. Instead of writing the counts, we then instead use the "missing" or "masked" extension of the sync file format to denote this, which is .:.:.:.:.:.
instead of the actual counts.
Definition at line 43 of file sync_common.cpp.
std::ostream & to_sync | ( | Variant const & | var, |
std::ostream & | os, | ||
bool | use_status_and_missing = true |
||
) |
Output a Variant instance to a stream in the PoPoolation2 sync format.
The format is a tab-delimited file with one variant per line:
Each population column outputs counts separated by colons, in the order A:T:C:G:N:D
, with D
being deletions (*
in pileup).
See https://sourceforge.net/p/popoolation2/wiki/Tutorial/ for details.
If use_status_and_missing
is set to true
(default), any variant for which the Variant::status is not passing (any status value other than 0) is considered to be filtered out. Instead of writing the counts, we then instead use the "missing" or "masked" extension of the sync file format to denote this, which is .:.:.:.:.:.
instead of the actual counts. This is first applied to the status of the Variant, in which case all samples are effected. It then is also propagated to the SampleCounts themselves, and their status is checked, with the same effect, but per sample. This allows to granuarly decide whether the whole Variant is filtered, or only individual samples.
Definition at line 54 of file sync_common.cpp.
|
inline |
Count of the pure nucleotide bases at this position, that is, the sum of all A
, C
, G
, and T
.
See nucleotide_sum() for details. This function gives the sum over all samples in the Variant.
Definition at line 306 of file population/function/functions.hpp.
|
inline |
Sum up all the base counts at this sample
, that is, the sum of all A
, C
, G
, T
, as well as the N
and D
count for indetermined and deleted counts.
See sample_counts_sum() for details. This function gives the sum over all samples in the Variant.
Definition at line 335 of file population/function/functions.hpp.
void transform_zero_out_by_max_count | ( | SampleCounts & | sample, |
size_t | max_count, | ||
bool | also_n_and_d_counts = true |
||
) |
Transform a SampleCounts sample
by setting any nucleotide count (A
, C
, G
, T
) to zero if max_count
is exceeded for that nucleotide.
This transformation is used as a type of quality control. All nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, and SampleCounts::t_count) that are above the given max_count
are set to zero.
If also_n_and_d_counts
is set (default), this filtering is also done for SampleCounts::n_count and SampleCounts::d_count, although they are not taken into account in the statistics.
Definition at line 78 of file sample_counts_filter_numerical.cpp.
void transform_zero_out_by_max_count | ( | Variant & | variant, |
size_t | max_count, | ||
bool | also_n_and_d_counts = true |
||
) |
Transform a SampleCounts sample
by setting any nucleotide count (A
, C
, G
, T
) to zero if max_count
is exceeded for that nucleotide.
This transformation is used as a type of quality control. All nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, and SampleCounts::t_count) that are above the given max_count
are set to zero.
If also_n_and_d_counts
is set (default), this filtering is also done for SampleCounts::n_count and SampleCounts::d_count, although they are not taken into account in the statistics.
This overload acts on all Variant::samples in the given variant
.
Definition at line 101 of file sample_counts_filter_numerical.cpp.
void transform_zero_out_by_min_count | ( | SampleCounts & | sample, |
size_t | min_count, | ||
bool | also_n_and_d_counts = true |
||
) |
Transform a SampleCounts sample
by setting any nucleotide count (A
, C
, G
, T
) to zero if min_count
is not reached for that nucleotide.
This transformation is used as a type of quality control. All nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, and SampleCounts::t_count) that are below the given min_count
are set to zero.
If also_n_and_d_counts
is set (default), this filtering is also done for SampleCounts::n_count and SampleCounts::d_count, although they are not taken into account in the statistics.
Definition at line 50 of file sample_counts_filter_numerical.cpp.
void transform_zero_out_by_min_count | ( | Variant & | variant, |
size_t | min_count, | ||
bool | also_n_and_d_counts = true |
||
) |
Transform a SampleCounts sample
by setting any nucleotide count (A
, C
, G
, T
) to zero if min_count
is not reached for that nucleotide.
This transformation is used as a type of quality control. All nucleotide counts (that is, SampleCounts::a_count, SampleCounts::c_count, SampleCounts::g_count, and SampleCounts::t_count) that are below the given min_count
are set to zero.
If also_n_and_d_counts
is set (default), this filtering is also done for SampleCounts::n_count and SampleCounts::d_count, although they are not taken into account in the statistics.
This overload acts on all Variant::samples in the given variant
.
Definition at line 68 of file sample_counts_filter_numerical.cpp.
void validate_cathedral_plot_record | ( | CathedralPlotRecord const & | record | ) |
Check a Cathedral Plot record
for internal consistency.
This checks that if a CathedralPlotRecord::value_matrix is given, that its dimension are the same as the ones provided in the CathedralPlotParameters, and that the number of window widths matches the matrix height.
Definition at line 59 of file cathedral_plot.cpp.
VariantFilterCategoryStats variant_filter_stats_category_counts | ( | VariantFilterStats const & | stats | ) |
Generate summary counts for a VariantFilterStats counter.
The given stats
contain counts for different reasons of filters that could have failed when filtering a Variant. This function summarizes those stats into some basic categories, and gives their sums.
This is meant as a broad summary, for instance for user output, where it might not be overly relevant which exact numerical filter got triggered how often by a particular filter, but rather we want to have an overview of which classes or categories of filters got triggered how often.
Definition at line 103 of file variant_filter.cpp.
size_t variant_filter_stats_category_counts | ( | VariantFilterStats const & | stats, |
VariantFilterTagCategory | category | ||
) |
Overload that only reports back a single category sum of the filter stats.
Definition at line 131 of file variant_filter.cpp.
VariantFilterTagCategory variant_filter_tag_to_category | ( | VariantFilterTag | tag | ) |
For a given tag
, return its category tag.
Definition at line 63 of file variant_filter.cpp.
std::string vcf_genotype_string | ( | std::vector< VcfGenotype > const & | genotypes | ) |
Return the VCF-like string representation of a set of VcfGenotype entries.
The VcfFormatIterator::get_values() function returns all genotype entries for a given sample of a record/line. Here, we return a string representation similar to VCF of these genotypes, for example 0|0
or ./1
.
Definition at line 674 of file vcf_common.cpp.
size_t vcf_genotype_sum | ( | std::vector< VcfGenotype > const & | genotypes | ) |
Return the sum of genotypes for a set of VcfGenotype entries, typically used to construct a genotype matrix with entries 0,1,2.
The function takes the given genotypes
, encodes the reference as 0 and any alternative as 1, and then sums this over the values. For diploid organisms, this yields possible results in the range of 0 (homozygote for the reference), 1 (heterzygote), or 2 (homozygote for the alternative), which is typically used in genotype matrices.
Definition at line 688 of file vcf_common.cpp.
std::string vcf_hl_type_to_string | ( | int | hl_type | ) |
Internal helper function to convert htslib-internal BCF_HL_* header line type values to their string representation as used in the VCF header ("FILTER", "INFO", "FORMAT", etc).
Definition at line 208 of file vcf_common.cpp.
std::string vcf_value_special_to_string | ( | int | vl_type_num | ) |
Definition at line 180 of file vcf_common.cpp.
std::string vcf_value_special_to_string | ( | VcfValueSpecial | vl_type_num | ) |
Definition at line 175 of file vcf_common.cpp.
std::string vcf_value_type_to_string | ( | int | ht_type | ) |
Definition at line 150 of file vcf_common.cpp.
std::string vcf_value_type_to_string | ( | VcfValueType | ht_type | ) |
Definition at line 145 of file vcf_common.cpp.
|
inline |
Get the denoninator to use for averaging an estimator across a window.
This simply uses the policy
to make a selection of which of the given input numbers to select. The function is meant as the one place where we make this choice, in order for all estimators to work the same.
The function takes all possible stats and numbers as input, in order to guarantee that they are all available. This also enforces correct usage of the calculators and processors, as neither number can be omitted by accident.
Definition at line 257 of file window_average.hpp.
Counts of how many SampleCounts with each SampleCountsFilterTagCategory occured in some data.
This is a convenient summary of the SampleCountsFilterStats, where not the full level of detail is needed, for instance for user output.
Definition at line 232 of file sample_counts_filter.hpp.
Counts of how many SampleCountss with each SampleCountsFilterTag occured in some data.
Definition at line 224 of file sample_counts_filter.hpp.
Counts of how many Variants with each VariantFilterTagCategory occured in some data.
This is a convenient summary of the VariantFilterStats, where not the full level of detail is needed, for instance for user output.
Definition at line 314 of file variant_filter.hpp.
Counts of how many Variants with each VariantFilterTag occured in some data.
Definition at line 306 of file variant_filter.hpp.
Iterate Variants, using a variety of input file formats.
This generic stream is an abstraction that is agnostic to the underlying file format, and can be used with anything that can be converted to a Variant per genome position. It offers to iterate a whole input file, and transform and filter the Variant as needed in order to make downstream processing as easy as possible.
This is useful for downstream processing, where we just want to work with the Variants along the genome, but want to allow different file formats for their input. Use this stream to achieve this. For example, use the make_variant_input_stream_...()
functions to get such an interator for different input file types.
The stream furthermore offers a data field of type VariantInputStreamData, which gets filled with basic data about the input file and sample names (if available in the file format). Use the data() function to access this data while iterating.
Definition at line 108 of file stream/variant_input_stream.hpp.
using VariantWindowStream = BaseWindowStream< VariantInputStream::Iterator, VariantInputStream::value_type, Window<VariantInputStream::value_type> > |
Typedef for a uniform Window stream type.
This typedef is used for any Window stream over an VariantInputStream. It's simply a more convenient name that the full template specialization.
Definition at line 65 of file variant_window_stream.hpp.
using VariantWindowViewStream = BaseWindowStream< VariantInputStream::Iterator, VariantInputStream::value_type, WindowView<VariantInputStream::value_type> > |
Typedef for our uniform WindowView stream type.
This typedef is used for any WindowView stream over an VariantInputStream. It's simply a more convenient name that the full template specialization.
In particular, we use this type as an abstraction that captures streams over both Window and WindowView, for instance when using make_window_view_stream() to wrap a WindowStream into a WindowViewStream. as Because we want to model different types of window streams, some of which use Window, some of which use WindowView, this abstraction allows us to have a single type.
Definition at line 82 of file variant_window_stream.hpp.
using VcfFormatIteratorFloat = VcfFormatIterator<float, double> |
Definition at line 67 of file vcf_format_iterator.hpp.
using VcfFormatIteratorGenotype = VcfFormatIterator<int32_t, VcfGenotype> |
Definition at line 68 of file vcf_format_iterator.hpp.
using VcfFormatIteratorInt = VcfFormatIterator<int32_t, int32_t> |
Definition at line 66 of file vcf_format_iterator.hpp.
using VcfFormatIteratorString = VcfFormatIterator<char*, std::string> |
Definition at line 65 of file vcf_format_iterator.hpp.
|
strong |
Interpolation algorithm for window sizes across the rows of a cathedral plot.
See cathedral_window_width() for details. We highly recommend to use kExponential
, as this offers the best visualization results where the window widths across rows interpolate exponentially between whole genome and individual pixel of the plot, and hence best show the effects of different orders of magnitude of windows on the computed statistic.
In contrast, kGeometric
decays too fast, where most of the plot has very small window sizes, while kLinear
does the opposite, and simply shows triangles of large window sizes, omitting most of the fine structure of the statistics for small windows.
Enumerator | |
---|---|
kExponential | |
kGeometric | |
kLinear |
Definition at line 80 of file cathedral_plot.hpp.
|
strong |
Policy helper to decide how to treat filtered SampleCounts.
In several functions where we need to take the SampleCounts samples of a Variant into account, we need to decide on whether we want to use all of them, or only those that are passing. For instance, when merging samples, this is important. This policy allows to select the needed behaviour.
Enumerator | |
---|---|
kAll | |
kOnlyPassing |
Definition at line 211 of file sample_counts_filter.hpp.
|
strong |
Enumerator | |
---|---|
kPassed | Sample has passed all filters. |
kMaskedPosition | Position has been masked out from processing. This can be due to, e.g., via a RegionLocus set from a fasta file, see read_mask_fasta(). We distinguish this from kMaskedRegion purely for semantic reasons. Both filters are due to some user-specified position-based filter, and created by similar functions. However, we generally mean to indicate that a masked position is due to some fine-grained positional filter, while masked regions are meant to indicate filters for larger regions such as chromsosomes or genes. |
kMaskedRegion | Position is part of a masked region. See kMaskedPosition for details on the distrinction between the two. |
kMissing | Position is missing in the input data. |
kNotPassed | Generic indicator that the sample has not passed a filter. Not used at the moment internally, but included here as a simple catch-all value if no further distrinction for the filter that failed is needed. |
kInvalid | Generic indicator that the sample is invalid. Not used at the moment internally. Similar to kNotPassed, this is a generic value for invalid samples. |
kEmpty | Zero nucleotide counts, after zeroing out counts based on the min_count and max_count. |
kBelowMinReadDepth | Sum of counts across all nucleotide counts is below the min read depth threshold. |
kAboveMaxReadDepth | Sum of counts across all nucleotide counts is above the max read depth threshold. |
kAboveDeletionsCountLimit | Too many deletions at the position. |
kNotSnp | Invariant position, not a SNP. |
kNotBiallelicSnp | SNP position, but not biallelic, i.e., has more than one alternative. |
kEnd |
Definition at line 54 of file sample_counts_filter.hpp.
|
strong |
List of filter categories for a SampleCounts.
We summarize certain filters into categories. This is more useful for users than to have all of the above detail filter tags. Most of the time, we are mostly interested in these categories here; it might not be worth having the above detail tag list in the first place.
Enumerator | |
---|---|
kPassed | SampleCounts has passed all filters. |
kMasked | Position is masked. |
kMissingInvalid | Position is missing or otherwise invalid. |
kNumeric | Any of the numeric variant filters failed. |
kEnd | End of the enum values. |
Definition at line 171 of file sample_counts_filter.hpp.
|
strong |
SlidingWindowType of a Window, that is, whether we slide along a fixed size interval of the genome, along a fixed number of variants, or represents a whole chromosome.
Enumerator | |
---|---|
kInterval | Windows of this type are defined by a fixed start and end position on a chromosome. The amount of data contained in between these two loci can differ, depending on the number of variant positions found in the underlying data iterator. |
kVariants | Windows of this type are defined as containing a fixed number of entries (usually, Variants or other data that), and hence can span window widths of differing sizes. |
kChromosome | Windows of this type contain positions across a whole chromosome. The window contains all data from a whole chromosome. Moving to the next window then is equivalent to moving to the next chromosome. Note that this might need a lot of memory to keep all the data at once. |
Definition at line 55 of file sliding_window_generator.hpp.
|
strong |
Select which method to use for reducing the max read depth of a SampleCounts sample or a Variant.
See make_variant_input_stream_sample_subsampling_transform() for usage.
Enumerator | |
---|---|
kSubscale | Use transform_subscale() |
kSubsampleWithReplacement | Use transform_subsample_with_replacement() |
kSubsampleWithoutReplacement | Use transform_subsample_without_replacement() |
Definition at line 99 of file function/variant_input_stream.hpp.
|
strong |
Select how to compute the denominator for the pool sequencing correction of Tajima's D.
This boils down to which read dept to use for computing the expected number of individuals sequenced, or, as an alternative, drop that term completely, and use a different strategy.
Enumerator | |
---|---|
kEmpiricalMinReadDepth | Use the empirical minimum read depth found in each window to compute the expected number of individuals in n_base(). This is a conservative estimator that in our assessment makes more sense to use than the user-provided minimum read depth setting (which is what PoPoolation does). We recommend this most of the time. |
kProvidedMinReadDepth | Fix the bugs of the original PoPoolation, but still use their way of computing the empirical pool size via n_base() using the user-provided minimum read depth. With the bugs of PoPoolation fixed, they still use the user-provided min_read_depth (see DiversityPoolSettings) as input for the n_base() function to compute the empirical pool size. We think that this is not ideal, and gives wrong estimates of the number of individuals sequenced. Still, we offer this behaviour here, as a means to compute what we think PoPoolation intended to compute without their more obvious bugs. |
kWithPopoolationBugs | Replicate the original behaviour of PoPoolation <= 1.2.2. The idea is the same as in kProvidedMinReadDepth, but re-introduces the bugs of PoPoolation. There are major bugs (as far as we are aware) in the PoPoolation implementation up until (and including) version 1.2.2:
Using this option, one can voluntarily activate these bugs here as well, in order to get results that are comparable with PoPoolation results. |
kPoolsize | Instead of using n_base() to obtain the number of individuals sequenced (empirical pool size), simply use the poolsize directly. This is another estimator, that does not use n_base() at all, and just assumes that the number of individuals sequenced is equal to the pool size. This is good under high read depths. |
kUncorrected | Do not correct Tajima's D at all. Deriving a valid correction for Tajima's D in the context of pool sequencing is very tricky, and coming up with estimators that correct for all biases and noises is hard. It involves knowing about the covariance of frequencies across sites, which again has a demographic component (How has the randomness from pool sequencing affected the sites?), and a pool sequencing component (How does the randomness in the allele frequencies at the sites vary?), which seems rather complicated to derive and use. So instead, we here simply use no correction at all. Hence, values cannot be interpreted absolutely, and are not comparable to values of classic (non-pool-sequence) Tajima's D. Still, knowing their sign, and comparing them relative to each other across windows, might yield valuable insight. |
Definition at line 61 of file diversity_pool_functions.hpp.
|
strong |
List of filters that we apply to a Variant, to indicate whether the Variant passed or not.
This can be used with VariantFilterStatus to indicate the type of filter that did not pass. This includes reasons such as missing data (nothing in the input file for this position), or filtering or masking out regions of the input, so that those can be considered downstream as well. We currently do not use all of them internall - some are meant as generic filter types that can be added on later if additional filters are needed.
We typically only want a single filter to fail, and do not consider any more filters once a Variant has been filtered out. That makes it easy to keep track of the reason, and speeds up processing by skipping filter calculations for Variants that won't be considered anyway.
The only value for now that we assign a fixed integer value is VariantFilterTag::kPassed, which is 0
, indicating that all filters passed. Other values of this enum are not meant to have a numercial stable value (for now), as we might extend this list later on. Hence, instead compare to the enum directly. However, we do provide a special VariantFilterTag::kEnd enum value, which can be used to iterate the enum values, and facilitate index-based access for counting; see VariantFilterStats for an example.
Enumerator | |
---|---|
kPassed | Variant has passed all filters. |
kMaskedPosition | Position has been masked out from processing. This can be due to, e.g., via a RegionLocus set from a fasta file, see read_mask_fasta(). We distinguish this from kMaskedRegion purely for semantic reasons. Both filters are due to some user-specified position-based filter, and created by similar functions. However, we generally mean to indicate that a masked position is due to some fine-grained positional filter, while masked regions are meant to indicate filters for larger regions such as chromsosomes or genes. |
kMaskedRegion | Position is part of a masked region. See kMaskedPosition for details on the distrinction between the two. |
kMissing | Position is missing in the input data. |
kNotPassed | Generic indicator that the Variant has not passed a filter. A simple catch-all value if no further distrinction for the filter that failed is needed. |
kInvalid | Generic indicator that the Variant is invalid. Similar to kNotPassed, this is a generic value for invalid Variants. |
kNoSamplePassed | None of the SampleCounts of the Variant passed their filters. If all samples of the Variant did not pass their respecive filters, this value can be set to skip the whole Variant. |
kNotAllSamplesPassed | Some of the SampleCounts of the Variant did not pass their filters. For algorithms that need every sample to be passig their filters, this can be used to indicate that some samples did not pass, and that this Variant hence also needs to be skipped. In many algorithms though this is not needed - if at least some samples are valid, we continue with the Variant and process it as far as possible. |
kEmpty | All counts across all samples are zero. This corresponds to a zero read depth position that is however not missing in the data. |
kBelowMinReadDepth | Sum of counts across all samples is below the min read depth threshold. |
kAboveMaxReadDepth | Sum of counts across all samples is above the max read depth threshold. |
kAboveDeletionsCountLimit | Too many deletions at the position. |
kNotSnp | Invariant position, not a SNP. This is a generic filter type if a position is invariant, for instance used when no min or max counts are given. Whenever those are given though, we use one of the more specific types below that indicate why exactly the position is not a SNP. |
kNotBiallelicSnp | SNP position, but not biallelic, i.e., has more than one alternative. This counts how many Variants were SNPs but not biallelic. It hence indicates how many Variants were filtered out because of the only_biallelic_snps filter setting. |
kBelowSnpMinCount | Sum of nucleotides is below VariantFilterNumericalParams::snp_min_count. That is, the variant would have counted as a SNP if the snp_min_count setting wasn't used. This is hence useful to see how many Variants were filtered out because of that setting. Note though that we do not make a distinction between biallelic and multialleleic SNPs here any more for simplicity. This counts any position that was filtered out for not being a SNP according to the only_snps and/or only_biallelic_snps after considering snp_min_count. |
kAboveSnpMaxCount | Sum of nucleotides is above VariantFilterNumericalParams::snp_max_count. Same as kBelowSnpMinCount, but for the snp_max_count setting instead. |
kBelowMinAlleleFreq | Did not reach minimum allele frequency. This flags positions that are not a SNP according to the minimum allele frequency did not have an allele frequency |
kEnd | End of the enum values. This value is solely provided as a stable name for referencing when iterating the values in the enum. See VariantFilterStats for an example of where this is used. Note: For that reason, this value has to be remain the last in the enum. |
Definition at line 71 of file variant_filter.hpp.
|
strong |
List of filter categories for a Variant.
We summarize certain filters into categories. This is more useful for users than to have all of the above detail filter tags. Most of the time, we are mostly interested in these categories here; it might not be worth having the above detail tag list in the first place.
Enumerator | |
---|---|
kPassed | Variant has passed all filters. |
kMasked | Position is masked. |
kMissingInvalid | Position is missing or otherwise invalid. |
kSamplesFailed | Either all or some of the samples failed their respetive filters. |
kNumeric | Any of the numeric variant filters failed. |
kInvariant | Postion is not a SNP according to the SNP filters. |
kEnd | End of the enum values. |
Definition at line 261 of file variant_filter.hpp.
|
strong |
Specification for the values determining header line types of VCF/BCF files.
This list contains the types of header lines that htslib uses for identification, as specified in the VCF header. Corresponds to the BCF_HL_*
macro constants defined by htslib. We statically assert that these have the same values.
Enumerator | |
---|---|
kFilter | |
kInfo | |
kFormat | |
kContig | |
kStructured | |
kGeneric |
Definition at line 71 of file vcf_common.hpp.
|
strong |
Specification for special markers for the number of values expected for key-value-pairs of VCF/BCF files.
This list contains the special markers for the number of values of the INFO
and FORMAT
key-value pairs, as specified in the VCF header, and used in the record lines. Corresponds to the BCF_VL_*
macro constants defined by htslib. We statically assert that these have the same values.
Enumerator | |
---|---|
kFixed | Fixed number of values expected. In VCF, this is denoted simply by an integer number. This simply specifies that there is a fixed number of values to be expected; we do not further define how many exaclty are expected here (the integer value). This is taken care of in a separate variable that is provided whenever a fixed-size value is needed, see for example VcfSpecification. |
kVariable | Variable number of possible values, or unknown, or unbounded. In VCF, this is denoted by '.'. |
kAllele | One value per alternate allele. In VCF, this is denoted as 'A'. |
kGenotype | One value for each possible genotype (more relevant to the FORMAT tags). In VCF, this is denoated as 'G'. |
kReference | One value for each possible allele (including the reference). In VCF, this is denoted as 'R'. |
Definition at line 106 of file vcf_common.hpp.
|
strong |
Specification for the data type of the values expected in key-value-pairs of VCF/BCF files.
This list contains the types of data in values of the INFO
and FORMAT
key-value pairs, as specified in the VCF header, and used in the record lines. Corresponds to the BCF_HT_*
macro constants defined by htslib. We statically assert that these have the same values.
Enumerator | |
---|---|
kFlag | |
kInteger | |
kFloat | |
kString |
Definition at line 89 of file vcf_common.hpp.
|
strong |
Position in the genome that is used for reporting when emitting or using a window.
See anchor_position() for details. The interval-based types are available for any BaseWindow type, that is, for Window and for WindowView for instance. The variant-based types however require random access to the data in the window, and hence are not applicable to WindowView.
Enumerator | |
---|---|
kIntervalBegin | |
kIntervalEnd | |
kIntervalMidpoint | |
kVariantFirst | |
kVariantLast | |
kVariantMedian | |
kVariantMean | |
kVariantMidpoint |
Definition at line 57 of file population/window/functions.hpp.
|
strong |
Select the method to use for computing window averages of statistic estimators.
When computing diversity metrics or FST in windows, we often want to compute an average across a window. Data might have positions that are missing, have low read depth, fail some other filter, or simply might only consist of the SNPs, if some SNP calling was applied before. Hence, we need different strategies to compute the per-site average across windows.
Enumerator | |
---|---|
kWindowLength | Use the window length. This does not take any characteristics of the data into account. This might underestimate diversity in regions with low coverage, as then, we might have positions with no coverage, so that we do not compute a value there, but they are still used in the denominator for computing the relative value. |
kAvailableLoci | Use the number of positions for which there was data at all, independent of all filter settings. This can be useful when SNP calling was applied beforehand. Technically, this simply uses the sum of the variant filter stats to get the number of positions that have been processed in total, except for any missing data. As the stats get incremented each time a position is processed, this sum reflects every entry that was exposed to some filter and not already completely removed, via e.g., region filters. |
kValidLoci | Use the number of positions that passed all quality and numerical filters, excluding the SNP-related filters. That is, these positions are of high quality, and include both the SNPs and the invariant positions. In the absence of any particular circumstances, this is the recommended option. This can also be used in combination with a mask file, in order to specify loci that are to be considered valid, even in the absence of actual data in the input. Technically, this uses the passing positions of the sample stats (that passed all, including the SNP-related, filters), as well as the sum of all SNP-related variant filter stats. That sum is the number of positions that passed all previous filters, such as numerical ones, but failed in the last step, i.e., are not SNPs according to the SNP-related filter settings. See variant_filter_stats_category_counts() for a function that does something similar. |
kValidSnps | Use the number of SNPs only. This will overestimate the average, but might be useful depending on the given type of data. Note that if the data only consists of SNPs in the first place, this is identical to kValidLoci anyway. Technically, this uses the sum of passing sample filters. We could use the passing variant filter count here instead, but that might be too big, as it could contain positions that passed the total variant filters, but failed for a particular sample. So we are conservative here, and only use the number of postions that passed everything. |
kSum | Simply report the total sum, with no averaging, i.e., the absolute value of the metric. |
kProvidedLoci | Use exactly the provided loci as set in the window of a GenomeLocusSet. This bypasses all the above data-based ways of determining the denominator for window averaging, and instead uses a user-provided mask in form of a GenomeLocusSet. Within the window, all positions that are set to |
Definition at line 65 of file window_average.hpp.
std::function<void(Variant&)> make_variant_input_stream_sample_name_filter_transform(std::vector< bool > const &sample_filter) |
Helper function to create a Variant transform to filter out samples.
The function expects a bool vector indicating which samples within a Variant to keep. The vector needs to have the same length as the Variant has samples. It can be created for instance with make_sample_name_filter() based on sample names. However, as Variant does not store the names itself, those might need to be accessed from the VariantInputStream data() function, which yields a VariantInputStreamData object.
Using this to filter samples by their name is likely somewhat slower than doing it direclty in the parsers, which we also offer. However, this way offers a unified and simple way to achieve the filtering, as it is applied down the line, and hence can be used on any VariantInputStream.
Definition at line 85 of file function/variant_input_stream.hpp.
std::function<void(Variant&)> make_variant_input_stream_sample_subsampling_transform(size_t max_depth, SubsamplingMethod method=SubsamplingMethod::kSubscale) |
Create a Variant transformation function that subscales or subsamples the base counts to be below a given max_depth
.
This is intended to be used as a transformation on a VariantInputStream, see add_transform() for details. The function creates a transformation function to be used on a stream, and subsamples or subscales the SampleCounts of each Variant in the stream, so that max_depth
is not exceeded. This is useful for instance when computing the pool sequencing diversity estimators, which have computational terms depending on read depth; reducing high read depth can hence help to improve computational time.
By default, we use SubsamplingMethod::kSubscale, which is the closest to a lossless reduction of the read depth that can be achieved with integer counts. The two other methods instead resample from a distribution based on the given counts of the Variant, and can hence also be used to create in-silico alternative populations based on the original sample.
Definition at line 134 of file function/variant_input_stream.hpp.
std::function<void(Variant const&)> make_variant_input_stream_sequence_length_observer(std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict) |
Helper function to check that some Variant input has positions that agree with those reported in a SequenceDict.
Similar to make_variant_input_stream_sequence_order_observer(), but without the sequence order check. Meant for situations where this order check is either not necessary, or already done in some other way, for example in a VariantParallelInputStream.
See make_variant_input_stream_sequence_order_observer() for details on usage.
Definition at line 217 of file function/variant_input_stream.hpp.
std::function<void(Variant const&)> make_variant_input_stream_sequence_order_observer(std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict={}, bool check_sequence_lengths=true) |
Helper function to check that some Variant input is sorted properly.
The function creates a std::function
that can be used with a VariantInputStream to check the order (and length) of the Variants being processed.
By default, the different types of VariantInputStreams that we create for different file types with the make_variant_input_stream_from_...()
simply iterate over their respective input files as they are. However, we might want to check that their order is correct, or that their lengths fit our expectation.
This function by default checks this, using lexicographical order for the chromosomes, and numerical order for the positions within chromosomes. This however might not always be the order as present in the input source. In order to provide a custom order, the function optinally takes a SequenceDict, which is used for the order instead. See there for details on how to create such a dictionary from, e.g., .fai
, .dict
, or reference genome .fasta
files.
Furthermore, when a sequence_dict
is provided, using the check_sequence_lengths
, we can also check that the positions within each chromosome that we encounter in the input source fit with the expectations of that dictionary. This serves as an additional sanity check of the input files.
If any of these checks fail, an exception is thrown.
Exemplary usage:
// Get a sequence dict from a fai file auto sequence_dict = read_sequence_fai( from_file( fai_file )); // Create a VariantInputStream, for example from a sync file auto variant_stream = make_variant_input_stream_from_sync_file( sync_file ); // Add the observer that checks order, using a variant_stream.add_observer( make_variant_input_stream_sequence_order_observer( sequence_dict, true ) ); // Use the iterator as usual. for( auto const& variant : variant_stream ) { // ... }
Definition at line 202 of file function/variant_input_stream.hpp.
|
static |
Map from sam flags to their numerical value, for different types of naming of the flags.
Definition at line 58 of file sam_flags.cpp.