Classes | |
class | AlleleFrequencyWindow |
class | BaseFstPoolCalculator |
Base class to compute FST between two pooled samples, given two instances of SampleCounts. More... | |
class | BaseWindow |
Base class for Window and WindowView, to share common functionality. More... | |
class | BaseWindowStream |
Base class for streams of Windows over the chromosomes of a genome. More... | |
class | BedReader |
Reader for BED (Browser Extensible Data) files. More... | |
struct | CathedralPlotParameters |
Plot parameters to make a cathedral plot. More... | |
struct | CathedralPlotRecord |
Collection of the data used for making for a cathedral plot. More... | |
class | ChromosomeWindowStream |
Stream for traversing each chromosome as a whole, with an inner WindowView iterator over the positions of each chromosome. More... | |
class | DiversityPoolCalculator |
Compute Theta Pi, Theta Watterson, and Tajia's D in their pool-sequencing corrected versions according to Kofler et al. More... | |
class | DiversityPoolProcessor |
Helper class to iterate over Variants and process the samples (SampleCounts), using a set of DiversityPoolCalculator instances, one for each sample. More... | |
struct | DiversityPoolSettings |
Settings used by different pool-sequencing corrected diversity statistics. More... | |
struct | EmptyAccumulator |
Empty helper data struct to serve as a dummy for Window. More... | |
struct | EmptyGenomeData |
Helper struct to define a default empty data for the classes GenomeLocus, GenomeRegion, and GenomeRegionList. More... | |
struct | FilterStats |
Counts of how many entries with a particular Filter Tag occured in some data. More... | |
class | FilterStatus |
Tag class to assign a filter status to a Variant or SampleCounts. More... | |
class | FrequencyTableInputStream |
Iterate an input source and parse it as a table of allele frequencies or counts. More... | |
class | FstCathedralAccumulator |
Accumulate the partial pi values for a given window to produce a cathedral plot. More... | |
struct | FstCathedralPlotRecord |
Data for making one FST cathedral plot, that is, one pair of samples and one chromosome. More... | |
class | FstPoolCalculatorKarlsson |
Compute the F_ST statistic for pool-sequenced data of Karlsson et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
class | FstPoolCalculatorKofler |
Compute the F_ST statistic for pool-sequenced data of Kofler et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
class | FstPoolCalculatorUnbiased |
Compute our unbiased F_ST statistic for pool-sequenced data for two ranges of SampleCountss. More... | |
class | FstPoolProcessor |
Helper class to iterate over Variants and process pairs of FST between their samples (SampleCounts), using a set of BaseFstPoolCalculator. More... | |
class | GenomeHeatmap |
struct | GenomeLocus |
A single locus, that is, a position (or coordinate) on a chromosome. More... | |
class | GenomeLocusSet |
List of positions/coordinates in a genome, for each chromosome. More... | |
struct | GenomeRegion |
A region (between two positions) on a chromosome. More... | |
class | GenomeRegionList |
List of regions in a genome, for each chromosome. More... | |
class | GenomeRegionReader |
Generic reader for inputs that contain a genomic region or locus per line, in different formats. More... | |
class | GenomeWindowStream |
Stream for traversing the entire genome as a single window, with an inner WindowView iterator over the positions along the chromosomes. More... | |
class | GffReader |
Reader for GFF2 and GFF3 (General Feature Format) and GTF (General Transfer Format) files. More... | |
class | HeatmapColorization |
class | HtsFile |
Wrap an ::htsFile struct. More... | |
class | IntervalWindowStream |
Stream for sliding Windows of fixed sized intervals over the chromosomes of a genome. More... | |
class | MapBimReader |
Reader for map/bim files as used by PLINK. More... | |
class | PositionWindowStream |
Stream for traversing each position along a genome individually. More... | |
class | QueueWindowStream |
Stream for Windows contaiing a queue of entries, i.e., sliding Windows of a fixed number of selected positions in a genome. More... | |
class | RegionWindowStream |
Stream for Windows representing regions of a genome. More... | |
struct | SampleCounts |
One set of nucleotide sample counts, for example for a given sample that represents a pool of sequenced individuals. More... | |
struct | SampleCountsFilterNumericalParams |
Filter settings to filter and transform SampleCounts. More... | |
class | SamVariantInputStream |
Input stream for SAM/BAM/CRAM files that produces a Variant per genome position. More... | |
class | SimplePileupInputStream |
Iterate an input source and parse it as a (m)pileup file. More... | |
class | SimplePileupReader |
Reader for line-by-line assessment of (m)pileup files. More... | |
class | SlidingWindowGenerator |
Generator for sliding Windows over the chromosomes of a genome. More... | |
struct | SortedSampleCounts |
Ordered array of sample counts for the four nucleotides. More... | |
class | SyncInputStream |
Iterate an input source and parse it as a sync file. More... | |
class | SyncReader |
Reader for PoPoolation2's "synchronized" files. More... | |
struct | Variant |
A single variant at a position in a chromosome, along with SampleCounts for a set of samples. More... | |
struct | VariantFilterNumericalParams |
class | VariantGaplessInputStream |
Stream adapter that visits every position in the genome. More... | |
struct | VariantInputStreamData |
Data storage for input-specific information when traversing a variant file. More... | |
struct | VariantInputStreamFromVcfParams |
Parameters to use when streaming through a VCF file as Variants. More... | |
class | VariantParallelInputStream |
Iterate multiple input sources that yield Variants in parallel. More... | |
class | VcfFormatHelper |
Provide htslib helper functions. More... | |
class | VcfFormatIterator |
Iterate the FORMAT information for the samples in a SNP/variant line in a VCF/BCF file. More... | |
class | VcfGenotype |
Simple wrapper class for one genotype field for a sample. More... | |
class | VcfHeader |
Capture the information from a header of a VCF/BCF file. More... | |
class | VcfInputStream |
Iterate an input source and parse it as a VCF/BCF file. More... | |
class | VcfRecord |
Capture the information of a single SNP/variant line in a VCF/BCF file. More... | |
struct | VcfSpecification |
Collect the four required keys that describe an INFO or FORMAT sub-field of VCF/BCF files. More... | |
class | Window |
Window over the chromosomes of a genome. More... | |
class | WindowView |
Proxy view over window-like regions of a genome. More... | |
class | WindowViewStream |
Stream wrapper that turns a BaseWindowStream over Window into a BaseWindowStream over WindowView. More... | |
Functions | |
double | a_n (double n) |
Compute a_n , the sum of reciprocals. More... | |
bool | all_finite_ (FstCathedralPlotRecord::Entry const &entry) |
size_t | allele_count (SampleCounts const &sample) |
Return the number of alleles, that is, of non-zero nucleotide counts of the sample . More... | |
size_t | allele_count (SampleCounts const &sample, size_t min_count) |
Return the number of alleles, taking a min_count into consideration, that is, we compute the number of nucleotide counts of the sample that are at least the min_count . More... | |
size_t | allele_count (SampleCounts const &sample, size_t min_count, size_t max_count) |
Return the number of alleles, taking a min_count and max_count into consideration, that is, we compute the number of nucleotide counts of the sample that are at least min_count and at most max_count . More... | |
double | alpha_star (double n) |
Compute alpha* according to Achaz 2008 and Kofler et al. 2011. More... | |
double | amnm_ (size_t poolsize, size_t nucleotide_count, size_t allele_frequency) |
Local helper function to compute values for the denominator. More... | |
template<class D > | |
size_t | anchor_position (BaseWindow< D > const &window, WindowAnchorType anchor_type=WindowAnchorType::kIntervalBegin) |
Get the position in the chromosome reported according to a specific WindowAnchorType. More... | |
template<class D , class A = EmptyAccumulator> | |
size_t | anchor_position (Window< D, A > const &window, WindowAnchorType anchor_type=WindowAnchorType::kIntervalBegin) |
Get the position in the chromosome reported according to a specific WindowAnchorType. More... | |
bool | apply_sample_counts_filter_numerical (SampleCounts &sample, SampleCountsFilterNumericalParams const ¶ms) |
Filter a given SampleCounts based on the numerical properties of the counts. More... | |
bool | apply_sample_counts_filter_numerical (SampleCounts &sample, SampleCountsFilterNumericalParams const ¶ms, SampleCountsFilterStats &stats) |
Filter a given SampleCounts based on the numerical properties of the counts. More... | |
bool | apply_sample_counts_filter_numerical (Variant &variant, SampleCountsFilterNumericalParams const ¶ms, bool all_need_pass=false) |
bool | apply_sample_counts_filter_numerical (Variant &variant, SampleCountsFilterNumericalParams const ¶ms, VariantFilterStats &variant_stats, SampleCountsFilterStats &sample_count_stats, bool all_need_pass=false) |
Filter a given SampleCounts based on the numerical properties of the counts. More... | |
bool | apply_variant_filter_numerical (Variant &variant, VariantFilterNumericalParams const ¶ms) |
Filter a given Variant based on the numerical properties of the counts. More... | |
bool | apply_variant_filter_numerical (Variant &variant, VariantFilterNumericalParams const ¶ms, VariantFilterStats &stats) |
Filter a given Variant based on the numerical properties of the counts. More... | |
double | b_n (double n) |
Compute b_n , the sum of squared reciprocals. More... | |
double | beta_star (double n) |
Compute beta* according to Achaz 2008 and Kofler et al. 2011. More... | |
genesis::utils::JsonDocument | cathedral_plot_parameters_to_json_document (CathedralPlotParameters const ¶meters) |
Get a user-readable description of a CathedralPlotParameters as a JsonDocument. More... | |
genesis::utils::JsonDocument | cathedral_plot_record_to_json_document (CathedralPlotRecord const &record) |
Get a user-readable description of the data of a CathedralPlotRecord as a JsonDocument. More... | |
double | cathedral_window_width (CathedralPlotRecord const &record, size_t row) |
Compute the window width for a row in a cathedral plot. More... | |
CathedralWindowWidthMethod | cathedral_window_width_method_from_string (std::string const &method) |
Helper function to return a CathedralWindowWidthMethod from its textual representation. More... | |
std::string | cathedral_window_width_method_to_string (CathedralWindowWidthMethod method) |
Helper function to return a textual representation of the method . More... | |
template<class Record , class Accumulator > | |
void | compute_cathedral_matrix (CathedralPlotParameters const ¶meters, Record &record, Accumulator accumulator=Accumulator{}) |
Template function to compute the value matrix for a cathedral plot, given a recored with plot parameters and per-position data to accumulate per window. More... | |
void | compute_fst_cathedral_matrix (CathedralPlotParameters const ¶meters, FstCathedralPlotRecord &record) |
Compute the matrix of values that represents the cathedral plot for FST. More... | |
std::vector< FstCathedralPlotRecord > | compute_fst_cathedral_records (VariantInputStream &iterator, FstPoolProcessor &processor, FstPoolCalculatorUnbiased::Estimator fst_estimator, std::vector< std::string > const &sample_names=std::vector< std::string >{}, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict=nullptr) |
Compute the components of per-position FST data for all pairs of samples in the given processor , for the chromosomes in the given input iterator . More... | |
std::vector< FstCathedralPlotRecord > | compute_fst_cathedral_records_for_chromosome (VariantInputStream::Iterator &iterator, FstPoolProcessor &processor, FstPoolCalculatorUnbiased::Estimator fst_estimator, std::vector< std::string > const &sample_names=std::vector< std::string >{}, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict=nullptr) |
Compute the components of per-position FST data for all pairs of samples in the given processor , for the current chromosome in the given input iterator . More... | |
std::pair< char, double > | consensus (SampleCounts const &sample) |
Consensus character for a SampleCounts, and its confidence. More... | |
std::pair< char, double > | consensus (SampleCounts const &sample, bool is_covered) |
Consensus character for a SampleCounts, and its confidence. More... | |
SampleCounts | convert_to_sample_counts (SimplePileupReader::Sample const &sample, unsigned char min_phred_score) |
Variant | convert_to_variant (SimplePileupReader::Record const &record, unsigned char min_phred_score) |
Variant | convert_to_variant_as_individuals (VcfRecord const &record, bool use_allelic_depth=false) |
Convert a VcfRecord to a Variant, treating each sample as an individual, and combining them all into one SampleCounts sample. More... | |
Variant | convert_to_variant_as_pool (VcfRecord const &record) |
Convert a VcfRecord to a Variant, treating each sample column as a pool of individuals. More... | |
void | convert_to_variant_as_pool_set_missing_gt_ (VcfRecord const &record, Variant &variant) |
Local helper function that sets the filter status of a Variant and its samples to missing depending on whether the genotypes of the samples are missing or not. More... | |
void | convert_to_variant_as_pool_tally_bases_ (VcfRecord const &record, std::pair< std::array< char, 6 >, size_t > const &snp_chars, VcfFormatIteratorInt const &sample_ad, SampleCounts &sample) |
Local helper function to tally up the bases form a VcfRecord into a SampleCounts. More... | |
template<class ForwardIterator1 , class ForwardIterator2 > | |
double | f_st_pool_karlsson (ForwardIterator1 p1_begin, ForwardIterator1 p1_end, ForwardIterator2 p2_begin, ForwardIterator2 p2_end, bool only_passing_samples=true) |
Compute the F_ST statistic for pool-sequenced data of Karlsson et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
template<class ForwardIterator1 , class ForwardIterator2 > | |
double | f_st_pool_kofler (size_t p1_poolsize, size_t p2_poolsize, ForwardIterator1 p1_begin, ForwardIterator1 p1_end, ForwardIterator2 p2_begin, ForwardIterator2 p2_end, bool only_passing_samples=true) |
Compute the F_ST statistic for pool-sequenced data of Kofler et al as used in PoPoolation2, for two ranges of SampleCountss. More... | |
template<class ForwardIterator1 , class ForwardIterator2 > | |
std::pair< double, double > | f_st_pool_unbiased (size_t p1_poolsize, size_t p2_poolsize, ForwardIterator1 p1_begin, ForwardIterator1 p1_end, ForwardIterator2 p2_begin, ForwardIterator2 p2_end, bool only_passing_samples=true) |
Compute our unbiased F_ST statistic for pool-sequenced data for two ranges of SampleCountss. More... | |
double | f_star (double a_n, double n) |
Compute f* according to Achaz 2008 and Kofler et al. 2011. More... | |
void | fill_fst_cathedral_records_from_processor_ (FstPoolProcessor const &processor, std::vector< FstCathedralPlotRecord > &records, size_t position) |
genesis::utils::JsonDocument | fst_cathedral_plot_record_to_json_document (FstCathedralPlotRecord const &record) |
Get a user-readable description of the data of a FstCathedralPlotRecord as a JsonDocument. More... | |
std::vector< std::pair< std::string, std::string > > | fst_pool_processor_sample_names (FstPoolProcessor const &processor, std::vector< std::string > const &sample_names) |
Return a list of sample name pairs for each calculator in an FstPoolProcessor. More... | |
FstPoolCalculatorUnbiased::Estimator | fst_pool_unbiased_estimator_from_string (std::string const &str) |
std::string | fst_pool_unbiased_estimator_to_string (FstPoolCalculatorUnbiased::Estimator estimator) |
GenomeLocusSet | genome_locus_set_from_vcf_file (std::string const &file) |
Read a VCF file, and use its positions to create a GenomeLocusSet. More... | |
GenomeRegionList | genome_region_list_from_vcf_file (std::string const &file) |
Read a VCF file, and use its positions to create a GenomeRegionList. More... | |
void | genome_region_list_from_vcf_file (std::string const &file, GenomeRegionList &target) |
Read a VCF file, and add its positions to an existing GenomeRegionList. More... | |
SampleCounts::size_type | get_base_count (SampleCounts const &sample, char base) |
Get the count for a base given as a char. More... | |
std::pair< std::array< char, 6 >, size_t > | get_vcf_record_snp_ref_alt_chars_ (VcfRecord const &record) |
Local helper function that returns the REF and ALT chars of a VcfRecord for SNPs. More... | |
template<class D > | |
size_t | get_window_length (BaseWindow< D > const &window) |
Get the length of a given Window. More... | |
template<class D > | |
size_t | get_window_provided_loci_count (BaseWindow< D > const &window, std::shared_ptr< GenomeLocusSet > provided_loci) |
Get the count of provided loci in a window. More... | |
char | guess_alternative_base (Variant const &variant, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the alternative base of a Variant. More... | |
void | guess_and_set_ref_and_alt_bases (Variant &variant, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference and alternative bases for a Variant, and set them. More... | |
void | guess_and_set_ref_and_alt_bases (Variant &variant, char ref_base, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference base. More... | |
void | guess_and_set_ref_and_alt_bases (Variant &variant, genesis::sequence::ReferenceGenome const &ref_genome, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference genme to obtain the base. More... | |
genesis::sequence::QualityEncoding | guess_pileup_quality_encoding (std::shared_ptr< utils::BaseInputSource > source, size_t max_lines=0) |
Guess the quality score encoding for an (m)pileup input, based on counts of how often each char appeared in the quality string (of the input pileup file for example). More... | |
char | guess_reference_base (Variant const &variant, bool force=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Guess the reference base of a Variant. More... | |
double | heterozygosity (SampleCounts const &sample, bool with_bessel=false) |
Compute classic heterozygosity. More... | |
bool | is_covered (GenomeLocusSet const &loci, std::string const &chromosome, size_t position) |
Test whether the chromosome/position is within a given GenomeLocusSet . More... | |
template<class T > | |
bool | is_covered (GenomeLocusSet const &loci, T const &locus) |
Test whether the chromosome/position of a locus is within a given GenomeLocusSet . More... | |
bool | is_covered (GenomeLocusSet const &loci, VcfRecord const &variant) |
bool | is_covered (GenomeRegion const ®ion, std::string const &chromosome, size_t position) |
Test whether the chromosome/position is within a given genomic region . More... | |
template<class T > | |
bool | is_covered (GenomeRegion const ®ion, T const &locus) |
Test whether the chromosome/position of a locus is within a given genomic region . More... | |
bool | is_covered (GenomeRegion const ®ion, VcfRecord const &variant) |
bool | is_covered (GenomeRegionList const ®ions, std::string const &chromosome, size_t position) |
Test whether the chromosome/position is within a given list of genomic regions . More... | |
template<class T > | |
bool | is_covered (GenomeRegionList const ®ions, T const &locus) |
Test whether the chromosome/position of a locus is within a given list of genomic regions . More... | |
bool | is_covered (GenomeRegionList const ®ions, VcfRecord const &variant) |
constexpr bool | is_valid_base (char c) |
Return whether a given base is in ACGT , case insensitive. More... | |
constexpr bool | is_valid_base_or_n (char c) |
Return whether a given base is in ACGTN , case insensitive. More... | |
std::pair< genesis::utils::JsonDocument, genesis::utils::Matrix< double > > | load_cathedral_plot_record_components_from_files (std::string const &base_path) |
Load the parts of a cathedral plot from a set of files. More... | |
CathedralPlotRecord | load_cathedral_plot_record_from_files (std::string const &base_path) |
Load the record of a cathedral plot from a set of files. More... | |
int | locus_compare (GenomeLocus const &l, GenomeLocus const &r) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
int | locus_compare (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Three-way comparison (spaceship operator <=> ) for two loci in a genome. More... | |
bool | locus_equal (GenomeLocus const &l, GenomeLocus const &r) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | locus_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | locus_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | locus_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Equality comparison (== ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, GenomeLocus const &r) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
bool | locus_greater (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Greater than comparison (> ) for two loci in a genome. More... | |
bool | locus_greater (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than comparison (> ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, GenomeLocus const &r) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
bool | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
bool | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
int | locus_greater_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
bool | locus_inequal (GenomeLocus const &l, GenomeLocus const &r) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | locus_inequal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | locus_inequal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | locus_inequal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Inequality comparison (!= ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, GenomeLocus const &r) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
bool | locus_less (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Less than comparison (< ) for two loci in a genome. More... | |
bool | locus_less (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than comparison (< ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, GenomeLocus const &r) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (GenomeLocus const &l, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, GenomeLocus const &r, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
bool | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
bool | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, ::genesis::sequence::SequenceDict const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
int | locus_less_or_equal (std::string const &l_chromosome, size_t l_position, std::string const &r_chromosome, size_t r_position, std::shared_ptr< genesis::sequence::SequenceDict > const &sequence_dict) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
genesis::utils::Matrix< genesis::utils::Color > | make_cathedral_plot_heatmap (CathedralPlotRecord const &record, genesis::utils::HeatmapParameters const &heatmap_parameters) |
Make a cathedral plot heat map as a color matrix. More... | |
genesis::utils::SvgDocument | make_cathedral_plot_svg (CathedralPlotRecord const &record, genesis::utils::HeatmapParameters const &heatmap_parameters) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes. More... | |
genesis::utils::SvgDocument | make_cathedral_plot_svg (CathedralPlotRecord const &record, genesis::utils::HeatmapParameters const &heatmap_parameters, genesis::utils::Matrix< genesis::utils::Color > const &image) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
ChromosomeWindowStream< InputStreamIterator, DataType > | make_chromosome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator > | |
ChromosomeWindowStream< InputStreamIterator > | make_default_chromosome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, for a default use case. More... | |
template<class InputStreamIterator > | |
GenomeWindowStream< InputStreamIterator > | make_default_genome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a GenomeWindowStream for the whole genome, for a default use case. More... | |
template<class InputStreamIterator > | |
IntervalWindowStream< InputStreamIterator > | make_default_interval_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t width=0, size_t stride=0) |
Helper function to instantiate a IntervalWindowStream for a default use case. More... | |
template<class InputStreamIterator > | |
PositionWindowStream< InputStreamIterator > | make_default_position_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, for a default use case. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_position_window_view_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper class that creates a PositionWindowStream with default functors and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
QueueWindowStream< InputStreamIterator > | make_default_queue_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t count=0, size_t stride=0) |
Helper function to instantiate a QueueWindowStream for a default use case. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_queue_window_view_stream (InputStreamIterator begin, InputStreamIterator end, size_t count, size_t stride=0) |
Helper class that creates a QueueWindowStream with default functors and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
RegionWindowStream< InputStreamIterator > | make_default_region_window_stream (InputStreamIterator begin, InputStreamIterator end, std::shared_ptr< GenomeRegionList > region_list) |
Helper function to instantiate a RegionWindowStream for a default use case. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_region_window_view_stream (InputStreamIterator begin, InputStreamIterator end, std::shared_ptr< GenomeRegionList > region_list) |
Helper class that creates a RegionWindowStream and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_default_sliding_interval_window_view_stream (InputStreamIterator begin, InputStreamIterator end, size_t width=0, size_t stride=0) |
Helper class that creates a IntervalWindowStream and wraps it in a WindowViewStream. More... | |
DiversityPoolProcessor | make_diversity_pool_processor (WindowAveragePolicy window_average_policy, DiversityPoolSettings const &settings, std::vector< size_t > const &pool_sizes) |
Create an DiversityPoolProcessor to compute diversity for all samples. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (size_t index, std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for one-to-all FST computation between one sample and all others. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (size_t index_1, size_t index_2, std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for one-to-one FST computation between two samples. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for all-to-all computation of FST between all pairs of samples. More... | |
template<class Calculator , typename... Args> | |
FstPoolProcessor | make_fst_pool_processor (std::vector< std::pair< size_t, size_t >> const &sample_pairs, std::vector< size_t > const &pool_sizes, Args... args) |
Create an FstPoolProcessor for computation of FST between specific pairs of samples. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
GenomeWindowStream< InputStreamIterator, DataType > | make_genome_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a GenomeWindowStream for the whole genome, without the need to specify the template parameters manually. More... | |
template<class T , class R > | |
std::shared_ptr< T > | make_input_stream_with_sample_filter_ (std::string const &filename, R const &reader, std::vector< size_t > const &sample_indices, bool inverse_sample_indices, std::vector< bool > const &sample_filter) |
Local helper function template that takes care of intilizing an input stream, and setting the sample filters, for those streams for which we do not know the number of samples prior to starting the file iteration. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
IntervalWindowStream< InputStreamIterator, DataType > | make_interval_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t width=0, size_t stride=0) |
Helper function to instantiate a IntervalWindowStream without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator > | |
PositionWindowStream< InputStreamIterator > | make_passing_variant_position_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a PositionWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_passing_variant_position_window_view_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper class that creates a PositionWindowStream with default functions for Variant data, and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator > | |
QueueWindowStream< InputStreamIterator > | make_passing_variant_queue_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t count=0, size_t stride=0) |
Helper function to instantiate a QueueWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected. More... | |
template<class InputStreamIterator > | |
WindowViewStream< InputStreamIterator > | make_passing_variant_queue_window_view_stream (InputStreamIterator begin, InputStreamIterator end, size_t count, size_t stride=0) |
Helper class that creates a QueueWindowStream with default functions for Variant data, and wraps it in a WindowViewStream. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
PositionWindowStream< InputStreamIterator, DataType > | make_position_window_stream (InputStreamIterator begin, InputStreamIterator end) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
QueueWindowStream< InputStreamIterator, DataType > | make_queue_window_stream (InputStreamIterator begin, InputStreamIterator end, size_t count=0, size_t stride=0) |
Helper function to instantiate a QueueWindowStream without the need to specify the template parameters manually. More... | |
template<class InputStreamIterator , class DataType = typename InputStreamIterator::value_type> | |
RegionWindowStream< InputStreamIterator, DataType > | make_region_window_stream (InputStreamIterator begin, InputStreamIterator end, std::shared_ptr< GenomeRegionList > region_list) |
Helper function to instantiate a RegionWindowStream without the need to specify the template parameters manually. More... | |
template<class GenomeMaskType > | |
std::function< void(Variant &)> | make_sample_counts_filter_by_region_tagging (std::vector< std::shared_ptr< GenomeMaskType >> const &sample_masks, SampleCountsFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream on a Variant to filter its SampleCounts by genome regions, by tagging non-covered positions with the given tag . More... | |
std::function< void(Variant &)> | make_sample_counts_filter_numerical_tagging (SampleCountsFilterNumericalParams const ¶ms, bool all_need_pass=false) |
Return a functional to numerically filter the SampleCounts samples in a Variant tagging the ones that do not pass the filters, and potentially tagging the Variant. More... | |
std::function< void(Variant &)> | make_sample_counts_filter_numerical_tagging (SampleCountsFilterNumericalParams const ¶ms, VariantFilterStats &variant_stats, SampleCountsFilterStats &sample_count_stats, bool all_need_pass=false) |
std::vector< bool > | make_sample_name_filter (std::vector< std::string > const &sample_names, std::vector< std::string > const &names_filter, bool inverse_filter=false) |
Create a filter for samples, indicating which to keep. More... | |
std::vector< std::string > | make_sample_name_list_ (std::string const &source_name, size_t size) |
Local helper to fill the sample names of file formats without sample names. More... | |
std::function< bool(Variant const &)> | make_variant_filter_by_region_excluding (GenomeRegion const ®ion, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a genome region, by excluding non-covered positions from the stream. More... | |
std::function< bool(Variant const &)> | make_variant_filter_by_region_excluding (std::shared_ptr< GenomeLocusSet > loci, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream. More... | |
std::function< bool(Variant const &)> | make_variant_filter_by_region_excluding (std::shared_ptr< GenomeRegionList > regions, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream. More... | |
std::function< void(Variant &)> | make_variant_filter_by_region_tagging (GenomeRegion const ®ion, VariantFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a genome region, by tagging non-covered positions with the given tag . More... | |
std::function< void(Variant &)> | make_variant_filter_by_region_tagging (std::shared_ptr< GenomeLocusSet > loci, VariantFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag . More... | |
std::function< void(Variant &)> | make_variant_filter_by_region_tagging (std::shared_ptr< GenomeRegionList > regions, VariantFilterTag tag, bool complement=false) |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag . More... | |
std::function< bool(Variant &)> | make_variant_filter_numerical_excluding (VariantFilterNumericalParams const ¶ms) |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters. More... | |
std::function< bool(Variant &)> | make_variant_filter_numerical_excluding (VariantFilterNumericalParams const ¶ms, VariantFilterStats &stats) |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (SampleCountsFilterNumericalParams const &sample_count_params, VariantFilterNumericalParams const &variant_params, bool all_need_pass=false) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (SampleCountsFilterNumericalParams const &sample_count_params, VariantFilterNumericalParams const &variant_params, VariantFilterStats &variant_stats, SampleCountsFilterStats &sample_count_stats, bool all_need_pass=false) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (VariantFilterNumericalParams const ¶ms) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
std::function< void(Variant &)> | make_variant_filter_numerical_tagging (VariantFilterNumericalParams const ¶ms, VariantFilterStats &stats) |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr< GenomeLocusSet > genome_locus_set) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::ReferenceGenome > ref_genome) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::ReferenceGenome > ref_genome, std::shared_ptr< GenomeLocusSet > genome_locus_set) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::SequenceDict > seq_dict) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_gapless_input_stream (VariantInputStream const &input, std::shared_ptr<::genesis::sequence::SequenceDict > seq_dict, std::shared_ptr< GenomeLocusSet > genome_locus_set) |
Create a VariantGaplessInputStream from a VariantInputStream input , and wrap it agian in a VariantInputStream. More... | |
VariantInputStream | make_variant_input_stream_from_frequency_table_file (std::string const &filename, char separator_char='\t', FrequencyTableInputStream const &reader=FrequencyTableInputStream{}) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_frequency_table_file (std::string const &filename, std::vector< std::string > const &sample_names_filter, bool inverse_sample_names_filter=false, char separator_char='\t', FrequencyTableInputStream const &reader=FrequencyTableInputStream{}) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_individual_vcf_file (std::string const &filename, VariantInputStreamFromVcfParams const ¶ms=VariantInputStreamFromVcfParams{}, bool use_allelic_depth=false) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as an individual, and combining them all into one SampleCounts sample. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file (std::string const &filename, SimplePileupReader const &reader=SimplePileupReader{}) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file (std::string const &filename, std::vector< bool > const &sample_filter, SimplePileupReader const &reader=SimplePileupReader{}) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file (std::string const &filename, std::vector< size_t > const &sample_indices, bool inverse_sample_indices=false, SimplePileupReader const &reader=SimplePileupReader{}) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_pileup_file_ (std::string const &filename, SimplePileupReader const &reader, std::vector< size_t > const &sample_indices, bool inverse_sample_indices, std::vector< bool > const &sample_filter) |
Local helper function that takes care of the three functions below. More... | |
VariantInputStream | make_variant_input_stream_from_pool_vcf_file (std::string const &filename, VariantInputStreamFromVcfParams const ¶ms=VariantInputStreamFromVcfParams{}) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as a pool of individuals. More... | |
VariantInputStream | make_variant_input_stream_from_sam_file (std::string const &filename, SamVariantInputStream const &reader=SamVariantInputStream{}) |
Create a VariantInputStream to iterate the contents of a SAM/BAM/CRAM file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file (std::string const &filename) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file (std::string const &filename, std::vector< bool > const &sample_filter) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file (std::string const &filename, std::vector< size_t > const &sample_indices, bool inverse_sample_indices=false) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants. More... | |
VariantInputStream | make_variant_input_stream_from_sync_file_ (std::string const &filename, std::vector< size_t > const &sample_indices, bool inverse_sample_indices, std::vector< bool > const &sample_filter) |
VariantInputStream | make_variant_input_stream_from_variant_gapless_input_stream (VariantGaplessInputStream const &gapless_input) |
Create a VariantInputStream that wraps a VariantGaplessInputStream. More... | |
VariantInputStream | make_variant_input_stream_from_variant_parallel_input_stream (VariantParallelInputStream const ¶llel_input, VariantParallelInputStream::JoinedVariantParams const &joined_variant_params=VariantParallelInputStream::JoinedVariantParams{}) |
Create a VariantInputStream to iterate multiple input sources at once, using a VariantParallelInputStream. More... | |
VariantInputStream | make_variant_input_stream_from_vcf_file_ (std::string const &filename, VariantInputStreamFromVcfParams const ¶ms, bool pool_samples, bool use_allelic_depth) |
Local helper function that takes care of both main functions below. More... | |
VariantInputStream | make_variant_input_stream_from_vector (std::vector< Variant > const &variants) |
Create a VariantInputStream to iterate the contents of std::vector containing Variants. More... | |
std::function< void(Variant &)> | make_variant_input_stream_sample_name_filter_transform (std::vector< bool > const &sample_filter) |
std::function< void(Variant &)> | make_variant_input_stream_sample_subsampling_transform (size_t max_depth, SubsamplingMethod method) |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_length_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict) |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_order_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict, bool check_sequence_lengths) |
VariantInputStream | make_variant_merging_input_stream (VariantInputStream const &input, std::unordered_map< std::string, std::string > const &sample_name_to_group, bool allow_ungrouped_samples=false, SampleCountsFilterPolicy filter_policy=SampleCountsFilterPolicy::kOnlyPassing) |
Create a VariantInputStream that merges samples from its underlying input . More... | |
VariantMergeGroupAssignment | make_variant_merging_input_stream_group_assignment_ (VariantInputStream const &variant_input, std::unordered_map< std::string, std::string > const &sample_name_to_group, bool allow_ungrouped_samples) |
Helper function to create a mapping from sample indices to group indices. More... | |
template<class T > | |
WindowViewStream< typename T::InputStreamType, typename T::DataType > | make_window_view_stream (T &&window_iterator) |
Create a WindowViewStream that iterates some underlying BaseWindowStream. More... | |
template<class T > | |
WindowViewStream< typename T::InputStreamType, typename T::DataType > | make_window_view_stream (T const &window_iterator) |
Create a WindowViewStream that iterates some underlying BaseWindowStream. More... | |
SampleCounts | merge (SampleCounts const &p1, SampleCounts const &p2) |
Merge the counts of two SampleCountss. More... | |
SampleCounts | merge (std::vector< SampleCounts > const &p, SampleCountsFilterPolicy filter_policy) |
Merge the counts of a vector SampleCountss. More... | |
void | merge_inplace (SampleCounts &p1, SampleCounts const &p2) |
Merge the counts of two SampleCountss, by adding the counts of the second (p2 ) to the first (p1 ). More... | |
SampleCounts | merge_sample_counts (Variant const &v, SampleCountsFilterPolicy filter_policy) |
Merge the counts of a vector SampleCountss. More... | |
double | n_base (size_t read_depth, size_t poolsize) |
Compute the n_base term used for Tajima's D in Kofler et al. 2011, using a faster closed form expression. More... | |
double | n_base_matrix (size_t read_depth, size_t poolsize) |
Compute the n_base term used for Tajima's D in Kofler et al. 2011, following their approach. More... | |
template<typename T > | |
std::array< size_t, 4 > | nucleotide_sorting_order (std::array< T, 4 > const &values) |
Return the sorting order of four values, for instance of the four nucleotides ACGT , in descending order (largest first). More... | |
constexpr size_t | nucleotide_sum (SampleCounts const &sample) |
Count of the pure nucleotide bases at this position, that is, the sum of all A , C , G , and T . More... | |
bool | operator!= (GenomeLocus const &l, GenomeLocus const &r) |
Inequality comparison (!= ) for two loci in a genome. More... | |
bool | operator!= (GenomeRegion const &a, GenomeRegion const &b) |
Inequality comparison (!= ) for two GenomeRegions. More... | |
bool | operator< (GenomeLocus const &l, GenomeLocus const &r) |
Less than comparison (< ) for two loci in a genome. More... | |
std::ostream & | operator<< (std::ostream &os, GenomeLocus const &locus) |
std::ostream & | operator<< (std::ostream &os, GenomeRegion const ®ion) |
std::ostream & | operator<< (std::ostream &os, SampleCounts const &bs) |
Output stream operator for SampleCounts instances. More... | |
bool | operator<= (GenomeLocus const &l, GenomeLocus const &r) |
Less than or equal comparison (<= ) for two loci in a genome. More... | |
bool | operator== (GenomeLocus const &l, GenomeLocus const &r) |
Equality comparison (== ) for two loci in a genome. More... | |
bool | operator== (GenomeRegion const &a, GenomeRegion const &b) |
Equality comparison (!= ) for two GenomeRegions. More... | |
bool | operator> (GenomeLocus const &l, GenomeLocus const &r) |
Greater than comparison (> ) for two loci in a genome. More... | |
bool | operator>= (GenomeLocus const &l, GenomeLocus const &r) |
Greater than or equal comparison (>= ) for two loci in a genome. More... | |
GenomeRegion | parse_genome_region (std::string const ®ion, bool zero_based=false, bool end_exclusive=false) |
Parse a genomic region. More... | |
GenomeRegionList | parse_genome_regions (std::string const ®ions, bool zero_based=false, bool end_exclusive=false) |
Parse a set/list of genomic regions. More... | |
genesis::utils::Matrix< double > | pij_matrix_ (size_t max_read_depth, size_t poolsize) |
genesis::utils::Matrix< double > const & | pij_matrix_resolver_ (size_t max_read_depth, size_t poolsize) |
std::vector< FstCathedralPlotRecord > | prepare_fst_cathedral_records_for_chromosome_ (std::string const &chromosome, FstPoolProcessor const &processor, FstPoolCalculatorUnbiased::Estimator fst_estimator, std::vector< std::string > const &sample_names) |
std::string | print_sample_counts_filter_category_stats (SampleCountsFilterCategoryStats const &stats, bool verbose) |
std::string | print_sample_counts_filter_category_stats (SampleCountsFilterStats const &stats, bool verbose=false) |
std::ostream & | print_sample_counts_filter_category_stats (std::ostream &os, SampleCountsFilterCategoryStats const &stats, bool verbose) |
std::ostream & | print_sample_counts_filter_category_stats (std::ostream &os, SampleCountsFilterStats const &stats, bool verbose=false) |
std::string | print_sample_counts_filter_stats (SampleCountsFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
std::ostream & | print_sample_counts_filter_stats (std::ostream &os, SampleCountsFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
std::ostream & | print_variant_filter_category_stats (std::ostream &os, VariantFilterCategoryStats const &stats, bool verbose) |
std::ostream & | print_variant_filter_category_stats (std::ostream &os, VariantFilterStats const &stats, bool verbose=false) |
std::string | print_variant_filter_category_stats (VariantFilterCategoryStats const &stats, bool verbose) |
std::string | print_variant_filter_category_stats (VariantFilterStats const &stats, bool verbose=false) |
std::ostream & | print_variant_filter_stats (std::ostream &os, VariantFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
std::string | print_variant_filter_stats (VariantFilterStats const &stats, bool verbose=false) |
Print a textual representation of the counts collected. More... | |
GenomeLocusSet | read_mask_fasta (std::shared_ptr< utils::BaseInputSource > source, size_t mask_min=0, bool invert=false) |
Read an input source as a mask fasta file, and return its content as a GenomeLocusSet. More... | |
genesis::sequence::SequenceDict | reference_locus_set_to_dict (GenomeLocusSet const &set) |
void | resample_counts (SampleCounts &sample, size_t target_depth) |
Resample all counts in a SampleCounts sample to a new target_depth . More... | |
void | resample_counts (Variant &variant, size_t target_depth) |
Resample all counts in a SampleCounts sample to a new target_depth . More... | |
template<typename Distribution > | |
void | resample_counts_ (SampleCounts &sample, size_t max_depth, Distribution distribution, bool skip_if_below_target_depth) |
Local helper function to avoid code duplication. Takes the distribution (with or without replacement) and performs the resampling of base counts. More... | |
void | rescale_counts (SampleCounts &sample, size_t target_depth) |
Transform a SampleCounts sample by re-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max if max_depth is exceeded for the sample. More... | |
void | rescale_counts (Variant &variant, size_t target_depth) |
Transform a SampleCounts sample by re-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max if max_depth is exceeded for the sample. More... | |
void | rescale_counts_ (SampleCounts &sample, size_t target_depth, bool skip_if_below_target_depth) |
template<class Data , class Accumulator = EmptyAccumulator> | |
void | run_vcf_window (SlidingWindowGenerator< Data, Accumulator > &generator, std::string const &vcf_file, std::function< Data(VcfRecord const &)> conversion, std::function< bool(VcfRecord const &)> condition={}) |
Convenience function to iterate over a whole VCF file. More... | |
std::string | sam_flag_to_string (int flags) |
Turn a set of flags for sam/bam/cram reads into their textual representation. More... | |
SampleCountsFilterCategoryStats | sample_counts_filter_stats_category_counts (SampleCountsFilterStats const &stats) |
Generate summary counts for a SampleCountsFilterStats counter. More... | |
size_t | sample_counts_filter_stats_category_counts (SampleCountsFilterStats const &stats, SampleCountsFilterTagCategory category) |
Overload that only reports back a single category sum of the filter stats. More... | |
SampleCountsFilterTagCategory | sample_counts_filter_tag_to_category (SampleCountsFilterTag tag) |
For a given tag , return its category tag. More... | |
template<typename T > | |
std::array< size_t, 6 > | sample_counts_sorting_order (std::array< T, 6 > const &v) |
Return the sorting order of six values, for instance of the four nucleotides ACGT and the N and D counts of a SampleCounts object, in descending order (largest first). More... | |
constexpr size_t | sample_counts_sum (SampleCounts const &sample) |
Sum up all the base counts at this sample , that is, the sum of all A , C , G , T , as well as the N and D count for indetermined and deleted counts. More... | |
void | save_cathedral_plot_record_to_files (CathedralPlotRecord const &record, std::string const &base_path) |
Convenience function to save the record of a cathedral plot in a set of files. More... | |
void | save_cathedral_plot_record_to_files (genesis::utils::JsonDocument const &record_document, genesis::utils::Matrix< double > const &record_value_matrix, std::string const &base_path) |
Save the record of a cathedral plot in a set of files. More... | |
void | save_cathedral_plot_record_to_targets (genesis::utils::JsonDocument const &record_document, genesis::utils::Matrix< double > const &record_value_matrix, std::shared_ptr< genesis::utils::BaseOutputTarget > json_target, std::shared_ptr< genesis::utils::BaseOutputTarget > csv_target) |
Save the record of a cathedral plot in a set of output targets. More... | |
void | set_base_count (SampleCounts &sample, char base, SampleCounts::size_type value) |
Set the count for a base given as a char. More... | |
template<> | |
void | SimplePileupReader::process_ancestral_base_< SimplePileupReader::Sample > (utils::InputStream &input_stream, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::process_quality_string_< SimplePileupReader::Sample > (utils::InputStream &input_stream, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::set_sample_read_bases_< SimplePileupReader::Sample > (std::string const &read_bases, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::set_sample_read_depth_< SimplePileupReader::Sample > (size_t read_depth, SimplePileupReader::Sample &sample) const |
template<> | |
void | SimplePileupReader::set_target_alternative_base_< SimplePileupReader::Record > (SimplePileupReader::Record &target) const |
std::pair< SortedSampleCounts, SortedSampleCounts > | sorted_average_sample_counts (SampleCounts const &sample_a, SampleCounts const &sample_b) |
Return the sorted base counts of both input samples, orderd by the average frequencies of the nucleotide counts in the two samples. More... | |
SortedSampleCounts | sorted_sample_counts (SampleCounts const &sample) |
Return the order of base counts (nucleotides), largest one first. More... | |
SortedSampleCounts | sorted_sample_counts (Variant const &variant, bool reference_first, SampleCountsFilterPolicy filter_policy) |
Get a list of bases sorted by their counts. More... | |
SortedSampleCounts | sorted_sample_counts_ (Variant const &variant, bool reference_first, SampleCounts const &total) |
Local helper function that takes an already computed total from merge_sample_counts(), so that it can be re-used internally here. More... | |
int | string_to_sam_flag (std::string const &value) |
Parse a string as a set of flags for sam/bam/cram reads. More... | |
void | subsample_counts_with_replacement (SampleCounts &sample, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) with replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subsample_counts_with_replacement (Variant &variant, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) with replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subsample_counts_without_replacement (SampleCounts &sample, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) without replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subsample_counts_without_replacement (Variant &variant, size_t max_depth) |
Transform a SampleCounts sample by subsampling the nucleotide counts (A , C , G , T , as well as N and D ) without replacement to sum up to max if max_depth is exceeded for the sample. More... | |
void | subscale_counts (SampleCounts &sample, size_t max_depth) |
Transform a SampleCounts sample by sub-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max_depth if max_depth is exceeded for the sample. More... | |
void | subscale_counts (Variant &variant, size_t max_depth) |
Transform a SampleCounts sample by sub-scaling the base counts (A , C , G , T , as well as N and D ) to sum up to max_depth if max_depth is exceeded for the sample. More... | |
double | tajima_d_pool (DiversityPoolSettings const &settings, double theta_pi, double theta_watterson, size_t poolsize, double window_avg_denom, size_t empirical_min_read_depth) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | tajima_d_pool (DiversityPoolSettings const &settings, double theta_pi, double theta_watterson, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | tajima_d_pool (DiversityPoolSettings const &settings, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute the pool-sequencing corrected version of Tajima's D according to Kofler et al. More... | |
double | tajima_d_pool_denominator (DiversityPoolSettings const &settings, double theta, size_t poolsize, double window_avg_denom, size_t empirical_min_read_depth) |
Compute the denominator for the pool-sequencing correction of Tajima's D according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | theta_pi (ForwardIterator begin, ForwardIterator end, bool with_bessel=true, bool only_passing_samples=true) |
Compute classic theta pi, that is, the sum of heterozygosities. More... | |
template<class ForwardIterator > | |
double | theta_pi_pool (DiversityPoolSettings const &settings, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute theta pi with pool-sequencing correction according to Kofler et al, that is, the sum of heterozygosities divided by the correction denominator. More... | |
double | theta_pi_pool (DiversityPoolSettings const &settings, size_t poolsize, SampleCounts const &sample) |
Compute theta pi with pool-sequencing correction according to Kofler et al, for a single SampleCounts. More... | |
double | theta_pi_pool_denominator (DiversityPoolSettings const &settings, size_t poolsize, size_t nucleotide_count) |
Compute the denominator for the pool-sequencing correction of theta pi according to Kofler et al. More... | |
template<class ForwardIterator > | |
double | theta_pi_within_pool (size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute classic theta pi (within a population), that is, the sum of heterozygosities including Bessel's correction for total nucleotide sum at each position, and Bessel's correction for the pool size. More... | |
template<class ForwardIterator > | |
double | theta_watterson_pool (DiversityPoolSettings const &settings, size_t poolsize, ForwardIterator begin, ForwardIterator end, bool only_passing_samples=true) |
Compute theta watterson with pool-sequencing correction according to Kofler et al. More... | |
double | theta_watterson_pool (DiversityPoolSettings const &settings, size_t poolsize, SampleCounts const &sample) |
Compute theta watterson with pool-sequencing correction according to Kofler et al, for a single SampleCounts sample. More... | |
double | theta_watterson_pool_denominator (DiversityPoolSettings const &settings, size_t poolsize, size_t nucleotide_count) |
Compute the denominator for the pool-sequencing correction of theta watterson according to Kofler et al. More... | |
std::string | to_string (GenomeLocus const &locus) |
std::string | to_string (GenomeRegion const ®ion) |
std::ostream & | to_sync (SampleCounts const &bs, std::ostream &os, bool use_status_and_missing=true) |
Output a SampleCounts instance to a stream in the PoPoolation2 sync format. More... | |
std::ostream & | to_sync (Variant const &var, std::ostream &os, bool use_status_and_missing=true) |
Output a Variant instance to a stream in the PoPoolation2 sync format. More... | |
size_t | total_nucleotide_sum (Variant const &variant, SampleCountsFilterPolicy filter_policy) |
Count of the pure nucleotide bases at this position, that is, the sum of all A , C , G , and T . More... | |
size_t | total_sample_counts_sum (Variant const &variant, SampleCountsFilterPolicy filter_policy) |
Sum up all the base counts at this sample , that is, the sum of all A , C , G , T , as well as the N and D count for indetermined and deleted counts. More... | |
void | transform_zero_out_by_max_count (SampleCounts &sample, size_t max_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if max_count is exceeded for that nucleotide. More... | |
void | transform_zero_out_by_max_count (Variant &variant, size_t max_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if max_count is exceeded for that nucleotide. More... | |
void | transform_zero_out_by_min_count (SampleCounts &sample, size_t min_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if min_count is not reached for that nucleotide. More... | |
void | transform_zero_out_by_min_count (Variant &variant, size_t min_count, bool also_n_and_d_counts=true) |
Transform a SampleCounts sample by setting any nucleotide count (A , C , G , T ) to zero if min_count is not reached for that nucleotide. More... | |
void | validate_cathedral_plot_record (CathedralPlotRecord const &record) |
Check a Cathedral Plot record for internal consistency. More... | |
VariantFilterCategoryStats | variant_filter_stats_category_counts (VariantFilterStats const &stats) |
Generate summary counts for a VariantFilterStats counter. More... | |
size_t | variant_filter_stats_category_counts (VariantFilterStats const &stats, VariantFilterTagCategory category) |
Overload that only reports back a single category sum of the filter stats. More... | |
VariantFilterTagCategory | variant_filter_tag_to_category (VariantFilterTag tag) |
For a given tag , return its category tag. More... | |
std::string | vcf_genotype_string (std::vector< VcfGenotype > const &genotypes) |
Return the VCF-like string representation of a set of VcfGenotype entries. More... | |
size_t | vcf_genotype_sum (std::vector< VcfGenotype > const &genotypes) |
Return the sum of genotypes for a set of VcfGenotype entries, typically used to construct a genotype matrix with entries 0,1,2. More... | |
std::string | vcf_hl_type_to_string (int hl_type) |
Internal helper function to convert htslib-internal BCF_HL_* header line type values to their string representation as used in the VCF header ("FILTER", "INFO", "FORMAT", etc). More... | |
std::string | vcf_value_special_to_string (int vl_type_num) |
std::string | vcf_value_special_to_string (VcfValueSpecial vl_type_num) |
std::string | vcf_value_type_to_string (int ht_type) |
std::string | vcf_value_type_to_string (VcfValueType ht_type) |
template<class D > | |
double | window_average_denominator (WindowAveragePolicy policy, BaseWindow< D > const &window, std::shared_ptr< GenomeLocusSet > provided_loci, VariantFilterStats const &variant_filter_stats, SampleCountsFilterStats const &sample_counts_filter_stats) |
Get the denoninator to use for averaging an estimator across a window. More... | |
Enumerations | |
enum | CathedralWindowWidthMethod { kExponential, kGeometric, kLinear } |
Interpolation algorithm for window sizes across the rows of a cathedral plot. More... | |
enum | SampleCountsFilterPolicy { kAll, kOnlyPassing } |
Policy helper to decide how to treat filtered SampleCounts. More... | |
enum | SampleCountsFilterTag : FilterStatus::IntType { kPassed = 0, kMaskedPosition, kMaskedRegion, kMissing, kNotPassed, kInvalid, kEmpty, kBelowMinReadDepth, kAboveMaxReadDepth, kAboveDeletionsCountLimit, kNotSnp, kNotBiallelicSnp, kEnd } |
enum | SampleCountsFilterTagCategory : FilterStatus::IntType { kPassed = 0, kMasked, kMissingInvalid, kNumeric, kEnd } |
List of filter categories for a SampleCounts. More... | |
enum | SlidingWindowType { kInterval, kVariants, kChromosome } |
SlidingWindowType of a Window, that is, whether we slide along a fixed size interval of the genome, along a fixed number of variants, or represents a whole chromosome. More... | |
enum | SubsamplingMethod { kSubscale, kSubsampleWithReplacement, kSubsampleWithoutReplacement } |
Select which method to use for reducing the max read depth of a SampleCounts sample or a Variant. More... | |
enum | TajimaDenominatorPolicy { kEmpiricalMinReadDepth, kProvidedMinReadDepth, kWithPopoolationBugs, kPoolsize, kUncorrected } |
Select how to compute the denominator for the pool sequencing correction of Tajima's D. More... | |
enum | VariantFilterTag : FilterStatus::IntType { kPassed = 0, kMaskedPosition, kMaskedRegion, kMissing, kNotPassed, kInvalid, kNoSamplePassed, kNotAllSamplesPassed, kEmpty, kBelowMinReadDepth, kAboveMaxReadDepth, kAboveDeletionsCountLimit, kNotSnp, kNotBiallelicSnp, kBelowSnpMinCount, kAboveSnpMaxCount, kBelowMinAlleleFreq, kEnd } |
List of filters that we apply to a Variant, to indicate whether the Variant passed or not. More... | |
enum | VariantFilterTagCategory : FilterStatus::IntType { kPassed = 0, kMasked, kMissingInvalid, kSamplesFailed, kNumeric, kInvariant, kEnd } |
List of filter categories for a Variant. More... | |
enum | VcfHeaderLine : int { kFilter = 0, kInfo = 1, kFormat = 2, kContig = 3, kStructured = 4, kGeneric = 5 } |
Specification for the values determining header line types of VCF/BCF files. More... | |
enum | VcfValueSpecial : int { kFixed = 0, kVariable = 1, kAllele = 2, kGenotype = 3, kReference = 4 } |
Specification for special markers for the number of values expected for key-value-pairs of VCF/BCF files. More... | |
enum | VcfValueType : int { kFlag = 0, kInteger = 1, kFloat = 2, kString = 3 } |
Specification for the data type of the values expected in key-value-pairs of VCF/BCF files. More... | |
enum | WindowAnchorType { kIntervalBegin, kIntervalEnd, kIntervalMidpoint, kVariantFirst, kVariantLast, kVariantMedian, kVariantMean, kVariantMidpoint } |
Position in the genome that is used for reporting when emitting or using a window. More... | |
enum | WindowAveragePolicy { kWindowLength, kAvailableLoci, kValidLoci, kValidSnps, kSum, kProvidedLoci } |
Select the method to use for computing window averages of statistic estimators. More... | |
Variables | |
std::function< void(Variant &)> | make_variant_input_stream_sample_name_filter_transform (std::vector< bool > const &sample_filter) |
Helper function to create a Variant transform to filter out samples. More... | |
std::function< void(Variant &)> | make_variant_input_stream_sample_subsampling_transform (size_t max_depth, SubsamplingMethod method=SubsamplingMethod::kSubscale) |
Create a Variant transformation function that subscales or subsamples the base counts to be below a given max_depth . More... | |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_length_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict) |
Helper function to check that some Variant input has positions that agree with those reported in a SequenceDict. More... | |
std::function< void(Variant const &)> | make_variant_input_stream_sequence_order_observer (std::shared_ptr< genesis::sequence::SequenceDict > sequence_dict={}, bool check_sequence_lengths=true) |
Helper function to check that some Variant input is sorted properly. More... | |
static const std::unordered_map< std::string, int > | sam_flag_name_to_int_ |
Map from sam flags to their numerical value, for different types of naming of the flags. More... | |
double a_n | ( | double | n | ) |
Compute a_n
, the sum of reciprocals.
This is the sum of reciprocals up to n-1
, which is \( a_n = \sum_{i=1}^{n-1} \frac{1}{i} \).
See Equation 3.6 in
Hahn, M. W. (2018). Molecular Population Genetics. https://global.oup.com/academic/product/molecular-population-genetics-9780878939657
for details.
Note that we are implementing this for double
n
, instead of an unsigned integer type, as some variants of the tajima_d() computation actually use n_base() to get an "effective" pool size. That is kind of wrong, but we have implemented it here for comparability with PoPoolation. In these cases, we round n
to the nearest integer first. For any actual integer numbers of pool sizes, double
has enough precision to accurately stor that integer value, so there is no loss of accuracy in those cases.
Definition at line 285 of file diversity_pool_functions.cpp.
bool genesis::population::all_finite_ | ( | FstCathedralPlotRecord::Entry const & | entry | ) |
Definition at line 49 of file fst_cathedral.cpp.
size_t allele_count | ( | SampleCounts const & | sample | ) |
Return the number of alleles, that is, of non-zero nucleotide counts of the sample
.
This looks at all four nucleotide counts (ACGT
), and returns the number of them that are non zero. The result hence is between 0 and 4, with 0 = no allele had any counts and 4 = all alleles have a non-zero count.
Definition at line 309 of file population/function/functions.cpp.
size_t allele_count | ( | SampleCounts const & | sample, |
size_t | min_count | ||
) |
Return the number of alleles, taking a min_count
into consideration, that is, we compute the number of nucleotide counts of the sample
that are at least the min_count
.
This looks at all four nucleotide counts (ACGT
), and returns the number of them that are at least the min_count
. If min_count == 0
, we instead call the allele_count(SampleCounts const&) overload of this function that does not consider minimum counts.
Definition at line 329 of file population/function/functions.cpp.
size_t allele_count | ( | SampleCounts const & | sample, |
size_t | min_count, | ||
size_t | max_count | ||
) |
Return the number of alleles, taking a min_count
and max_count
into consideration, that is, we compute the number of nucleotide counts of the sample
that are at least min_count
and at most max_count
.
This looks at all four nucleotide counts (ACGT
), and returns the number of them that are at least the min_count
and at most max_count
. If either of them is zero, they are not taken into account though.
Definition at line 351 of file population/function/functions.cpp.
double alpha_star | ( | double | n | ) |
Compute alpha*
according to Achaz 2008 and Kofler et al. 2011.
This is needed for the computation of tajima_d_pool() according to
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
The equation is based on
G. Achaz.
Testing for neutrality in samples with sequencing errors.
(2008) Genetics, 179(3), 1409–1424. https://doi.org/10.1534/genetics.107.082198
See there for details.
Definition at line 330 of file diversity_pool_functions.cpp.
double genesis::population::amnm_ | ( | size_t | poolsize, |
size_t | nucleotide_count, | ||
size_t | allele_frequency | ||
) |
Local helper function to compute values for the denominator.
This computes the sum over all r poolsizes of 1/r times a binomial:
\( \sum_{m=b}^{C-b} \frac{1}{k} {C \choose m} \left(\frac{k}{n}\right)^m \left(\frac{n-k}{n}\right)^{C-m} \)
This is needed in the pool seq correction denoinators of Theta Pi and Theta Watterson.
Definition at line 65 of file diversity_pool_functions.cpp.
size_t genesis::population::anchor_position | ( | BaseWindow< D > const & | window, |
WindowAnchorType | anchor_type = WindowAnchorType::kIntervalBegin |
||
) |
Get the position in the chromosome reported according to a specific WindowAnchorType.
This overload accepts both Window and WindowView, and dispatches as needed. For WindowView, only interval-based anchor types are available. Furthermore, Window has an additional template parameter A
, which we need to ignore here to fit the BaseWindow signature. Hence, when using a Window with a non-defaulted A
template parameter, the dispatch cannot bet done with this function.
Definition at line 157 of file population/window/functions.hpp.
size_t genesis::population::anchor_position | ( | Window< D, A > const & | window, |
WindowAnchorType | anchor_type = WindowAnchorType::kIntervalBegin |
||
) |
Get the position in the chromosome reported according to a specific WindowAnchorType.
When a window is filled with data, we need to report the position in the genome at which the window is. There are several ways that this position can be computed. Typically, just the first position of the window is used (that is, for an interval, the beginning of the interval, and for variants, the position of the first variant).
However, it might be desirable to report a different position, for example when plotting the results. When using WindowType::kVariants for example, one might want to plot the values computed per window at the midpoint genome position of the variants in that window.
Definition at line 82 of file population/window/functions.hpp.
bool apply_sample_counts_filter_numerical | ( | SampleCounts & | sample, |
SampleCountsFilterNumericalParams const & | params | ||
) |
Filter a given SampleCounts based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the SampleCounts::status to the corresponding SampleCountsFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
This overload simply omits the incrementing of the SampleCountsFilterStats counter.
Definition at line 217 of file sample_counts_filter_numerical.cpp.
bool apply_sample_counts_filter_numerical | ( | SampleCounts & | sample, |
SampleCountsFilterNumericalParams const & | params, | ||
SampleCountsFilterStats & | stats | ||
) |
Filter a given SampleCounts based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the SampleCounts::status to the corresponding SampleCountsFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
Definition at line 115 of file sample_counts_filter_numerical.cpp.
bool apply_sample_counts_filter_numerical | ( | Variant & | variant, |
SampleCountsFilterNumericalParams const & | params, | ||
bool | all_need_pass = false |
||
) |
This overload simply omits the incrementing of the SampleCountsFilterStats counter.
Definition at line 277 of file sample_counts_filter_numerical.cpp.
bool apply_sample_counts_filter_numerical | ( | Variant & | variant, |
SampleCountsFilterNumericalParams const & | params, | ||
VariantFilterStats & | variant_stats, | ||
SampleCountsFilterStats & | sample_count_stats, | ||
bool | all_need_pass = false |
||
) |
Filter a given SampleCounts based on the numerical properties of the counts.
This function applies the version of this function for SampleCounts to all Variant::samples. If all_need_pass
is set, the function returns true
iff all individual samples passed all filters, and false
otherwise, and sets the Variant::status to VariantFilterTag::kNotAllSamplesPassed. If all_need_pass
is not set, the function returns true
if any sample passed the filters. In either case, all samples of the variant
are always processed (no short-circuit, as we want all of them to have the count transformations applied to them). If all of them fail the filter settings, the Variant::status is set to VariantFilterTag::kNoSamplePassed, independently of all_need_pass
.
Definition at line 229 of file sample_counts_filter_numerical.cpp.
bool apply_variant_filter_numerical | ( | Variant & | variant, |
VariantFilterNumericalParams const & | params | ||
) |
Filter a given Variant based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the Variant::status to the corresponding VariantFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
This overload simply omits the incrementing of the VariantFilterStats counter.
Definition at line 232 of file variant_filter_numerical.cpp.
bool apply_variant_filter_numerical | ( | Variant & | variant, |
VariantFilterNumericalParams const & | params, | ||
VariantFilterStats & | stats | ||
) |
Filter a given Variant based on the numerical properties of the counts.
The function applies the filter using the given params
settings. If any filter fails, the function sets the Variant::status to the corresponding VariantFilterTag value, and increments the counter for the stats
for the failing filter, both for the first filter that fails. It returns whether any filter failed (in which case, false
is returned), or all passed (true
).
Definition at line 50 of file variant_filter_numerical.cpp.
double b_n | ( | double | n | ) |
Compute b_n
, the sum of squared reciprocals.
This is the sum of squared reciprocals up to n-1
, which is \( b_n = \sum_{i=1}^{n-1} \frac{1}{i^2} \).
See
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
for details. The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
See also tne note in a_n() about the usage of double
here for the argument.
Definition at line 307 of file diversity_pool_functions.cpp.
double beta_star | ( | double | n | ) |
Compute beta*
according to Achaz 2008 and Kofler et al. 2011.
This is needed for the computation of tajima_d_pool() according to
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
The equation is based on
G. Achaz.
Testing for neutrality in samples with sequencing errors.
(2008) Genetics, 179(3), 1409–1424. https://doi.org/10.1534/genetics.107.082198
See there for details.
Definition at line 358 of file diversity_pool_functions.cpp.
genesis::utils::JsonDocument cathedral_plot_parameters_to_json_document | ( | CathedralPlotParameters const & | parameters | ) |
Get a user-readable description of a CathedralPlotParameters as a JsonDocument.
Definition at line 173 of file cathedral_plot.cpp.
genesis::utils::JsonDocument cathedral_plot_record_to_json_document | ( | CathedralPlotRecord const & | record | ) |
Get a user-readable description of the data of a CathedralPlotRecord as a JsonDocument.
This is meant for user output, so that cathedral plots can be generated from a data matrix, without having to recompute the matrix.
Definition at line 196 of file cathedral_plot.cpp.
double cathedral_window_width | ( | CathedralPlotRecord const & | record, |
size_t | row | ||
) |
Compute the window width for a row in a cathedral plot.
This uses the chromosome length and the intended plot dimensions to compute window widths where the first row of the image has a width corresponding to the whole image width, the last row has a window width corresponding to a single pixel, and the rows in between are interpolated using one of the CathedralWindowWidthMethod methods.
Definition at line 86 of file cathedral_plot.cpp.
CathedralWindowWidthMethod cathedral_window_width_method_from_string | ( | std::string const & | method | ) |
Helper function to return a CathedralWindowWidthMethod from its textual representation.
Definition at line 152 of file cathedral_plot.cpp.
std::string cathedral_window_width_method_to_string | ( | CathedralWindowWidthMethod | method | ) |
Helper function to return a textual representation of the method
.
Definition at line 136 of file cathedral_plot.cpp.
void genesis::population::compute_cathedral_matrix | ( | CathedralPlotParameters const & | parameters, |
Record & | record, | ||
Accumulator | accumulator = Accumulator{} |
||
) |
Template function to compute the value matrix for a cathedral plot, given a recored with plot parameters and per-position data to accumulate per window.
The function computes the accumulated values across windows for each pixel in a cathedral plot, which can then be visualized as a heat map.
The function expects a cathedral plot record
, containing data needed to compute the values per pixel. It expects record
to contain an iterable container std::vector<Entry> entries
whose contained elements have a member position
, and also contain the data that is needed by the accumulator
. See FstCathedralPlotRecord for an example.
The accumulator
needs to have functions accumulate()
and dissipate()
that each take an element of the record
entries. These functions are meant to accumulate values, and then un-do this again, which is what we use to speed up the computation here. Also, the accumulator
needs to have a aggregate()
function that uses the currently accumulated data to compute the value for a given window. See FstCathedralAccumulator for an example. We take this as an (optional) argument, so that it can be set up with other parameters as needed.
record
for that case. Definition at line 197 of file cathedral_plot.hpp.
|
inline |
Compute the matrix of values that represents the cathedral plot for FST.
This is merely a shortcut to call compute_cathedral_matrix() with the arguments for a cathedral plot of FST, using the result of compute_fst_cathedral_records(). The returned matrix can then be plotted as a heatmap.
Definition at line 216 of file fst_cathedral.hpp.
std::vector< FstCathedralPlotRecord > compute_fst_cathedral_records | ( | VariantInputStream & | iterator, |
FstPoolProcessor & | processor, | ||
FstPoolCalculatorUnbiased::Estimator | fst_estimator, | ||
std::vector< std::string > const & | sample_names = std::vector< std::string >{} , |
||
std::shared_ptr< genesis::sequence::SequenceDict > const & | sequence_dict = nullptr |
||
) |
Compute the components of per-position FST data for all pairs of samples in the given processor
, for the chromosomes in the given input iterator
.
The result contains entries for all pairs of samples and all chromosomes, in one vector. This is a convenience function that calls compute_fst_cathedral_records_for_chromosome() for each chromosome. We however do not recommend this for larger datasets, as the resulting data can be quite memory-intense. It might hence be better to use this per-chromosome function instead, and process the returned data before starting with the next chromosome.
Definition at line 275 of file fst_cathedral.cpp.
std::vector< FstCathedralPlotRecord > compute_fst_cathedral_records_for_chromosome | ( | VariantInputStream::Iterator & | iterator, |
FstPoolProcessor & | processor, | ||
FstPoolCalculatorUnbiased::Estimator | fst_estimator, | ||
std::vector< std::string > const & | sample_names = std::vector< std::string >{} , |
||
std::shared_ptr< genesis::sequence::SequenceDict > const & | sequence_dict = nullptr |
||
) |
Compute the components of per-position FST data for all pairs of samples in the given processor
, for the current chromosome in the given input iterator
.
The result contains entries for all pairs of samples. The computation starts at the current position in iterator
, uses that chromosome, and iterates until its end or until the next chromosome is found, and stops there. See compute_fst_cathedral_records() for a helper function that does this for all chromosomes in the input.
This expects the processor to only contain FstPoolCalculatorUnbiased calculators, as those are the only ones for which we can compute cathedral plots with our current implementation.
If given sample_names
, we use those to set the sample names in the resulting FstCathedralPlotRecord objects, so that downstream we can keep track of them.
If given a sequence_dict
, we use the information in there to set the chromosome length; otherwise, we use the last position found in the data for that.
Definition at line 219 of file fst_cathedral.cpp.
std::pair< char, double > consensus | ( | SampleCounts const & | sample | ) |
Consensus character for a SampleCounts, and its confidence.
This is simply the character (out of ACGT
) that appears most often (or, for ties, the lexicographically smallest character), unless all of (A
, C
, G
, T
) are zero, in which case the consensus character is N
. The confidence is the count of the consensus character, divided by the total count of all four nucleotides.
Definition at line 428 of file population/function/functions.cpp.
std::pair< char, double > consensus | ( | SampleCounts const & | sample, |
bool | is_covered | ||
) |
Consensus character for a SampleCounts, and its confidence.
This is simply the character (out of ACGT
) that appears most often (or, for ties, the lexicographically smallest character). If is_covered
is false (meaning, the position is not well covered by reads), the consensus character is N
. The confidence is the count of the consensus character, divided by the total count of all four nucleotides.
Definition at line 469 of file population/function/functions.cpp.
SampleCounts convert_to_sample_counts | ( | SimplePileupReader::Sample const & | sample, |
unsigned char | min_phred_score | ||
) |
Definition at line 46 of file simple_pileup_common.cpp.
Variant convert_to_variant | ( | SimplePileupReader::Record const & | record, |
unsigned char | min_phred_score | ||
) |
Definition at line 146 of file simple_pileup_common.cpp.
Variant convert_to_variant_as_individuals | ( | VcfRecord const & | record, |
bool | use_allelic_depth = false |
||
) |
Convert a VcfRecord to a Variant, treating each sample as an individual, and combining them all into one SampleCounts sample.
In this function, we assume that the data that was used to create the VCF file was the typical use case of VCF, where each sample (column) in the file corresponds to an individual. When using this function, all samples (individuals) are combined into one, as our targeted output type Variant is used to describe allele counts of several individual (e.g., in a pool). As all columns are combined, the resulting Variant only contains a single SampleCounts object. We only consider biallelic SNP positions here.
We offer two ways of combining the samples (columns) of the input VCF record into the SampleCounts:
use_allelic_depth
is false
(default), individuals simply contribute to the SampleCounts according to their polidy. That is, an individual with genotype A/T
will contribute one count each for A
and T
.use_allelic_depth
is true
instead, we use the "AD" FORMAT field instead, to obtain the actual counts for the reference and alterantive allele, and use these to sum up the SampleCounts data.Definition at line 453 of file vcf_common.cpp.
Convert a VcfRecord to a Variant, treating each sample column as a pool of individuals.
This assumes that the data that was used to create the VCF file was actually a pool of individuals (e.g., from pool sequencing) for each sample (column) of the VCF file. We do not actually recommend to use variant calling software on pool-seq data, as it induces frequency shifts due to the statistical models employed by variant calles that were not built for pool sequencing data. It however seems to be a commonly used approach, and hence we offer this function here. For this type of data, the VCF allelic depth ("AD") information contains the counts of the reference and alternative base, which in this context can be interpreted as describing the allele frequencines of each pool of individuals. This requires the VCF to have the "AD" FORMAT field.
Only SNP data (no indels) are allowed in this function; use VcfRecord::is_snp() to test this.
Definition at line 393 of file vcf_common.cpp.
void genesis::population::convert_to_variant_as_pool_set_missing_gt_ | ( | VcfRecord const & | record, |
Variant & | variant | ||
) |
Local helper function that sets the filter status of a Variant and its samples to missing depending on whether the genotypes of the samples are missing or not.
Definition at line 344 of file vcf_common.cpp.
void genesis::population::convert_to_variant_as_pool_tally_bases_ | ( | VcfRecord const & | record, |
std::pair< std::array< char, 6 >, size_t > const & | snp_chars, | ||
VcfFormatIteratorInt const & | sample_ad, | ||
SampleCounts & | sample | ||
) |
Local helper function to tally up the bases form a VcfRecord into a SampleCounts.
Definition at line 279 of file vcf_common.cpp.
double genesis::population::f_st_pool_karlsson | ( | ForwardIterator1 | p1_begin, |
ForwardIterator1 | p1_end, | ||
ForwardIterator2 | p2_begin, | ||
ForwardIterator2 | p2_end, | ||
bool | only_passing_samples = true |
||
) |
Compute the F_ST statistic for pool-sequenced data of Karlsson et al as used in PoPoolation2, for two ranges of SampleCountss.
The approach is called the "asymptotically unbiased" estimator in PoPoolation2 [1], and follows Karlsson et al [2].
[1] PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq).
Kofler R, Pandey RV, Schlotterer C.
Bioinformatics, 2011, 27(24), 3435–3436. https://doi.org/10.1093/bioinformatics/btr589
[2] Efficient mapping of mendelian traits in dogs through genome-wide association.
Karlsson EK, Baranowska I, Wade CM, Salmon Hillbertz NHC, Zody MC, Anderson N, Biagi TM, Patterson N, Pielberg GR, Kulbokas EJ, Comstock KE, Keller ET, Mesirov JP, Von Euler H, Kämpe O, Hedhammar Å, Lander ES, Andersson G, Andersson L, Lindblad-Toh K.
Nature Genetics, 2007, 39(11), 1321–1328. https://doi.org/10.1038/ng.2007.10
Definition at line 267 of file fst_pool_functions.hpp.
double genesis::population::f_st_pool_kofler | ( | size_t | p1_poolsize, |
size_t | p2_poolsize, | ||
ForwardIterator1 | p1_begin, | ||
ForwardIterator1 | p1_end, | ||
ForwardIterator2 | p2_begin, | ||
ForwardIterator2 | p2_end, | ||
bool | only_passing_samples = true |
||
) |
Compute the F_ST statistic for pool-sequenced data of Kofler et al as used in PoPoolation2, for two ranges of SampleCountss.
The approach is called the "classical" or "conventional" estimator in PoPoolation2 [1], and follows Hartl and Clark [2].
[1] PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq).
Kofler R, Pandey RV, Schlotterer C.
Bioinformatics, 2011, 27(24), 3435–3436. https://doi.org/10.1093/bioinformatics/btr589
[2] Principles of Population Genetics.
Hartl DL, Clark AG.
Sinauer, 2007.
Definition at line 188 of file fst_pool_functions.hpp.
std::pair<double, double> genesis::population::f_st_pool_unbiased | ( | size_t | p1_poolsize, |
size_t | p2_poolsize, | ||
ForwardIterator1 | p1_begin, | ||
ForwardIterator1 | p1_end, | ||
ForwardIterator2 | p2_begin, | ||
ForwardIterator2 | p2_end, | ||
bool | only_passing_samples = true |
||
) |
Compute our unbiased F_ST statistic for pool-sequenced data for two ranges of SampleCountss.
This is our novel approach for estimating F_ST, using pool-sequencing corrected estimates of Pi within, Pi between, and Pi total, to compute F_ST following the definitions of Nei [1] and Hudson [2], respectively. These are returned here as a pair in that order. See https://github.com/lczech/pool-seq-pop-gen-stats for details.
[1] Analysis of Gene Diversity in Subdivided Populations.
Nei M.
Proceedings of the National Academy of Sciences, 1973, 70(12), 3321–3323. https://doi.org/10.1073/PNAS.70.12.3321
[2] Estimation of levels of gene flow from DNA sequence data.
Hudson RR, Slatkin M, Maddison WP.
Genetics, 1992, 132(2), 583–589. https://doi.org/10.1093/GENETICS/132.2.583
Definition at line 333 of file fst_pool_functions.hpp.
double f_star | ( | double | a_n, |
double | n | ||
) |
Compute f*
according to Achaz 2008 and Kofler et al. 2011.
This is compuated as \( f_{star} = \frac{n - 3}{a_n \cdot (n-1) - n} \), and needed for the computation of alpha_star() and beta_star(). See there for some more details, and see
G. Achaz.
Testing for neutrality in samples with sequencing errors.
(2008) Genetics, 179(3), 1409–1424. https://doi.org/10.1534/genetics.107.082198
for the original equations.
Definition at line 324 of file diversity_pool_functions.cpp.
void genesis::population::fill_fst_cathedral_records_from_processor_ | ( | FstPoolProcessor const & | processor, |
std::vector< FstCathedralPlotRecord > & | records, | ||
size_t | position | ||
) |
Definition at line 177 of file fst_cathedral.cpp.
genesis::utils::JsonDocument fst_cathedral_plot_record_to_json_document | ( | FstCathedralPlotRecord const & | record | ) |
Get a user-readable description of the data of a FstCathedralPlotRecord as a JsonDocument.
Definition at line 307 of file fst_cathedral.cpp.
|
inline |
Return a list of sample name pairs for each calculator in an FstPoolProcessor.
The function takes a processor
, and the original list of sample_names
of the samples in the calculators in the processor
, and uses their indices (as stored in the processor
) to get pairs of sample names.
Definition at line 578 of file fst_pool_processor.hpp.
|
inline |
Definition at line 462 of file fst_pool_unbiased.hpp.
|
inline |
Definition at line 446 of file fst_pool_unbiased.hpp.
GenomeLocusSet genome_locus_set_from_vcf_file | ( | std::string const & | file | ) |
Read a VCF file, and use its positions to create a GenomeLocusSet.
This is for example useful to restrict some analysis to the loci of known variants. Note that the whole file has to be read still; it can hence be better to only do this once and convert to a faster file format, such as simple genome region lists, see GenomeRegionReader.
This ignores all sample information, and simply uses the CHROM
and POS
data to construct the resulting set. The VCF file does not have to be sorted for this.
Definition at line 580 of file vcf_common.cpp.
GenomeRegionList genome_region_list_from_vcf_file | ( | std::string const & | file | ) |
Read a VCF file, and use its positions to create a GenomeRegionList.
This is for example useful to restrict some analysis to the loci of known variants; however, for that use case, it is recommended to use genome_locus_set_from_vcf_file() instead, as testing genome coordinate coverage is way faster with that.
Note that the whole file has to be read still; it can hence be better to only do this once and convert to a faster file format, such as simple genome region lists, see GenomeRegionReader.
This ignores all sample information, and simply uses the CHROM
and POS
data to construct intervals of consecutive positions along the chromosomes, i.e., if the file contains positions 1
, 2
, and 3
, but not 4
, an interval spanning 1-3
is inserted into the list.
The VCF file does not have to be sorted for this.
Definition at line 600 of file vcf_common.cpp.
void genome_region_list_from_vcf_file | ( | std::string const & | file, |
GenomeRegionList & | target | ||
) |
Read a VCF file, and add its positions to an existing GenomeRegionList.
This is for example useful to restrict some analysis to the loci of known variants; however, for that use case, it is recommended to use genome_locus_set_from_vcf_file() instead, as testing genome coordinate coverage is way faster with that.
Note that the whole file has to be read still; it can hence be better to only do this once and convert to a faster file format, such as simple genome region lists, see GenomeRegionReader.
This ignores all sample information, and simply uses the CHROM
and POS
data to construct intervals of consecutive positions along the chromosomes, i.e., if the file contains positions 1
, 2
, and 3
, but not 4
, an interval spanning 1-3
is inserted into the list.
The VCF file does not have to be sorted for this.
Definition at line 607 of file vcf_common.cpp.
SampleCounts::size_type get_base_count | ( | SampleCounts const & | sample, |
char | base | ||
) |
Get the count for a base
given as a char.
The given base
has to be one of ACGTDN
(case insensitive), or *#.
for deletions as well.
Definition at line 50 of file population/function/functions.cpp.
std::pair<std::array<char, 6>, size_t> genesis::population::get_vcf_record_snp_ref_alt_chars_ | ( | VcfRecord const & | record | ) |
Local helper function that returns the REF and ALT chars of a VcfRecord for SNPs.
This function expects the record
to only contain SNP REF and ALT (single nucleotides), and throws when not. It then fills the resulting array with these chars. That is, result[0] is the REF char, result[1] the first ALT char, and so forth.
To keep it speedy, we always return an array that is large enough for all ACGTND
, and return the number of used entries as the second value of the pair.
Definition at line 235 of file vcf_common.cpp.
|
inline |
Get the length of a given Window.
This is needed for the special case of a WindowView over the whole genome, which we indicate by WindowView::is_whole_genome() being set. In this case, the length is not contiguous along a single chromosome. In all other window cases, we simply use the first and last position of the window, via BaseWindow::width().
Definition at line 146 of file window_average.hpp.
|
inline |
Get the count of provided loci in a window.
Definition at line 166 of file window_average.hpp.
char guess_alternative_base | ( | Variant const & | variant, |
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the alternative base of a Variant.
If the Variant already has an alternative_base
in ACGT
and force
is not true
, this original base is returned (meaning that this function is idempotent; it does not change the alternative base if there already is one). However, if the alternative_base
is N
or any other char not in ACGT
, or if force
is true
, the base with the highest count that is not the reference base is returned instead. This also means that the reference base has to be set to a value in ACGT
, as otherwise the concept of an alternative base is meaningless anyway. If the reference base is not one of ACGT
, the returned alternative base is N
. Furthermore, if all three non-reference bases have count 0, the returned alternative base is N
.
Definition at line 495 of file population/function/functions.cpp.
void guess_and_set_ref_and_alt_bases | ( | Variant & | variant, |
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference and alternative bases for a Variant, and set them.
This uses the same approach as guess_reference_base() and guess_alternative_base(), but is more efficient than calling both in sequence. See there for details.
Definition at line 515 of file population/function/functions.cpp.
void guess_and_set_ref_and_alt_bases | ( | Variant & | variant, |
char | ref_base, | ||
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference base.
This uses the same approach as guess_and_set_ref_and_alt_bases( Variant&, bool ), but additionally consideres the given ref_base
. If the reference base contains a value in ACGT
(case insensitive) at the position of the variant
, it is used as the reference. Note that the function throws an exception should the reference base already be set to a different value that is not code for the base of the Variant, in order to notify users that something is off. That is, we do check for ambiguity codes, and if the reference base is an ambiguous base that contains the one already set in the Variant, this is okay. An exception is thrown on mismatch only.
If the reference base is N
though, the function behaves the same as its reference-free overload of the function. For the alternative base, it always uses the most abundant base that is not the reference, same as its alternative function.
Definition at line 566 of file population/function/functions.cpp.
void guess_and_set_ref_and_alt_bases | ( | Variant & | variant, |
genesis::sequence::ReferenceGenome const & | ref_genome, | ||
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference and alternative bases for a Variant, and set them, using a given reference genme to obtain the base.
This simply calls guess_and_set_ref_and_alt_bases( Variant&, char, bool ) with the base given by the ref_genome
. See there for details.
Definition at line 634 of file population/function/functions.cpp.
genesis::sequence::QualityEncoding guess_pileup_quality_encoding | ( | std::shared_ptr< utils::BaseInputSource > | source, |
size_t | max_lines = 0 |
||
) |
Guess the quality score encoding for an (m)pileup input, based on counts of how often each char appeared in the quality string (of the input pileup file for example).
The function reads and parses the input source as a pileup file, counts all quality score chars as they appear in there, and then guesses the encoding that was used. If max_lines
is set to a value greater than 0, only that many lines are read. If max_chars
is set to a value greater than 0, only that many quality score charaters are read.
Definition at line 178 of file simple_pileup_common.cpp.
char guess_reference_base | ( | Variant const & | variant, |
bool | force = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Guess the reference base of a Variant.
If the Variant already has a reference_base
in ACGT
, this base is returned (meaning that this function is idempotent; it does not change the reference base if there already is one). However, if the reference_base
is N
or any other value not in ACGT
, or if force
is true
, the base with the highest count is returned instead, unless all counts are 0, in which case the returned reference base is N
.
Definition at line 478 of file population/function/functions.cpp.
double heterozygosity | ( | SampleCounts const & | sample, |
bool | with_bessel = false |
||
) |
Compute classic heterozygosity.
This is computed as \( h = \frac{n}{n-1} \left( 1 - \sum p^2 \right) \) with n
the total nucleotide_sum() (sum of A
,C
,G
,T
in the sample), and p
their respective nucleotide frequencies, with with_bessel
, or without Bessel's correction in the beginning of the equation when with_bessel
is set to false
(default).
See Equation 3.1 in
Hahn, M. W.
(2018). Molecular Population Genetics.
https://global.oup.com/academic/product/molecular-population-genetics-9780878939657
for details.
Definition at line 150 of file diversity_pool_functions.cpp.
|
inline |
Test whether the chromosome/position is within a given GenomeLocusSet
.
Definition at line 124 of file function/genome_region.hpp.
bool genesis::population::is_covered | ( | GenomeLocusSet const & | loci, |
T const & | locus | ||
) |
Test whether the chromosome/position of a locus
is within a given GenomeLocusSet
.
This is a function template, so that it can accept any data structure that contains public member variables chromosome
(std::string
) and position
(size_t
), such as Variant or GenomeLocus.
Definition at line 168 of file function/genome_region.hpp.
|
inline |
Definition at line 189 of file function/genome_region.hpp.
bool is_covered | ( | GenomeRegion const & | region, |
std::string const & | chromosome, | ||
size_t | position | ||
) |
Test whether the chromosome/position is within a given genomic region
.
Definition at line 207 of file genome_region.cpp.
bool genesis::population::is_covered | ( | GenomeRegion const & | region, |
T const & | locus | ||
) |
Test whether the chromosome/position of a locus
is within a given genomic region
.
This is a function template, so that it can accept any data structure that contains public member variables chromosome
(std::string
) and position
(size_t
), such as Variant or GenomeLocus.
Definition at line 141 of file function/genome_region.hpp.
|
inline |
Definition at line 179 of file function/genome_region.hpp.
|
inline |
Test whether the chromosome/position is within a given list of genomic regions
.
Definition at line 116 of file function/genome_region.hpp.
bool genesis::population::is_covered | ( | GenomeRegionList const & | regions, |
T const & | locus | ||
) |
Test whether the chromosome/position of a locus
is within a given list of genomic regions
.
This is a function template, so that it can accept any data structure that contains public member variables chromosome
(std::string
) and position
(size_t
), such as Variant or GenomeLocus.
Definition at line 155 of file function/genome_region.hpp.
|
inline |
Definition at line 184 of file function/genome_region.hpp.
|
inlineconstexpr |
Return whether a given base is in ACGT
, case insensitive.
Definition at line 56 of file population/function/functions.hpp.
|
inlineconstexpr |
Return whether a given base is in ACGTN
, case insensitive.
Definition at line 71 of file population/function/functions.hpp.
std::pair< genesis::utils::JsonDocument, genesis::utils::Matrix< double > > load_cathedral_plot_record_components_from_files | ( | std::string const & | base_path | ) |
Load the parts of a cathedral plot from a set of files.
Reverse of save_cathedral_plot_record_to_files(), returning the files as a Json document, and a Matrix of values for the heatmap. See load_cathedral_plot_record_from_files() for the convenience function that actually loads and fills the CathedralPlotRecord from that.
Definition at line 276 of file cathedral_plot.cpp.
CathedralPlotRecord load_cathedral_plot_record_from_files | ( | std::string const & | base_path | ) |
Load the record of a cathedral plot from a set of files.
See save_cathedral_plot_record_to_files(). This reads a json and a csv file using the base_path
with the extensions .json
and .csv
. For convenience, it is also possible to specify one of the two file paths directly, and the respective other will be inferred.
Definition at line 311 of file cathedral_plot.cpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 232 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 249 of file function/genome_locus.hpp.
|
inline |
Three-way comparison (spaceship operator <=>
) for two loci in a genome.
We generally compare loci based on their chromosome first, and then, if both chromosomes are identical, based on their position within that chromosome. The comparison returns a value < 0
if the left locus is before the right locus, a value > 0
if the right locus is before the left locus, and 0
if the two loci are equal.
We offer several overloads of this function:
std::string
for the chromosome, and a size_t
for the position. There are overloads for every combination of those two ways of specifying loci. This makes the functions convenient to use in algorithms where not all loci are stored as a GenomeLocus instance.std::shared_ptr
. In the latter case, it is only used when the pointer is valid; otherwise, the overload without SequenceDict is used instead. This is meant as a simplification for situations where one might or might not have a SequenceDict to work with.The latter type of overloads allow to be more flexible with the sorting orders of chromosomes.
Definition at line 272 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 330 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 310 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 320 of file function/genome_locus.hpp.
|
inline |
Equality comparison (==
) for two loci in a genome.
Definition at line 300 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 465 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 476 of file function/genome_locus.hpp.
|
inline |
Greater than comparison (>
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 486 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 552 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 563 of file function/genome_locus.hpp.
|
inline |
Greater than or equal comparison (>=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 573 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 386 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 366 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 376 of file function/genome_locus.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.
Definition at line 356 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 414 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 424 of file function/genome_locus.hpp.
|
inline |
Less than comparison (<
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 446 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 505 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 517 of file function/genome_locus.hpp.
|
inline |
Less than or equal comparison (<=
) for two loci in a genome.
See locus_compare() for notes on the chromosome comparison order and the available overloads.
Definition at line 533 of file function/genome_locus.hpp.
genesis::utils::Matrix< genesis::utils::Color > make_cathedral_plot_heatmap | ( | CathedralPlotRecord const & | record, |
genesis::utils::HeatmapParameters const & | heatmap_parameters | ||
) |
Make a cathedral plot heat map as a color matrix.
This uses the data from a record
, and the color heat map parameters
.
This is meant as a high level function for convenience, and to show how such a heat map can be made. See also make_cathedral_plot_svg().
Definition at line 346 of file cathedral_plot.cpp.
genesis::utils::SvgDocument make_cathedral_plot_svg | ( | CathedralPlotRecord const & | record, |
genesis::utils::HeatmapParameters const & | heatmap_parameters | ||
) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes.
This uses the data from a record
, and the color heat map parameters
.
This is meant as a high level function for convenience, and to show how such a heat map can be made. See also make_cathedral_plot_heatmap().
Definition at line 435 of file cathedral_plot.cpp.
genesis::utils::SvgDocument make_cathedral_plot_svg | ( | CathedralPlotRecord const & | record, |
genesis::utils::HeatmapParameters const & | heatmap_parameters, | ||
genesis::utils::Matrix< genesis::utils::Color > const & | image | ||
) |
Make a cathedral plot heat map and add it into an SVG document with legend and axes.
This uses the data from a record
, and the color heat map parameters
.
This is meant as a high level function for convenience, and to show how such a heat map can be made. See also make_cathedral_plot_heatmap().
Definition at line 354 of file cathedral_plot.cpp.
ChromosomeWindowStream<InputStreamIterator, DataType> genesis::population::make_chromosome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, without the need to specify the template parameters manually.
Definition at line 451 of file chromosome_window_stream.hpp.
ChromosomeWindowStream<InputStreamIterator> genesis::population::make_default_chromosome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a ChromosomeWindowStream for each chromosome, for a default use case.
This helper assumes that the underlying type of the input data stream and of the data that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the ChromosomeWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the ChromosomeWindowStream. For example, a data type that this works for is Variant data.
Definition at line 470 of file chromosome_window_stream.hpp.
GenomeWindowStream<InputStreamIterator> genesis::population::make_default_genome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a GenomeWindowStream for the whole genome, for a default use case.
This helper assumes that the underlying type of the input data stream and of the data that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the GenomeWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the GenomeWindowStream. For example, a data type that this works for is Variant data.
This helper function creates a GenomeWindowStream from the given pair of iterators, so that the whole genome is traversed without stopping at individual chromosomes in each iteration.
Definition at line 465 of file genome_window_stream.hpp.
IntervalWindowStream<InputStreamIterator> genesis::population::make_default_interval_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | width = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a IntervalWindowStream for a default use case.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the IntervalWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the IntervalWindowStream. For example, a data type that this works for is Variant data.
Definition at line 522 of file interval_window_stream.hpp.
PositionWindowStream<InputStreamIterator> genesis::population::make_default_position_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, for a default use case.
This helper assumes that the underlying type of the input data stream and of the data that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the PositionWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the PositionWindowStream. For example, a data type that this works for is Variant data.
The PositionWindowStream::entry_selection_function is set so that all entries are selected to be considered in the iteration. This can be re-set afterwards if a different criterion is needed. See also make_passing_variant_position_window_stream() and make_passing_variant_position_window_view_stream() for specializations of this for data type Variant, which instead only select entries that have Variant::status passing.
Definition at line 373 of file position_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_position_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper class that creates a PositionWindowStream with default functors and wraps it in a WindowViewStream.
See make_default_position_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of PositionWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 410 of file position_window_stream.hpp.
QueueWindowStream<InputStreamIterator> genesis::population::make_default_queue_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a QueueWindowStream for a default use case.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of the same type, that is, we do no conversion in the entry_input_function
functor of the QueueWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the QueueWindowStream. For example, a data type that this works for is Variant data.
The QueueWindowStream::entry_selection_function is set so that all entries are selected to be considered towards the QueueWindowStream::count() of entries per window. This can be re-set afterwards if a different criterion is needed. See also make_passing_variant_queue_window_stream() and make_passing_variant_queue_window_view_stream() for specializations of this for data type Variant, which instead only select entries that have Variant::status passing.
Definition at line 855 of file queue_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_queue_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count, | ||
size_t | stride = 0 |
||
) |
Helper class that creates a QueueWindowStream with default functors and wraps it in a WindowViewStream.
See make_default_queue_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of QueueWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 895 of file queue_window_stream.hpp.
RegionWindowStream<InputStreamIterator> genesis::population::make_default_region_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
std::shared_ptr< GenomeRegionList > | region_list | ||
) |
Helper function to instantiate a RegionWindowStream for a default use case.
This helper assumes that the underlying type of the input data stream and of the Windows that we are iterating over are of the same type, that is, we do no conversion in the entry_input_function
functor of the RegionWindowStream. It further assumes that this data type has public member variables chromosome
and position
that are accessed by the chromosome_function
and position_function
functors of the RegionWindowStream. For example, a data type that this works for is Variant data.
Definition at line 842 of file region_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_region_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
std::shared_ptr< GenomeRegionList > | region_list | ||
) |
Helper class that creates a RegionWindowStream and wraps it in a WindowViewStream.
See make_default_region_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of RegionWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 877 of file region_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_default_sliding_interval_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | width = 0 , |
||
size_t | stride = 0 |
||
) |
Helper class that creates a IntervalWindowStream and wraps it in a WindowViewStream.
See make_default_interval_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of IntervalWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 558 of file interval_window_stream.hpp.
|
inline |
Create an DiversityPoolProcessor to compute diversity for all samples.
The function expects the settings to use for all samples, as well as the list of pool sizes of all samples. It then yields a processor that can be provided with all Variants of interest along the genome, and computes diversity for each sample.
Compared to the corresponding make_fst_pool_processor() functions, this function here does not really do much, and is just provided for symmetry reasons with the fst functions...
Definition at line 356 of file diversity_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for one-to-all FST computation between one sample and all others.
The function expects the pool sizes of all samples, as well as the index of the Variant::samples SampleCounts object between which FST to all other samples shall be calculated, and any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between the given index and all other samples.
Definition at line 521 of file fst_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for one-to-one FST computation between two samples.
The function expects the pool sizes of all samples, as well as two indices of the Variant::samples SampleCounts objects between which FST shall be calculated, and any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between the given pair of samples.
Definition at line 550 of file fst_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for all-to-all computation of FST between all pairs of samples.
The function expects the pool sizes of all samples, as well as any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between all pairs of their samples.
Definition at line 454 of file fst_pool_processor.hpp.
|
inline |
Create an FstPoolProcessor for computation of FST between specific pairs of samples.
The function expects the the pool sizes of all samples, as well as the pairs of indices of the Variant::samples SampleCounts between which FST shall be calculated, and any additional args
to be provided to each processor after the pair of pool sizes. It then yields a processor that can be provided with all Variants of interest along the genome, and computes FST between all provided pairs of their samples.
Definition at line 484 of file fst_pool_processor.hpp.
GenomeWindowStream<InputStreamIterator, DataType> genesis::population::make_genome_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a GenomeWindowStream for the whole genome, without the need to specify the template parameters manually.
This helper function creates a GenomeWindowStream from the given pair of iterators, so that the whole genome is traversed without stopping at individual chromosomes in each iteration.
Definition at line 443 of file genome_window_stream.hpp.
std::shared_ptr<T> genesis::population::make_input_stream_with_sample_filter_ | ( | std::string const & | filename, |
R const & | reader, | ||
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices, | ||
std::vector< bool > const & | sample_filter | ||
) |
Local helper function template that takes care of intilizing an input stream, and setting the sample filters, for those streams for which we do not know the number of samples prior to starting the file iteration.
The template arguments are: T
the returned type of input stream, and R
the underlying reader type. This is very specific for the use case here, and currently is only meant for how we work with the SimplePileupReader and the SyncReader and their streams. Both their streams accept a reader to take settings from.
Definition at line 67 of file variant_input_stream_sources.cpp.
IntervalWindowStream<InputStreamIterator, DataType> genesis::population::make_interval_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | width = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a IntervalWindowStream without the need to specify the template parameters manually.
The three functors entry_input_function
, chromosome_function
, and position_function
of the IntervalWindowStream have to be set in the returned stream before using it. See make_default_interval_window_stream() for an alternative make function that sets these three functors to reasonable defaults that work for the Variant data type.
Definition at line 501 of file interval_window_stream.hpp.
PositionWindowStream<InputStreamIterator> genesis::population::make_passing_variant_position_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a PositionWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of type Variant. It is hence a more specialized version of make_default_position_window_stream(). Here, we check the Variant::status, and only select those Variantss to yield a window that have a passing FilterStatus. The PositionWindowStream::entry_selection_function is set accordingly.
Definition at line 433 of file position_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_passing_variant_position_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper class that creates a PositionWindowStream with default functions for Variant data, and wraps it in a WindowViewStream.
See make_passing_variant_position_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of PositionWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 469 of file position_window_stream.hpp.
QueueWindowStream<InputStreamIterator> genesis::population::make_passing_variant_queue_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a QueueWindowStream for a default use case with underlying data of type Variant, where only Variants with passing status are selected.
This helper assumes that the underlying type of the input data stream and of the Windows that we are sliding over are of type Variant. It is hence a more specialized version of make_default_queue_window_stream(). Here, we check the Variant::status, and only select those Variantss towards the QueueWindowStream::count() of each window That have a passing FilterStatus. The QueueWindowStream::entry_selection_function is set accordingly.
Definition at line 919 of file queue_window_stream.hpp.
WindowViewStream<InputStreamIterator> genesis::population::make_passing_variant_queue_window_view_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count, | ||
size_t | stride = 0 |
||
) |
Helper class that creates a QueueWindowStream with default functions for Variant data, and wraps it in a WindowViewStream.
See make_passing_variant_queue_window_stream() for the base functionality, and see make_window_view_stream() for the wrapping behaviour.
Note that because this is a simple wrapper around the constructor of QueueWindowStream, we lose access to that class itself, so that its more specialized member functions cannot be called any more. If this is needed, use the two aforementioned make_...()
functions individually.
Definition at line 958 of file queue_window_stream.hpp.
PositionWindowStream<InputStreamIterator, DataType> genesis::population::make_position_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end | ||
) |
Helper function to instantiate a PositionWindowStream for each position as an individual window, without the need to specify the template parameters manually.
Definition at line 345 of file position_window_stream.hpp.
QueueWindowStream<InputStreamIterator, DataType> genesis::population::make_queue_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
size_t | count = 0 , |
||
size_t | stride = 0 |
||
) |
Helper function to instantiate a QueueWindowStream without the need to specify the template parameters manually.
This still requires to set the four needed functionals for processing the input stream, as described in QueueWindowStream.
Definition at line 824 of file queue_window_stream.hpp.
RegionWindowStream<InputStreamIterator, DataType> genesis::population::make_region_window_stream | ( | InputStreamIterator | begin, |
InputStreamIterator | end, | ||
std::shared_ptr< GenomeRegionList > | region_list | ||
) |
Helper function to instantiate a RegionWindowStream without the need to specify the template parameters manually.
The three functors entry_input_function
, chromosome_function
, and position_function
of the RegionWindowStream have to be set in the returned stream before using it. See make_default_region_window_stream() for an alternative make function that sets these three functors to reasonable defaults that work for the Variant data type.
Definition at line 822 of file region_window_stream.hpp.
|
inline |
Filter function to be used with VariantInputStream on a Variant to filter its SampleCounts by genome regions, by tagging non-covered positions with the given tag
.
This function is similar to make_variant_filter_by_region_tagging(), but instead of setting the status of the whole Variant, it applies per-sample filters instead, and sets their status flags. The function expects a set of GenomeLocusSet or GenomeRegionList pointers to be given, one for each sample of the Variant. The template parameter GenomeMaskType allows either of those two mask types to be used.
Definition at line 66 of file sample_counts_filter_positional.hpp.
|
inline |
Return a functional to numerically filter the SampleCounts samples in a Variant tagging the ones that do not pass the filters, and potentially tagging the Variant.
The function uses apply_sample_counts_filter_numerical(), modifying the samples, and tagging whether the filtering determined that the samples should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream. alternative that instead excludes the Variant::status from the stream.
Definition at line 280 of file sample_counts_filter_numerical.hpp.
|
inline |
This overload also includes the statistics of the failing or passing filter.
Definition at line 294 of file sample_counts_filter_numerical.hpp.
std::vector< bool > make_sample_name_filter | ( | std::vector< std::string > const & | sample_names, |
std::vector< std::string > const & | names_filter, | ||
bool | inverse_filter = false |
||
) |
Create a filter for samples, indicating which to keep.
The resulting bool vector has the same length as the input sample_names
vector, and is true
for all samples that are meant to be kept, and false
otherwise. By default, with inverse_filter == false
, sample names that are in the names_filter
are kept, and those that are not are not kept. With inverse_filter == true
, this is reversed.
The function also checks that sample_names
and names_filter
are unique (as otherwise the filtering might be wrong), and that the names in the names_filter
actually appear in the sample_names
.
Definition at line 46 of file variant_input_stream.cpp.
std::vector<std::string> genesis::population::make_sample_name_list_ | ( | std::string const & | source_name, |
size_t | size | ||
) |
Local helper to fill the sample names of file formats without sample names.
We want to use a standardized format for that: the file base name, followed by consecutive numbers for each sample, separated by a character.
Definition at line 133 of file variant_input_stream_sources.cpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a genome region, by excluding non-covered positions from the stream.
This function can be used as a filter with VariantInputStream::add_filter(), in order to only iterate over Variants that are in the given region
(if complement
is false
, default), or only over Variants that are outside of the region
(if complement
is true
).
Definition at line 66 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream.
This function can be used as a filter with VariantInputStream::add_filter(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
Definition at line 103 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by excluding non-covered positions from the stream.
This function can be used as a filter with VariantInputStream::add_filter(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
Definition at line 85 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a genome region, by tagging non-covered positions with the given tag
.
This function can be used as a filter with VariantInputStream::add_transform(), in order to only iterate over Variants that are in the given region
(if complement
is false
, default), or only over Variants that are outside of the region
(if complement
is true
).
The two tag options are VariantFilterTag::kMaskedPosition and VariantFilterTag::kMaskedRegion, which we check, in order to avoid accidental mistakes. We distinguish between those in the sense that a masked region is meant to be a larger part, where only certain chromosomes or genes are not masked, while a masked position is meant to be a finer scale, that can be decided per position, such as to mark synonymous vs non-synonymous SNPs.
Definition at line 138 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag
.
This function can be used as a filter with VariantInputStream::add_transform(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
The two tag options are VariantFilterTag::kMaskedPosition and VariantFilterTag::kMaskedRegion, which we check, in order to avoid accidental mistakes. We distinguish between those in the sense that a masked region is meant to be a larger part, where only certain chromosomes or genes are not masked, while a masked position is meant to be a finer scale, that can be decided per position, such as to mark synonymous vs non-synonymous SNPs.
Definition at line 209 of file variant_filter_positional.hpp.
|
inline |
Filter function to be used with VariantInputStream to filter by a list of genome regions, by tagging non-covered positions with the given tag
.
This function can be used as a filter with VariantInputStream::add_transform(), in order to only iterate over Variants that are in the given regions
(if complement
is false
, default), or only over Variants that are outside of the regions
(if complement
is true
).
The two tag options are VariantFilterTag::kMaskedPosition and VariantFilterTag::kMaskedRegion, which we check, in order to avoid accidental mistakes. We distinguish between those in the sense that a masked region is meant to be a larger part, where only certain chromosomes or genes are not masked, while a masked position is meant to be a finer scale, that can be decided per position, such as to mark synonymous vs non-synonymous SNPs.
Definition at line 177 of file variant_filter_positional.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), which returns true
or false
depending on whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform_filter() to exclude positions fully from the stream.
See make_variant_filter_numerical_tagging() for an alternative that instead simply sets the Variant::status to an appropriate VariantFilterTag, but does not exclude it from the stream.
Definition at line 206 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, excluding the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), which returns true
or false
depending on whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform_filter() to exclude positions fully from the stream.
See make_variant_filter_numerical_tagging() for an alternative that instead simply sets the Variant::status to an appropriate VariantFilterTag, but does not exclude it from the stream.
This overload also includes the statistics of the failing or passing filter.
Definition at line 219 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
This overload additionally runs apply_sample_counts_filter_numerical() on all samples, i.e., it additionally does the same as make_sample_counts_filter_numerical_tagging(). This is meant as a convenience function that just does all the typical numercial filtering at once.
Definition at line 273 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
This overload additionally runs apply_sample_counts_filter_numerical() on all samples, i.e., it additionally does the same as make_sample_counts_filter_numerical_tagging(). This is meant as a convenience function that just does all the typical numercial filtering at once. The Variant filter is also set to fitting non-passing values if
This overload also includes the statistics of the failing or passing filter.
Definition at line 296 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
Definition at line 244 of file variant_filter_numerical.hpp.
|
inline |
Return a functional to numerically filter Variants in an VariantInputStream, tagging the ones that do not pass the filters.
The function uses apply_variant_filter_numerical(), tagging whether the filtering determined that the Variant should be kept. It can hence be used with GenericInputStream::add_transform() to mark filtered positions in the stream.
See make_variant_filter_numerical_excluding() for an alternative that instead excludes the Variant::status from the stream.
This overload also includes the statistics of the failing or passing filter.
Definition at line 257 of file variant_filter_numerical.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 97 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets a genome locus set to filter the positions.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 143 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the reference genome for the gapless iteration.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 111 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the reference genome for the gapless iteration, as well as a genome locus set to filter the positions.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 160 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the sequence dictionary for the gapless iteration.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 127 of file variant_input_stream_adapters.hpp.
|
inline |
Create a VariantGaplessInputStream from a VariantInputStream input
, and wrap it agian in a VariantInputStream.
This overload additionally sets the sequence dictionary for the gapless iteration, as well as a genome locus set to filter the positions.
See also make_variant_input_stream_from_variant_gapless_input_stream()
Definition at line 179 of file variant_input_stream_adapters.hpp.
VariantInputStream make_variant_input_stream_from_frequency_table_file | ( | std::string const & | filename, |
char | separator_char = '\t' , |
||
FrequencyTableInputStream const & | reader = FrequencyTableInputStream{} |
||
) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants.
Optionally, this takes a reader
with settings to be used.
Definition at line 412 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_frequency_table_file | ( | std::string const & | filename, |
std::vector< std::string > const & | sample_names_filter, | ||
bool | inverse_sample_names_filter = false , |
||
char | separator_char = '\t' , |
||
FrequencyTableInputStream const & | reader = FrequencyTableInputStream{} |
||
) |
Create a VariantInputStream to iterate the contents of a frequency table file as Variants.
Additionally, this version of the function takes a list of sample_names
which are used as filter so that only those samples (columns of the frequency table) are evaluated and accessible - or, if inverse_sample_names
is set to true
, instead all but those samples.
Optionally, this takes a reader
with settings to be used.
Definition at line 422 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_individual_vcf_file | ( | std::string const & | filename, |
VariantInputStreamFromVcfParams const & | params = VariantInputStreamFromVcfParams{} , |
||
bool | use_allelic_depth = false |
||
) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as an individual, and combining them all into one SampleCounts sample.
See convert_to_variant_as_individuals( VcfRecord const&, bool ) for details on the conversion from VcfRecord to Variant.
Definition at line 591 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pileup_file | ( | std::string const & | filename, |
SimplePileupReader const & | reader = SimplePileupReader{} |
||
) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants.
Optionally, this takes a reader
with settings to be used.
Definition at line 302 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pileup_file | ( | std::string const & | filename, |
std::vector< bool > const & | sample_filter, | ||
SimplePileupReader const & | reader = SimplePileupReader{} |
||
) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants.
This uses only the samples at the indices where the sample_filter
is true
. Optionally, this takes a reader
with settings to be used.
Definition at line 322 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pileup_file | ( | std::string const & | filename, |
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices = false , |
||
SimplePileupReader const & | reader = SimplePileupReader{} |
||
) |
Create a VariantInputStream to iterate the contents of a (m)pileup file as Variants.
This uses only the samples at the zero-based indices given in the sample_indices
list. If inverse_sample_indices
is true
, this list is inversed, that is, all sample indices but the ones listed are included in the output.
For example, given a list { 0, 2 }
and a file with 4 samples, only the first and the third sample will be in the output. When however inverse_sample_indices
is also set, then the output will contain the second and fourth sample.
Optionally, this takes a reader
with settings to be used.
Definition at line 311 of file variant_input_stream_sources.cpp.
VariantInputStream genesis::population::make_variant_input_stream_from_pileup_file_ | ( | std::string const & | filename, |
SimplePileupReader const & | reader, | ||
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices, | ||
std::vector< bool > const & | sample_filter | ||
) |
Local helper function that takes care of the three functions below.
Definition at line 261 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_pool_vcf_file | ( | std::string const & | filename, |
VariantInputStreamFromVcfParams const & | params = VariantInputStreamFromVcfParams{} |
||
) |
Create a VariantInputStream to iterate the contents of a VCF file as Variants, treating each sample as a pool of individuals.
See convert_to_variant_as_pool( VcfRecord const& ) for details on the conversion from VcfRecord to Variant.
This function requires the VCF to have the "AD" FORMAT field. It only iterates over those VCF record lines that actually have the "AD" FORMAT provided, as this is the information that we use to convert the samples to Variants. All records without that field are skipped.
Definition at line 582 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sam_file | ( | std::string const & | filename, |
SamVariantInputStream const & | reader = SamVariantInputStream{} |
||
) |
Create a VariantInputStream to iterate the contents of a SAM/BAM/CRAM file as Variants.
An instance of SamVariantInputStream can be provided from which the settings are copied.
Depending on the settings used in the reader
, this can either produce a single sample (one SampleCounts object in the resulting Variant at each position in the genome), or split the input file by the read group (RG) tag (potentially also allowing for an "unaccounted" group of reads).
The other make_variant_input_stream_...
functions offer settings to sub-set (filter) the samples based on their names or indices. This can be achieved here as well, but has instead to be done directly in the reader
, instead of providing the fitler arguments to this function.
Definition at line 188 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sync_file | ( | std::string const & | filename | ) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants.
Definition at line 381 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sync_file | ( | std::string const & | filename, |
std::vector< bool > const & | sample_filter | ||
) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants.
This uses only the samples at the indices where the sample_filter
is true
. Optionally, this takes a reader
with settings to be used.
Definition at line 399 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_sync_file | ( | std::string const & | filename, |
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices = false |
||
) |
Create a VariantInputStream to iterate the contents of a PoPoolation2 sync file as Variants.
This uses only the samples at the zero-based indices given in the sample_indices
list. If inverse_sample_indices
is true
, this list is inversed, that is, all sample indices but the ones listed are included in the output.
For example, given a list { 0, 2 }
and a file with 4 samples, only the first and the third sample will be in the output. When however inverse_sample_indices
is also set, then the output will contain the second and fourth sample.
Definition at line 389 of file variant_input_stream_sources.cpp.
VariantInputStream genesis::population::make_variant_input_stream_from_sync_file_ | ( | std::string const & | filename, |
std::vector< size_t > const & | sample_indices, | ||
bool | inverse_sample_indices, | ||
std::vector< bool > const & | sample_filter | ||
) |
Definition at line 336 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_variant_gapless_input_stream | ( | VariantGaplessInputStream const & | gapless_input | ) |
Create a VariantInputStream that wraps a VariantGaplessInputStream.
See also make_variant_gapless_input_stream()
Definition at line 121 of file variant_input_stream_adapters.cpp.
VariantInputStream make_variant_input_stream_from_variant_parallel_input_stream | ( | VariantParallelInputStream const & | parallel_input, |
VariantParallelInputStream::JoinedVariantParams const & | joined_variant_params = VariantParallelInputStream::JoinedVariantParams{} |
||
) |
Create a VariantInputStream to iterate multiple input sources at once, using a VariantParallelInputStream.
This wraps multiple input sources into one stream that traverses all of them in parallel, and is here then yet again turned into a Variant per position, using VariantParallelInputStream::Iterator::joined_variant() to combine all input sources into one. See there for the meaning of the two bool
parameters of this function.
As this is iterating multiple files, we leave the VariantInputStreamData::file_path and VariantInputStreamData::source_name empty, and fill the VariantInputStreamData::sample_names with the sample names of the underlying input sources of the parallel stream, checking for duplicates to avoid downstream trouble.
Definition at line 55 of file variant_input_stream_adapters.cpp.
VariantInputStream genesis::population::make_variant_input_stream_from_vcf_file_ | ( | std::string const & | filename, |
VariantInputStreamFromVcfParams const & | params, | ||
bool | pool_samples, | ||
bool | use_allelic_depth | ||
) |
Local helper function that takes care of both main functions below.
Definition at line 486 of file variant_input_stream_sources.cpp.
VariantInputStream make_variant_input_stream_from_vector | ( | std::vector< Variant > const & | variants | ) |
Create a VariantInputStream to iterate the contents of std::vector
containing Variants.
This is a simple wrapper to bring a vector of in-memory Variants into the input stream format that we use for file streaming as well. Meant as a speed-up for small files that fit into memory, in cases where they for example have to processed multiple times.
The user needs to make sure that the lifetime of the given input variants
vector is longer than the stream returned here, and that the vector is not modified after calling this function.
Definition at line 147 of file variant_input_stream_sources.cpp.
std::function<void(Variant&)> genesis::population::make_variant_input_stream_sample_name_filter_transform | ( | std::vector< bool > const & | sample_filter | ) |
Definition at line 103 of file variant_input_stream.cpp.
std::function<void(Variant&)> genesis::population::make_variant_input_stream_sample_subsampling_transform | ( | size_t | max_depth, |
SubsamplingMethod | method | ||
) |
Definition at line 143 of file variant_input_stream.cpp.
std::function<void(Variant const&)> genesis::population::make_variant_input_stream_sequence_length_observer | ( | std::shared_ptr< genesis::sequence::SequenceDict > | sequence_dict | ) |
Definition at line 229 of file variant_input_stream.cpp.
std::function<void(Variant const&)> genesis::population::make_variant_input_stream_sequence_order_observer | ( | std::shared_ptr< genesis::sequence::SequenceDict > | sequence_dict, |
bool | check_sequence_lengths | ||
) |
Definition at line 175 of file variant_input_stream.cpp.
VariantInputStream make_variant_merging_input_stream | ( | VariantInputStream const & | input, |
std::unordered_map< std::string, std::string > const & | sample_name_to_group, | ||
bool | allow_ungrouped_samples = false , |
||
SampleCountsFilterPolicy | filter_policy = SampleCountsFilterPolicy::kOnlyPassing |
||
) |
Create a VariantInputStream that merges samples from its underlying input
.
This provides an on-the-fly merging of input samples by simply summing out their SampleCounts. It takes a mapping of sample names to group names, and creates a VariantInputStream with the group names as new sample names, which then merge the input of their respective samples.
If allow_ungrouped_samples
is set to true
, any sample that does not occur in the map will be added as-is, with its original sample name, and as its own "group". By default, we throw an exception in this case, in order to make sure that the behavior is intended.
Definition at line 276 of file variant_input_stream_adapters.cpp.
VariantMergeGroupAssignment genesis::population::make_variant_merging_input_stream_group_assignment_ | ( | VariantInputStream const & | variant_input, |
std::unordered_map< std::string, std::string > const & | sample_name_to_group, | ||
bool | allow_ungrouped_samples | ||
) |
Helper function to create a mapping from sample indices to group indices.
Definition at line 180 of file variant_input_stream_adapters.cpp.
WindowViewStream<typename T::InputStreamType, typename T::DataType> genesis::population::make_window_view_stream | ( | T && | window_iterator | ) |
Create a WindowViewStream that iterates some underlying BaseWindowStream.
The template parameter T
is expected to be a BaseWindowStream.
This serves as an abstraction to be able to use WindowViewStream everywhere, instead of having to switch between WindowViewStream and WindowStream depending on the type of windowing that is being done. See WindowViewStream for details.
This overload of the function takes the underlying iterator by r-value ref, so that it can be provided directly without copy.
Definition at line 337 of file window_view_stream.hpp.
WindowViewStream<typename T::InputStreamType, typename T::DataType> genesis::population::make_window_view_stream | ( | T const & | window_iterator | ) |
Create a WindowViewStream that iterates some underlying BaseWindowStream.
The template parameter T
is expected to be a BaseWindowStream.
This serves as an abstraction to be able to use WindowViewStream everywhere, instead of having to switch between WindowViewStream and WindowStream depending on the type of windowing that is being done. See WindowViewStream for details.
Definition at line 317 of file window_view_stream.hpp.
SampleCounts merge | ( | SampleCounts const & | p1, |
SampleCounts const & | p2 | ||
) |
Merge the counts of two SampleCountss.
Definition at line 400 of file population/function/functions.cpp.
SampleCounts merge | ( | std::vector< SampleCounts > const & | p, |
SampleCountsFilterPolicy | filter_policy | ||
) |
Merge the counts of a vector SampleCountss.
Definition at line 407 of file population/function/functions.cpp.
void merge_inplace | ( | SampleCounts & | p1, |
SampleCounts const & | p2 | ||
) |
Merge the counts of two SampleCountss, by adding the counts of the second (p2
) to the first (p1
).
Definition at line 383 of file population/function/functions.cpp.
|
inline |
Merge the counts of a vector SampleCountss.
Definition at line 282 of file population/function/functions.hpp.
double n_base | ( | size_t | read_depth, |
size_t | poolsize | ||
) |
Compute the n_base
term used for Tajima's D in Kofler et al. 2011, using a faster closed form expression.
This term is the expected number of distinct individuals sequenced, which is equivalent to finding the expected number of distinct values selected from a set of integers.
The computation in PoPoolation is slowm, see n_base_matrix(). We here instead use a closed form expression following the reasoning of https://math.stackexchange.com/a/72351 See there for the derivation of the equation.
Definition at line 501 of file diversity_pool_functions.cpp.
double n_base_matrix | ( | size_t | read_depth, |
size_t | poolsize | ||
) |
Compute the n_base
term used for Tajima's D in Kofler et al. 2011, following their approach.
This term is the expected number of distinct individuals sequenced, which is equivalent to finding the expected number of distinct values selected from a set of integers.
The computation of this term in PoPoolation uses a recursive dynamic programming approach to sum over different possibilities of selecting sets of integers. This gets rather slow for larger inputs, and there is an equivalent closed form that we here use instead. See n_base() for details. We here merely offer the original PoPoolation implementation as a point of reference.
R. Kofler, P. Orozco-terWengel, N. De Maio, R. V. Pandey, V. Nolte, A. Futschik, C. Kosiol, C. Schlötterer.
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals.
(2011) PLoS ONE, 6(1), e15925. https://doi.org/10.1371/journal.pone.0015925
The paper unfortunately does not explain their equations, but there is a hidden document in their code repository that illuminates the situation a bit. See https://sourceforge.net/projects/popoolation/files/correction_equations.pdf
Definition at line 467 of file diversity_pool_functions.cpp.
std::array<size_t, 4> genesis::population::nucleotide_sorting_order | ( | std::array< T, 4 > const & | values | ) |
Return the sorting order of four values, for instance of the four nucleotides ACGT
, in descending order (largest first).
The input are four values, either counts or frequencies. The output are the indices into this array that are sorted so that the largest one comes first:
auto const data = std::array<T, 4>{ 15, 10, 20, 5 }; auto const order = nucleotide_sorting_order( data );
yields { 2, 0, 1, 3 }
, so that data[order[0]] = data[2] = 20
is the largest value, data[order[1]] = data[0] = 15
the second largest, and so forth.
Usage with actual data might be as follows:
SampleCounts sample = ...; auto const data = std::array<T, 4>{ sample.a_count, sample.c_count, sample.g_count, sample.t_count }; auto const order = nucleotide_sorting_order( data ); // ...
See also sample_counts_sorting_order() for an equivalent function that also considers the "any" (N
) and "deletion" (D
) counts of a SampleCounts object.
Definition at line 128 of file population/function/functions.hpp.
|
inlineconstexpr |
Count of the pure nucleotide bases at this position, that is, the sum of all A
, C
, G
, and T
.
This is simply the sum of a_count + c_count + g_count + t_count
, which we often use as the read depth at the given site.
NB: In PoPoolation, this variable is called eucov
.
Definition at line 296 of file population/function/functions.hpp.
|
inline |
Inequality comparison (!=
) for two loci in a genome.