A library for working with phylogenetic and population genetic data.
v0.27.0
SamVariantInputIterator Class Reference

#include <genesis/population/formats/sam_variant_input_iterator.hpp>

Detailed Description

Input iterator for SAM/BAM/CRAM files that produces a Variant per genome position.

We expect the input file to be sorted by position. Positions with no reads overlapping are skipped.

Exemplary usage:

auto sam_it = SamVariantInputIterator( "/path/to/file.sam" );
sam_it.min_map_qual( 40 );
for( auto const& var : sam_it ) {
    std::cout << var.chromosome << "\t" << var.position << "\t";
    for( auto const& bs : var.samples ) {
        std::cout << "\t";
        to_sync( bs, std::cout );
    }
    std::cout << "\n";
}

By default, as above, all reads are considered to be belonging to the same sample. In that case hence, the above inner loop over samples will only ever go through one BaseCounts object stored in the Variant. We however are also able to split by read group (@RG), see split_by_rg() and with_unaccounted_rg() for details. In that case, the Variant contains one BaseCounts object per read group, as well as potentially a special one for unaccounted reads with no proper RG. This can further be filtered by setting rg_tag_filter(), to only consider certain RG tags as samples to be produced.

Definition at line 102 of file sam_variant_input_iterator.hpp.

Public Member Functions

 SamVariantInputIterator ()
 Create a default instance, with no input. This is also the past-the-end iterator. More...
 
 SamVariantInputIterator (SamVariantInputIterator &&)=default
 
 SamVariantInputIterator (SamVariantInputIterator const &)=default
 
 SamVariantInputIterator (std::string const &input_file)
 
 SamVariantInputIterator (std::string const &input_file, std::unordered_set< std::string > const &rg_tag_filter, bool inverse_rg_tag_filter=false)
 
 ~SamVariantInputIterator ()=default
 
Iterator begin () const
 
Iterator end () const
 
uint32_t flags_exclude_all () const
 
self_typeflags_exclude_all (uint32_t value)
 Do not use reads with all bits set in value present in the FLAG field of the read. More...
 
uint32_t flags_exclude_any () const
 
self_typeflags_exclude_any (uint32_t value)
 Do not use reads with any bits set in value present in the FLAG field of the read. More...
 
uint32_t flags_include_all () const
 
self_typeflags_include_all (uint32_t value)
 Only use reads with all bits set in value present in the FLAG field of the read. More...
 
uint32_t flags_include_any () const
 
self_typeflags_include_any (uint32_t value)
 Only use reads with any bits set in value present in the FLAG field of the read. More...
 
std::string const & input_file () const
 
self_typeinput_file (std::string const &value)
 Set the input file. More...
 
bool inverse_rg_tag_filter () const
 
self_typeinverse_rg_tag_filter (bool value)
 Reverse the meaning of the list of sample names given by rg_tag_filter(). More...
 
int max_accumulation_depth () const
 
self_typemax_accumulation_depth (int value)
 Set the maximum depth (coverage) at a given position that is actually processed. More...
 
int max_depth () const
 
self_typemax_depth (int value)
 Set the maximum depth (coverage) at a given position to be considered. More...
 
int min_base_qual () const
 
self_typemin_base_qual (int value)
 Set the minimum phred-scaled per-base quality score for a nucleotide to be considered. More...
 
int min_depth () const
 
self_typemin_depth (int value)
 Set the minimum depth (coverage) at a given position to be considered. More...
 
int min_map_qual () const
 
self_typemin_map_qual (int value)
 Set the minimum phred-scaled mapping quality score for a read in the input file to be considered. More...
 
SamVariantInputIteratoroperator= (SamVariantInputIterator &&)=default
 
SamVariantInputIteratoroperator= (SamVariantInputIterator const &)=default
 
std::unordered_set< std::string > const & rg_tag_filter () const
 
self_typerg_tag_filter (std::unordered_set< std::string > const &value)
 Set the sample names used for filtering reads by their RG read group tag. More...
 
bool split_by_rg () const
 
self_typesplit_by_rg (bool value)
 If set to true, instead of reading all mapped reads as a single sample, split them by the @RG read group tag. More...
 
bool with_unaccounted_rg () const
 
self_typewith_unaccounted_rg (bool value)
 Decide whether to add a sample for reads without a read group, when splitting by @RG tag. More...
 

Public Types

using difference_type = std::ptrdiff_t
 
using iterator_category = std::input_iterator_tag
 
using pointer = value_type const *
 
using reference = value_type const &
 
using self_type = SamVariantInputIterator
 
using value_type = Variant
 

Classes

class  Iterator
 Iterator over loci of the input sources. More...
 

Constructor & Destructor Documentation

◆ SamVariantInputIterator() [1/5]

Create a default instance, with no input. This is also the past-the-end iterator.

Definition at line 346 of file sam_variant_input_iterator.hpp.

◆ SamVariantInputIterator() [2/5]

SamVariantInputIterator ( std::string const &  input_file)
inlineexplicit

Definition at line 350 of file sam_variant_input_iterator.hpp.

◆ SamVariantInputIterator() [3/5]

SamVariantInputIterator ( std::string const &  input_file,
std::unordered_set< std::string > const &  rg_tag_filter,
bool  inverse_rg_tag_filter = false 
)

Definition at line 933 of file sam_variant_input_iterator.cpp.

◆ ~SamVariantInputIterator()

◆ SamVariantInputIterator() [4/5]

◆ SamVariantInputIterator() [5/5]

Member Function Documentation

◆ begin()

Iterator begin ( ) const
inline

Definition at line 374 of file sam_variant_input_iterator.hpp.

◆ end()

Iterator end ( ) const
inline

Definition at line 386 of file sam_variant_input_iterator.hpp.

◆ flags_exclude_all() [1/2]

uint32_t flags_exclude_all ( ) const
inline

Definition at line 474 of file sam_variant_input_iterator.hpp.

◆ flags_exclude_all() [2/2]

self_type& flags_exclude_all ( uint32_t  value)
inline

Do not use reads with all bits set in value present in the FLAG field of the read.

This is equivalent to the -G setting in samtools view.

The value can be specified in hex by beginning with 0x (i.e., /^0x[0-9A-F]+/), in octal by beginning with 0 (i.e., /^0[0-7]+/), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.

See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.

See also
flags_include_all( uint32_t ), flags_include_any( uint32_t ), flags_exclude_any( uint32_t )

Definition at line 497 of file sam_variant_input_iterator.hpp.

◆ flags_exclude_any() [1/2]

uint32_t flags_exclude_any ( ) const
inline

Definition at line 504 of file sam_variant_input_iterator.hpp.

◆ flags_exclude_any() [2/2]

self_type& flags_exclude_any ( uint32_t  value)
inline

Do not use reads with any bits set in value present in the FLAG field of the read.

This is equivalent to the -F / --excl-flags / --exclude-flags setting in samtools view.

The value can be specified in hex by beginning with 0x (i.e., /^0x[0-9A-F]+/), in octal by beginning with 0 (i.e., /^0[0-7]+/), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.

See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.

See also
flags_include_all( uint32_t ), flags_include_any( uint32_t ), flags_exclude_all( uint32_t )

Definition at line 527 of file sam_variant_input_iterator.hpp.

◆ flags_include_all() [1/2]

uint32_t flags_include_all ( ) const
inline

Definition at line 416 of file sam_variant_input_iterator.hpp.

◆ flags_include_all() [2/2]

self_type& flags_include_all ( uint32_t  value)
inline

Only use reads with all bits set in value present in the FLAG field of the read.

This is equivalent to the -f / --require-flags setting in samtools view.

The value can be specified in hex by beginning with 0x (i.e., /^0x[0-9A-F]+/), in octal by beginning with 0 (i.e., /^0[0-7]+/), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.

See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.

See also
flags_include_any( uint32_t ), flags_exclude_all( uint32_t ), flags_exclude_any( uint32_t )

Definition at line 439 of file sam_variant_input_iterator.hpp.

◆ flags_include_any() [1/2]

uint32_t flags_include_any ( ) const
inline

Definition at line 445 of file sam_variant_input_iterator.hpp.

◆ flags_include_any() [2/2]

self_type& flags_include_any ( uint32_t  value)
inline

Only use reads with any bits set in value present in the FLAG field of the read.

This is equivalent to the --rf / --incl-flags / --include-flags setting in samtools view.

The value can be specified in hex by beginning with 0x (i.e., /^0x[0-9A-F]+/), in octal by beginning with 0 (i.e., /^0[0-7]+/), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.

See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.

See also
flags_include_all( uint32_t ), flags_exclude_all( uint32_t ), flags_exclude_any( uint32_t )

Definition at line 468 of file sam_variant_input_iterator.hpp.

◆ input_file() [1/2]

std::string const& input_file ( ) const
inline

Definition at line 395 of file sam_variant_input_iterator.hpp.

◆ input_file() [2/2]

self_type& input_file ( std::string const &  value)
inline

Set the input file.

This overwrites the file if it was already given in the constructor. Shall not be called after iteration has been started.

Definition at line 406 of file sam_variant_input_iterator.hpp.

◆ inverse_rg_tag_filter() [1/2]

bool inverse_rg_tag_filter ( ) const
inline

Definition at line 705 of file sam_variant_input_iterator.hpp.

◆ inverse_rg_tag_filter() [2/2]

self_type& inverse_rg_tag_filter ( bool  value)
inline

Reverse the meaning of the list of sample names given by rg_tag_filter().

See there for details.

Definition at line 715 of file sam_variant_input_iterator.hpp.

◆ max_accumulation_depth() [1/2]

int max_accumulation_depth ( ) const
inline

Definition at line 609 of file sam_variant_input_iterator.hpp.

◆ max_accumulation_depth() [2/2]

self_type& max_accumulation_depth ( int  value)
inline

Set the maximum depth (coverage) at a given position that is actually processed.

The max_depth() setting excludes sites that have depth/coverage above a given value. However, one might want to keep those sites in the iteration, and yet limit the number of bases being tallied up. This setting is mostly meant as a memory saver, in order to avoid piling up too many sites at the same time. When set to a value greater than 0, only that many bases are considered, and any further reads overlapping the site are not taken into account.

Definition at line 624 of file sam_variant_input_iterator.hpp.

◆ max_depth() [1/2]

int max_depth ( ) const
inline

Definition at line 592 of file sam_variant_input_iterator.hpp.

◆ max_depth() [2/2]

self_type& max_depth ( int  value)
inline

Set the maximum depth (coverage) at a given position to be considered.

Positions in the genome with more than the given minimum depth are skipped. If set to 0 (default), the value is not used as a threshold.

Definition at line 603 of file sam_variant_input_iterator.hpp.

◆ min_base_qual() [1/2]

int min_base_qual ( ) const
inline

Definition at line 555 of file sam_variant_input_iterator.hpp.

◆ min_base_qual() [2/2]

self_type& min_base_qual ( int  value)
inline

Set the minimum phred-scaled per-base quality score for a nucleotide to be considered.

Any base that has a quality score below the given value is not taken into account in the per-position tally of counts.

Definition at line 566 of file sam_variant_input_iterator.hpp.

◆ min_depth() [1/2]

int min_depth ( ) const
inline

Definition at line 576 of file sam_variant_input_iterator.hpp.

◆ min_depth() [2/2]

self_type& min_depth ( int  value)
inline

Set the minimum depth (coverage) at a given position to be considered.

Positions in the genome with fewer than the given minimum depth are skipped.

Definition at line 586 of file sam_variant_input_iterator.hpp.

◆ min_map_qual() [1/2]

int min_map_qual ( ) const
inline

Definition at line 537 of file sam_variant_input_iterator.hpp.

◆ min_map_qual() [2/2]

self_type& min_map_qual ( int  value)
inline

Set the minimum phred-scaled mapping quality score for a read in the input file to be considered.

Any read that is below the given value of mapping quality will be completely discarded, and its bases not taken into account.

Definition at line 549 of file sam_variant_input_iterator.hpp.

◆ operator=() [1/2]

SamVariantInputIterator& operator= ( SamVariantInputIterator &&  )
default

◆ operator=() [2/2]

SamVariantInputIterator& operator= ( SamVariantInputIterator const &  )
default

◆ rg_tag_filter() [1/2]

std::unordered_set<std::string> const& rg_tag_filter ( ) const
inline

Definition at line 677 of file sam_variant_input_iterator.hpp.

◆ rg_tag_filter() [2/2]

self_type& rg_tag_filter ( std::unordered_set< std::string > const &  value)
inline

Set the sample names used for filtering reads by their RG read group tag.

Only used when split_by_rg() is set to true. Reads that have an RG read group tag that appears in the header of the input file, but is not present in the value list given here (or in the constructor of the class), will be ignored. That is, they will also not appear in the "unaccounted" sample, independently of the setting of with_unaccounted_rg(). The unaccounted sample will only contain data from those reads that do not have an RG tag at all, or one that does not appear in the header.

See also inverse_rg_tag_filter() to inverse this setting. That is, instead of only using samples based on the RG tags given in this list here, use all but the given RG tags.

When the given value list is empty, the filtering by RG read group tag is deactivated (which is also the default), independently of the inverse_rg_tag_filter() setting.

Definition at line 699 of file sam_variant_input_iterator.hpp.

◆ split_by_rg() [1/2]

bool split_by_rg ( ) const
inline

Definition at line 634 of file sam_variant_input_iterator.hpp.

◆ split_by_rg() [2/2]

self_type& split_by_rg ( bool  value)
inline

If set to true, instead of reading all mapped reads as a single sample, split them by the @RG read group tag.

This way, multiple BaseCounts objects are created in the resulting Variant, one for each read group, and potentially an additional one for the unaccounted reads that do not have a read group, if with_unaccounted_rg() is also set.

Definition at line 647 of file sam_variant_input_iterator.hpp.

◆ with_unaccounted_rg() [1/2]

bool with_unaccounted_rg ( ) const
inline

Definition at line 653 of file sam_variant_input_iterator.hpp.

◆ with_unaccounted_rg() [2/2]

self_type& with_unaccounted_rg ( bool  value)
inline

Decide whether to add a sample for reads without a read group, when splitting by @RG tag.

If split_by_rg() and this option are both set to true, also add a special sample for the reads without a read group, as the last BaseCounts object of the Variant. If this option here is however set to false, all reads without a read group tag or with an invalid read group tag (that does not appear in the header) are ignored. If split_by_rg() is not set to true, this option here is completely ignored.

See also rg_tag_filter() to sub-set the reads by RG, that is, to ignore reads that have a proper RG tag set, but that belong to a sample that shall be ignored.

Definition at line 671 of file sam_variant_input_iterator.hpp.

Member Typedef Documentation

◆ difference_type

using difference_type = std::ptrdiff_t

Definition at line 114 of file sam_variant_input_iterator.hpp.

◆ iterator_category

using iterator_category = std::input_iterator_tag

Definition at line 115 of file sam_variant_input_iterator.hpp.

◆ pointer

using pointer = value_type const*

Definition at line 112 of file sam_variant_input_iterator.hpp.

◆ reference

using reference = value_type const&

Definition at line 113 of file sam_variant_input_iterator.hpp.

◆ self_type

◆ value_type

Definition at line 111 of file sam_variant_input_iterator.hpp.


The documentation for this class was generated from the following files: