#include <genesis/population/formats/sam_variant_input_iterator.hpp>
Input iterator for SAM/BAM/CRAM files that produces a Variant per genome position.
We expect the input file to be sorted by position. Positions with no reads overlapping are skipped.
Exemplary usage:
auto sam_it = SamVariantInputIterator( "/path/to/file.sam" ); sam_it.min_map_qual( 40 ); for( auto const& var : sam_it ) { std::cout << var.chromosome << "\t" << var.position << "\t"; for( auto const& bs : var.samples ) { std::cout << "\t"; to_sync( bs, std::cout ); } std::cout << "\n"; }
By default, as above, all reads are considered to be belonging to the same sample. In that case hence, the above inner loop over samples will only ever go through one BaseCounts object stored in the Variant. We however are also able to split by read group (@RG
), see split_by_rg() and with_unaccounted_rg() for details. In that case, the Variant contains one BaseCounts object per read group, as well as potentially a special one for unaccounted reads with no proper RG. This can further be filtered by setting rg_tag_filter(), to only consider certain RG tags as samples to be produced.
Definition at line 102 of file sam_variant_input_iterator.hpp.
Public Member Functions | |
SamVariantInputIterator () | |
Create a default instance, with no input. This is also the past-the-end iterator. More... | |
SamVariantInputIterator (SamVariantInputIterator &&)=default | |
SamVariantInputIterator (SamVariantInputIterator const &)=default | |
SamVariantInputIterator (std::string const &input_file) | |
SamVariantInputIterator (std::string const &input_file, std::unordered_set< std::string > const &rg_tag_filter, bool inverse_rg_tag_filter=false) | |
~SamVariantInputIterator ()=default | |
Iterator | begin () const |
Iterator | end () const |
uint32_t | flags_exclude_all () const |
self_type & | flags_exclude_all (uint32_t value) |
Do not use reads with all bits set in value present in the FLAG field of the read. More... | |
uint32_t | flags_exclude_any () const |
self_type & | flags_exclude_any (uint32_t value) |
Do not use reads with any bits set in value present in the FLAG field of the read. More... | |
uint32_t | flags_include_all () const |
self_type & | flags_include_all (uint32_t value) |
Only use reads with all bits set in value present in the FLAG field of the read. More... | |
uint32_t | flags_include_any () const |
self_type & | flags_include_any (uint32_t value) |
Only use reads with any bits set in value present in the FLAG field of the read. More... | |
std::string const & | input_file () const |
self_type & | input_file (std::string const &value) |
Set the input file. More... | |
bool | inverse_rg_tag_filter () const |
self_type & | inverse_rg_tag_filter (bool value) |
Reverse the meaning of the list of sample names given by rg_tag_filter(). More... | |
int | max_accumulation_depth () const |
self_type & | max_accumulation_depth (int value) |
Set the maximum depth (coverage) at a given position that is actually processed. More... | |
int | max_depth () const |
self_type & | max_depth (int value) |
Set the maximum depth (coverage) at a given position to be considered. More... | |
int | min_base_qual () const |
self_type & | min_base_qual (int value) |
Set the minimum phred-scaled per-base quality score for a nucleotide to be considered. More... | |
int | min_depth () const |
self_type & | min_depth (int value) |
Set the minimum depth (coverage) at a given position to be considered. More... | |
int | min_map_qual () const |
self_type & | min_map_qual (int value) |
Set the minimum phred-scaled mapping quality score for a read in the input file to be considered. More... | |
SamVariantInputIterator & | operator= (SamVariantInputIterator &&)=default |
SamVariantInputIterator & | operator= (SamVariantInputIterator const &)=default |
std::unordered_set< std::string > const & | rg_tag_filter () const |
self_type & | rg_tag_filter (std::unordered_set< std::string > const &value) |
Set the sample names used for filtering reads by their RG read group tag. More... | |
bool | split_by_rg () const |
self_type & | split_by_rg (bool value) |
If set to true , instead of reading all mapped reads as a single sample, split them by the @RG read group tag. More... | |
bool | with_unaccounted_rg () const |
self_type & | with_unaccounted_rg (bool value) |
Decide whether to add a sample for reads without a read group, when splitting by @RG tag. More... | |
Public Types | |
using | difference_type = std::ptrdiff_t |
using | iterator_category = std::input_iterator_tag |
using | pointer = value_type const * |
using | reference = value_type const & |
using | self_type = SamVariantInputIterator |
using | value_type = Variant |
Classes | |
class | Iterator |
Iterator over loci of the input sources. More... | |
|
inline |
Create a default instance, with no input. This is also the past-the-end iterator.
Definition at line 346 of file sam_variant_input_iterator.hpp.
|
inlineexplicit |
Definition at line 350 of file sam_variant_input_iterator.hpp.
SamVariantInputIterator | ( | std::string const & | input_file, |
std::unordered_set< std::string > const & | rg_tag_filter, | ||
bool | inverse_rg_tag_filter = false |
||
) |
Definition at line 933 of file sam_variant_input_iterator.cpp.
|
default |
|
default |
|
default |
|
inline |
Definition at line 374 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 386 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 474 of file sam_variant_input_iterator.hpp.
|
inline |
Do not use reads with all bits set in value
present in the FLAG field of the read.
This is equivalent to the -G
setting in samtools view.
The value
can be specified in hex by beginning with 0x
(i.e., /^0x[0-9A-F]+/
), in octal by beginning with 0
(i.e., /^0[0-7]+/
), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.
See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.
Definition at line 497 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 504 of file sam_variant_input_iterator.hpp.
|
inline |
Do not use reads with any bits set in value
present in the FLAG field of the read.
This is equivalent to the -F
/ --excl-flags
/ --exclude-flags
setting in samtools view.
The value
can be specified in hex by beginning with 0x
(i.e., /^0x[0-9A-F]+/
), in octal by beginning with 0
(i.e., /^0[0-7]+/
), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.
See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.
Definition at line 527 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 416 of file sam_variant_input_iterator.hpp.
|
inline |
Only use reads with all bits set in value
present in the FLAG field of the read.
This is equivalent to the -f
/ --require-flags
setting in samtools view.
The value
can be specified in hex by beginning with 0x
(i.e., /^0x[0-9A-F]+/
), in octal by beginning with 0
(i.e., /^0[0-7]+/
), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.
See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.
Definition at line 439 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 445 of file sam_variant_input_iterator.hpp.
|
inline |
Only use reads with any bits set in value
present in the FLAG field of the read.
This is equivalent to the --rf
/ --incl-flags
/ --include-flags
setting in samtools view.
The value
can be specified in hex by beginning with 0x
(i.e., /^0x[0-9A-F]+/
), in octal by beginning with 0
(i.e., /^0[0-7]+/
), as a decimal number not beginning with '0', or as a comma-, plus-, or space-separated list of flag names. We are more lenient in parsing flag names then samtools, and allow different capitalization and delimiteres such as dashes and underscores in the flag names as well.
See http://www.htslib.org/doc/samtools-flags.html and https://broadinstitute.github.io/picard/explain-flags.html for details on the flag values, and see https://www.htslib.org/doc/samtools-view.html for their usage in samtools.
Definition at line 468 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 395 of file sam_variant_input_iterator.hpp.
|
inline |
Set the input file.
This overwrites the file if it was already given in the constructor. Shall not be called after iteration has been started.
Definition at line 406 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 705 of file sam_variant_input_iterator.hpp.
|
inline |
Reverse the meaning of the list of sample names given by rg_tag_filter().
See there for details.
Definition at line 715 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 609 of file sam_variant_input_iterator.hpp.
|
inline |
Set the maximum depth (coverage) at a given position that is actually processed.
The max_depth() setting excludes sites that have depth/coverage above a given value. However, one might want to keep those sites in the iteration, and yet limit the number of bases being tallied up. This setting is mostly meant as a memory saver, in order to avoid piling up too many sites at the same time. When set to a value greater than 0, only that many bases are considered, and any further reads overlapping the site are not taken into account.
Definition at line 624 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 592 of file sam_variant_input_iterator.hpp.
|
inline |
Set the maximum depth (coverage) at a given position to be considered.
Positions in the genome with more than the given minimum depth are skipped. If set to 0 (default), the value is not used as a threshold.
Definition at line 603 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 555 of file sam_variant_input_iterator.hpp.
|
inline |
Set the minimum phred-scaled per-base quality score for a nucleotide to be considered.
Any base that has a quality score below the given value is not taken into account in the per-position tally of counts.
Definition at line 566 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 576 of file sam_variant_input_iterator.hpp.
|
inline |
Set the minimum depth (coverage) at a given position to be considered.
Positions in the genome with fewer than the given minimum depth are skipped.
Definition at line 586 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 537 of file sam_variant_input_iterator.hpp.
|
inline |
Set the minimum phred-scaled mapping quality score for a read in the input file to be considered.
Any read that is below the given value of mapping quality will be completely discarded, and its bases not taken into account.
Definition at line 549 of file sam_variant_input_iterator.hpp.
|
default |
|
default |
|
inline |
Definition at line 677 of file sam_variant_input_iterator.hpp.
|
inline |
Set the sample names used for filtering reads by their RG read group tag.
Only used when split_by_rg() is set to true
. Reads that have an RG read group tag that appears in the header of the input file, but is not present in the value
list given here (or in the constructor of the class), will be ignored. That is, they will also not appear in the "unaccounted" sample, independently of the setting of with_unaccounted_rg(). The unaccounted sample will only contain data from those reads that do not have an RG tag at all, or one that does not appear in the header.
See also inverse_rg_tag_filter() to inverse this setting. That is, instead of only using samples based on the RG tags given in this list here, use all but the given RG tags.
When the given value
list is empty, the filtering by RG read group tag is deactivated (which is also the default), independently of the inverse_rg_tag_filter() setting.
Definition at line 699 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 634 of file sam_variant_input_iterator.hpp.
|
inline |
If set to true
, instead of reading all mapped reads as a single sample, split them by the @RG
read group tag.
This way, multiple BaseCounts objects are created in the resulting Variant, one for each read group, and potentially an additional one for the unaccounted reads that do not have a read group, if with_unaccounted_rg() is also set.
Definition at line 647 of file sam_variant_input_iterator.hpp.
|
inline |
Definition at line 653 of file sam_variant_input_iterator.hpp.
|
inline |
Decide whether to add a sample for reads without a read group, when splitting by @RG
tag.
If split_by_rg() and this option are both set to true
, also add a special sample for the reads without a read group, as the last BaseCounts object of the Variant. If this option here is however set to false
, all reads without a read group tag or with an invalid read group tag (that does not appear in the header) are ignored. If split_by_rg() is not set to true
, this option here is completely ignored.
See also rg_tag_filter() to sub-set the reads by RG, that is, to ignore reads that have a proper RG tag set, but that belong to a sample that shall be ignored.
Definition at line 671 of file sam_variant_input_iterator.hpp.
using difference_type = std::ptrdiff_t |
Definition at line 114 of file sam_variant_input_iterator.hpp.
using iterator_category = std::input_iterator_tag |
Definition at line 115 of file sam_variant_input_iterator.hpp.
using pointer = value_type const* |
Definition at line 112 of file sam_variant_input_iterator.hpp.
using reference = value_type const& |
Definition at line 113 of file sam_variant_input_iterator.hpp.
using self_type = SamVariantInputIterator |
Definition at line 110 of file sam_variant_input_iterator.hpp.
using value_type = Variant |
Definition at line 111 of file sam_variant_input_iterator.hpp.