#include <genesis/population/format/vcf_record.hpp>
Capture the information of a single SNP/variant line in a VCF/BCF file.
The basic usage to iterate the records/lines of a VCF/BCF file is:
// Prepare and read all input data and instanticate our classes. auto file = HtsFile( infile ); auto header = VcfHeader( file ); auto record = VcfRecord( header ); // Iterate the file line by line. while( record.read_next( file )) { // Work with the record by calling record.*() functions. }
Within the loop, the respective values of the record can be accessed, for example the simple first columns via get_chromosome(), get_position(), etc. For the reference and alternative alleles, as well as their combination (which we here call the "variants"), we offer access functions. Filters can be tested via has_filter(), or their absence (that is: PASS
) via has_pass().
Testing whether certain INFO fileds are set can be done via has_info() or assert_info(). Accessing the values of the INFO column is a bit more involved, as one needs to know the data type in advance, and call the respective get_info_*()
for the data type (string, float, int, flag). This is a bit unfortunate, and we here simply mimic the behaviour of htslib. A more complex any
wrapper migth be possible, but as one would need to cast in the end anyway, we leave it at that solution for now.
The most involved part is the per-sample access to the values as indicated by the FORMAT column. Testing that a particular FORMAT key/id/field is present in the record can be done via has_format() and assert_format(). Accessing the values per sample is then done with an iterator that has to be obtained for each FORMAT id separately via the begin_format_*()
/ end_format_*()
function pair, or for range-based for loops via the get_format_*()
function. See there, and in particular, see the VcfFormatIterator class documentation for details on usage.
Note that VCF for some reason does not seem to support flags in the FORMAT/sample data (the VCF 4.2 Specification does not state this explicitly, but seems to implicitly assume it; they probably do not allow flags because that would lead to variable length per-sample columns). Hence, we also cannot support per-sample flags.
Definition at line 107 of file vcf_record.hpp.
Public Member Functions | |
VcfRecord () | |
Create a default (empty) instance. More... | |
VcfRecord (VcfHeader &header) | |
Create an instance based on a VCF/BCF header. More... | |
VcfRecord (VcfHeader &header, ::bcf1_t *bcf1) | |
Create an instance by copy. More... | |
VcfRecord (VcfRecord &&) | |
VcfRecord (VcfRecord const &)=delete | |
~VcfRecord () | |
void | assert_format (char const *id) const |
Assert that an FORMAT entry with a given id is present in the record. More... | |
void | assert_format (std::string const &id) const |
Assert that an FORMAT entry with a given id is present in the record. More... | |
void | assert_info (char const *) const |
Assert that an INFO entry with a given id is present in the record. More... | |
void | assert_info (std::string const &id) const |
Assert that an INFO entry with a given id is present in the record. More... | |
std::string | at () const |
Return a textual representation of the current record chromosome position. More... | |
VcfFormatIteratorFloat | begin_format_float (std::string const &id) const |
Get the begin iterator over the samples that accesses a certain FORMAT id as a float value. More... | |
VcfFormatIteratorGenotype | begin_format_genotype () const |
Get the begin iterator over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values. More... | |
VcfFormatIteratorInt | begin_format_int (std::string const &id) const |
Get the begin iterator over the samples that accesses a certain FORMAT id as an int value. More... | |
VcfFormatIteratorString | begin_format_string (std::string const &id) const |
Get the begin iterator over the samples that accesses a certain FORMAT id as a string value. More... | |
::bcf1_t * | data () |
Return the internal htslib bcf1_t record data struct pointer. More... | |
::bcf1_t const * | data () const |
Return the internal htslib bcf1_t record data struct pointer. More... | |
VcfFormatIteratorFloat | end_format_float () const |
Get the end iterator over the samples that accesses a certain FORMAT id as a float value. More... | |
VcfFormatIteratorGenotype | end_format_genotype () const |
Get the end iterator over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values. More... | |
VcfFormatIteratorInt | end_format_int () const |
Get the end iterator over the samples that accesses a certain FORMAT id as an int value. More... | |
VcfFormatIteratorString | end_format_string () const |
Get the end iterator over the samples that accesses a certain FORMAT id as a string value. More... | |
std::string | get_alternative (size_t index) const |
Get a particular alternative allele (ALT , fifth column of the line). More... | |
std::vector< std::string > | get_alternatives () const |
Get the alternative alleles/sequences of the variant (ALT , fifth column of the line). More... | |
size_t | get_alternatives_count () const |
Get the number of alternative alleles/sequences of the variant (ALT , fifth column of the line). More... | |
std::string | get_chromosome () const |
Get the name of a chromosome/contig/sequence (CHROM , first column of the line). More... | |
std::vector< std::string > | get_filter_ids () const |
Get the list of all filter values (PASS or the names of the non-passing filters) that are applied to the record. More... | |
genesis::utils::Range< VcfFormatIteratorFloat > | get_format_float (std::string const &id) const |
Get an iterator pair over the samples that accesses a certain FORMAT id as an float value. More... | |
genesis::utils::Range< VcfFormatIteratorGenotype > | get_format_genotype () const |
Get an iterator pair over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values. More... | |
std::vector< std::string > | get_format_ids () const |
Get the list of all format IDs (FORMAT column) that the record contains. More... | |
genesis::utils::Range< VcfFormatIteratorInt > | get_format_int (std::string const &id) const |
Get an iterator pair over the samples that accesses a certain FORMAT id as an int value. More... | |
genesis::utils::Range< VcfFormatIteratorString > | get_format_string (std::string const &id) const |
Get an iterator pair over the samples that accesses a certain FORMAT id as a string value. More... | |
std::string | get_id () const |
Get the ID string of the variant (ID , third column of the line). More... | |
bool | get_info_flag (std::string const &id) const |
Return whehter an INFO flag is set, that is, whether the info value for a given key id is present in the INFO fields. More... | |
std::vector< double > | get_info_float (std::string const &id) const |
Return the info value for the given key id as a vector of float/double. More... | |
void | get_info_float (std::string const &id, std::vector< double > &destination) const |
Write the info value for the given key id to a given destination vector of float/double. More... | |
std::vector< std::string > | get_info_ids () const |
Get the list of all info IDs (INFO column) that the record contains. More... | |
std::vector< int32_t > | get_info_int (std::string const &id) const |
Return the info value for the given key id as a vector of int. More... | |
void | get_info_int (std::string const &id, std::vector< int32_t > &destination) const |
Write the info value for the given key id to a given destination vector of int. More... | |
std::string | get_info_string (std::string const &id) const |
Return the info value for the given key id as a string. More... | |
void | get_info_string (std::string const &id, std::string &destination) const |
Write the info value for the given key id to a given destination string. More... | |
size_t | get_position () const |
Get the position within the chromosome/contig (POS , second column of the line). More... | |
double | get_quality () const |
Get the quality score (QUAL , sixth column of the line). More... | |
std::string | get_reference () const |
Get the reference allele/sequence of the variant (REF , fourth column of the line). More... | |
std::string | get_variant (size_t index) const |
Get a particular variant (REF or ALT allele). More... | |
size_t | get_variant_count () const |
Get the total number of variants (REF and ALT alleles) in the record/line. More... | |
VariantType | get_variant_type (size_t alt_index) const |
Get the variant type of a particular alternative allele/sequence. More... | |
VariantType | get_variant_types () const |
Get the or'ed (union) value of all variant types of the alternative alleles/sequences of the record. More... | |
std::vector< std::string > | get_variants () const |
Shortcut to get both the reference (REF , fourth column of the line) and the alternative (ALT , fifth column of the line) alleles/sequences of the line. More... | |
bool | has_filter (std::string const &filter) const |
Return whether the record has a given filter set. More... | |
bool | has_format (char const *id) const |
Return whether the record has a given FORMAT id present. More... | |
bool | has_format (std::string const &id) const |
Return whether the record has a given FORMAT id present. More... | |
bool | has_info (char const *id) const |
Return whether the record has a given INFO id present. More... | |
bool | has_info (std::string const &id) const |
Return whether the record has a given INFO id present. More... | |
VcfHeader & | header () |
Return the VcfHeader instance associated with this record. More... | |
VcfHeader const & | header () const |
Return the VcfHeader instance associated with this record. More... | |
bool | is_snp () const |
Return whether this variant is a SNP. More... | |
bool | is_snp_or_alt_del () const |
Return whether this variant is a SNP, or a deletion in the alternative. More... | |
VcfRecord & | operator= (VcfRecord &&) |
VcfRecord & | operator= (VcfRecord const &)=delete |
bool | pass_filter () const |
Return whether the record passes the filters, that is, whether PASS is set, or no filters were applied yet (‘FILTER == ’.'`). More... | |
bool | read_next (HtsFile &source) |
Read the next record/line from the given source , and replace the content of this VcfRecord instance. More... | |
void | swap (VcfRecord &other) |
void | unpack () const |
Unpack the htslib bcf1_t record data. More... | |
Public Types | |
enum | VariantType : int { kRef = 0, kSnp = 1, kMnp = 2, kIndel = 4, kOther = 8, kBreakend = 16, kOverlap = 32 } |
Types of variants of alleles that can occur in a record. More... | |
Friends | |
bool | operator& (VcfRecord::VariantType a, VcfRecord::VariantType b) |
And-operator for VariantTypes. More... | |
VcfRecord | ( | ) |
Create a default (empty) instance.
Definition at line 97 of file vcf_record.cpp.
Create an instance based on a VCF/BCF header.
This is the most common use case, where we create a record instance for a given VCF/BCF file, using its header information for access to details laters.
Definition at line 105 of file vcf_record.cpp.
Create an instance by copy.
This calls ::bcf_dup()
from htslib to create a copy of the given record. The provided bcf1
record hence has to be freed elsewhere.
Definition at line 114 of file vcf_record.cpp.
~VcfRecord | ( | ) |
Definition at line 123 of file vcf_record.cpp.
Definition at line 133 of file vcf_record.cpp.
void assert_format | ( | char const * | id | ) | const |
Assert that an FORMAT entry with a given id
is present in the record.
This is the same as has_format(), but throws an exception in case that the FORMAT ID is not present.
Definition at line 539 of file vcf_record.cpp.
void assert_format | ( | std::string const & | id | ) | const |
Assert that an FORMAT entry with a given id
is present in the record.
This is the same as has_format(), but throws an exception in case that the FORMAT ID is not present.
Definition at line 534 of file vcf_record.cpp.
void assert_info | ( | char const * | id | ) | const |
Assert that an INFO entry with a given id
is present in the record.
This is the same as has_info(), but throws an exception in case that the INFO ID is not present.
Definition at line 415 of file vcf_record.cpp.
void assert_info | ( | std::string const & | id | ) | const |
Assert that an INFO entry with a given id
is present in the record.
This is the same as has_info(), but throws an exception in case that the INFO ID is not present.
Definition at line 410 of file vcf_record.cpp.
std::string at | ( | ) | const |
Return a textual representation of the current record chromosome position.
This is either CHROM:POS
or CHROM:POS (ID)
, depending on whether the ID of the record is set (that is, not equal to ‘’.'`). See get_chromosome(), get_position(), and get_id() for details on the individual parts.
Definition at line 196 of file vcf_record.cpp.
VcfFormatIteratorFloat begin_format_float | ( | std::string const & | id | ) | const |
Get the begin iterator over the samples that accesses a certain FORMAT id
as a float value.
This can be used in a for loop over the samples, using the end_format_*()
iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*()
for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 607 of file vcf_record.cpp.
VcfFormatIteratorGenotype begin_format_genotype | ( | ) | const |
Get the begin iterator over the samples that accesses the FORMAT genotype (GT
field/key/id) as a set of VcfGenotype values.
The GT
genotype field is a special case of htslib, that we hence also have to treat specially here. Instead of strings or ints, we model each call as a VcfGenotype instance; see there for details.
The simplest use case is via the get_format_genotype() function; see there for some convenience examples as well.
This can be used in a for loop over the samples, using the end_format_*()
iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*()
for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 552 of file vcf_record.cpp.
VcfFormatIteratorInt begin_format_int | ( | std::string const & | id | ) | const |
Get the begin iterator over the samples that accesses a certain FORMAT id
as an int value.
This can be used in a for loop over the samples, using the end_format_*()
iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*()
for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 588 of file vcf_record.cpp.
VcfFormatIteratorString begin_format_string | ( | std::string const & | id | ) | const |
Get the begin iterator over the samples that accesses a certain FORMAT id
as a string value.
This can be used in a for loop over the samples, using the end_format_*()
iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*()
for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 569 of file vcf_record.cpp.
|
inline |
Return the internal htslib bcf1_t
record data struct pointer.
Definition at line 194 of file vcf_record.hpp.
|
inline |
Return the internal htslib bcf1_t
record data struct pointer.
Definition at line 202 of file vcf_record.hpp.
VcfFormatIteratorFloat end_format_float | ( | ) | const |
Get the end iterator over the samples that accesses a certain FORMAT id
as a float value.
See begin_format_float() for details.
Definition at line 612 of file vcf_record.cpp.
VcfFormatIteratorGenotype end_format_genotype | ( | ) | const |
Get the end iterator over the samples that accesses the FORMAT genotype (GT
field/key/id) as a set of VcfGenotype values.
See begin_format_genotype() for details.
Definition at line 557 of file vcf_record.cpp.
VcfFormatIteratorInt end_format_int | ( | ) | const |
Get the end iterator over the samples that accesses a certain FORMAT id
as an int value.
See begin_format_int() for details.
Definition at line 593 of file vcf_record.cpp.
VcfFormatIteratorString end_format_string | ( | ) | const |
Get the end iterator over the samples that accesses a certain FORMAT id
as a string value.
See begin_format_string() for details.
Definition at line 574 of file vcf_record.cpp.
std::string get_alternative | ( | size_t | index | ) | const |
Get a particular alternative allele (ALT
, fifth column of the line).
This is equivalent to calling get_alternatives() to obtain a particular alternative at an index
, but faster if only a particular entry is needed or when iterating them. Valid indices are in the range [ 0, get_alternatives_count() )
.
Definition at line 224 of file vcf_record.cpp.
std::vector< std::string > get_alternatives | ( | ) | const |
Get the alternative alleles/sequences of the variant (ALT
, fifth column of the line).
Definition at line 212 of file vcf_record.cpp.
size_t get_alternatives_count | ( | ) | const |
Get the number of alternative alleles/sequences of the variant (ALT
, fifth column of the line).
This simply gives their count, which is identical to get_alternatives().size()
. See also get_variant_count(), which gives the number of total variants (reference and alternative alleles), and which hence is get_alternatives_count() + 1
. Although these functions only differ in the extra counting of the reference allele, we offer both so that using them reflects the intention (are we interested in just the alternative, or all variants?).
Definition at line 239 of file vcf_record.cpp.
std::string get_chromosome | ( | ) | const |
Get the name of a chromosome/contig/sequence (CHROM
, first column of the line).
Definition at line 170 of file vcf_record.cpp.
std::vector< std::string > get_filter_ids | ( | ) | const |
Get the list of all filter values (PASS
or the names of the non-passing filters) that are applied to the record.
For example, the lines
#CHROM POS ID REF ALT QUAL FILTER [...] 20 14370 rs6054257 G A 29 PASS [...] 20 17330 . T A 3 q10 [...]
would return {"PASS"}
and {"q10"}
, respectively.
Definition at line 344 of file vcf_record.cpp.
genesis::utils::Range< VcfFormatIteratorFloat > get_format_float | ( | std::string const & | id | ) | const |
Get an iterator pair over the samples that accesses a certain FORMAT id
as an float value.
This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*()
and end_format_*()
for the respective begin and end iterators, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 617 of file vcf_record.cpp.
genesis::utils::Range< VcfFormatIteratorGenotype > get_format_genotype | ( | ) | const |
Get an iterator pair over the samples that accesses the FORMAT genotype (GT
field/key/id) as a set of VcfGenotype values.
See begin_format_genotype() for why we offer a special function to get the genotype field.
In order to simply print the VCF-like genotype information for the samples of a record, the vcf_genotype_string() convenience function can be used:
for( auto gt_iterator : record_iterator->get_format_genotype() ) { std::cout << vcf_genotype_string( gt_iterator.get_values() ) << "\n"; }
This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*()
and end_format_*()
for the respective begin and end iterators, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 562 of file vcf_record.cpp.
std::vector< std::string > get_format_ids | ( | ) | const |
Get the list of all format IDs (FORMAT
column) that the record contains.
For example, the line
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT [...] 20 14370 rs6054257 G A,CG 29 PASS NS=3;DP=14 GT:GQ:DP:HQ
would return a list containing {"GT", "GQ", "DP", "HQ"}
.
Definition at line 512 of file vcf_record.cpp.
genesis::utils::Range< VcfFormatIteratorInt > get_format_int | ( | std::string const & | id | ) | const |
Get an iterator pair over the samples that accesses a certain FORMAT id
as an int value.
This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*()
and end_format_*()
for the respective begin and end iterators, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 598 of file vcf_record.cpp.
genesis::utils::Range< VcfFormatIteratorString > get_format_string | ( | std::string const & | id | ) | const |
Get an iterator pair over the samples that accesses a certain FORMAT id
as a string value.
This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*()
and end_format_*()
for the respective begin and end iterators, and see the other begin_format_*()
, end_format_*()
, and get_format_*()
functions for the equivalents for other FORMAT data types.
Definition at line 579 of file vcf_record.cpp.
std::string get_id | ( | ) | const |
Get the ID string of the variant (ID
, third column of the line).
Another instance where the overloaded term "ID" is used in VCF and in htslib. Here, it stands for the identifier list of the line for SNP databases.
Definition at line 190 of file vcf_record.cpp.
bool get_info_flag | ( | std::string const & | id | ) | const |
Return whehter an INFO flag is set, that is, whether the info value for a given key id
is present in the INFO fields.
This is meant for flags and returns whether the flag has been set or not in the record.
Definition at line 502 of file vcf_record.cpp.
std::vector< double > get_info_float | ( | std::string const & | id | ) | const |
Return the info value for the given key id
as a vector of float/double.
While htslib uses float
, we use double
throughout genesis, and hence return this here.
The function throws an exception if the requested id
is not present (in the header or the record) or if the value behind that id
is of a different type.
Definition at line 454 of file vcf_record.cpp.
void get_info_float | ( | std::string const & | id, |
std::vector< double > & | destination | ||
) | const |
Write the info value for the given key id
to a given destination
vector of float/double.
If the destination
vector is re-used between calles for different records, this is the faster variant that saves on memory allocations.
While htslib uses float
, we use double
throughout genesis, and hence return this here.
The function throws an exception if the requested id
is not present (in the header or the record) or if the value behind that id
is of a different type.
Definition at line 461 of file vcf_record.cpp.
std::vector< std::string > get_info_ids | ( | ) | const |
Get the list of all info IDs (INFO
column) that the record contains.
For example, the line
#CHROM POS ID REF ALT QUAL FILTER INFO [...] 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 [...]
would return a list containing {"NS", "DP", "AF", "DB", "H2"}
.
Definition at line 386 of file vcf_record.cpp.
std::vector< int32_t > get_info_int | ( | std::string const & | id | ) | const |
Return the info value for the given key id
as a vector of int.
The function throws an exception if the requested id
is not present (in the header or the record) or if the value behind that id
is of a different type.
Definition at line 478 of file vcf_record.cpp.
void get_info_int | ( | std::string const & | id, |
std::vector< int32_t > & | destination | ||
) | const |
Write the info value for the given key id
to a given destination
vector of int.
If the destination
vector is re-used between calles for different records, this is the faster variant that saves on memory allocations.
The function throws an exception if the requested id
is not present (in the header or the record) or if the value behind that id
is of a different type.
Definition at line 485 of file vcf_record.cpp.
std::string get_info_string | ( | std::string const & | id | ) | const |
Return the info value for the given key id
as a string.
The function throws an exception if the requested id
is not present (in the header or the record) or if the value behind that id
is of a different type.
Definition at line 424 of file vcf_record.cpp.
void get_info_string | ( | std::string const & | id, |
std::string & | destination | ||
) | const |
Write the info value for the given key id
to a given destination
string.
If the destination
string is re-used between calles for different records, this is the faster variant that saves on memory allocations.
The function throws an exception if the requested id
is not present (in the header or the record) or if the value behind that id
is of a different type.
Definition at line 431 of file vcf_record.cpp.
size_t get_position | ( | ) | const |
Get the position within the chromosome/contig (POS
, second column of the line).
We report the position as given in the VCF/BCF file, that is, 1-based!
Definition at line 181 of file vcf_record.cpp.
double get_quality | ( | ) | const |
Get the quality score (QUAL
, sixth column of the line).
Definition at line 335 of file vcf_record.cpp.
std::string get_reference | ( | ) | const |
Get the reference allele/sequence of the variant (REF
, fourth column of the line).
Definition at line 202 of file vcf_record.cpp.
std::string get_variant | ( | size_t | index | ) | const |
Get a particular variant (REF
or ALT
allele).
This is equivalent to calling get_variants() to obtain a particular alternative at an index
, but faster if only a particular entry is needed or when iterating them. Valid indices are in the range [ 0, get_variant_count() )
.
Note that compared to get_alternative() and get_alternatives(), the indices of the alternative alleles are shifted, as the reference allele is at index 0
.
Definition at line 260 of file vcf_record.cpp.
size_t get_variant_count | ( | ) | const |
Get the total number of variants (REF
and ALT
alleles) in the record/line.
This is equivalent to get_variants().size()
, and to get_alternatives_count() + 1
(taking the REF
into account).
Definition at line 275 of file vcf_record.cpp.
VcfRecord::VariantType get_variant_type | ( | size_t | alt_index | ) | const |
Get the variant type of a particular alternative allele/sequence.
The alt_index
is the 0-based index of the alternative allele/sequence for which we want the variant type. This corresponds to the indices in the get_alternatives() result vector, which are hence in the range [ 0, get_alternatives_count() )
.
Definition at line 287 of file vcf_record.cpp.
VcfRecord::VariantType get_variant_types | ( | ) | const |
Get the or'ed (union) value of all variant types of the alternative alleles/sequences of the record.
This can be used to simply test whether a particular type of variant appears at all in a given record:
if( record.get_variant_types() & VcfRecord::VariantType::kSnp ) { ... }
See operator& ( VariantType a, VariantType b ) for details.
This is a simple wrapper for ::bcf_get_variant_types()
from htslib, which however is not ideally named, as "variants" seems to mean REF+ALT in VCF terminology, but the flag for REF has value 0 and hence is (in a sense) always set in the result. Of course, this makes sense, as we always have a reference variant. But technically, we cannot test for this, so this function only is useful for alternative alleles, and not all variants. Still, we follow their terminology here.
Definition at line 282 of file vcf_record.cpp.
std::vector< std::string > get_variants | ( | ) | const |
Shortcut to get both the reference (REF
, fourth column of the line) and the alternative (ALT
, fifth column of the line) alleles/sequences of the line.
This simply combines get_reference() and get_alternatives(). Note that hence the indices of the alternatives are shifted as compared to get_alternatives(), as the reference allele is at index 0
.
Definition at line 248 of file vcf_record.cpp.
bool has_filter | ( | std::string const & | filter | ) | const |
Return whether the record has a given filter
set.
For example, provided with filter == "q10"
, the function returns whether the q10
filter is set for the record (indicating that the record failed that filter test).
Definition at line 354 of file vcf_record.cpp.
bool has_format | ( | char const * | id | ) | const |
Return whether the record has a given FORMAT id
present.
Definition at line 528 of file vcf_record.cpp.
bool has_format | ( | std::string const & | id | ) | const |
Return whether the record has a given FORMAT id
present.
Definition at line 522 of file vcf_record.cpp.
bool has_info | ( | char const * | id | ) | const |
Return whether the record has a given INFO id
present.
Definition at line 401 of file vcf_record.cpp.
bool has_info | ( | std::string const & | id | ) | const |
Return whether the record has a given INFO id
present.
Definition at line 396 of file vcf_record.cpp.
|
inline |
Return the VcfHeader instance associated with this record.
This can also be used to obtain internal htslib bcf_hdr_t
header data struct pointer, if needed for any functionality that is not offered by our wrapper.
Definition at line 213 of file vcf_record.hpp.
|
inline |
Return the VcfHeader instance associated with this record.
This can also be used to obtain internal htslib bcf_hdr_t
header data struct pointer, if needed for any functionality that is not offered by our wrapper.
Definition at line 224 of file vcf_record.hpp.
bool is_snp | ( | ) | const |
Return whether this variant is a SNP.
This is simply a wrapper for the htslib function ::bcf_is_snp()
. It returns true
iff the reference and all alternative alleles/sequence are single characters (and none of them is a ‘’*'` missing allele character).
Definition at line 304 of file vcf_record.cpp.
bool is_snp_or_alt_del | ( | ) | const |
Return whether this variant is a SNP, or a deletion in the alternative.
This is similar to is_snp(), but also allows the alternatives to have a deletion.
Definition at line 309 of file vcf_record.cpp.
Definition at line 140 of file vcf_record.cpp.
bool pass_filter | ( | ) | const |
Return whether the record passes the filters, that is, whether PASS
is set, or no filters were applied yet (‘FILTER == ’.'`).
This is identical to calling has_filter() with the argument "PASS"
, and is the preferred method to test whether a record line has passed all filters (return value true
), or not (return value false
).
Definition at line 371 of file vcf_record.cpp.
bool read_next | ( | HtsFile & | source | ) |
Read the next record/line from the given source
, and replace the content of this VcfRecord instance.
This is the function used to iterate a VCF/BCF file. It returns true
on success, and can hence be used as follows:
while( record.read_next( file )) { // work with the current record line }
Within that loop, all functions of the record instance can be used.
Definition at line 630 of file vcf_record.cpp.
void swap | ( | VcfRecord & | other | ) |
Definition at line 149 of file vcf_record.cpp.
void unpack | ( | ) | const |
Unpack the htslib bcf1_t
record data.
Some external functions might want to work with the bcf1_t
record data struct pointer directly, but need to unpack the data first. This is not possible when the VcfRecord is passed by const reference for example, so we offer this function that unpacks a const VcfRecord, so that its internal htslib data is available.
Definition at line 165 of file vcf_record.cpp.
|
friend |
And-operator for VariantTypes.
The function get_variant_types() returns the or'ed (union) values of all variant types that appear in the alternative alleles of the record. Hence, this and-operator can be used to disentangle them and test whether a particular variant occurs in the record:
if( record.get_variant_types() & Record::VariantType::kSnp ) { ... }
See get_variant_types() for details.
It's a bit ugly to return a bool from such a comparison, but for now, it works. Should more complex use cases arise in the future, this might change.
Definition at line 147 of file vcf_record.hpp.
|
strong |
Types of variants of alleles that can occur in a record.
Corresponds to the VCF_*
macro constants defined by htslib. We statically assert that these have the same values.
Enumerator | |
---|---|
kRef | |
kSnp | |
kMnp | |
kIndel | |
kOther | |
kBreakend | |
kOverlap |
Definition at line 121 of file vcf_record.hpp.