A library for working with phylogenetic and population genetic data.
v0.27.0
VcfRecord Class Reference

#include <genesis/population/formats/vcf_record.hpp>

Detailed Description

Capture the information of a single SNP/variant line in a VCF/BCF file.

The basic usage to iterate the records/lines of a VCF/BCF file is:

// Prepare and read all input data and instanticate our classes.
auto file = HtsFile( infile );
auto header = VcfHeader( file );
auto record = VcfRecord( header );

// Iterate the file line by line.
while( record.read_next( file )) {
    // Work with the record by calling record.*() functions.
}

Within the loop, the respective values of the record can be accessed, for example the simple first columns via get_chromosome(), get_position(), etc. For the reference and alternative alleles, as well as their combination (which we here call the "variants"), we offer access functions. Filters can be tested via has_filter(), or their absence (that is: PASS) via has_pass().

Testing whether certain INFO fileds are set can be done via has_info() or assert_info(). Accessing the values of the INFO column is a bit more involved, as one needs to know the data type in advance, and call the respective get_info_*() for the data type (string, float, int, flag). This is a bit unfortunate, and we here simply mimic the behaviour of htslib. A more complex any wrapper migth be possible, but as one would need to cast in the end anyway, we leave it at that solution for now.

The most involved part is the per-sample access to the values as indicated by the FORMAT column. Testing that a particular FORMAT key/id/field is present in the record can be done via has_format() and assert_format(). Accessing the values per sample is then done with an iterator that has to be obtained for each FORMAT id separately via the begin_format_*() / end_format_*() function pair, or for range-based for loops via the get_format_*() function. See there, and in particular, see the VcfFormatIterator class documentation for details on usage.

Note that VCF for some reason does not seem to support flags in the FORMAT/sample data (the VCF 4.2 Specification does not state this explicitly, but seems to implicitly assume it; they probably do not allow flags because that would lead to variable length per-sample columns). Hence, we also cannot support per-sample flags.

Definition at line 107 of file vcf_record.hpp.

Public Member Functions

 VcfRecord ()
 Create a default (empty) instance. More...
 
 VcfRecord (VcfHeader &header)
 Create an instance based on a VCF/BCF header. More...
 
 VcfRecord (VcfHeader &header, ::bcf1_t *bcf1)
 Create an instance by copy. More...
 
 VcfRecord (VcfRecord &&)
 
 VcfRecord (VcfRecord const &)=delete
 
 ~VcfRecord ()
 
void assert_format (char const *id) const
 Assert that an FORMAT entry with a given id is present in the record. More...
 
void assert_format (std::string const &id) const
 Assert that an FORMAT entry with a given id is present in the record. More...
 
void assert_info (char const *) const
 Assert that an INFO entry with a given id is present in the record. More...
 
void assert_info (std::string const &id) const
 Assert that an INFO entry with a given id is present in the record. More...
 
std::string at () const
 Return a textual representation of the current record chromosome position. More...
 
VcfFormatIteratorFloat begin_format_float (std::string const &id) const
 Get the begin iterator over the samples that accesses a certain FORMAT id as a float value. More...
 
VcfFormatIteratorGenotype begin_format_genotype () const
 Get the begin iterator over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values. More...
 
VcfFormatIteratorInt begin_format_int (std::string const &id) const
 Get the begin iterator over the samples that accesses a certain FORMAT id as an int value. More...
 
VcfFormatIteratorString begin_format_string (std::string const &id) const
 Get the begin iterator over the samples that accesses a certain FORMAT id as a string value. More...
 
::bcf1_t * data ()
 Return the internal htslib bcf1_t record data struct pointer. More...
 
::bcf1_t const * data () const
 Return the internal htslib bcf1_t record data struct pointer. More...
 
VcfFormatIteratorFloat end_format_float () const
 Get the end iterator over the samples that accesses a certain FORMAT id as a float value. More...
 
VcfFormatIteratorGenotype end_format_genotype () const
 Get the end iterator over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values. More...
 
VcfFormatIteratorInt end_format_int () const
 Get the end iterator over the samples that accesses a certain FORMAT id as an int value. More...
 
VcfFormatIteratorString end_format_string () const
 Get the end iterator over the samples that accesses a certain FORMAT id as a string value. More...
 
std::string get_alternative (size_t index) const
 Get a particular alternative allele (ALT, fifth column of the line). More...
 
std::vector< std::string > get_alternatives () const
 Get the alternative alleles/sequences of the variant (ALT, fifth column of the line). More...
 
size_t get_alternatives_count () const
 Get the number of alternative alleles/sequences of the variant (ALT, fifth column of the line). More...
 
std::string get_chromosome () const
 Get the name of a chromosome/contig/sequence (CHROM, first column of the line). More...
 
std::vector< std::string > get_filter_ids () const
 Get the list of all filter values (PASS or the names of the non-passing filters) that are applied to the record. More...
 
genesis::utils::Range< VcfFormatIteratorFloatget_format_float (std::string const &id) const
 Get an iterator pair over the samples that accesses a certain FORMAT id as an float value. More...
 
genesis::utils::Range< VcfFormatIteratorGenotypeget_format_genotype () const
 Get an iterator pair over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values. More...
 
std::vector< std::string > get_format_ids () const
 Get the list of all format IDs (FORMAT column) that the record contains. More...
 
genesis::utils::Range< VcfFormatIteratorIntget_format_int (std::string const &id) const
 Get an iterator pair over the samples that accesses a certain FORMAT id as an int value. More...
 
genesis::utils::Range< VcfFormatIteratorStringget_format_string (std::string const &id) const
 Get an iterator pair over the samples that accesses a certain FORMAT id as a string value. More...
 
std::string get_id () const
 Get the ID string of the variant (ID, third column of the line). More...
 
bool get_info_flag (std::string const &id) const
 Return whehter an INFO flag is set, that is, whether the info value for a given key id is present in the INFO fields. More...
 
std::vector< double > get_info_float (std::string const &id) const
 Return the info value for the given key id as a vector of float/double. More...
 
void get_info_float (std::string const &id, std::vector< double > &destination) const
 Write the info value for the given key id to a given destination vector of float/double. More...
 
std::vector< std::string > get_info_ids () const
 Get the list of all info IDs (INFO column) that the record contains. More...
 
std::vector< int32_t > get_info_int (std::string const &id) const
 Return the info value for the given key id as a vector of int. More...
 
void get_info_int (std::string const &id, std::vector< int32_t > &destination) const
 Write the info value for the given key id to a given destination vector of int. More...
 
std::string get_info_string (std::string const &id) const
 Return the info value for the given key id as a string. More...
 
void get_info_string (std::string const &id, std::string &destination) const
 Write the info value for the given key id to a given destination string. More...
 
size_t get_position () const
 Get the position within the chromosome/contig (POS, second column of the line). More...
 
double get_quality () const
 Get the quality score (QUAL, sixth column of the line). More...
 
std::string get_reference () const
 Get the reference allele/sequence of the variant (REF, fourth column of the line). More...
 
std::string get_variant (size_t index) const
 Get a particular variant (REF or ALT allele). More...
 
size_t get_variant_count () const
 Get the total number of variants (REF and ALT alleles) in the record/line. More...
 
VariantType get_variant_type (size_t alt_index) const
 Get the variant type of a particular alternative allele/sequence. More...
 
VariantType get_variant_types () const
 Get the or'ed (union) value of all variant types of the alternative alleles/sequences of the record. More...
 
std::vector< std::string > get_variants () const
 Shortcut to get both the reference (REF, fourth column of the line) and the alternative (ALT, fifth column of the line) alleles/sequences of the line. More...
 
bool has_filter (std::string const &filter) const
 Return whether the record has a given filter set. More...
 
bool has_format (char const *id) const
 Return whether the record has a given FORMAT id present. More...
 
bool has_format (std::string const &id) const
 Return whether the record has a given FORMAT id present. More...
 
bool has_info (char const *id) const
 Return whether the record has a given INFO id present. More...
 
bool has_info (std::string const &id) const
 Return whether the record has a given INFO id present. More...
 
VcfHeaderheader ()
 Return the VcfHeader instance associated with this record. More...
 
VcfHeader const & header () const
 Return the VcfHeader instance associated with this record. More...
 
bool is_snp () const
 Return whether this variant is a SNP. More...
 
VcfRecordoperator= (VcfRecord &&)
 
VcfRecordoperator= (VcfRecord const &)=delete
 
bool pass_filter () const
 Return whether the record passes the filters, that is, whether PASS is set. More...
 
bool read_next (HtsFile &source)
 Read the next record/line from the given source, and replace the content of this VcfRecord instance. More...
 
void swap (VcfRecord &other)
 
void unpack () const
 Unpack the htslib bcf1_t record data. More...
 

Public Types

enum  VariantType : int {
  kRef = 0, kSnp = 1, kMnp = 2, kIndel = 4,
  kOther = 8, kBreakend = 16, kOverlap = 32
}
 Types of variants of alleles that can occur in a record. More...
 

Friends

bool operator& (VcfRecord::VariantType a, VcfRecord::VariantType b)
 And-operator for VariantTypes. More...
 

Constructor & Destructor Documentation

◆ VcfRecord() [1/5]

VcfRecord ( )

Create a default (empty) instance.

Definition at line 97 of file vcf_record.cpp.

◆ VcfRecord() [2/5]

VcfRecord ( VcfHeader header)
explicit

Create an instance based on a VCF/BCF header.

This is the most common use case, where we create a record instance for a given VCF/BCF file, using its header information for access to details laters.

Definition at line 105 of file vcf_record.cpp.

◆ VcfRecord() [3/5]

VcfRecord ( VcfHeader header,
::bcf1_t *  bcf1 
)

Create an instance by copy.

This calls ::bcf_dup() from htslib to create a copy of the given record. The provided bcf1 record hence has to be freed elsewhere.

Definition at line 114 of file vcf_record.cpp.

◆ ~VcfRecord()

~VcfRecord ( )

Definition at line 123 of file vcf_record.cpp.

◆ VcfRecord() [4/5]

VcfRecord ( VcfRecord const &  )
delete

◆ VcfRecord() [5/5]

VcfRecord ( VcfRecord &&  other)

Definition at line 133 of file vcf_record.cpp.

Member Function Documentation

◆ assert_format() [1/2]

void assert_format ( char const *  id) const

Assert that an FORMAT entry with a given id is present in the record.

This is the same as has_format(), but throws an exception in case that the FORMAT ID is not present.

Definition at line 511 of file vcf_record.cpp.

◆ assert_format() [2/2]

void assert_format ( std::string const &  id) const

Assert that an FORMAT entry with a given id is present in the record.

This is the same as has_format(), but throws an exception in case that the FORMAT ID is not present.

Definition at line 506 of file vcf_record.cpp.

◆ assert_info() [1/2]

void assert_info ( char const *  id) const

Assert that an INFO entry with a given id is present in the record.

This is the same as has_info(), but throws an exception in case that the INFO ID is not present.

Definition at line 387 of file vcf_record.cpp.

◆ assert_info() [2/2]

void assert_info ( std::string const &  id) const

Assert that an INFO entry with a given id is present in the record.

This is the same as has_info(), but throws an exception in case that the INFO ID is not present.

Definition at line 382 of file vcf_record.cpp.

◆ at()

std::string at ( ) const

Return a textual representation of the current record chromosome position.

This is either CHROM:POS or CHROM:POS (ID), depending on whether the ID of the record is set (that is, not equal to ‘’.'`). See get_chromosome(), get_position(), and get_id() for details on the individual parts.

Definition at line 196 of file vcf_record.cpp.

◆ begin_format_float()

VcfFormatIteratorFloat begin_format_float ( std::string const &  id) const

Get the begin iterator over the samples that accesses a certain FORMAT id as a float value.

This can be used in a for loop over the samples, using the end_format_*() iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*() for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 579 of file vcf_record.cpp.

◆ begin_format_genotype()

VcfFormatIteratorGenotype begin_format_genotype ( ) const

Get the begin iterator over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values.

The GT genotype field is a special case of htslib, that we hence also have to treat specially here. Instead of strings or ints, we model each call as a VcfGenotype instance; see there for details.

The simplest use case is via the get_format_genotype() function; see there for some convenience examples as well.

This can be used in a for loop over the samples, using the end_format_*() iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*() for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 524 of file vcf_record.cpp.

◆ begin_format_int()

VcfFormatIteratorInt begin_format_int ( std::string const &  id) const

Get the begin iterator over the samples that accesses a certain FORMAT id as an int value.

This can be used in a for loop over the samples, using the end_format_*() iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*() for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 560 of file vcf_record.cpp.

◆ begin_format_string()

VcfFormatIteratorString begin_format_string ( std::string const &  id) const

Get the begin iterator over the samples that accesses a certain FORMAT id as a string value.

This can be used in a for loop over the samples, using the end_format_*() iterator for the end condition. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see get_format_*() for a combined range iterator pair to be used in range-based for loops, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 541 of file vcf_record.cpp.

◆ data() [1/2]

::bcf1_t* data ( )
inline

Return the internal htslib bcf1_t record data struct pointer.

Definition at line 194 of file vcf_record.hpp.

◆ data() [2/2]

::bcf1_t const* data ( ) const
inline

Return the internal htslib bcf1_t record data struct pointer.

Definition at line 202 of file vcf_record.hpp.

◆ end_format_float()

VcfFormatIteratorFloat end_format_float ( ) const

Get the end iterator over the samples that accesses a certain FORMAT id as a float value.

See begin_format_float() for details.

Definition at line 584 of file vcf_record.cpp.

◆ end_format_genotype()

VcfFormatIteratorGenotype end_format_genotype ( ) const

Get the end iterator over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values.

See begin_format_genotype() for details.

Definition at line 529 of file vcf_record.cpp.

◆ end_format_int()

VcfFormatIteratorInt end_format_int ( ) const

Get the end iterator over the samples that accesses a certain FORMAT id as an int value.

See begin_format_int() for details.

Definition at line 565 of file vcf_record.cpp.

◆ end_format_string()

VcfFormatIteratorString end_format_string ( ) const

Get the end iterator over the samples that accesses a certain FORMAT id as a string value.

See begin_format_string() for details.

Definition at line 546 of file vcf_record.cpp.

◆ get_alternative()

std::string get_alternative ( size_t  index) const

Get a particular alternative allele (ALT, fifth column of the line).

This is equivalent to calling get_alternatives() to obtain a particular alternative at an index, but faster if only a particular entry is needed or when iterating them. Valid indices are in the range [ 0, get_alternatives_count() ).

Definition at line 224 of file vcf_record.cpp.

◆ get_alternatives()

std::vector< std::string > get_alternatives ( ) const

Get the alternative alleles/sequences of the variant (ALT, fifth column of the line).

Definition at line 212 of file vcf_record.cpp.

◆ get_alternatives_count()

size_t get_alternatives_count ( ) const

Get the number of alternative alleles/sequences of the variant (ALT, fifth column of the line).

This simply gives their count, which is identical to get_alternatives().size(). See also get_variant_count(), which gives the number of total variants (reference and alternative alleles), and which hence is get_alternatives_count() + 1. Although these functions only differ in the extra counting of the reference allele, we offer both so that using them reflects the intention (are we interested in just the alternative, or all variants?).

Definition at line 239 of file vcf_record.cpp.

◆ get_chromosome()

std::string get_chromosome ( ) const

Get the name of a chromosome/contig/sequence (CHROM, first column of the line).

Definition at line 170 of file vcf_record.cpp.

◆ get_filter_ids()

std::vector< std::string > get_filter_ids ( ) const

Get the list of all filter values (PASS or the names of the non-passing filters) that are applied to the record.

For example, the lines

#CHROM POS      ID         REF   ALT    QUAL  FILTER  [...]
20     14370    rs6054257  G     A      29    PASS    [...]
20     17330    .          T     A      3     q10     [...]

would return {"PASS"} and {"q10"}, respectively.

Definition at line 318 of file vcf_record.cpp.

◆ get_format_float()

genesis::utils::Range< VcfFormatIteratorFloat > get_format_float ( std::string const &  id) const

Get an iterator pair over the samples that accesses a certain FORMAT id as an float value.

This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*() and end_format_*() for the respective begin and end iterators, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 589 of file vcf_record.cpp.

◆ get_format_genotype()

genesis::utils::Range< VcfFormatIteratorGenotype > get_format_genotype ( ) const

Get an iterator pair over the samples that accesses the FORMAT genotype (GT field/key/id) as a set of VcfGenotype values.

See begin_format_genotype() for why we offer a special function to get the genotype field.

In order to simply print the VCF-like genotype information for the samples of a record, the vcf_genotype_string() convenience function can be used:

for( auto gt_iterator : record_iterator->get_format_genotype() ) {
    std::cout << vcf_genotype_string( gt_iterator.get_values() ) << "\n";
}

This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*() and end_format_*() for the respective begin and end iterators, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 534 of file vcf_record.cpp.

◆ get_format_ids()

std::vector< std::string > get_format_ids ( ) const

Get the list of all format IDs (FORMAT column) that the record contains.

For example, the line

#CHROM POS      ID        REF ALT  QUAL FILTER INFO       FORMAT      [...]
20     14370    rs6054257 G   A,CG 29   PASS   NS=3;DP=14 GT:GQ:DP:HQ

would return a list containing {"GT", "GQ", "DP", "HQ"}.

Definition at line 484 of file vcf_record.cpp.

◆ get_format_int()

genesis::utils::Range< VcfFormatIteratorInt > get_format_int ( std::string const &  id) const

Get an iterator pair over the samples that accesses a certain FORMAT id as an int value.

This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*() and end_format_*() for the respective begin and end iterators, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 570 of file vcf_record.cpp.

◆ get_format_string()

genesis::utils::Range< VcfFormatIteratorString > get_format_string ( std::string const &  id) const

Get an iterator pair over the samples that accesses a certain FORMAT id as a string value.

This can directly be used in a range-based for loop over the samples. See the VcfRecord and the VcfFormatIterator class descriptions for details on usage. Also, see begin_format_*() and end_format_*() for the respective begin and end iterators, and see the other begin_format_*(), end_format_*(), and get_format_*() functions for the equivalents for other FORMAT data types.

Definition at line 551 of file vcf_record.cpp.

◆ get_id()

std::string get_id ( ) const

Get the ID string of the variant (ID, third column of the line).

Another instance where the overloaded term "ID" is used in VCF and in htslib. Here, it stands for the identifier list of the line for SNP databases.

Definition at line 190 of file vcf_record.cpp.

◆ get_info_flag()

bool get_info_flag ( std::string const &  id) const

Return whehter an INFO flag is set, that is, whether the info value for a given key id is present in the INFO fields.

This is meant for flags and returns whether the flag has been set or not in the record.

Definition at line 474 of file vcf_record.cpp.

◆ get_info_float() [1/2]

std::vector< double > get_info_float ( std::string const &  id) const

Return the info value for the given key id as a vector of float/double.

While htslib uses float, we use double throughout genesis, and hence return this here.

The function throws an exception if the requested id is not present (in the header or the record) or if the value behind that id is of a different type.

Definition at line 426 of file vcf_record.cpp.

◆ get_info_float() [2/2]

void get_info_float ( std::string const &  id,
std::vector< double > &  destination 
) const

Write the info value for the given key id to a given destination vector of float/double.

If the destination vector is re-used between calles for different records, this is the faster variant that saves on memory allocations.

While htslib uses float, we use double throughout genesis, and hence return this here.

The function throws an exception if the requested id is not present (in the header or the record) or if the value behind that id is of a different type.

Definition at line 433 of file vcf_record.cpp.

◆ get_info_ids()

std::vector< std::string > get_info_ids ( ) const

Get the list of all info IDs (INFO column) that the record contains.

For example, the line

#CHROM POS      ID         REF   ALT    QUAL  FILTER  INFO                     [...]
20     14370    rs6054257  G     A      29    PASS    NS=3;DP=14;AF=0.5;DB;H2  [...]

would return a list containing {"NS", "DP", "AF", "DB", "H2"}.

Definition at line 358 of file vcf_record.cpp.

◆ get_info_int() [1/2]

std::vector< int32_t > get_info_int ( std::string const &  id) const

Return the info value for the given key id as a vector of int.

The function throws an exception if the requested id is not present (in the header or the record) or if the value behind that id is of a different type.

Definition at line 450 of file vcf_record.cpp.

◆ get_info_int() [2/2]

void get_info_int ( std::string const &  id,
std::vector< int32_t > &  destination 
) const

Write the info value for the given key id to a given destination vector of int.

If the destination vector is re-used between calles for different records, this is the faster variant that saves on memory allocations.

The function throws an exception if the requested id is not present (in the header or the record) or if the value behind that id is of a different type.

Definition at line 457 of file vcf_record.cpp.

◆ get_info_string() [1/2]

std::string get_info_string ( std::string const &  id) const

Return the info value for the given key id as a string.

The function throws an exception if the requested id is not present (in the header or the record) or if the value behind that id is of a different type.

Definition at line 396 of file vcf_record.cpp.

◆ get_info_string() [2/2]

void get_info_string ( std::string const &  id,
std::string &  destination 
) const

Write the info value for the given key id to a given destination string.

If the destination string is re-used between calles for different records, this is the faster variant that saves on memory allocations.

The function throws an exception if the requested id is not present (in the header or the record) or if the value behind that id is of a different type.

Definition at line 403 of file vcf_record.cpp.

◆ get_position()

size_t get_position ( ) const

Get the position within the chromosome/contig (POS, second column of the line).

We report the position as given in the VCF/BCF file, that is, 1-based!

Definition at line 181 of file vcf_record.cpp.

◆ get_quality()

double get_quality ( ) const

Get the quality score (QUAL, sixth column of the line).

Definition at line 309 of file vcf_record.cpp.

◆ get_reference()

std::string get_reference ( ) const

Get the reference allele/sequence of the variant (REF, fourth column of the line).

Definition at line 202 of file vcf_record.cpp.

◆ get_variant()

std::string get_variant ( size_t  index) const

Get a particular variant (REF or ALT allele).

This is equivalent to calling get_variants() to obtain a particular alternative at an index, but faster if only a particular entry is needed or when iterating them. Valid indices are in the range [ 0, get_variant_count() ).

Note that compared to get_alternative() and get_alternatives(), the indices of the alternative alleles are shifted, as the reference allele is at index 0.

Definition at line 260 of file vcf_record.cpp.

◆ get_variant_count()

size_t get_variant_count ( ) const

Get the total number of variants (REF and ALT alleles) in the record/line.

This is equivalent to get_variants().size(), and to get_alternatives_count() + 1 (taking the REF into account).

Definition at line 275 of file vcf_record.cpp.

◆ get_variant_type()

VcfRecord::VariantType get_variant_type ( size_t  alt_index) const

Get the variant type of a particular alternative allele/sequence.

The alt_index is the 0-based index of the alternative allele/sequence for which we want the variant type. This corresponds to the indices in the get_alternatives() result vector, which are hence in the range [ 0, get_alternatives_count() ).

Definition at line 287 of file vcf_record.cpp.

◆ get_variant_types()

VcfRecord::VariantType get_variant_types ( ) const

Get the or'ed (union) value of all variant types of the alternative alleles/sequences of the record.

This can be used to simply test whether a particular type of variant appears at all in a given record:

if( record.get_variant_types() & VcfRecord::VariantType::kSnp ) {
    ...
}

See operator& ( VariantType a, VariantType b ) for details.

This is a simple wrapper for ::bcf_get_variant_types() from htslib, which however is not ideally named, as "variants" seems to mean REF+ALT in VCF terminology, but the flag for REF has value 0 and hence is (in a sense) always set in the result. Of course, this makes sense, as we always have a reference variant. But technically, we cannot test for this, so this function only is useful for alternative alleles, and not all variants. Still, we follow their terminology here.

Definition at line 282 of file vcf_record.cpp.

◆ get_variants()

std::vector< std::string > get_variants ( ) const

Shortcut to get both the reference (REF, fourth column of the line) and the alternative (ALT, fifth column of the line) alleles/sequences of the line.

This simply combines get_reference() and get_alternatives(). Note that hence the indices of the alternatives are shifted as compared to get_alternatives(), as the reference allele is at index 0.

Definition at line 248 of file vcf_record.cpp.

◆ has_filter()

bool has_filter ( std::string const &  filter) const

Return whether the record has a given filter set.

For example, provided with filter == "q10", the function returns whether the q10 filter is set for the record (indicating that the record failed that filter test).

Definition at line 328 of file vcf_record.cpp.

◆ has_format() [1/2]

bool has_format ( char const *  id) const

Return whether the record has a given FORMAT id present.

Definition at line 500 of file vcf_record.cpp.

◆ has_format() [2/2]

bool has_format ( std::string const &  id) const

Return whether the record has a given FORMAT id present.

Definition at line 494 of file vcf_record.cpp.

◆ has_info() [1/2]

bool has_info ( char const *  id) const

Return whether the record has a given INFO id present.

Definition at line 373 of file vcf_record.cpp.

◆ has_info() [2/2]

bool has_info ( std::string const &  id) const

Return whether the record has a given INFO id present.

Definition at line 368 of file vcf_record.cpp.

◆ header() [1/2]

VcfHeader& header ( )
inline

Return the VcfHeader instance associated with this record.

This can also be used to obtain internal htslib bcf_hdr_t header data struct pointer, if needed for any functionality that is not offered by our wrapper.

Definition at line 213 of file vcf_record.hpp.

◆ header() [2/2]

VcfHeader const& header ( ) const
inline

Return the VcfHeader instance associated with this record.

This can also be used to obtain internal htslib bcf_hdr_t header data struct pointer, if needed for any functionality that is not offered by our wrapper.

Definition at line 224 of file vcf_record.hpp.

◆ is_snp()

bool is_snp ( ) const

Return whether this variant is a SNP.

This is simply a wrapper for the htslib function ::bcf_is_snp(). It returns true iff the reference and all alternative alleles/sequence are single characters (and none of them is a ‘’*'` missing allele character).

Definition at line 304 of file vcf_record.cpp.

◆ operator=() [1/2]

VcfRecord & operator= ( VcfRecord &&  other)

Definition at line 140 of file vcf_record.cpp.

◆ operator=() [2/2]

VcfRecord& operator= ( VcfRecord const &  )
delete

◆ pass_filter()

bool pass_filter ( ) const

Return whether the record passes the filters, that is, whether PASS is set.

This is identical to calling has_filter() with the argument "PASS", and is the preferred method to test whether a record line has passed all filters (return value true), or not (return value false).

Definition at line 345 of file vcf_record.cpp.

◆ read_next()

bool read_next ( HtsFile source)

Read the next record/line from the given source, and replace the content of this VcfRecord instance.

This is the function used to iterate a VCF/BCF file. It returns true on success, and can hence be used as follows:

while( record.read_next( file )) {
    // work with the current record line
}

Within that loop, all functions of the record instance can be used.

Definition at line 602 of file vcf_record.cpp.

◆ swap()

void swap ( VcfRecord other)

Definition at line 149 of file vcf_record.cpp.

◆ unpack()

void unpack ( ) const

Unpack the htslib bcf1_t record data.

Some external functions might want to work with the bcf1_t record data struct pointer directly, but need to unpack the data first. This is not possible when the VcfRecord is passed by const reference for example, so we offer this function that unpacks a const VcfRecord, so that its internal htslib data is available.

Definition at line 165 of file vcf_record.cpp.

Friends And Related Function Documentation

◆ operator&

bool operator& ( VcfRecord::VariantType  a,
VcfRecord::VariantType  b 
)
friend

And-operator for VariantTypes.

The function get_variant_types() returns the or'ed (union) values of all variant types that appear in the alternative alleles of the record. Hence, this and-operator can be used to disentangle them and test whether a particular variant occurs in the record:

if( record.get_variant_types() & Record::VariantType::kSnp ) {
    ...
}

See get_variant_types() for details.

It's a bit ugly to return a bool from such a comparison, but for now, it works. Should more complex use cases arise in the future, this might change.

Definition at line 147 of file vcf_record.hpp.

Member Enumeration Documentation

◆ VariantType

enum VariantType : int
strong

Types of variants of alleles that can occur in a record.

Corresponds to the VCF_* macro constants defined by htslib. We statically assert that these have the same values.

Enumerator
kRef 
kSnp 
kMnp 
kIndel 
kOther 
kBreakend 
kOverlap 

Definition at line 121 of file vcf_record.hpp.


The documentation for this class was generated from the following files: