A library for working with phylogenetic data. v0.25.0

#include <genesis/population/formats/vcf_header.hpp>

## Detailed Description

Capture the information from a header of a VCF/BCF file.

Unfortunately, the terminology used in htslib and in the VCF v4.2 specification to describe different parts of a VCF file seems diffuse; in particular, the words "ID", "key", and "tag" are overloaded, as well as "column", "field", and "sub-field", and the usage of "type" for almost anything.

Given the excerpt from a VCF file

##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
...
#CHROM POS   ID        REF ALT QUAL FILTER INFO                    FORMAT      NA00001
20     14370 rs6054257 G   A   29   PASS   NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51


let's try to untangle the terminology. We mark the terms used in our wrapper classes in bold.

• The main body of a VCF file consists of 8 tab-separated fixed columns per record: "CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", as well as an optional "FORMAT" column followed by an arbitrary number of sample columns ("NA00001" in the example above).
• Each record line consists of fields that correspond to these columns; that is, a field is the particular entry for a given column and a given record line.
• Some fields (typically, corresponding to the columns "INFO", "FILTER", and "FORMAT") are further divided into sub-fields (also called "fields" in the VCF standard) that are identified by their ID (not to be confused with the field/column "ID", which just happens to have the same name for maximum confuson).
• Some of these sub-fields furthermore form key-**value** pairs, where the key is given by an ID; in the example above, "NS" is the ID of an INFO sub-field, and contains the key-value-pair "NS=3" in the given record line. IDs can however also be simple flags, in which case they are not followed by a value. In htslib, the ID/key of a sub-field/field is also sometimes calld the tag.
• We are aware that we here use ID and key as two terms for a similar thing. This is our attempt to bridge the gap between intuition and htslib: an ID here is used as a key for a key-value-pair.

Hopefully, this helps to make sense of the terminology.

Definition at line 102 of file vcf_header.hpp.

## Public Member Functions

Create a default (empty) instance. More...

Create an instance given a pointer to the htslib-internal header struct. More...

Create an instance given an HtsFile object. More...

Create an instance with a specific mode. More...

void assert_filter (std::string const &id) const
Assert that a FILTER entry with a given ID is defined in the header of the VCF/BCF file. More...

void assert_format (std::string const &id) const
Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file. More...

void assert_format (std::string const &id, VcfValueType type) const
Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file, and that its value(s) has/have a specified data type. More...

void assert_format (std::string const &id, VcfValueType type, size_t number) const
Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified number of values. More...

void assert_format (std::string const &id, VcfValueType type, VcfValueSpecial special) const
Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified special type of number of values. More...

void assert_info (std::string const &id) const
Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file. More...

void assert_info (std::string const &id, VcfValueType type) const
Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file, and that its value(s) has/have a specified data type. More...

void assert_info (std::string const &id, VcfValueType type, size_t number) const
Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified number of values. More...

void assert_info (std::string const &id, VcfValueType type, VcfValueSpecial special) const
Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified special type of number of values. More...

::bcf_hdr_t * data ()
Return the internal htslib ::bcf_hdr_t data struct pointer. More...

::bcf_hdr_t const * data () const
Return the internal htslib ::bcf_hdr_t data struct pointer. More...

size_t get_chromosome_length (std::string const &chrom_name) const
Get the length of a chromosome/contig/sequence, given its name. More...

std::unordered_map< std::string, std::string > get_chromosome_values (std::string const &chrom_name) const
Get all key-value-pairs describing a particular chromosome/contig/sequence, given its name. More...

std::vector< std::string > get_chromosomes () const
Get a list of the chromosome/contig/sequence names used in the file. More...

std::vector< std::string > get_filter_ids () const
Get a list of the ID names of all FILTER entries in the header. More...

std::unordered_map< std::string, std::string > get_filter_values (std::string const &id) const
Get all key-value pairs describing a particular filter header line, given its ID. More...

std::vector< std::string > get_format_ids () const
Get a list of the ID names of all FORMAT fields in the header. More...

VcfSpecification get_format_specification (std::string const &id) const
Get the required specification key-value-pairs for a given FORMAT entry. More...

std::unordered_map< std::string, std::string > get_format_values (std::string const &id) const
Get all key-value pairs describing a particular format field, given its ID. More...

std::vector< std::string > get_info_ids () const
Get a list of the ID names of all INFO fields in the header. More...

VcfSpecification get_info_specification (std::string const &id) const
Get the required specification key-value-pairs for a given INFO entry. More...

std::unordered_map< std::string, std::string > get_info_values (std::string const &id) const
Get all key-value pairs describing a particular info header line, given its ID. More...

size_t get_sample_count () const
Get the number of samples (columns) in the file. More...

size_t get_sample_index (std::string const &name) const
Get the index of a sample, given its name. More...

std::string get_sample_name (size_t index) const
Get the name of a sample given its index. More...

std::vector< std::string > get_sample_names () const
Return a list of the sample names (column headers) of the VCF/BCF file. More...

bool has_filter (std::string const &id) const
Return whether a FILTER entry with a given ID is defined in the header of the VCF/BCF file. More...

bool has_format (std::string const &id) const
Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file. More...

bool has_format (std::string const &id, VcfValueType type) const
Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file, and whether its value(s) has/have a specified data type. More...

bool has_format (std::string const &id, VcfValueType type, size_t number) const
Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified number of values. More...

bool has_format (std::string const &id, VcfValueType type, VcfValueSpecial special) const
Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified special type of number of values. More...

bool has_info (std::string const &id) const
Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file. More...

bool has_info (std::string const &id, VcfValueType type) const
Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file, and whether its value(s) has/have a specified data type. More...

bool has_info (std::string const &id, VcfValueType type, size_t number) const
Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified number of values. More...

bool has_info (std::string const &id, VcfValueType type, VcfValueSpecial special) const
Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified special type of number of values. More...

void set_samples (std::vector< std::string > const &sample_names, bool inverse_sample_names=false)
Speficy a subset of samples to be parsed. More...

std::string version () const
Return the VCF/BCF version string, e.g. "VCFv4.2". More...

## Friends

template<typename S , typename T >
class VcfFormatIterator

class VcfRecord

## Constructor & Destructor Documentation

default

Create a default (empty) instance.

 VcfHeader ( std::string const & mode )
explicit

Create an instance with a specific mode.

For the mode param, see the ::hts_open() documentation of htslib.

Definition at line 72 of file vcf_header.cpp.

 VcfHeader ( HtsFile & hts_file )
explicit

Create an instance given an HtsFile object.

The HtsFile has to be newly created and cannot have been used to read from before, otherwise its internal file pointer is already past the header.

Definition at line 80 of file vcf_header.cpp.

 VcfHeader ( ::bcf_hdr_t * bcf_hdr )
explicit

Create an instance given a pointer to the htslib-internal header struct.

This copies the header using ::bcf_hdr_dup() from htslib, and then manages the livetime of the newly created ::bcf_hdr_t instance only.

Definition at line 92 of file vcf_header.cpp.

Definition at line 100 of file vcf_header.cpp.

delete

Definition at line 107 of file vcf_header.cpp.

## ◆ assert_filter()

 void assert_filter ( std::string const & id ) const

Assert that a FILTER entry with a given ID is defined in the header of the VCF/BCF file.

Definition at line 212 of file vcf_header.cpp.

## ◆ assert_format() [1/4]

 void assert_format ( std::string const & id ) const

Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file.

Definition at line 300 of file vcf_header.cpp.

## ◆ assert_format() [2/4]

 void assert_format ( std::string const & id, VcfValueType type ) const

Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file, and that its value(s) has/have a specified data type.

Definition at line 305 of file vcf_header.cpp.

## ◆ assert_format() [3/4]

 void assert_format ( std::string const & id, VcfValueType type, size_t number ) const

Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified number of values.

This check requires that the number of values is fixed to the given number. That is, here, we do not use any of the special cases (number of values depending on number of alleles, etc), but require a fixed given number instead.

Definition at line 315 of file vcf_header.cpp.

## ◆ assert_format() [4/4]

 void assert_format ( std::string const & id, VcfValueType type, VcfValueSpecial special ) const

Assert that an FORMAT entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified special type of number of values.

The last check for the kind of number of values is typically used to require one of the special cases (number of values depending on number of alleles, etc).

Definition at line 310 of file vcf_header.cpp.

## ◆ assert_info() [1/4]

 void assert_info ( std::string const & id ) const

Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file.

Definition at line 241 of file vcf_header.cpp.

## ◆ assert_info() [2/4]

 void assert_info ( std::string const & id, VcfValueType type ) const

Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file, and that its value(s) has/have a specified data type.

Definition at line 246 of file vcf_header.cpp.

## ◆ assert_info() [3/4]

 void assert_info ( std::string const & id, VcfValueType type, size_t number ) const

Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified number of values.

This check requires that the number of values is fixed to the given number. That is, here, we do not use any of the special cases (number of values depending on number of alleles, etc), but require a fixed given number instead.

Definition at line 256 of file vcf_header.cpp.

## ◆ assert_info() [4/4]

 void assert_info ( std::string const & id, VcfValueType type, VcfValueSpecial special ) const

Assert that an INFO entry with a given ID is defined in the header of the VCF/BCF file, that its value(s) has/have a specified data type, and a specified special type of number of values.

The last check for the kind of number of values is typically used to require one of the special cases (number of values depending on number of alleles, etc).

Definition at line 251 of file vcf_header.cpp.

## ◆ data() [1/2]

 ::bcf_hdr_t* data ( )
inline

Return the internal htslib ::bcf_hdr_t data struct pointer.

Definition at line 177 of file vcf_header.hpp.

## ◆ data() [2/2]

 ::bcf_hdr_t const* data ( ) const
inline

Return the internal htslib ::bcf_hdr_t data struct pointer.

Definition at line 185 of file vcf_header.hpp.

## ◆ get_chromosome_length()

 size_t get_chromosome_length ( std::string const & chrom_name ) const

Get the length of a chromosome/contig/sequence, given its name.

This information is potentially stored auto-magically in the htslib struct.

Definition at line 174 of file vcf_header.cpp.

## ◆ get_chromosome_values()

 std::unordered_map< std::string, std::string > get_chromosome_values ( std::string const & chrom_name ) const

Get all key-value-pairs describing a particular chromosome/contig/sequence, given its name.

For example, if the header contains a line

##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>


then the result will contain entries that map from "ID" to "20", from "length" to "62435964", and so forth. (Note that ID is also included in the result, for simplicity.)

Definition at line 193 of file vcf_header.cpp.

## ◆ get_chromosomes()

 std::vector< std::string > get_chromosomes ( ) const

Get a list of the chromosome/contig/sequence names used in the file.

These correspond to the field entries in the CHROM column of the records.

Definition at line 140 of file vcf_header.cpp.

## ◆ get_filter_ids()

 std::vector< std::string > get_filter_ids ( ) const

Get a list of the ID names of all FILTER entries in the header.

For example, if the header contains a line

##FILTER=<ID=q10,Description="Quality below 10">


then the list contains an entry q10.

Definition at line 202 of file vcf_header.cpp.

## ◆ get_filter_values()

 std::unordered_map< std::string, std::string > get_filter_values ( std::string const & id ) const

Get all key-value pairs describing a particular filter header line, given its ID.

For example, if the header contains a line

##FILTER=<ID=q10,Description="Quality below 10">


then the result will contain entries that map from "ID" to "q10", and from "Description" to "Quality below 10". (Note that ID is also included in the result, for simplicity.)

Definition at line 207 of file vcf_header.cpp.

## ◆ get_format_ids()

 std::vector< std::string > get_format_ids ( ) const

Get a list of the ID names of all FORMAT fields in the header.

For example, if the header contains a line

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">


then the list contains an entry GT.

Definition at line 285 of file vcf_header.cpp.

## ◆ get_format_specification()

 VcfSpecification get_format_specification ( std::string const & id ) const

Get the required specification key-value-pairs for a given FORMAT entry.

See also get_format_values() for a function that returns all given key-value-paris of the FORMAT entry.

Definition at line 290 of file vcf_header.cpp.

## ◆ get_format_values()

 std::unordered_map< std::string, std::string > get_format_values ( std::string const & id ) const

Get all key-value pairs describing a particular format field, given its ID.

For example, if the header contains a line

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">


then the result will contain entries that map from "ID" to "GT", from "Number" to "1", and so forth. (Note that ID is also included in the result, for simplicity.)

See also get_format_specification() for a function that returns only the required key-value-pairs for FORMAT entries.

Definition at line 295 of file vcf_header.cpp.

## ◆ get_info_ids()

 std::vector< std::string > get_info_ids ( ) const

Get a list of the ID names of all INFO fields in the header.

For example, if the header contains a line

##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">


then the list contains an entry DP.

Definition at line 226 of file vcf_header.cpp.

## ◆ get_info_specification()

 VcfSpecification get_info_specification ( std::string const & id ) const

Get the required specification key-value-pairs for a given INFO entry.

See also get_info_values() for a function that returns all given key-value-paris of the INFO entry.

Definition at line 231 of file vcf_header.cpp.

## ◆ get_info_values()

 std::unordered_map< std::string, std::string > get_info_values ( std::string const & id ) const

Get all key-value pairs describing a particular info header line, given its ID.

For example, if the header contains a line

##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">


then the result will contain entries that map from "ID" to "DP", from "Number" to "1", and so forth. (Note that ID is also included in the result, for simplicity.)

See also get_info_specification() for a function that returns only the required key-value-pairs for INFO entries.

Definition at line 236 of file vcf_header.cpp.

## ◆ get_sample_count()

 size_t get_sample_count ( ) const

Get the number of samples (columns) in the file.

Definition at line 344 of file vcf_header.cpp.

## ◆ get_sample_index()

 size_t get_sample_index ( std::string const & name ) const

Get the index of a sample, given its name.

If the sample name does not exist in the VCF file, an exception is thrown.

Definition at line 361 of file vcf_header.cpp.

## ◆ get_sample_name()

 std::string get_sample_name ( size_t index ) const

Get the name of a sample given its index.

This corresponds to the name given in the #CHROM line of the VCF file, using indices in the range [ 0, get_sample_count() ).

Definition at line 350 of file vcf_header.cpp.

## ◆ get_sample_names()

 std::vector< std::string > get_sample_names ( ) const

Return a list of the sample names (column headers) of the VCF/BCF file.

These are the names that correspond to the column headers for samples in the #CHROM line of a VCF file.

Definition at line 373 of file vcf_header.cpp.

## ◆ has_filter()

 bool has_filter ( std::string const & id ) const

Return whether a FILTER entry with a given ID is defined in the header of the VCF/BCF file.

Definition at line 217 of file vcf_header.cpp.

## ◆ has_format() [1/4]

 bool has_format ( std::string const & id ) const

Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file.

Definition at line 320 of file vcf_header.cpp.

## ◆ has_format() [2/4]

 bool has_format ( std::string const & id, VcfValueType type ) const

Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file, and whether its value(s) has/have a specified data type.

Definition at line 325 of file vcf_header.cpp.

## ◆ has_format() [3/4]

 bool has_format ( std::string const & id, VcfValueType type, size_t number ) const

Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified number of values.

This check requires that the number of values is fixed to the given number. That is, here, we do not use any of the special cases (number of values depending on number of alleles, etc), but require a fixed given number instead.

Definition at line 335 of file vcf_header.cpp.

## ◆ has_format() [4/4]

 bool has_format ( std::string const & id, VcfValueType type, VcfValueSpecial special ) const

Return whether a FORMAT entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified special type of number of values.

The last check for the kind of number of values is typically used to require one of the special cases (number of values depending on number of alleles, etc).

Definition at line 330 of file vcf_header.cpp.

## ◆ has_info() [1/4]

 bool has_info ( std::string const & id ) const

Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file.

Definition at line 261 of file vcf_header.cpp.

## ◆ has_info() [2/4]

 bool has_info ( std::string const & id, VcfValueType type ) const

Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file, and whether its value(s) has/have a specified data type.

Definition at line 266 of file vcf_header.cpp.

## ◆ has_info() [3/4]

 bool has_info ( std::string const & id, VcfValueType type, size_t number ) const

Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified number of values.

This check requires that the number of values is fixed to the given number. That is, here, we do not use any of the special cases (number of values depending on number of alleles, etc), but require a fixed given number instead.

Definition at line 276 of file vcf_header.cpp.

## ◆ has_info() [4/4]

 bool has_info ( std::string const & id, VcfValueType type, VcfValueSpecial special ) const

Return whether an INFO entry with a given ID is defined in the header of the VCF/BCF file, whether its value(s) has/have a specified data type, and a specified special type of number of values.

The last check for the kind of number of values is typically used to require one of the special cases (number of values depending on number of alleles, etc).

Definition at line 271 of file vcf_header.cpp.

## ◆ operator=() [1/2]

Definition at line 116 of file vcf_header.cpp.

delete

## ◆ set_samples()

 void set_samples ( std::vector< std::string > const & sample_names, bool inverse_sample_names = false )

Speficy a subset of samples to be parsed.

Only the specified set of sample columns are parsed and used to populate the VcfRecords when reading the VCF/BCF file (or, if inverse_sample_names is true, only the specified samples are excluded). This can yield drastical speedups in parsing large files.

This function wraps the bcf_hdr_set_samples() function of htslib, see there for details.

Definition at line 384 of file vcf_header.cpp.

## ◆ version()

 std::string version ( ) const

Return the VCF/BCF version string, e.g. "VCFv4.2".

Definition at line 131 of file vcf_header.cpp.

## ◆ VcfFormatIterator

 friend class VcfFormatIterator
friend

Definition at line 119 of file vcf_header.hpp.

## ◆ VcfRecord

 friend class VcfRecord
friend

Definition at line 116 of file vcf_header.hpp.

The documentation for this class was generated from the following files: