A library for working with phylogenetic and population genetic data.
v0.32.0
MapBimReader Class Reference

#include <genesis/population/format/map_bim_reader.hpp>

Detailed Description

Reader for map/bim files as used by PLINK.

This reader offers to process map/bim files as for example used by PLINK. The formats are pretty similar, so we can read all with one reader. See https://www.cog-genomics.org/plink/2.0/formats#bim and https://www.cog-genomics.org/plink/2.0/formats#map for details on these formats.

Basically, a map file contains four columns (1-4 below), of which column 3 (position in morgans or centimorgans) is optional. Then, bim files are an extension of this, which add two additional columns for the alleles.

  1. Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT'; '0' indicates unknown) or name
  2. Variant identifier
  3. Position in morgans or centimorgans (optional; safe to use dummy value of '0')
  4. Base-pair coordinate (1-based; limited to 2^31-2)
  5. Allele 1 (corresponding to clear bits in PLINK .bed; usually minor)
  6. Allele 2 (corresponding to set bits in PLINK .bed; usually major)

For example:

2       2_71210 0       71210   C       A
2       2_71228 0       71228   G       C
2       2_71282 0       71282   T       C

All lines must have the same number of columns (so either no lines contain the centimorgans column, or all of them do).

According to the PLINK standard, negative base-pair coordinates are skipped. We do the same here by default. This behaviour can also be changed with the skip_negative_coordinates() setting, so that downstream processes can decide what to make of this themselves, if needed. The format description however does not mention 0 as a coordinate value. As this is a tricky special case, we throw an exception in this case, to make sure that users are aware of this. Also, we internally use 0 as an undefined or "any" coordinate, so just blindly accpeting this here might lead to unexpected behaviour downstream; hence the exception.

Definition at line 90 of file map_bim_reader.hpp.

Public Member Functions

 MapBimReader ()=default
 
 MapBimReader (MapBimReader &&)=default
 
 MapBimReader (MapBimReader const &)=default
 
 ~MapBimReader ()=default
 
MapBimReaderoperator= (MapBimReader &&)=default
 
MapBimReaderoperator= (MapBimReader const &)=default
 
std::vector< Featureread (std::shared_ptr< utils::BaseInputSource > source) const
 Read a map/bim input source, and return its content as a list of Feature structs. More...
 
GenomeLocusSet read_as_genome_locus_set (std::shared_ptr< utils::BaseInputSource > source) const
 Read an input source, and return its content as a GenomeLocusSet. More...
 
GenomeRegionList read_as_genome_region_list (std::shared_ptr< utils::BaseInputSource > source, bool merge=true) const
 Read a map/bim input source, and return its content as a GenomeRegionList. More...
 
void read_as_genome_region_list (std::shared_ptr< utils::BaseInputSource > source, GenomeRegionList &target, bool merge=true) const
 Read a map/bim input source, and add its content to an existing GenomeRegionList. More...
 
bool skip_negative_coordinates () const
 
MapBimReaderskip_negative_coordinates (bool value)
 

Classes

struct  Feature
 Store all values that can typically appear in the columns of a map/bim file. More...
 

Constructor & Destructor Documentation

◆ MapBimReader() [1/3]

MapBimReader ( )
default

◆ ~MapBimReader()

~MapBimReader ( )
default

◆ MapBimReader() [2/3]

MapBimReader ( MapBimReader const &  )
default

◆ MapBimReader() [3/3]

MapBimReader ( MapBimReader &&  )
default

Member Function Documentation

◆ operator=() [1/2]

MapBimReader& operator= ( MapBimReader &&  )
default

◆ operator=() [2/2]

MapBimReader& operator= ( MapBimReader const &  )
default

◆ read()

std::vector< MapBimReader::Feature > read ( std::shared_ptr< utils::BaseInputSource source) const

Read a map/bim input source, and return its content as a list of Feature structs.

Definition at line 50 of file map_bim_reader.cpp.

◆ read_as_genome_locus_set()

GenomeLocusSet read_as_genome_locus_set ( std::shared_ptr< utils::BaseInputSource source) const

Read an input source, and return its content as a GenomeLocusSet.

This only uses the columns chromosome and coordinate and ignores everything else.

This is the recommended way to read an input for testing whether genome coordinates are covered (filtered / to be considered) for downstream analyses.

Definition at line 60 of file map_bim_reader.cpp.

◆ read_as_genome_region_list() [1/2]

GenomeRegionList read_as_genome_region_list ( std::shared_ptr< utils::BaseInputSource source,
bool  merge = true 
) const

Read a map/bim input source, and return its content as a GenomeRegionList.

This only uses the columns chromosome and coordinate and ignores everything else.

If merge is set, adjacent coordinates of the file are merged into contiguous intervals. This is useful of the region list is used to determine coverage, and the default here. See the overlap flag of GenomeRegionList::add( GenomeLocus const&, bool ) for details.

Definition at line 70 of file map_bim_reader.cpp.

◆ read_as_genome_region_list() [2/2]

void read_as_genome_region_list ( std::shared_ptr< utils::BaseInputSource source,
GenomeRegionList target,
bool  merge = true 
) const

Read a map/bim input source, and add its content to an existing GenomeRegionList.

This only uses the columns chromosome and coordinate and ignores everything else.

If merge is set, adjacent coordinates of the file are merged into contiguous intervals. This is useful of the region list is used to determine coverage, and the default here. See the overlap flag of GenomeRegionList::add( GenomeLocus const&, bool ) for details.

Definition at line 79 of file map_bim_reader.cpp.

◆ skip_negative_coordinates() [1/2]

bool skip_negative_coordinates ( ) const
inline

Definition at line 184 of file map_bim_reader.hpp.

◆ skip_negative_coordinates() [2/2]

MapBimReader& skip_negative_coordinates ( bool  value)
inline

Definition at line 189 of file map_bim_reader.hpp.


The documentation for this class was generated from the following files: