#include <genesis/population/format/map_bim_reader.hpp>
Reader for map/bim files as used by PLINK.
This reader offers to process map
/bim
files as for example used by PLINK. The formats are pretty similar, so we can read all with one reader. See https://www.cog-genomics.org/plink/2.0/formats#bim and https://www.cog-genomics.org/plink/2.0/formats#map for details on these formats.
Basically, a map
file contains four columns (1-4 below), of which column 3 (position in morgans or centimorgans) is optional. Then, bim
files are an extension of this, which add two additional columns for the alleles.
For example:
2 2_71210 0 71210 C A 2 2_71228 0 71228 G C 2 2_71282 0 71282 T C
All lines must have the same number of columns (so either no lines contain the centimorgans column, or all of them do).
According to the PLINK standard, negative base-pair coordinates are skipped. We do the same here by default. This behaviour can also be changed with the skip_negative_coordinates() setting, so that downstream processes can decide what to make of this themselves, if needed. The format description however does not mention 0 as a coordinate value. As this is a tricky special case, we throw an exception in this case, to make sure that users are aware of this. Also, we internally use 0 as an undefined or "any" coordinate, so just blindly accpeting this here might lead to unexpected behaviour downstream; hence the exception.
Definition at line 90 of file map_bim_reader.hpp.
Public Member Functions | |
MapBimReader ()=default | |
MapBimReader (MapBimReader &&)=default | |
MapBimReader (MapBimReader const &)=default | |
~MapBimReader ()=default | |
MapBimReader & | operator= (MapBimReader &&)=default |
MapBimReader & | operator= (MapBimReader const &)=default |
std::vector< Feature > | read (std::shared_ptr< utils::BaseInputSource > source) const |
Read a map /bim input source, and return its content as a list of Feature structs. More... | |
GenomeLocusSet | read_as_genome_locus_set (std::shared_ptr< utils::BaseInputSource > source) const |
Read an input source, and return its content as a GenomeLocusSet. More... | |
GenomeRegionList | read_as_genome_region_list (std::shared_ptr< utils::BaseInputSource > source, bool merge=true) const |
Read a map /bim input source, and return its content as a GenomeRegionList. More... | |
void | read_as_genome_region_list (std::shared_ptr< utils::BaseInputSource > source, GenomeRegionList &target, bool merge=true) const |
Read a map /bim input source, and add its content to an existing GenomeRegionList. More... | |
bool | skip_negative_coordinates () const |
MapBimReader & | skip_negative_coordinates (bool value) |
Classes | |
struct | Feature |
Store all values that can typically appear in the columns of a map/bim file. More... | |
|
default |
|
default |
|
default |
|
default |
|
default |
|
default |
std::vector< MapBimReader::Feature > read | ( | std::shared_ptr< utils::BaseInputSource > | source | ) | const |
Read a map
/bim
input source, and return its content as a list of Feature structs.
Definition at line 50 of file map_bim_reader.cpp.
GenomeLocusSet read_as_genome_locus_set | ( | std::shared_ptr< utils::BaseInputSource > | source | ) | const |
Read an input source, and return its content as a GenomeLocusSet.
This only uses the columns chromosome
and coordinate
and ignores everything else.
This is the recommended way to read an input for testing whether genome coordinates are covered (filtered / to be considered) for downstream analyses.
Definition at line 60 of file map_bim_reader.cpp.
GenomeRegionList read_as_genome_region_list | ( | std::shared_ptr< utils::BaseInputSource > | source, |
bool | merge = true |
||
) | const |
Read a map
/bim
input source, and return its content as a GenomeRegionList.
This only uses the columns chromosome
and coordinate
and ignores everything else.
If merge
is set, adjacent coordinates of the file are merged into contiguous intervals. This is useful of the region list is used to determine coverage, and the default here. See the overlap
flag of GenomeRegionList::add( GenomeLocus const&, bool ) for details.
Definition at line 70 of file map_bim_reader.cpp.
void read_as_genome_region_list | ( | std::shared_ptr< utils::BaseInputSource > | source, |
GenomeRegionList & | target, | ||
bool | merge = true |
||
) | const |
Read a map
/bim
input source, and add its content to an existing GenomeRegionList.
This only uses the columns chromosome
and coordinate
and ignores everything else.
If merge
is set, adjacent coordinates of the file are merged into contiguous intervals. This is useful of the region list is used to determine coverage, and the default here. See the overlap
flag of GenomeRegionList::add( GenomeLocus const&, bool ) for details.
Definition at line 79 of file map_bim_reader.cpp.
|
inline |
Definition at line 184 of file map_bim_reader.hpp.
|
inline |
Definition at line 189 of file map_bim_reader.hpp.