A library for working with phylogenetic and population genetic data.
v0.32.0
GffReader Class Reference

#include <genesis/population/format/gff_reader.hpp>

Detailed Description

Reader for GFF2 and GFF3 (General Feature Format) and GTF (General Transfer Format) files.

See https://uswest.ensembl.org/info/website/upload/gff.html for the format description. Lines starting with track or browser (including a trailing white space) are ignored, as are comment lines starting with # (or for that matter, ## for directives), and empty lines.

We currently do not support the underlying ontology features, and simply store the ninth field of the file as a string in Feature::attributes_group. This is also how we support all three formats, GFF2, GFF3, and GTF in one parser: We simply ignore the parts that are different between them. If need, this last field has to be parsed by the user.

See also http://gmod.org/wiki/GFF2, http://gmod.org/wiki/GFF3, and http://genome.ucsc.edu/FAQ/FAQformat.html#format3 for additional information, and https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md for a thorough specification of GFF3.

Definition at line 69 of file gff_reader.hpp.

Public Member Functions

 GffReader ()=default
 
 GffReader (GffReader &&)=default
 
 GffReader (GffReader const &)=default
 
 ~GffReader ()=default
 
GffReaderoperator= (GffReader &&)=default
 
GffReaderoperator= (GffReader const &)=default
 
bool parse_line (utils::InputStream &input_stream, Feature &feature) const
 
std::vector< Featureread (std::shared_ptr< utils::BaseInputSource > source) const
 Read a GFF2/GFF3/GTF input source, and return its content as a list of Feature structs. More...
 
GenomeLocusSet read_as_genome_locus_set (std::shared_ptr< utils::BaseInputSource > source) const
 Read an input source, and return its content as a GenomeLocusSet. More...
 
GenomeRegionList read_as_genome_region_list (std::shared_ptr< utils::BaseInputSource > source, bool merge=false) const
 Read a GFF2/GFF3/GTF input source, and return its content as a GenomeRegionList. More...
 
void read_as_genome_region_list (std::shared_ptr< utils::BaseInputSource > source, GenomeRegionList &target, bool merge=false) const
 Read a GFF2/GFF3/GTF input source, and add its content to an existing GenomeRegionList. More...
 

Classes

struct  Feature
 

Constructor & Destructor Documentation

◆ GffReader() [1/3]

GffReader ( )
default

◆ ~GffReader()

~GffReader ( )
default

◆ GffReader() [2/3]

GffReader ( GffReader const &  )
default

◆ GffReader() [3/3]

GffReader ( GffReader &&  )
default

Member Function Documentation

◆ operator=() [1/2]

GffReader& operator= ( GffReader &&  )
default

◆ operator=() [2/2]

GffReader& operator= ( GffReader const &  )
default

◆ parse_line()

bool parse_line ( utils::InputStream input_stream,
GffReader::Feature feature 
) const

Definition at line 100 of file gff_reader.cpp.

◆ read()

std::vector< GffReader::Feature > read ( std::shared_ptr< utils::BaseInputSource source) const

Read a GFF2/GFF3/GTF input source, and return its content as a list of Feature structs.

Definition at line 51 of file gff_reader.cpp.

◆ read_as_genome_locus_set()

GenomeLocusSet read_as_genome_locus_set ( std::shared_ptr< utils::BaseInputSource source) const

Read an input source, and return its content as a GenomeLocusSet.

This only uses the columns seqname, start, and end, and ignores everything else.

This is the recommended way to read an input for testing whether genome coordinates are covered (filtered / to be considered) for downstream analyses.

Definition at line 63 of file gff_reader.cpp.

◆ read_as_genome_region_list() [1/2]

GenomeRegionList read_as_genome_region_list ( std::shared_ptr< utils::BaseInputSource source,
bool  merge = false 
) const

Read a GFF2/GFF3/GTF input source, and return its content as a GenomeRegionList.

This only uses the columns seqname, start, and end, and ignores everything else.

If merge is set, the individual regions of the file are merged if they overlap. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes. See the overlap flag of GenomeRegionList::add( GenomeLocus const&, bool ) for details.

Definition at line 75 of file gff_reader.cpp.

◆ read_as_genome_region_list() [2/2]

void read_as_genome_region_list ( std::shared_ptr< utils::BaseInputSource source,
GenomeRegionList target,
bool  merge = false 
) const

Read a GFF2/GFF3/GTF input source, and add its content to an existing GenomeRegionList.

This only uses the columns seqname, start, and end, and ignores everything else.

If merge is set, the individual regions of the file are merged if they overlap. This is more useful of the region list is used to determine coverage, and less useful if regions are meant to indicate some specific parts of the genome, such as genes. See the overlap flag of GenomeRegionList::add( GenomeLocus const&, bool ) for details.

Definition at line 84 of file gff_reader.cpp.


The documentation for this class was generated from the following files: