#include <genesis/utils/io/gzip_block_ostream.hpp>
Inherits ostream.
Output stream that writes blocks of gzip-compressed data to an underlying wrapped stream, using parallel compression.
The gzip format specifies that concatenated blocks of gzip-compressed data (including the gzip header) are still valid gzip files, and are equivalent to concatenating the decompressed data. This is for example used in compressed vcf files (.vcf.gz, Variant Calling Format) to achieve random access into compressed data, by maintaining an index table of offsets to the beginning of individual compressed blocks.
We here use a similar technique to achieve a compression speedup by using parallel threads on different gzip blocks. This gives almost linear speedup, at the cost of ~3% increase in resulting file size due to the additional gzip headers of each block. This downside can be alleivated by using larger blocks though. By default, we use 64kB blocks.
Exemplary usage:
// Wrapped output stream to write to. Use binary mode, so that compressed output works. std::ofstream ofile; ofile.open( "path/to/test.txt.gz", std::ios_base::binary ); // Prepare stream GzipBlockOStream ostr( ofile ); // Write data to stream ostr << "some data\n";
By default, the global thread pool of Options::get().global_thread_pool() is used for compressing gzip blocks in parallel. An alternative pool can be provided instead if needed.
Furthermore, note that some file managers might not display the original (uncompressed) file size correctly when viewing the resulting gz file, as they might use only the size of one block instead of the full resulting uncompressed file size. This should not affect decompression or any other downstream processes though. As this class is a stream, we usually do not know beforehand how lare the resulting file will be, so there is not much we can do about this.
The class could also be extended in the future to achieve indexing similar to compressed vcf. NB: We have not yet tested compatibility with the vcf format, as they might employ additional tricks to achieve their goals.
Definition at line 89 of file gzip_block_ostream.hpp.
Public Member Functions | |
GzipBlockOStream (std::ostream &os, std::size_t block_size=GZIP_DEFAULT_BLOCK_SIZE, GzipCompressionLevel compression_level=GzipCompressionLevel::kDefaultCompression, std::shared_ptr< ThreadPool > thread_pool=nullptr) | |
GzipBlockOStream (std::streambuf *sbuf_p, std::size_t block_size=GZIP_DEFAULT_BLOCK_SIZE, GzipCompressionLevel compression_level=GzipCompressionLevel::kDefaultCompression, std::shared_ptr< ThreadPool > thread_pool=nullptr) | |
virtual | ~GzipBlockOStream () |
Static Public Attributes | |
static const std::size_t | GZIP_DEFAULT_BLOCK_SIZE = 1ul << 16 |
|
explicit |
Definition at line 673 of file gzip_block_ostream.cpp.
|
explicit |
Definition at line 684 of file gzip_block_ostream.cpp.
|
virtual |
Definition at line 700 of file gzip_block_ostream.cpp.
|
static |
Definition at line 110 of file gzip_block_ostream.hpp.