Data Storage
StreamingFast Firehose data storage

Data and the locations where it is stored are important facets of Firehose deployment and operation.
Key Firehose data storage topics include Data Stores, Merged blocks files, serialization, one block files, and 100-blocks files.

Firehose Stores are abstractions sitting on top of Object Storage.
Note: Object Storage is a data storage technique that manages data as objects in opposition to other data storage architectures like hierarchical file systems.

Stores utilize the Firehose dstore abstraction library to provide support for local file systems, Azure, Google Cloud, Amazon S3, and other Amazon S3 API compatible object storage solutions such as MinIO or Ceph.

For production deployments outside of cloud providers, StreamingFast recommends Ceph as the distributed storage instead of its compatible Amazon S3 API system.

Firehose primarily utilizes Protocol Buffers version 3 for serialization.

Merged blocks files are also referred to as 100-blocks files, and merged bundles. These terms are all used interchangeably within Firehose.
Merged blocks are binary files that use the dbin packing format to store a series of bstream block objects, serialized as Protocol Buffers.

Firehose uses Extractor components that have been set with a special flag to work in catch-up mode to create merged blocks.

In high-availability Firehose configurations, merged blocks will be created by the Merger component. The Extractor component will provide the Merger component with one-block files.

The Merger component will also collate all of the one-block files into a single bundle of blocks.

Up to one hundred blocks can be contained within a single 100-blocks file.
The 100-blocks files can include multiple versions such as a fork block or a given block number, ensuring continuity through the previous block link.

Nearly all components in Firehose rely on or utilize 100-blocks files. The bstream library consumes 100-blocks files for example.
Protocol-specific decoded block objects, like Ethereum, are what circulate amongst all processes that work with executed block data in Firehose.

In high availability configurations, one-block files are transient and ensure the Merger component gathers all visible forks from any Extractor components.
Important: One-block files contain only one bstream.Block as a serialized Protocol Buffer.

One-block files are consumed by the Merger component, bundled in executed 100-blocks files. The one-block files are then stored to dstore storage and consumed by most of the other Firehose processes.
Copy link
On this page
Data Storage in Firehose
Data Stores
Merged Blocks Files
One Block Files