Data Storage

This section is about what is stored where in a Firehose deployment.


Stores

Stores used by the Firehose are abstractions on top of Object Storage.

These use the Firehose dstore abstraction library to support Azure, GCP, S3 (and on-premise solution supporting the S3 API interface like minio or ceph) as well as local filesystems.

Note

For production deployments outside of cloud providers, we recommend ceph over its compatible S3 API as the distributed storage system.

Artifacts

The Firehose stack uses Protocol Buffers version 3 for serialization, pretty much throughout.

Merged Blocks Files

Also called 100-blocks files, or merged blocks files, or merged bundles, these are all used interchangeably.

These files are binary files that use the dbin packing format, to store a series of bstream.Block objects (defined here), serialized as Protocol Buffers.

They are produced by Extractors, in catch-up mode (set as such with certain flags), or by the Merger in an HA setup. In the latter case, the Extractor contributes one-block files to the Merger instead, and the Merger collates all of those in a single bundle.

These 100-blocks files can contain more than 100 blocks (because they can include multiple versions (e.g. fork block) of a given block number), ensuring continuity through the previous block link.

They are consumed by the bstream library, used by almost all components.

The protocol specific decoded block objects (for Ethereum as an example) are what circulate amongst all processes that work with executed block data.

One Block Files

These are transient files, destined to ensure that the Merger gathers all visible forks from the Extractor instances, in an HA setup.

They contain one bstream.Block, as serialized Protobuf (see links above).

The Merger will consume them, bundle them in executed blocks files (100-blocks files) and store them to dstore storage, for consumption by most other processes.