Data Storage
This section is about what is stored where in a Firehose deployment.
Stores
Stores used by the Firehose are abstractions on top of Object Storage.
These use the Firehose
dstore abstraction library to support
Azure, GCP, S3 (and on-premise solution supporting the S3 API interface like minio
or ceph) as well as local filesystems.
Note
For production deployments outside of cloud providers, we recommend ceph over its compatible S3 API as the distributed storage system.Artifacts
The Firehose
stack uses Protocol Buffers version 3 for serialization, pretty much throughout.
Merged Blocks Files
Also called 100-blocks files
, or merged blocks files, or merged bundles, these are all used interchangeably.
These files are binary files that use the dbin packing format,
to store a series of bstream.Block
objects (defined here),
serialized as Protocol Buffers.
They are produced by Extractor
s, in catch-up mode (set as such with certain flags), or by the Merger
in an HA setup.
In the latter case, the Extractor
contributes one-block files to the Merger
instead, and the Merger
collates
all of those in a single bundle.
These 100-blocks files
can contain more than 100 blocks (because they can include multiple versions
(e.g. fork block) of a given block number), ensuring continuity through the previous block link.
They are consumed by the bstream library, used by almost all components.
The protocol specific decoded block objects (for Ethereum as an example) are what circulate amongst all processes that work with executed block data.
One Block Files
These are transient files, destined to ensure that the Merger
gathers all visible forks from
the Extractor
instances, in an HA setup.
They contain one bstream.Block
, as serialized Protobuf (see links above).
The Merger
will consume them, bundle them in executed blocks files (100-blocks files) and store
them to dstore
storage, for consumption by most other processes.