As a complement to the Data Flow section, below we’ll discuss in more detail each of the components which constitute the Firehose system.


To serve a production-grade Firehose, some infrastructure is required.

You will need to run a full protocol node with the Firehose instrumentation enabled, as well as have an object store to conserve the merged blocks files.

For a highly-available setup (which the system is designed to allow), you will need a few more components.

flowchart BT dstore[("Object Store")] click dstore "/operate/concepts/data-storage/" "More on Data Storage" extractors[["firehose-enabled blockchain node(s)"]] relayers[["relayer(s)"]] merger click merger "/operate/concepts/components/" "More on the merger" firehoses[["firehose servers"]] relayers --> |gRPC| extractors firehoses --> |gRPC| relayers user([User])-->|gRPC|firehoses extractors-.->|one-block files|merger merger -.-> |writes merged blocks| dstore dstore -.-> |consumes merged blocks| relayers dstore -.-> |consumes merged blocks| firehoses

Stay tuned for a video series on the Firehose components.

Extractor (mindreader)


The Extractor processes uses the node-manager library to run a blockchain node (nodeos) instance as a sub-process, and read data produced therein.

This is the primal source of all data that will flow in all systems. The Extractor nodes can be considered as simple full nodes, with no archive enabled nor other oddities. You can bootstrap them just the way you would with full nodes, with the same backup strategy.

The only major difference is that they spew out lots of data when the Firehose is enabled.

The prior name of the extractor is mindreader, a play on deepmind instrumentation.

High Availability

You will want more than one Extractor if you want to ensure blocks always flow through your system.

Firehose is designed to deduplicate any Extractor data produced that would be identical (when two Extractor instances execute the same block), and also to aggregate any forked blocks that would be seen by one Extractor, and not by another one.

See the merger for details.

By adding more Extractors, and dispersing them geographically, they end up racing to push blocks to the Relayer, increasing performance of the overall system.



The Merger collects one-block files written by one or more Extractors, into a one-block object store, and merges them to produce merged blocks files (a.k.a 100-blocks files).

One core feature of the Merger is the capacity to merge all forks visited by any backing Extractor node.

The merged block files are produced once the whole 100 blocks are collected, and after we’re relatively sure no more forks will occur (bstream’s ForkableHandler supports seeing fork data in future merged blocks files anyway).

Detailed Behavior

  • On boot, without a merged-seen.gob file, it finds the last merged-block on storage and starts at the next bundle.
  • On boot, with merged-seen.gob file, the merger will try to start where it left off.
  • It gathers one-block-files and puts them together inside a bundle
    • The bundle is written when the first block of the next bundle is older than 25 seconds.
    • The bundle is written when it contains at least one fully-linked segment of 100 blocks.
  • The merger keeps a list of all seen (merged) blocks in the last {merger-max-fixable-fork}
    • “seen” blocks are blocks that we have either merged ourselves, or discovered by loading a bundle merged by someone else (another Extractor)
  • The merger will delete one-blocks:
    • That are older than {merger-max-fixable-fork}
    • That have already been seen (merged), as recorded by the merged-seen.gob file
  • If the merger cannot complete a bundle (missing blocks, or hole…) it looks at the destination storage to see if the merged block already exists. If it does, it loads the blocks in there to fill its seen-blocks cache and continues to next bundle.
  • Any one-block-file that was not included in previous bundle will be included in next ones. (ex: bundle 500 might include block 429)
    • Blocks older than {merger-max-fixable-fork} will, instead, be deleted.

High Availability

This component is needed when you want highly available Extractor nodes. You only need one of these, because the whole system can survive downtime from the merger, and it only produces files from time to time anyway.

Systems in need of blocks, when they start, will usually connect to Relayers, get real-time blocks and go back to merged block files only when the Relayer can’t satisfy the range. If Relayers provide 200-300 blocks in RAM, then you have that time for the Merger to be down, to sustain restarts from other components.

Once the other components are live, in general, they won’t read from merged block files.



The Relayer serves executed block data to most other components.

It feeds from all Extractor nodes available (in order to get a complete view of all possible forks). Its role is to fan-out that block information.

The Relayer serves its block data through a streaming gRPC interface called BlockStream::Blocks (defined here). It is the same interface that Extractor exposes to the Relayers.

High Availability

Relayers feed from all of the Extractor nodes, to get a complete view of all possible forks.

Firehose gRPC Server


Firehose consumes merged blocks for historical requests from the data storage location directly, and live blocks from the Relayer (itself connected to one or more Extractor instances) for live blocks.

High Availability

Firehoses can be horizontally scaled and will be limited by the throughput of the network between them and clients, as well as them as the Relayers.

Firehoses can connect to a subset, or all of the Relayers.

Having Firehoses connect to all Relayers increases the likelihood of seeing all forks, and being able to navigate those forks for clients that request them while they are still in memory. All forks do end up in the merged blocks, so in the worst case, navigation of forks is delayed when forked blocks do not make it to all Firehoses.