githubEdit

Architecture

Firehose architecture overview and core concepts

Overview

Firehose is a distributed system designed to extract, process, and serve blockchain data at scale. This section covers the core architectural concepts that apply to all blockchain implementations.

Core Principles

Chain-Agnostic Design

  • 90% Universal: Core components work across all blockchains

  • 10% Chain-Specific: Only reader nodes differ between chains

  • Consistent Interface: Same gRPC API regardless of blockchain

Scalable Architecture

  • Horizontal Scaling: Add more components as needed

  • Component Isolation: Each service can be scaled independently

  • Storage Flexibility: Support for local, cloud, and distributed storage

Real-Time Processing

  • Live Streaming: Sub-second latency for new blocks

  • Historical Access: Efficient querying of past data

  • Fault Tolerance: Automatic recovery from failures

System Components

The Firehose system consists of several key components that work together to provide a complete blockchain data pipeline:

Detailed information about each component in the Firehose stack:

  • Reader Node - Wraps blockchain nodes and extracts block data

  • Merger - Combines individual blocks into larger files

  • Relayer - Provides real-time streaming and high availability

  • Firehose - Serves the Firehose gRPC API to clients

  • Substreams - High-performance parallel data transformation engine

  • High Availability - Redundancy and failover strategies

Understanding how data moves through the Firehose system from blockchain nodes to client applications.

Storage patterns, formats, and strategies used by Firehose for different types of blockchain data.

Deployment Patterns

Single-Machine Deployment

All components running on one machine for development or small-scale use:

circle-info

The blockchain node runs as a subprocess of the Reader component, which manages the node's lifecycle and extracts block data.

Distributed Deployment

Components spread across multiple machines for production scale:

Key Features

Universal Block Format

  • Consistent protobuf schemas across chains

  • Rich metadata and transaction details

  • Efficient binary serialization

Streaming API

  • gRPC-based streaming interface

  • Real-time and historical data access

  • Fork-aware streaming with automatic reorg handling

  • Cursor-based resumption for reliable data delivery

Storage Efficiency

  • Compressed block files

  • Incremental merging strategy

  • Cloud storage integration

Operational Excellence

  • Comprehensive metrics and monitoring

  • Automated recovery mechanisms

  • Horizontal scaling capabilities

Next Steps

Last updated

Was this helpful?