Distributed Deployment

This guide shows how to deploy Firehose components as separate processes using shared object storage. This approach is recommended for production environments where you need scalability, high availability, and proper service isolation.

Overview

In this deployment, each component (reader-node, merger, relayer, firehose, substreams-tier1, substreams-tier2) runs as a separate process. Components communicate through shared object storage and gRPC endpoints.

┌─────────────────-┐    ┌─────────────────┐    ┌─────────────────┐
│  Reader          │    │   Processing    │    │   Serving       │
│  Process         │    │   Components    │    │   Components    │
├─────────────────-┤    ├─────────────────┤    ├─────────────────┤
│ ┌─────────────┐  │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │dummy-blockchain│    │ │   Merger    │ │    │ │  Firehose   │ │
│ │ (subprocess)│  │    │ │   Relayer   │ │    │ │  Substreams │ │
│ │   Reader    │  │    │ │             │ │    │ │             │ │
│ └─────────────┘  │    │ └─────────────┘ │    │ └─────────────┘ │
└─────────────────-┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │ Shared Object   │
                    │ Storage         │
                    │ (Cloud Storage) │
                    └─────────────────┘

While this guide shows all components running on a single machine for simplicity, in production you would typically deploy these across multiple machines with proper ingress, DNS, and service discovery.

Prerequisites

Shared Object Storage: Set up cloud storage (AWS S3, Google Cloud Storage, etc.) or On-Premise Storage Solution compatible with S3 like Ceph (recommended).
Binaries: Install firecore and dummy-blockchain as described in Prerequisites

Storage Configuration

First, configure your shared object storage. For this example, we'll use a local filesystem path that simulates cloud storage:

# Create shared storage directory (in production, this would be cloud storage)
mkdir -p ./shared-storage/firehose-data
export SHARED_STORAGE_URL="file://$(pwd)/shared-storage/firehose-data"

# For cloud storage, you would use URLs like:
# export SHARED_STORAGE_URL="gs://your-bucket/firehose-data"
# export SHARED_STORAGE_URL="s3://your-bucket/firehose-data"

Component 1: Reader Node

The Reader manages the blockchain node and extracts block data.

# Terminal 1: Start the Reader
firecore start reader-node \
  --config-file="" \
  --data-dir="./reader-data" \
  --advertise-block-id-encoding="hex" \
  --advertise-chain-name="acme-dummy-blockchain" \
  --reader-node-path="dummy-blockchain" \
  --reader-node-data-dir="./reader-data/node" \
  --reader-node-arguments="start --tracer=firehose --store-dir=./reader-data/node --block-rate=120 --genesis-height=0 --genesis-block-burst=100" \
  --common-one-block-store-url="${SHARED_STORAGE_URL}/one-blocks" \
  --reader-node-grpc-listen-addr=":10010" \
  --reader-node-manager-api-addr=":10013"

The Reader runs the dummy-blockchain as a subprocess and extracts block data to shared storage. See Reader Component for details.

Verify Reader Operation

# Check that one-block files are being created
ls ./shared-storage/firehose-data/one-blocks/

# Inspect a one-block file
firecore tools print one-block ./shared-storage/firehose-data/one-blocks 1 --output=text

Component 2: Merger

The Merger combines one-block files into merged block files for efficient storage.

# Terminal 2: Start the Merger
firecore start merger \
  --config-file="" \
  --data-dir="./merger-data" \
  --advertise-block-id-encoding="hex" \
  --advertise-chain-name="acme-dummy-blockchain" \
  --common-one-block-store-url="${SHARED_STORAGE_URL}/one-blocks" \
  --common-merged-blocks-store-url="${SHARED_STORAGE_URL}/merged-blocks" \
  --merger-grpc-listen-addr=":10011"

The Merger processes one-block files from shared storage and creates optimized merged block files. Learn more about Merger Component.

Verify Merger Operation

# Check that merged block files are being created
ls ./shared-storage/firehose-data/merged-blocks/

# Inspect a merged block file
firecore tools print merged-blocks ./shared-storage/firehose-data/merged-blocks 100 --output=text

Component 3: Relayer

The Relayer provides live block streaming capabilities.

# Terminal 3: Start the Relayer
firecore start relayer \
  --config-file="" \
  --data-dir="./relayer-data" \
  --advertise-block-id-encoding="hex" \
  --advertise-chain-name="acme-dummy-blockchain" \
  --common-one-block-store-url="${SHARED_STORAGE_URL}/one-blocks" \
  --relayer-grpc-listen-addr=":10012" \
  --relayer-source="reader-node:localhost:10010"

The Relayer connects to the Reader to stream live blocks and provides real-time data access. See Relayer Component for more details.

Verify Relayer Operation

# Stream live blocks from the relayer
firecore tools relayer stream localhost:10012 -o text +3

Component 4: Firehose

The Firehose component serves historical and live block data via gRPC.

# Terminal 4: Start Firehose
firecore start firehose \
  --config-file="" \
  --data-dir="./firehose-data" \
  --advertise-block-id-encoding="hex" \
  --advertise-chain-name="acme-dummy-blockchain" \
  --common-one-block-store-url="${SHARED_STORAGE_URL}/one-blocks" \
  --common-merged-blocks-store-url="${SHARED_STORAGE_URL}/merged-blocks" \
  --firehose-grpc-listen-addr=":10015" \
  --common-live-blocks-addr="localhost:10012"

Verify Firehose Operation

# Test the Firehose gRPC API
grpcurl -plaintext -d '{"start_block_num": 1, "stop_block_num": 5}' \
  localhost:10015 sf.firehose.v2.Stream/Blocks

Component 5: Substreams Tier 1

Substreams Tier 1 serves as the entry point for Substreams requests, handling live blocks directly and delegating historical block processing to Tier 2 workers.

# Terminal 5: Start Substreams Tier 1
firecore start substreams-tier1 \
  --config-file="" \
  --data-dir="./substreams-tier1-data" \
  --advertise-block-id-encoding="hex" \
  --advertise-chain-name="acme-dummy-blockchain" \
  --common-one-block-store-url="${SHARED_STORAGE_URL}/one-blocks" \
  --common-merged-blocks-store-url="${SHARED_STORAGE_URL}/merged-blocks" \
  --substreams-tier1-grpc-listen-addr=":10016" \
  --common-live-blocks-addr="localhost:10012" \
  --substreams-tier1-subrequests-endpoint="localhost:10017"

Component 6: Substreams Tier 2

Substreams Tier 2 workers handle the actual block processing for historical data. Tier 1 delegates work to Tier 2 workers.

Start Tier 2 before Tier 1, as Tier 1 connects to Tier 2 workers via --substreams-tier1-subrequests-endpoint.

# Terminal 6: Start Substreams Tier 2
firecore start substreams-tier2 \
  --config-file="" \
  --data-dir="./substreams-tier2-data" \
  --advertise-block-id-encoding="hex" \
  --advertise-chain-name="acme-dummy-blockchain" \
  --common-merged-blocks-store-url="${SHARED_STORAGE_URL}/merged-blocks" \
  --substreams-tier2-grpc-listen-addr=":10017"

Verify Substreams Operation

# List available Substreams services
grpcurl -plaintext localhost:10016 list | grep substreams
grpcurl -plaintext localhost:10017 list | grep substreams

Load Balancer / API Gateway

In production, you would typically put a load balancer or API gateway in front of your services:

# Example nginx configuration for load balancing
# upstream firehose_backend {
#     server localhost:10015;
# }
#
# upstream substreams_backend {
#     server localhost:10016;
#     server localhost:10017;
# }
#
# server {
#     listen 80;
#     location /firehose/ {
#         grpc_pass grpc://firehose_backend;
#     }
#     location /substreams/ {
#         grpc_pass grpc://substreams_backend;
#     }
# }

Monitoring and Health Checks

Monitor each component's health:

# Check component health via gRPC health checks
grpcurl -plaintext localhost:10011 grpc.health.v1.Health/Check  # Merger
grpcurl -plaintext localhost:10012 grpc.health.v1.Health/Check  # Relayer
grpcurl -plaintext localhost:10015 grpc.health.v1.Health/Check  # Firehose
grpcurl -plaintext localhost:10016 grpc.health.v1.Health/Check  # Substreams Tier1
grpcurl -plaintext localhost:10017 grpc.health.v1.Health/Check  # Substreams Tier2

Production Considerations

Service Discovery

In production, components need to discover each other. Consider using:

Kubernetes: Service discovery via DNS
Consul: Service mesh with health checking
AWS ELB/ALB: Load balancing with health checks

Storage

Replace the local filesystem with proper cloud storage:

# AWS S3
--common-one-block-store-url="s3://your-bucket/one-blocks"
--common-merged-blocks-store-url="s3://your-bucket/merged-blocks"

# Google Cloud Storage
--common-one-block-store-url="gs://your-bucket/one-blocks"
--common-merged-blocks-store-url="gs://your-bucket/merged-blocks"

High Availability

For high availability:

Run multiple instances of each component
Use health checks and automatic restarts
Implement proper monitoring and alerting
Use redundant storage with replication

Security

Secure your deployment:

TLS encryption for gRPC communications
Authentication and authorization
Network segmentation and firewalls
Secrets management for storage credentials

Scaling

Scale components based on load:

Reader: Usually one per blockchain network
Merger: Can run multiple instances for different block ranges
Relayer: Multiple instances for high availability
Firehose: Scale horizontally based on query load
Substreams: Scale both tiers based on processing needs

Next Steps

Adapt for your blockchain: Use these patterns with your target blockchain
Production deployment: Implement proper orchestration (Kubernetes, Docker Swarm)
Monitoring: Set up comprehensive monitoring and alerting
Performance tuning: Optimize based on your specific requirements

You now have a distributed Firehose deployment! This architecture can be adapted to any cloud provider or orchestration platform for production use.

PreviousSingle Machine Deployment NextChain Specific

Last updated 22 days ago

Was this helpful?

hashtagOverview

hashtagPrerequisites

hashtagStorage Configuration

hashtagComponent 1: Reader Node

hashtagVerify Reader Operation

hashtagComponent 2: Merger

hashtagVerify Merger Operation

hashtagComponent 3: Relayer

hashtagVerify Relayer Operation

hashtagComponent 4: Firehose

hashtagVerify Firehose Operation

hashtagComponent 5: Substreams Tier 1

hashtagComponent 6: Substreams Tier 2

hashtagVerify Substreams Operation

hashtagLoad Balancer / API Gateway

hashtagMonitoring and Health Checks

hashtagProduction Considerations

hashtagService Discovery

hashtagStorage

hashtagHigh Availability

hashtagSecurity

hashtagScaling

hashtagNext Steps

Overview

Prerequisites

Storage Configuration

Component 1: Reader Node

Verify Reader Operation

Component 2: Merger

Verify Merger Operation

Component 3: Relayer

Verify Relayer Operation

Component 4: Firehose

Verify Firehose Operation

Component 5: Substreams Tier 1

Component 6: Substreams Tier 2

Verify Substreams Operation

Load Balancer / API Gateway

Monitoring and Health Checks

Production Considerations

Service Discovery

Storage

High Availability

Security

Scaling

Next Steps