Single-Machine Deployment
StreamingFast Firehose NEAR Setup
In this document, we are going to showcase how you can launch a Firehose on NEAR instance on a single machine that will serve everything. Firehose installation is accomplished through a few fairly simple tasks including obtaining specific binaries and some configuration steps.
Running on a single machine is quick and easy and can work fairly good if you are to use it for your own needs. For production grade set up, especially those shared across many users, we highly recommend splitting each component of Firehose in their own container with a shared storage for files access properly set up between them. This enables horizontal scaling with fine control over which component should be scaled out.
To bootstrap our instance, we are going to start Firehose on NEAR from a snapshot provided by the NEAR foundation. This will make our Firehose installation serves fairly recent blocks from the NEAR network. A caveat of using a snapshot like this is that historical blocks, e.g. blocks that were produced before the snapshot was taken, will not be available. To get access to historical blocks, look at Backfill historical blocks section.
It's important to understand here that we are going to run a NEAR full node (a.k.a NEAR RPC node). Operation blockchain's full node is a complex task and requires access to powerful disk(s) and powerful machine. The firehose-near
binary is going to launch near-firehose-indexer
which is a thin wrapper around neard
that outputs Firehose Instrumentation Logs for NEAR. The near-firehose-indexer
process acts just like neard
would, synchronizing with the network. How to properly and efficiently operate a NEAR full node is not the responsibility of Firehose. When you have problem syncing or slow ingestion rate, you should first look at the Full Node official documentation and seek help with neard
in mind.
Knowledge of neard
and how to run a NEAR full node is required here. We even suggest that you try to operate a NEAR full node outside of Firehose before hand. Starting a Firehose on NEAR instance is quite easy when you ran NEAR full node before, the neard
data folder can even be used to bootstrap the firehose-near
instance instead of fetching a NEAR snapshot.
Requirements
This tutorial have been tested on a Ubuntu 22.04 machine, and you will need to compile near-firehose-indexer
manually.
Hardware requirements should follow NEAR full node requirements found at https://near-nodes.io/rpc/hardware-rpc. Firehose requires also extra disk space for Firehose NEAR blocks produced, indexes file for filtered blocks stream and for Substreams, if enabled. The actual space usage is hard to give exactly, specially for Substreams which is highly dependent on usage. Firehose NEAR Mainnet blocks weight ~600 GiB, this is in addition to space taken by NEAR node itself, so a minimum of 2 TiB is recommended.
Installation
firenear
firenear
First install the firenear
binary:
The command above will download the latest firehose-near
tarball, extracts it in the current folder and copy over /usr/local/bin
as well as creating a symlink for versioning.
And validate that everything is working as expected:
It should print:
Firehose Instrumented Node Binary
Second step is to have the Firehose instrumented node binary. In the case of NEAR, we are going to get our hand on near-firehose-indexer
. This binary is actually a NEAR Indexer binary and essentially, it's neard
configured to index the block and transactions as they are synced by the node and emit Firehose logs out of it.
To avoid any compatibility issues, we are going to compile the binary directly on the machine that will execute the binary.
It's an important that you pick the current active version to sync with the network, the NEAR latest stable releases page lists the most recent version that is needed to sync with Mainnet. As new versions of neard
are published, new versions of near-firehose-indexer
will be made available by us, so be sure to subscribe to near-firehose-indexer
releases to be informed when a new release is out.
It's important that you subscribe to NEAR update announcements to ensure you correctly continue to synchronize with the network. Some versions upgrade are hard forks which mean that if you don't upgrade in time, you will be unable to follow the canonical chain when the hard fork is activated on the network. It's your responsibility to monitor new NEAR releases and hard forks.
Install required build dependencies:
Install and configure rustup:
Follow the instructions and don't forget to run source "$HOME/.cargo/env"
at the end to make the binaries available.
Now let's clone near-firehose-indexer
and checkout the correct version. In this tutorial we are going to use 1.30.1-fire
because it's the latest version at time of writing, be sure to use the correct version (latest tag can be found with curl -s https://api.github.com/repos/streamingfast/near-firehose-indexer/releases/latest | grep tag_name
):
Then let's compile it:
This will take several minutes to complete depending on your machine size. And to terminate, let's copy the binary somewhere it's going to be available:
Finally, let's verify that it worked correctly:
It should print:
If you see 1.27.0
printed while you downloaded 1.30.1
, it's just a mistake of this release where the version was not updated properly.
Snapshot
We are now going to download a NEAR Mainnet snapshot to our local disk. Instructions for download NEAR snapshot for Mainnet or Testnet are given at https://near-nodes.io/intro/node-data-snapshots. We are going to use s5cmd because it improves download speed a lot. We are going to quickly give the instructions but please refer to https://near-nodes.io/intro/node-data-snapshots for further details.
The NEAR snapshot should be put under a folder named data
within the NEAR home directory. In our instructions, our NEAR home directory will be at /data/node
so we are going to transfer all NEAR snapshot data under /data/node/data
. Feel free to adjust those paths to your own setup.
Running
Now that we have our NEAR full node snapshot at /data/node/data
, we need to setup the required configuration files that are needed to run the node. Essentially, we are following https://near-nodes.io/rpc/run-rpc-node-without-nearup#mainnet instructions, please refer to there for further details about some element.
The files that are needed:
config.json
genesis.json
node_key.json
Those files should be put in the NEAR home directory, which is at /data/node
in our case. Those files are provided by NEAR directly:
The config.json
already comes with pre-defined boot nodes as well as the tracked shards that we are interested in. Remember that those files are coming from NEAR directly, so please refer to their documentation for further details about the meaning of those config values.
We will now generate a unique node_key.json
for this node. For this, we are going to use firenear tools generate-node-key
. In the https://near-nodes.io/rpc/run-rpc-node-without-nearup#mainnet instructions, the node_key.json
file is generated by doing neard init
which essentially simply generates an ED25519 Public/Secret key pair and serialize it to a JSON file. We provide firenear tools generate-node-key
as a convenience to avoid downloading yet another binary.
Now, we are going to sanity check that everything is all good by running near-firehose-indexer
. This sanity check will also enable us to see at what block the snapshot is currently syncing, which is required later when we will start firenear
binary.
If everything works properly, you should see something like:
As soon as you see some output, you can do Ctrl-C
as we are going to restart everything now but through the firenear
directly, which will launch near-firehose-indexer
as a sub-process and manages it.
Prior continuing however, find the first line of the form INFO stats: #84349610 Waiting for peers
and note the block number that you see, in our case it's 84349610
. We will use this value soon to compute for firehose-near
the first streamable block of our setup.
We will now create a config file for firehose-near
and explain within the file itself some of the configuration value. Let's create a file /data/firehose.yaml
with the following content:
The configuration file can actually by all turned into flags passed to the binary directly if you prefer. The args
should be joined together with ,
and passed to firenear start
directly while all the configuration value should be prefixed with --
and pass as flag.
Let's now start the firenear
stack:
If everything is working, you should see logs like this:
Logs that have the date format Feb 03 02:54:00.930
are coming from NEAR node directly and not from firenear
. The logs with date format 2023-02-03T02:54:12.537Z
are those from firenear
. You will see a bunch of logs like Feb 03 02:56:41.062 INFO network: Error connecting to addr=142.132.150.14:24567 err=Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }
, those are simply stating that neard
tried to connect to a remote node but the connection was refused.
Logs like 2023-02-03T02:56:40.594Z INFO (merger) reading from blocks store: file does not (yet?) exist, retrying in {"filename": "/data/storage/merged-blocks/0084350700.dbin.zst", "base_filename": "0084350700", "retry_delay": "4s"}
are also normal, indeed, we have not yet produced this file because the node is still catching up.
Now, the node is waiting for peers to download the missing block header and to be able to replay the blocks. Once synchronization starts properly, you will see files being produced in /data/storage/one-blocks
and in /data/storage/merged-blocks
. It can take many minutes (and even dozen of minutes) before you are able to connect to good peers that will provide good data to you, this is something outside of Firehose controls, read NEAR documentation to try to improve P2P to your node.
If you have been stuck a long time on Feb 03 02:54:02.982 INFO stats: #84349610 Waiting for peers 2 peers ⬇ 12.8 kB/s ⬆ 1.58 kB/s 0.00 bps 0 gas/s CPU: 112%, Mem: 497 MB
, you may want to Ctrl-C
and start again, sometimes it help getting better peers.
Once peering is good, there is still some wait time to download missing headers and state, this depends on how old is the snapshot you started from. You should see logs like:
Which gives some information about completion rate. Now it's time to wait, you can monitor /data/storage/merged-blocks/
folder and wait until a least one merged bundle is produced, you can follow to next steps once you have one. This step can be quite long depending on the peering and the machine used as well as external network factors.
Operator Notes
NEAR works with a fast replay mode when the node is too far away from the canonical block of the network, this happen if your node is more than 1 (or 2 epochs, it's not 100% clear) away from the rest of the network, the node is going to not process blocks in between and instead "jump" to recent block by download some state snapshot. Firehose cannot work if blocks are missing, so it NEAR node needs to continuously synchronize with the network for Firehose to generate block properly and never create hole. If your node is down for tool long, you will need to fill the whole somehow, see backfilling below for details.
Verifying
To verify that everything is good, we are going to install grpcurl
, a curl
like command line tool but for gRPC protocol:
Let's now perform a Firehose stream blocks request:
If you see some output, everything is working normally, your instance is working as expected. It will now continue to synchronize with the network.
If instead you see something like Failed to dial target host "localhost:9000": dial tcp 127.0.0.1:9000: connect: connection refused
, it means firenear
has not produced a single block yet, wait until blocks are present in /data/storage/merged-blocks
and then try again.
Backfill
Now that you are synchronizing blocks live with the network, you need to backfill blocks that you did not process so far. This can be achieved in two ways:
Download existing blocks from a trusted provider
Configure an archive node instead of a full node, and replay blocks you are missing
Download existing blocks
You can reach to us on Discord to discuss exchanging our NEAR blocks. Note that you will need to pay the egress cost associated with the transfer.
Archive node
Now that you know how to sync Firehose for NEAR, you can repeat a similar procedure as the tutorial but with an archive node instead. Archive node are able to replay all block from genesis, so when you start with an archive node, you can specify the start block that you desire.
Docker Images
Docker images are available and come in two flavor. One that only contains firenear
and another one that we called bundled image that contains firenear
as well as the near-firehose-indexer, which is essentially neard
codebase wrapped in NEAR indexer framework.
Both kind of image are pushed to repository ghcr.io/streamingfast/firehose-near, the image's tag can be used to determine which version it is:
Image containing only
firenear
contains a singletag
likeghcr.io/streamingfast/firehose-near:v1.0.0
Image containing the bundle
firenear
andnear-firehose-indexer
contains two tags separated by a dash-
character likeghcr.io/streamingfast/firehose-near:v1.0.0-1.30.1-fire
which essentially means thatfirehose-near
versionv1.0.0
is bundled withnear-firehose-indexer
version1.30.1-fire
.
Last updated