The Firehose gets upstreamed into Go Ethereum

StreamingFast
4 min readApr 25, 2024

Today, The Graph merges with Go Ethereum.

Six years ago, StreamingFast set out to rethink from first principles how people should be processing blockchain data post-consensus. And today marks an important milestone: the new Live Tracer, carefully designed to enable the Firehose technology, makes its way into the core of Go Ethereum in the v1.14.0 release, culminating into the recognition and incorporation of a core piece of our technology and approach.

This initiative was seeded at a lunch in ETH Bogotá with Sina (from the Go Ethereum team). One of the first reasons he stated to implement the Live Tracer was:

“to enable the Firehose integration to reach the users of geth without the need to maintain forks of the Go Ethereum repository.” — Sina

Users of geth include all of its forks: e.g. Polygon, BNB, Optimism, Arbitrum and countless others. We at StreamingFast have maintained a great number of these forks for years.

The release of the Live Tracer means Firehose becomes a first-class citizen — a testament to its importance as a standard for blockchain data extraction. This also means that integrating the Firehose on geth-based chains will no longer require the cumbersome maintenance of individual Firehose upgrades.

The Firehose, which powers The Graph Network, is the first and deepest integration into the new Live Tracer, bringing multiple benefits, including the lowest latency and a cursor to handle forks. It also gave birth to Substreams, a transformation engine that taps into the architecture of the Firehose, and enables a plethora of sinks to feed blockchain data into a multitude of data systems, including subgraphs.

What is the Live Tracer

The Live Tracer is a way to get deep data traces (called Extended data details in the context of the Firehose) of transactions and block execution happening on geth-powered blockchains. It provides much richer, much more reliable data, for example: by providing deltas for balances, nonces, and state changes. Because of the rigor of the implementation (e.g.: absolute ordering of all events in the block), it enables new indexing paradigms, like pattern matching of events based on state changes, calls, and parent call trees, or triggering of events based on changes to the actual variables in a smart contract.

Hooked to the Firehose architecture, this tracing data can flow into your systems with a much simpler and more reliable interface, and less latency than any other solution.

Because it contains state changes, it can also be used to reconstruct state at any point in time, and it can do that at high speeds: it’s also a replication protocol.

A bit of history

Years ago, when working on the first protocol to attain 0.5s block times, StreamingFast made a radical decision: break away from the interface that was Lingua Franca of blockchain data access: JSON-RPC. We knew that old paradigm wasn’t going to cut it.

We set out to design a better solution from first principles: a way to tap into the core of the blockchain’s execution engine, and stream it out through a standardized interface, to replicate the node and feed data into various downstream systems. Any system needing comprehensive understanding of what’s happening on a blockchain network.

To do this, the team needed to roll up its sleeves and start the extremely courageous task of prying open complex blockchain source code repositories, in C++, in Rust, in Go, and, in there, create a replication protocol that would work for all chains.

For broad adoption, reusability and binary efficiency, we chose to represent every blockchain’s data model as Protocol Buffer messages. Unity in data representation is particularly needed in this space ripe with an infinite amount of serialization and encoding algorithms, codecs and methods.

A very fruitful collaboration ensued, between a handful of individuals from our team (shoutout to Matthieu Vachon for doing a lot of the hard work at StreamingFast), and the bright minds behind the Go Ethereum codebase (in particular Sina and Felix), which ultimately led to the final Tracer that landed into the main trunk on March 16th 2024.

This milestone also crystallizes the technology stack chosen by The Graph as a solid foundation for current and future solutions in the EVM ecosystem.

About the Firehose

For those with prior knowledge of The Graph and subgraphs, here is the Firehose’s positioning in the stack:

  • Subgraphs handle the whole flow of ETLQ (Extract using JSON-RPC, Transform with AssemblyScript, Load into Postgres, and Query through GraphQL).
  • Firehose is a next gen extraction layer (which just merged into geth). It collects history in flat files and streams new blocks in real-time. Start at block 1 or 100000, -100000 or -1 (from head), and stream forever.
  • Substreams is a large scale, parallelized data transformation engine (with modules written in Rust instead of AssemblyScript), similar to BigQuery... but it’s a streaming engine, building on top of the Firehose.
  • Substreams has multiple sinks: Postgres, MySQL, MongoDB, ClickHouse, files, third-party streaming engines, large scale analytics data stores, realtime notifications to different destinations, webhooks, and. subgraphs!
  • For The Graph, any future integrations first start with a Firehose integration, especially for non-EVM chains. graph-node has native support for EVM and Substreams (from which it supports all other protocols).
  • So the Firehose/Substreams is really an expansion of the world of possibilities offered by The Graph Network.

The inclusion of The Firehose protocol within Ethereum enshrines this approach into arguably the largest developer platform the world has ever seen/to have ever existed, and is a testament to the great work that StreamingFast engineers have poured into this stack along the years.

Interested in leveraging the Firehose and Substreams technology to turbo boost your project? Make sure to join StreamingFast’s Discord server to ask any questions that may come up.

--

--

StreamingFast

StreamingFast is a protocol infrastructure company that provides a massively scalable architecture for streaming blockchain data.