Substreams allows you to stream historical data from your favourite blockchain and apply all the transformations that your project needs.
Why is Substreams so performant?
When needed, Substreams processes your transformations in parallel, making it an ultra-fast tool! When parallelization happens, Substreams spins up several jobs (depending on the workload) to consume data independently.
The core component of the parallelization engine is the Substreams Scheduler, which is responsible for choosing how many jobs should be created, and orchestrating the data produced by every job.
Making Substreams even faster
One of the most powerful features of Substreams is that you can use the output of a Substreams module as an input for another Substreams module. This creates a dependency tree. For example, the following Substreams diagram includes two store modules (store_tokens and store_pool_count) that receive the same input (map_pools_created).
In the past, the execution model of the Substreams Scheduler led to calculating the result of the map_pools_created module two times (once for each store). However, since the release of the new Scheduler, the result of the map_pools_created module is reused for both stores, improving the performance and reducing the number of calculations.
- From 18 to 6 jobs
In a large Substreams module, the number of jobs created has been reduced from 18 to 6.
- A 2.74x performance increase
For the same compute resources, a 2.74x performance increase has been observed.