Building a Decentralized Brain with AI & Crypto

StreamingFast
9 min readApr 16, 2024

--

Thanks to Sam Green of Semiotic Labs and Yaniv Tal of Geo for contributing to this post.

TL;DR:

  • The rapid adoption of AI underscores the urgent need for decentralized solutions to step in and prevent centralized control among tech giants. The best path forward combines AI and blockchain to ensure data openness and verifiability.
  • Retrieval Augmented Generation (RAG) and knowledge graphs improve LLMs’ accuracy by providing up-to-date and contextually relevant information, with knowledge graphs offering superior data organization and retrieval capabilities.
  • Decentralized knowledge graphs are the next big paradigm shift. They can leverage blockchain technology to ensure open access to information, while enhancing trust through verifiability and transparent governance.
  • Geo, a pioneering decentralized knowledge graph launching soon on The Graph, exemplifies the integration of blockchain and AI to create a more accessible, reliable, and user-governed internet.
  • Information will be organized and generated at an exponential rate thanks to human-in-the-loop verification and AI-powered content generation, ensuring trust and transparency while maintaining a human touch.

Having witnessed this past year’s explosive mainstream adoption of LLMs, as well as discussions around the risks associated with this technology, it is becoming evident that AI will dramatically influence culture, politics, and truth seeking. It is therefore imperative that we, as a global community, don’t allow control to be wielded by a handful of tech giants through data moats, but instead work together towards building a decentralized alternative.

By ensuring that data remains open and public, we can build the trust layer that will enable verifiability of data accuracy in a way that is simply not possible in today’s big tech landscape. Rather than being impacted by the biases, assumptions, and opinions of a few large corporations, we can work together toward building a decentralized brain, truly accessible and owned by all. AI, and its integration into our lives, should be architected from the ground up as a public good, and not within walled gardens.

Yann LeCun — Chief AI Scientist at Meta

The Role of Retrieval Augmented Generation (RAG)

When discussing LLMs and information retrieval, it is helpful to use as an analogy our own brains, and view how we interact with AI from the viewpoints of working memory and explicit memory. LLMs are great at explicit memory. By encoding data with their weights during the training stage of the model, LLMs can parse a vast amount of content and are fairly good at memorizing that information. Not to say that this is without limitations though. As they cannot actually store all of the info they are trained on (since it would be an exponential amount of data), this can lead to the hallucinations we have all seen from LLMs, giving you a laughable guess of a response to a seemingly trivial question. And since you can’t continually train the model on every new piece of information that is available at that time, the LLM is blind to recent innovations and discoveries. This is why Retrieval Augmented Generation (RAG) technology is the perfect addition for LLMs.

RAG is the process by which a system first references a dataset of information that is outside of the LLM’s training knowledge in order to add any new information and context to the LLM prompt before responding. RAG can be seen as the working memory of an artificial brain. By integrating up-to-date knowledge through an external knowledge base and vector databases, RAG aims to refine the accuracy and relevance of AI-generated content. However, reliance on unstructured information can complicate extracting relevant data, leading to potential information redundancy and the challenge of ensuring the correct context is used when a prompt is answered.

Knowledge Graphs: An Improvement Over Vector Databases

Knowledge graphs represent an opportunity to augment the capabilities of RAG within LLMs. Knowledge graphs outperform vector databases through their ability to offer deeper semantic analysis, unmatched effectiveness in data retrieval, and enhanced ease of verifiability. Knowledge graphs excel in understanding and navigating the complexities of natural language, allowing for a nuanced exploration of data relationships that closely mirror human cognition. This semantic depth ensures that LLMs can access more accurate and contextually relevant information, significantly improving generated content quality. In comparison, vector databases rely on document chunking methods that either remove context or increase risks of hallucination by retrieving irrelevant information. With knowledge graphs, it is possible to quickly find a relevant entity and then transverse the graph to retrieve all relevant context.

Moreover, the structured nature of knowledge graphs makes them highly effective for organizing vast amounts of data, even though the dataset is constantly being appended to. This structural advantage supports a more precise retrieval process, directly benefiting RAG applications by supplying them with the most relevant data points for any given query. While using this in conjunction with the information found in an LLM’s “explicit memory”, your prompt can now be served from both “memory buckets,” each serving its unique purpose to provide a more accurate and contextualized response.

Decentralized Knowledge Graphs: A Paradigm Shift

We believe that the most perfect marriage for blockchain and AI is through decentralized knowledge graphs — bridging all of the world’s data, connecting it all in an easily explorable way through thoughtful creation, curation, organization, and composability. Knowledge graphs are typically built in a centralized manner by a company or group with a unique knowledge base that must be linked and continually updated. While this is a great tool, and serves a specific purpose very well, it doesn’t fit the need of what we envision the real potential of this technology to be: the underpinning of tomorrow’s internet.

While there has been a lot of hype and fanfare for the many ways that blockchain and AI can become interconnected, our position is that decentralized knowledge graphs will be unparalleled in importance, paradigm-shifting potential, and cultural relevance.

We’re extremely excited by the work being done on Geo, a decentralized knowledge graph that leverages The Graph (the world’s leading decentralized protocol for indexing and querying blockchain data). Geo is pioneering how this intersection of technologies can be built from the ground up within a truly web3 ethos — making the world’s knowledge openly accessible to all, without gatekeepers.

Geo: Pioneering Decentralized Knowledge Networks

Geo aims to organize and structure the world’s data in not just a searchable database, but also to ensure unrivaled composability. But similar to any compendium, it is paramount that you have a manner to easily retrieve the information you seek. We can envision a future in which you interact with Geo through Agents. These Agents would allow users to ask a question, and the knowledge graph would retrieve relevant content, databases, or APIs, which can then be used to feed a Large Language Model (LLM) on the fly. Rather than the current model of running a search and then paging one-by-one through the relevant results that are returned, imagine having an Agent answer you after it has loaded up all of the relevant information that connects to your query.

Now, of course, the quality of the information fed into an Agent is extremely important, which is where blockchain offers other great tools: identity and reputation. By marking each piece of information with an attestation of the original author, while having a traceable and publicly verifiable reputation of said author, you are able to control both the types of sources and the quality of sources you engage with. Additionally, since everything is extremely composable, how we engage with this information can be tailored to meet our interests and needs, without compromising the data being served.

Building the Decentralized Brain with The Graph

The overarching vision is to build a decentralized brain that can store information from various sources, which humans then curate into independent communities called Spaces. This shared brain can now reason, utilizing all of this information since it is well structured, allowing AI to make well-informed decisions. Once this decentralized brain exists, it can be connected to the real world through APIs, becoming a real autonomous agent executing actions for the user, automating away mundane tasks, and allowing humans to focus on more meaningful work. This interconnected graph of knowledge now has the ability to pull in data from multiple dynamic data sources.

A New Ecosystem of Data Contribution and Verification

The Graph is uniquely positioned to implement this architecture within The World of Data Services through the new Interconnected Graph of Data. Amongst many other services, The Graph will add LLM data services, meaning Indexers would provide open-source model inference. These models will have direct access to verifiable data through the Interconnected Graph of Data, including tooling to make it easy to access. For the first time, an open, composable, low-latency, and fully integrated stack will be available, enabling developers to build Agents more powerful than ever.

From Information Retrieval to Knowledge Creation: The Role of LLMs and Humans

We must take a different approach to building out tomorrow’s decentralized brain. This will help to enhance resiliency and reliability, improve an LLM’s ability to provide meaningful responses, and ease RAG. Looking at a potential design and architecture within the New Era of The Graph, we can point to how a thoughtfully designed knowledge graph can become a foundation for a better tomorrow.

  1. Information is added to the interconnected graph (and thus a Geo Space), by a cryptographically verified contributor with a traceable reputation
  2. Alternatively, information can be aggregated from a verifiable third party data source
  3. An LLM is then able to build logical connections between this newly added information and points of data that are already stored in its working memory, which is then served to, and validated, by humans within Geo
  4. An Agent receives a prompt from a human and can use RAG to retrieve the most relevant information from the interconnected graph
  5. The next user who is looking to contribute material to a Geo Space would now be better informed through direct access to relevant data, thus should be creating even better additional content
  6. The Agent can be the UX itself. The user can request information, then craft new content themselves and submit it through the Agent, which will help them to edit, add, access, and link to other relevant information
  7. To complete the knowledge loop, a Curator persona could be introduced. Reputable human-in-the-loop tagging of information can help to inform the knowledge graph as to which data is most valuable. We envision that this role could be incentivized using GRT, a reimagining of the current Curator role within The Graph

It’s also easy to see how you can start to have a system where, rather than humans leveraging LLMs to retrieve information, you would see LLMs leveraging humans to help them build upon the knowledge graph. LLMs can create information and propose it to reputable humans for human-in-the-loop verification. This would quickly speed up the pace of information aggregation without losing both the human verification of data and, more importantly, the human touch. By not allowing LLMs to add data directly, we are filtering out potential hallucinations from entering the knowledge compendium while leveraging LLMs’ ability to take on the mundane for us.

Trust and Transparency in Decentralized Knowledge Graphs

Integrating blockchain technology with knowledge graphs brings an added layer of trust through easier verifiability. Each piece of data can be attributed to a verifiable source, maintaining a clear record of its origin and modifications, and those involved. This transparency bolsters the data’s credibility and fosters a secure environment for its use, making decentralized knowledge graphs a superior choice for advancing RAG technology in LLMs.

Yaniv Tal — Co-Founder of The Graph and Geo

The New Era of The Graph, with Geo acting as a browser through which to harness the world’s information, is uniquely positioned to be at the forefront of this exciting new revolution of the internet. Not only does it fill the world’s need for a decentralized knowledge graph, it also allows us as the global user base to take part in the governance of such an important tool. A truly open and decentralized brain requires open and transparent governance, which would be impossible to achieve if it were being built in a centralized manner.

Onwards and upwards.

--

--

StreamingFast

StreamingFast is a protocol infrastructure company that provides a massively scalable architecture for streaming blockchain data.