We previously discussed how AI and Web3 can complement each other across vertical industries like computational networks, intermediary platforms, and consumer applications. When focusing on data resources as a vertical field, emerging Web projects offer new possibilities for the acquisition, sharing, and utilization of data.
Data has become the key driver of innovation and decision-making across industries. UBS predicts that global data volume will grow tenfold from 2020 to 2030, reaching 660 ZB. By 2025, each individual globally is expected to generate 463 EB (Exabytes, 1 EB = 1 billion GB) of data daily. The Data-as-a-Service (DaaS) market is rapidly expanding. According to Grand View Research, the global DaaS market was valued at $14.36 billion in 2023 and is expected to grow at a compound annual growth rate (CAGR) of 28.1%, reaching $76.8 billion by 2030.
AI model training relies heavily on large datasets to identify patterns and adjust parameters. After training, datasets are also needed to test the models’ performance and generalization capabilities. Additionally, AI agents, as emerging intelligent application forms, require real-time and reliable data sources to ensure accurate decision-making and task execution.
(Source: Leewayhertz)
Demand for business analytics is also becoming more diverse and widespread, serving as a core tool driving enterprise innovation. For instance, social media platforms and market research firms need reliable user behaviour data to formulate strategies and analyze trends, integrating diverse data from multiple social platforms to build a more comprehensive picture.
For the Web3 ecosystem, reliable and authentic data is also needed on-chain to support new financial products. As more innovative assets are tokenized, flexible and reliable data interfaces are required to support product development and risk management, allowing smart contracts to execute based on verifiable real-time data.
In addition, use cases in scientific research, IoT, and other fields highlight the skyrocketing demand for diverse, authentic, and real-time data. Traditional systems may struggle to cope with the rapidly growing data volume and ever-changing demands.
A typical data ecosystem includes data collection, storage, processing, analysis, and application. Centralized models are characterized by centralized data collection and storage, managed by a core IT team with strict access control. For example, Google’s data ecosystem spans various data sources like search engines, Gmail, and the Android operating system. These platforms collect user data, store it in globally distributed data centers, and process it using algorithms to support the development and optimization of various products and services.
In financial markets, LSEG (formerly Refinitiv) gathers real-time and historical data from global exchanges, banks, and major financial institutions, while utilizing its proprietary Reuters News network to collect market-related news. They process this information using proprietary algorithms and models to generate analysis and risk assessment products as value-added services.
(Source: kdnuggets.com)
While traditional data architecture is effective in professional services, the limitations of centralized models are becoming increasingly evident, particularly in covering emerging data sources, transparency, and user privacy protection. Below are some key issues:
For example, the 2021 GameStop event revealed the limitations of traditional financial data providers in analyzing social media sentiment. Investor sentiment on platforms like Reddit swiftly influenced market trends, but data terminals like Bloomberg and Reuters failed to capture these dynamics in time, leading to delayed market forecasts.
Beyond these issues, traditional data providers face challenges related to cost efficiency and flexibility. Although they are actively addressing these problems, emerging Web3 technologies provide new perspectives and possibilities to tackle them.
Since the launch of decentralized storage solutions like IPFS (InterPlanetary File System) in 2014, a series of emerging projects have aimed to address the limitations of traditional data ecosystems. Decentralized data solutions have evolved into a multi-layered, interconnected ecosystem covering all stages of the data lifecycle, including data generation, storage, exchange, processing and analysis, verification and security, as well as privacy and ownership.
As data exchange and utilization increase, ensuring authenticity, credibility, and privacy has become critical. This drives the Web3 ecosystem to innovate in data verification and privacy protection, leading to groundbreaking solutions.
Many Web3 technologies and native projects focus on addressing issues of data authenticity and privacy protection. Beyond the widespread adoption of technologies like Zero-Knowledge Proofs (ZK) and Multi-Party Computation (MPC), TLS Notary has emerged as a noteworthy new verification method.
Introduction to TLS Notary
The Transport Layer Security (TLS) protocol is a widely used encryption protocol for network communications. Its primary purpose is to ensure the security, integrity, and confidentiality of data transmission between a client and a server. TLS is a common encryption standard in modern network communications, applied in scenarios such as HTTPS, email, and instant messaging.
(TLS Encryption Principles, Source: TechTarget)
When TLS Notary was first introduced a decade ago, its goal was to verify the authenticity of TLS sessions by introducing a third-party “notary” outside of the client (prover) and server.
Using key-splitting technology, the master key of a TLS session is divided into two parts, held separately by the client and the notary. This design allows the notary to participate as a trusted third party in the verification process without accessing the actual communication content. This mechanism aims to detect man-in-the-middle attacks, prevent fraudulent certificates, and ensure that communication data is not tampered with during transmission. It also enables trusted third parties to confirm the legitimacy of communications while protecting privacy.
Thus, TLS Notary offers secure data verification and effectively balances verification needs with privacy protection.
In 2022, the TLS Notary project was restructured by the Ethereum Foundation’s Privacy and Scaling Exploration (PSE) research lab. The new version of the TLS Notary protocol was rewritten from scratch in the Rust programming language and integrated with more advanced cryptographic protocols like MPC. These updates enable users to prove the authenticity of data received from a server to a third party without revealing the data’s content. While retaining its core verification capabilities, the new TLS Notary significantly enhances privacy protection, making it more suitable for current and future data privacy requirements.
In recent years, TLS Notary technology has continued to evolve, resulting in various derivatives that further enhance its privacy and verification capabilities:
Web3 projects leverage these cryptographic technologies to enhance data verification and privacy protection, tackling issues like data monopolies, silos, and trusted transmission. Users can securely verify ownership of social media accounts, shopping records for financial loans, banking credit history, professional background, and academic credentials without compromising their privacy. Examples include:
(Projects Working on TLS Oracles, Source: Bastian Wetzel)
Data verification in Web3 is an essential link in the data ecosystem, with vast application prospects. The flourishing of this ecosystem is steering the digital economy toward a more open, dynamic, and user-centric model. However, the development of authenticity verification technologies is only the beginning of constructing next-generation data infrastructure.
Some projects have combined the aforementioned data verification technologies with further exploration of upstream data ecosystems, such as data traceability, distributed data collection, and trusted transmission. Below, we highlight three representative projects—OpenLayer, Grass, and Vana—that showcase unique potential in building next-generation data infrastructure.
OpenLayer, one of the projects from the a16z Crypto 2024 Spring Startup Accelerator, is the first modular authentic data layer. It aims to provide an innovative modular solution for coordinating data collection, verification, and transformation, addressing the needs of both Web2 and Web3 companies. OpenLayer has garnered support from renowned funds and angel investors, including Geometry Ventures and LongHash Ventures.
Traditional data layers face multiple challenges: lack of reliable verification mechanisms, reliance on centralized architectures that limit accessibility, lack of interoperability and flow between different systems, and the absence of fair data value distribution mechanisms.
A more specific issue is the increasing scarcity of training data for AI. On the public internet, many websites now deploy anti-scraping measures to prevent large-scale data scraping by AI companies. In private proprietary data, the situation is even more complex. Valuable data is often stored in a privacy-protected manner due to its sensitive nature, lacking effective incentive mechanisms. Users cannot safely monetize their private data and are thus reluctant to share sensitive information.
To address these problems, OpenLayer combines data verification technologies to build a Modular Authentic Data Layer. Through decentralization and economic incentives, it coordinates the processes of data collection, verification, and transformation, providing a safer, more efficient, and flexible data infrastructure for Web2 and Web3 companies.
OpenLayer provides a modular platform that simplifies data collection, trustworthy verification, and transformation processes.
a) OpenNodes
OpenNodes are the core components responsible for decentralized data collection in the OpenLayer ecosystem. Through mobile apps, browser extensions, and other channels, users can collect data. Different operators/nodes can optimize their rewards by performing tasks most suited to their hardware specifications.
OpenNodes support three main types of data:
Developers can easily add new data types, specify data sources, and define requirements, and retrieval methods. Users can provide anonymized data in exchange for rewards. This design allows the system to expand continuously to meet new data demands. The diverse data sources make OpenLayer suitable for various application scenarios and lower the threshold for data provision.
b) OpenValidators
OpenValidators handle the verification of collected data, enabling data consumers to confirm the accuracy of user-provided data against its source. Verification methods use cryptographic proofs, and results can be retrospectively validated. Multiple providers can offer verification services for the same type of proof, allowing developers to select the best-suited provider for their needs.
In initial use cases, particularly for public or private data from internet APIs, OpenLayer employs TLS Notary as a verification solution. It exports data from any web application and verifies its authenticity without compromising privacy.
Beyond TLS Notary, thanks to its modular design, the verification system can easily integrate other methods to accommodate diverse data and verification needs, including:
c) OpenConnect
OpenConnect is the module responsible for data transformation and usability within the OpenLayer ecosystem. It processes data from various sources, ensuring interoperability across different systems to meet diverse application requirements. For example:
Providing privacy-preserving data anonymization for user private accounts while enhancing security during data sharing to reduce leaks and misuse.
To meet the real-time data demands of AI and blockchain applications, OpenConnect supports efficient real-time data transformation.
Currently, through integration with EigenLayer, OpenLayer AVS (Active Validation Service) operators monitor data request tasks, collect data, verify it, and report results back to the system. Operators stake or re-stake assets on EigenLayer to provide economic guarantees for their actions. Malicious behaviour results in asset slashing. As one of the earliest AVS projects on the EigenLayer mainnet, OpenLayer has attracted over 50 operators and $4 billion in restaked assets.
Grass, the flagship project developed by Wynd Network, is designed to create a decentralized network crawler and AI training data platform. By the end of 2023, Grass completed a $3.5 million seed funding round led by Polychain Capital and Tribe Capital. In September 2024, it secured Series A funding, with $5 million led by HackVC and additional participation from Polychain, Delphi, Lattice, and Brevan Howard.
As AI training increasingly relies on diverse and expansive data sources, Grass addresses this need by creating a distributed web crawler node network. This network leverages decentralized physical infrastructure and idle user bandwidth to collect and provide verifiable datasets for AI training. Nodes route web requests through user internet connections, accessing public websites and compiling structured datasets. Initial data cleaning and formatting are performed using edge computing technology, ensuring high-quality outputs.
Grass utilizes the Solana Layer 2 Data Rollup architecture to enhance processing efficiency. Validators receive, verify, and batch-process web transactions from nodes, generating Zero-Knowledge (ZK) proofs to confirm data authenticity. Verified data is stored on the Grass Data Ledger (L2), with corresponding proofs linked to the Solana L1 blockchain.
a) Grass Nodes:
Users install the Grass app or browser extension, allowing their idle bandwidth to power decentralized web crawling. Nodes route web requests, access public websites, and compile structured datasets. Using edge computing, they perform initial data cleaning and formatting. Users earn GRASS tokens as rewards based on their bandwidth contribution and the volume of data provided.
b) Routers:
Acting as intermediaries, routers connect Grass nodes to validators. They manage the node network, and relay bandwidth, and are incentivized based on the total verified bandwidth they facilitate.
c) Validators:
Validators receive and verify web transactions relayed by routers. They generate ZK proofs to confirm the validity of the data, leveraging unique key sets to establish secure TLS connections and encryption suites. While Grass currently uses centralized validators, plans are in place to transition to a decentralized validator committee.
d) ZK Processors:
These processors validate node session data proofs and batch all web request proofs for submission to Solana Layer 1.
e) Grass Data Ledger (Grass L2):
The Grass Data Ledger stores comprehensive datasets and links them to their corresponding L1 proofs on Solana, ensuring transparency and traceability.
f) Edge Embedding Models:
These models transform unstructured web data into structured datasets suitable for AI training.
Source: Grass
Grass and OpenLayer share a commitment to leveraging distributed networks to provide companies with access to open internet data and authenticated private data. Both utilize incentive mechanisms to promote data sharing and the production of high-quality datasets, but their technical architectures and business models differ.
Technical Architecture:
Grass uses a Solana Layer 2 Data Rollup architecture with centralized validation, relying on a single validator. OpenLayer, as an early adopter of EigenLayer’s AVS (Active Validation Service), employs a decentralized validation mechanism using economic incentives and slashing penalties. Its modular design emphasizes scalability and flexibility in data verification services.
Product Focus:
Both projects allow users to monetize data through nodes, but their business use cases diverge:
Grass primarily targets AI companies and data scientists needing large-scale, structured datasets, as well as research institutions and enterprises requiring web-based data. OpenLayer caters to Web3 developers needing off-chain data sources, AI companies requiring real-time, verifiable streams, and businesses pursuing innovative strategies like verifying competitor product usage.
While both projects currently occupy distinct niches, their functionalities may converge as the industry evolves:
Both projects could also integrate data labelling as a critical step for training datasets. Grass, with its vast network of over 2.2 million active nodes, could quickly deploy Reinforcement Learning with Human Feedback (RLHF) services to optimize AI models. OpenLayer, with its expertise in real-time data verification and processing, could maintain an edge in data credibility and quality, particularly for private datasets.
Despite the potential overlap, their unique strengths and technological approaches may allow them to dominate different niches within the decentralized data ecosystem.
(Source:IOSG, David)
Vana is a user-centric data pool network designed to provide high-quality data for AI and related applications. Compared to OpenLayer and Grass, Vana takes a distinct technological and business approach. In September 2024, Vana secured $5 million in funding led by Coinbase Ventures, following an $18 million Series A round in which Paradigm served as the lead investor, with participation from Polychain and Casey Caruso.
Originally launched in 2018 as an MIT research project, Vana is a Layer 1 blockchain dedicated to private user data. Its innovations in data ownership and value distribution allow users to profit from AI models trained on their data. Vana achieves this through trustless, private, and attributable Data Liquidity Pools (DLPs) and an innovative Proof of Contribution mechanism that facilitates the flow and monetization of private data.
Vana introduces a unique concept of Data Liquidity Pools (DLPs), which are at the core of the Vana network. Each DLP is an independent peer-to-peer network aggregating specific types of data assets. Users can upload their private data—such as shopping records, browsing habits, and social media activity—into designated DLPs and decide whether to authorize specific third-party usage.
Data within these pools undergoes de-identification to protect user privacy while remaining usable for commercial applications, such as AI model training and market research. Users contributing data to a DLP are rewarded with corresponding DLP tokens. These tokens represent the user’s contribution to the pool, grant governance rights, and entitle the user to a share of future profits.
Unlike the traditional one-time sale of data, Vana allows data to participate continuously in the economic cycle, enabling users to receive ongoing rewards with transparent, visualized usage tracking.
The Proof of Contribution (PoC) mechanism is a cornerstone of Vana’s approach to ensuring data quality. Each DLP can define a unique PoC function tailored to its characteristics, verifying the authenticity and completeness of submitted data and evaluating its contribution to improving AI model performance. This mechanism quantifies user contributions, recording them for reward allocation. Similar to the “Proof of Work” concept in cryptocurrency, PoC rewards users based on data quality, quantity, and usage frequency. Smart contracts automate this process, ensuring contributors are compensated fairly and transparently.
This core layer enables the contribution, verification, and recording of data into DLPs, transforming data into transferable digital assets on-chain. DLP creators deploy smart contracts to set purposes, verification methods, and contribution parameters. Data contributors submit data for validation, and the PoC module evaluates data quality and assigns governance rights and rewards.
Serving as Vana’s application layer, this platform facilitates collaboration between data contributors and developers. It provides infrastructure for building distributed AI training models and AI DApps using the liquidity in DLPs.
A decentralized ledger that underpins the Vana ecosystem, Connectome acts as a real-time data flow map. It records all real-time data transactions using Proof of Stake consensus, ensuring the efficient transfer of DLP tokens and enabling cross-DLP data access. Fully compatible with EVM, it allows interoperability with other networks, protocols, and DeFi applications.
(Source: Vana)
Vana provides a fresh approach by focusing on the liquidity and empowerment of user data. This decentralized data exchange model not only supports AI training and data marketplaces but also enables seamless cross-platform data sharing and ownership in the Web3 ecosystem. Ultimately, it fosters an open internet where users can own and manage their data and the intelligent products created from it.
In 2006, data scientist Clive Humby famously remarked, “Data is the new oil.” Over the past two decades, we have witnessed the rapid evolution of technologies that “refine” this resource, such as big data analytics and machine learning, which have unlocked unprecedented value from data. According to IDC, by 2025, the global data sphere will expand to 163 ZB, with the majority coming from individuals. As IoT, wearable devices, AI, and personalized services become more widespread, much of the data required for commercial use will originate from individuals.
Web3 data solutions overcome the limitations of traditional infrastructure by leveraging distributed node networks. These networks enable broader, more efficient data collection while improving the real-time accessibility and verifiability of specific datasets. Web3 technologies ensure data authenticity and integrity while protecting user privacy, fostering a fairer data utilization model. This decentralized architecture democratizes data access and empowers users to share in the economic benefits of the data economy.
Both OpenLayer and Grass rely on user-node models to enhance specific data collection processes, while Vana monetizes private user data. These approaches not only improve efficiency but also enable ordinary users to participate in the value created by the data economy, creating a win-win scenario for users and developers.
Through tokenomics, Web3 data solutions redesign incentive models, establishing a fairer value distribution mechanism. These systems attract significant user participation, hardware resources, and capital investment, optimizing the operation of the entire data network.
Web3 solutions offer modularity and scalability, allowing for technological iteration and ecosystem expansion. For example: OpenLayer’s modular design provides flexibility for future advancements; Grass’ distributed architecture optimizes AI model training by providing diverse and high-quality datasets.
From data generation, storage, and verification to exchange and analysis, Web3-driven solutions address the shortcomings of traditional infrastructures. By enabling users to monetize their data, these solutions fundamentally transform the data economy.
As technologies evolve and application scenarios expand, decentralized data layers are poised to become a cornerstone of next-generation infrastructure. They will support a wide range of data-driven industries while empowering users to take control of their data and its economic potential.
We previously discussed how AI and Web3 can complement each other across vertical industries like computational networks, intermediary platforms, and consumer applications. When focusing on data resources as a vertical field, emerging Web projects offer new possibilities for the acquisition, sharing, and utilization of data.
Data has become the key driver of innovation and decision-making across industries. UBS predicts that global data volume will grow tenfold from 2020 to 2030, reaching 660 ZB. By 2025, each individual globally is expected to generate 463 EB (Exabytes, 1 EB = 1 billion GB) of data daily. The Data-as-a-Service (DaaS) market is rapidly expanding. According to Grand View Research, the global DaaS market was valued at $14.36 billion in 2023 and is expected to grow at a compound annual growth rate (CAGR) of 28.1%, reaching $76.8 billion by 2030.
AI model training relies heavily on large datasets to identify patterns and adjust parameters. After training, datasets are also needed to test the models’ performance and generalization capabilities. Additionally, AI agents, as emerging intelligent application forms, require real-time and reliable data sources to ensure accurate decision-making and task execution.
(Source: Leewayhertz)
Demand for business analytics is also becoming more diverse and widespread, serving as a core tool driving enterprise innovation. For instance, social media platforms and market research firms need reliable user behaviour data to formulate strategies and analyze trends, integrating diverse data from multiple social platforms to build a more comprehensive picture.
For the Web3 ecosystem, reliable and authentic data is also needed on-chain to support new financial products. As more innovative assets are tokenized, flexible and reliable data interfaces are required to support product development and risk management, allowing smart contracts to execute based on verifiable real-time data.
In addition, use cases in scientific research, IoT, and other fields highlight the skyrocketing demand for diverse, authentic, and real-time data. Traditional systems may struggle to cope with the rapidly growing data volume and ever-changing demands.
A typical data ecosystem includes data collection, storage, processing, analysis, and application. Centralized models are characterized by centralized data collection and storage, managed by a core IT team with strict access control. For example, Google’s data ecosystem spans various data sources like search engines, Gmail, and the Android operating system. These platforms collect user data, store it in globally distributed data centers, and process it using algorithms to support the development and optimization of various products and services.
In financial markets, LSEG (formerly Refinitiv) gathers real-time and historical data from global exchanges, banks, and major financial institutions, while utilizing its proprietary Reuters News network to collect market-related news. They process this information using proprietary algorithms and models to generate analysis and risk assessment products as value-added services.
(Source: kdnuggets.com)
While traditional data architecture is effective in professional services, the limitations of centralized models are becoming increasingly evident, particularly in covering emerging data sources, transparency, and user privacy protection. Below are some key issues:
For example, the 2021 GameStop event revealed the limitations of traditional financial data providers in analyzing social media sentiment. Investor sentiment on platforms like Reddit swiftly influenced market trends, but data terminals like Bloomberg and Reuters failed to capture these dynamics in time, leading to delayed market forecasts.
Beyond these issues, traditional data providers face challenges related to cost efficiency and flexibility. Although they are actively addressing these problems, emerging Web3 technologies provide new perspectives and possibilities to tackle them.
Since the launch of decentralized storage solutions like IPFS (InterPlanetary File System) in 2014, a series of emerging projects have aimed to address the limitations of traditional data ecosystems. Decentralized data solutions have evolved into a multi-layered, interconnected ecosystem covering all stages of the data lifecycle, including data generation, storage, exchange, processing and analysis, verification and security, as well as privacy and ownership.
As data exchange and utilization increase, ensuring authenticity, credibility, and privacy has become critical. This drives the Web3 ecosystem to innovate in data verification and privacy protection, leading to groundbreaking solutions.
Many Web3 technologies and native projects focus on addressing issues of data authenticity and privacy protection. Beyond the widespread adoption of technologies like Zero-Knowledge Proofs (ZK) and Multi-Party Computation (MPC), TLS Notary has emerged as a noteworthy new verification method.
Introduction to TLS Notary
The Transport Layer Security (TLS) protocol is a widely used encryption protocol for network communications. Its primary purpose is to ensure the security, integrity, and confidentiality of data transmission between a client and a server. TLS is a common encryption standard in modern network communications, applied in scenarios such as HTTPS, email, and instant messaging.
(TLS Encryption Principles, Source: TechTarget)
When TLS Notary was first introduced a decade ago, its goal was to verify the authenticity of TLS sessions by introducing a third-party “notary” outside of the client (prover) and server.
Using key-splitting technology, the master key of a TLS session is divided into two parts, held separately by the client and the notary. This design allows the notary to participate as a trusted third party in the verification process without accessing the actual communication content. This mechanism aims to detect man-in-the-middle attacks, prevent fraudulent certificates, and ensure that communication data is not tampered with during transmission. It also enables trusted third parties to confirm the legitimacy of communications while protecting privacy.
Thus, TLS Notary offers secure data verification and effectively balances verification needs with privacy protection.
In 2022, the TLS Notary project was restructured by the Ethereum Foundation’s Privacy and Scaling Exploration (PSE) research lab. The new version of the TLS Notary protocol was rewritten from scratch in the Rust programming language and integrated with more advanced cryptographic protocols like MPC. These updates enable users to prove the authenticity of data received from a server to a third party without revealing the data’s content. While retaining its core verification capabilities, the new TLS Notary significantly enhances privacy protection, making it more suitable for current and future data privacy requirements.
In recent years, TLS Notary technology has continued to evolve, resulting in various derivatives that further enhance its privacy and verification capabilities:
Web3 projects leverage these cryptographic technologies to enhance data verification and privacy protection, tackling issues like data monopolies, silos, and trusted transmission. Users can securely verify ownership of social media accounts, shopping records for financial loans, banking credit history, professional background, and academic credentials without compromising their privacy. Examples include:
(Projects Working on TLS Oracles, Source: Bastian Wetzel)
Data verification in Web3 is an essential link in the data ecosystem, with vast application prospects. The flourishing of this ecosystem is steering the digital economy toward a more open, dynamic, and user-centric model. However, the development of authenticity verification technologies is only the beginning of constructing next-generation data infrastructure.
Some projects have combined the aforementioned data verification technologies with further exploration of upstream data ecosystems, such as data traceability, distributed data collection, and trusted transmission. Below, we highlight three representative projects—OpenLayer, Grass, and Vana—that showcase unique potential in building next-generation data infrastructure.
OpenLayer, one of the projects from the a16z Crypto 2024 Spring Startup Accelerator, is the first modular authentic data layer. It aims to provide an innovative modular solution for coordinating data collection, verification, and transformation, addressing the needs of both Web2 and Web3 companies. OpenLayer has garnered support from renowned funds and angel investors, including Geometry Ventures and LongHash Ventures.
Traditional data layers face multiple challenges: lack of reliable verification mechanisms, reliance on centralized architectures that limit accessibility, lack of interoperability and flow between different systems, and the absence of fair data value distribution mechanisms.
A more specific issue is the increasing scarcity of training data for AI. On the public internet, many websites now deploy anti-scraping measures to prevent large-scale data scraping by AI companies. In private proprietary data, the situation is even more complex. Valuable data is often stored in a privacy-protected manner due to its sensitive nature, lacking effective incentive mechanisms. Users cannot safely monetize their private data and are thus reluctant to share sensitive information.
To address these problems, OpenLayer combines data verification technologies to build a Modular Authentic Data Layer. Through decentralization and economic incentives, it coordinates the processes of data collection, verification, and transformation, providing a safer, more efficient, and flexible data infrastructure for Web2 and Web3 companies.
OpenLayer provides a modular platform that simplifies data collection, trustworthy verification, and transformation processes.
a) OpenNodes
OpenNodes are the core components responsible for decentralized data collection in the OpenLayer ecosystem. Through mobile apps, browser extensions, and other channels, users can collect data. Different operators/nodes can optimize their rewards by performing tasks most suited to their hardware specifications.
OpenNodes support three main types of data:
Developers can easily add new data types, specify data sources, and define requirements, and retrieval methods. Users can provide anonymized data in exchange for rewards. This design allows the system to expand continuously to meet new data demands. The diverse data sources make OpenLayer suitable for various application scenarios and lower the threshold for data provision.
b) OpenValidators
OpenValidators handle the verification of collected data, enabling data consumers to confirm the accuracy of user-provided data against its source. Verification methods use cryptographic proofs, and results can be retrospectively validated. Multiple providers can offer verification services for the same type of proof, allowing developers to select the best-suited provider for their needs.
In initial use cases, particularly for public or private data from internet APIs, OpenLayer employs TLS Notary as a verification solution. It exports data from any web application and verifies its authenticity without compromising privacy.
Beyond TLS Notary, thanks to its modular design, the verification system can easily integrate other methods to accommodate diverse data and verification needs, including:
c) OpenConnect
OpenConnect is the module responsible for data transformation and usability within the OpenLayer ecosystem. It processes data from various sources, ensuring interoperability across different systems to meet diverse application requirements. For example:
Providing privacy-preserving data anonymization for user private accounts while enhancing security during data sharing to reduce leaks and misuse.
To meet the real-time data demands of AI and blockchain applications, OpenConnect supports efficient real-time data transformation.
Currently, through integration with EigenLayer, OpenLayer AVS (Active Validation Service) operators monitor data request tasks, collect data, verify it, and report results back to the system. Operators stake or re-stake assets on EigenLayer to provide economic guarantees for their actions. Malicious behaviour results in asset slashing. As one of the earliest AVS projects on the EigenLayer mainnet, OpenLayer has attracted over 50 operators and $4 billion in restaked assets.
Grass, the flagship project developed by Wynd Network, is designed to create a decentralized network crawler and AI training data platform. By the end of 2023, Grass completed a $3.5 million seed funding round led by Polychain Capital and Tribe Capital. In September 2024, it secured Series A funding, with $5 million led by HackVC and additional participation from Polychain, Delphi, Lattice, and Brevan Howard.
As AI training increasingly relies on diverse and expansive data sources, Grass addresses this need by creating a distributed web crawler node network. This network leverages decentralized physical infrastructure and idle user bandwidth to collect and provide verifiable datasets for AI training. Nodes route web requests through user internet connections, accessing public websites and compiling structured datasets. Initial data cleaning and formatting are performed using edge computing technology, ensuring high-quality outputs.
Grass utilizes the Solana Layer 2 Data Rollup architecture to enhance processing efficiency. Validators receive, verify, and batch-process web transactions from nodes, generating Zero-Knowledge (ZK) proofs to confirm data authenticity. Verified data is stored on the Grass Data Ledger (L2), with corresponding proofs linked to the Solana L1 blockchain.
a) Grass Nodes:
Users install the Grass app or browser extension, allowing their idle bandwidth to power decentralized web crawling. Nodes route web requests, access public websites, and compile structured datasets. Using edge computing, they perform initial data cleaning and formatting. Users earn GRASS tokens as rewards based on their bandwidth contribution and the volume of data provided.
b) Routers:
Acting as intermediaries, routers connect Grass nodes to validators. They manage the node network, and relay bandwidth, and are incentivized based on the total verified bandwidth they facilitate.
c) Validators:
Validators receive and verify web transactions relayed by routers. They generate ZK proofs to confirm the validity of the data, leveraging unique key sets to establish secure TLS connections and encryption suites. While Grass currently uses centralized validators, plans are in place to transition to a decentralized validator committee.
d) ZK Processors:
These processors validate node session data proofs and batch all web request proofs for submission to Solana Layer 1.
e) Grass Data Ledger (Grass L2):
The Grass Data Ledger stores comprehensive datasets and links them to their corresponding L1 proofs on Solana, ensuring transparency and traceability.
f) Edge Embedding Models:
These models transform unstructured web data into structured datasets suitable for AI training.
Source: Grass
Grass and OpenLayer share a commitment to leveraging distributed networks to provide companies with access to open internet data and authenticated private data. Both utilize incentive mechanisms to promote data sharing and the production of high-quality datasets, but their technical architectures and business models differ.
Technical Architecture:
Grass uses a Solana Layer 2 Data Rollup architecture with centralized validation, relying on a single validator. OpenLayer, as an early adopter of EigenLayer’s AVS (Active Validation Service), employs a decentralized validation mechanism using economic incentives and slashing penalties. Its modular design emphasizes scalability and flexibility in data verification services.
Product Focus:
Both projects allow users to monetize data through nodes, but their business use cases diverge:
Grass primarily targets AI companies and data scientists needing large-scale, structured datasets, as well as research institutions and enterprises requiring web-based data. OpenLayer caters to Web3 developers needing off-chain data sources, AI companies requiring real-time, verifiable streams, and businesses pursuing innovative strategies like verifying competitor product usage.
While both projects currently occupy distinct niches, their functionalities may converge as the industry evolves:
Both projects could also integrate data labelling as a critical step for training datasets. Grass, with its vast network of over 2.2 million active nodes, could quickly deploy Reinforcement Learning with Human Feedback (RLHF) services to optimize AI models. OpenLayer, with its expertise in real-time data verification and processing, could maintain an edge in data credibility and quality, particularly for private datasets.
Despite the potential overlap, their unique strengths and technological approaches may allow them to dominate different niches within the decentralized data ecosystem.
(Source:IOSG, David)
Vana is a user-centric data pool network designed to provide high-quality data for AI and related applications. Compared to OpenLayer and Grass, Vana takes a distinct technological and business approach. In September 2024, Vana secured $5 million in funding led by Coinbase Ventures, following an $18 million Series A round in which Paradigm served as the lead investor, with participation from Polychain and Casey Caruso.
Originally launched in 2018 as an MIT research project, Vana is a Layer 1 blockchain dedicated to private user data. Its innovations in data ownership and value distribution allow users to profit from AI models trained on their data. Vana achieves this through trustless, private, and attributable Data Liquidity Pools (DLPs) and an innovative Proof of Contribution mechanism that facilitates the flow and monetization of private data.
Vana introduces a unique concept of Data Liquidity Pools (DLPs), which are at the core of the Vana network. Each DLP is an independent peer-to-peer network aggregating specific types of data assets. Users can upload their private data—such as shopping records, browsing habits, and social media activity—into designated DLPs and decide whether to authorize specific third-party usage.
Data within these pools undergoes de-identification to protect user privacy while remaining usable for commercial applications, such as AI model training and market research. Users contributing data to a DLP are rewarded with corresponding DLP tokens. These tokens represent the user’s contribution to the pool, grant governance rights, and entitle the user to a share of future profits.
Unlike the traditional one-time sale of data, Vana allows data to participate continuously in the economic cycle, enabling users to receive ongoing rewards with transparent, visualized usage tracking.
The Proof of Contribution (PoC) mechanism is a cornerstone of Vana’s approach to ensuring data quality. Each DLP can define a unique PoC function tailored to its characteristics, verifying the authenticity and completeness of submitted data and evaluating its contribution to improving AI model performance. This mechanism quantifies user contributions, recording them for reward allocation. Similar to the “Proof of Work” concept in cryptocurrency, PoC rewards users based on data quality, quantity, and usage frequency. Smart contracts automate this process, ensuring contributors are compensated fairly and transparently.
This core layer enables the contribution, verification, and recording of data into DLPs, transforming data into transferable digital assets on-chain. DLP creators deploy smart contracts to set purposes, verification methods, and contribution parameters. Data contributors submit data for validation, and the PoC module evaluates data quality and assigns governance rights and rewards.
Serving as Vana’s application layer, this platform facilitates collaboration between data contributors and developers. It provides infrastructure for building distributed AI training models and AI DApps using the liquidity in DLPs.
A decentralized ledger that underpins the Vana ecosystem, Connectome acts as a real-time data flow map. It records all real-time data transactions using Proof of Stake consensus, ensuring the efficient transfer of DLP tokens and enabling cross-DLP data access. Fully compatible with EVM, it allows interoperability with other networks, protocols, and DeFi applications.
(Source: Vana)
Vana provides a fresh approach by focusing on the liquidity and empowerment of user data. This decentralized data exchange model not only supports AI training and data marketplaces but also enables seamless cross-platform data sharing and ownership in the Web3 ecosystem. Ultimately, it fosters an open internet where users can own and manage their data and the intelligent products created from it.
In 2006, data scientist Clive Humby famously remarked, “Data is the new oil.” Over the past two decades, we have witnessed the rapid evolution of technologies that “refine” this resource, such as big data analytics and machine learning, which have unlocked unprecedented value from data. According to IDC, by 2025, the global data sphere will expand to 163 ZB, with the majority coming from individuals. As IoT, wearable devices, AI, and personalized services become more widespread, much of the data required for commercial use will originate from individuals.
Web3 data solutions overcome the limitations of traditional infrastructure by leveraging distributed node networks. These networks enable broader, more efficient data collection while improving the real-time accessibility and verifiability of specific datasets. Web3 technologies ensure data authenticity and integrity while protecting user privacy, fostering a fairer data utilization model. This decentralized architecture democratizes data access and empowers users to share in the economic benefits of the data economy.
Both OpenLayer and Grass rely on user-node models to enhance specific data collection processes, while Vana monetizes private user data. These approaches not only improve efficiency but also enable ordinary users to participate in the value created by the data economy, creating a win-win scenario for users and developers.
Through tokenomics, Web3 data solutions redesign incentive models, establishing a fairer value distribution mechanism. These systems attract significant user participation, hardware resources, and capital investment, optimizing the operation of the entire data network.
Web3 solutions offer modularity and scalability, allowing for technological iteration and ecosystem expansion. For example: OpenLayer’s modular design provides flexibility for future advancements; Grass’ distributed architecture optimizes AI model training by providing diverse and high-quality datasets.
From data generation, storage, and verification to exchange and analysis, Web3-driven solutions address the shortcomings of traditional infrastructures. By enabling users to monetize their data, these solutions fundamentally transform the data economy.
As technologies evolve and application scenarios expand, decentralized data layers are poised to become a cornerstone of next-generation infrastructure. They will support a wide range of data-driven industries while empowering users to take control of their data and its economic potential.