Have you ever wondered why social media platforms like Reddit and X (formerly Twitter) can be used for free? The answer lies in the posts you make, the likes you give, and even the time you spend scrolling.
In the past, these platforms sold your attention as a commodity to advertisers. Now, they have found a bigger buyer—AI companies. Reports indicate that a single data licensing agreement between Reddit and Google can generate $60 million annually for the former. Yet, this massive wealth has nothing to do with us as data creators.
What’s even more disturbing is that the AI trained on our data may eventually replace our jobs. While AI might also create new employment opportunities, the concentration of wealth resulting from this data monopoly undoubtedly exacerbates social inequality. It seems we are sliding into a cyberpunk world controlled by a handful of tech giants.
So, how can ordinary people protect their interests in this AI era? After the rise of AI, many view blockchain as humanity’s last line of defense against it. Based on this thinking, some innovators have begun exploring solutions. They propose that first, we must reclaim ownership and control of our data; second, we should use this data to collaboratively train an AI model that truly serves the common people.
This idea may seem idealistic, but history shows us that every technological revolution starts with a “crazy” concept. Today, a new public chain project called “Vana” is turning this vision into reality. As the first decentralized data liquidity network, Vana aims to transform your data into freely circulating tokens, thereby promoting a truly user-controlled decentralized artificial intelligence.
In fact, the birth of Vana can be traced back to a classroom at the MIT Media Lab, where two young individuals with a vision to change the world—Anna Kazlauskas and Art Abal—met.
Left: Anna Kazlauskas; Right: Art Abal.
Anna Kazlauskas majored in computer science and economics at MIT, and her interest in data and cryptocurrency dates back to 2015. At that time, she was involved in early Ethereum mining, which gave her a profound understanding of the potential of decentralized technology. Subsequently, Anna conducted data research at international financial institutions such as the Federal Reserve, European Central Bank, and World Bank, experiences that led her to realize that data would become a new form of currency in the future.
Meanwhile, Art Abal pursued a master’s degree in public policy at Harvard University and conducted in-depth research on data impact assessments at the Belfer Center for Science and International Affairs. Before joining Vana, Art led innovative data collection methods at Appen, an AI training data provider, contributing significantly to the emergence of many generative AI tools today. His insights into data ethics and AI accountability infused Vana with a strong sense of social responsibility.
When Anna and Art met in a class at the MIT Media Lab, they quickly discovered their shared passion for data democratization and user data rights. They recognized that to truly address issues of data ownership and AI fairness, a new paradigm was needed—one that would allow users to genuinely control their own data.
This common vision motivated them to co-found Vana. Their goal is to create a revolutionary platform that not only advocates for data sovereignty for users but also ensures that users can derive economic benefits from their data. Through the innovative Data Liquidity Pool (DLP) mechanism and Proof of Contribution system, Vana enables users to securely contribute private data, co-own, and benefit from the AI models trained on that data, thus promoting user-driven AI development.
Vana’s vision quickly gained recognition in the industry. To date, Vana has announced it has completed a total of $25 million in funding, including a $5 million strategic round led by Coinbase Ventures, an $18 million Series A round led by Paradigm, and a $2 million seed round led by Polychain. Other notable investors include Casey Caruso, Packy McCormick, Manifold, GSR, and DeFiance Capital.
In this world where data is the new oil of the era, the emergence of Vana undoubtedly provides us with an important opportunity to reclaim data sovereignty. So, how does this promising project operate? Let’s delve into Vana’s technical architecture and innovative concepts together.
Vana’s technical architecture is a meticulously designed ecosystem aimed at democratizing data and maximizing its value. Its core components include the Data Liquidity Pool (DLP), Proof of Contribution mechanism, Nagoya Consensus, user self-custody of data, and a decentralized application layer. Together, these elements create an innovative platform that protects user privacy while unlocking the potential value of data.
The Data Liquidity Pool (DLP) serves as the fundamental unit within the Vana network and can be likened to “liquidity mining” but for data. Each DLP is essentially a smart contract designed to aggregate specific types of data assets. For example, the Reddit Data DAO (r/datadao) is a successful DLP case, attracting over 140,000 Reddit users and aggregating users’ Reddit posts, comments, and voting histories.
After users submit their data to a DLP, they can earn specific tokens associated with that DLP, such as RDAT for the Reddit Data DAO (r/datadao). These tokens not only represent the user’s contribution to the data pool but also grant governance rights and future profit-sharing benefits within the DLP. Notably, Vana allows each DLP to issue its own tokens, offering a flexible value-capture mechanism for different types of data assets.
In Vana’s ecosystem, the top 16 DLPs receive additional VANA token emissions, further incentivizing the formation and competition of high-quality data pools. This approach cleverly transforms scattered personal data into liquid digital assets, laying the groundwork for data valorization and liquidity.
Proof of Contribution is Vana’s key mechanism for ensuring data quality. Each DLP can tailor a unique Proof of Contribution function based on its specific needs. This function not only verifies the authenticity and completeness of data but also assesses its contribution to improving AI model performance.
For instance, the ChatGPT Data DAO’s Proof of Contribution considers four critical dimensions: authenticity, ownership, quality, and uniqueness. Authenticity is verified via data export links provided by OpenAI; ownership is confirmed through users’ email verification; quality assessment leverages LLM scoring on randomly sampled conversations; and uniqueness is determined by calculating data feature vectors and comparing them with existing data.
This multidimensional evaluation ensures that only high-quality, valuable data is accepted and rewarded. Proof of Contribution serves as the foundation for data pricing and is essential for maintaining data quality across the ecosystem.
The Nagoya Consensus is the core of the Vana network, inspired by and enhancing Bittensor’s Yuma Consensus. This mechanism revolves around a collective evaluation of data quality by a set of validation nodes, arriving at a final score through weighted averaging.
What sets it apart is the “two-layer evaluation” approach: not only do validation nodes assess data quality, but they also score other nodes’ rating behaviors. This adds a layer of fairness and accuracy, deterring misconduct. For instance, if a validation node assigns a high score to low-quality data, other nodes can penalize this misjudgment with a corrective score.
Every 1800 blocks (roughly every 3 hours) marks a cycle, during which nodes are rewarded based on their cumulative scores. This mechanism incentivizes honesty among validators and swiftly identifies and removes misconduct, ensuring the network’s healthy operation.
One of Vana’s significant innovations lies in its unique data management approach. In the Vana network, users’ original data is never truly “on-chain.” Instead, users can choose their storage locations, such as Google Drive, Dropbox, or even personal servers running on a MacBook.
When users submit data to a DLP, they are essentially providing a URL pointing to the encrypted data and an optional content integrity hash. This information is recorded in Vana’s data registration contract. Validators can request decryption keys to download and verify the data when needed.
This design cleverly addresses issues of data privacy and control. Users maintain complete control over their data while still participating in the data economy. This not only ensures data security but also opens up possibilities for broader data application scenarios in the future.
The top layer of Vana is an open application ecosystem. Here, developers can leverage the data liquidity accumulated in DLPs to build various innovative applications, while data contributors can derive tangible economic value from these applications.
For example, a development team might train a specialized AI model using data from the Reddit Data DAO. Users who contributed data can not only utilize the model once it’s trained but also receive a share of the profits generated by the model according to their contribution. In fact, such an AI model has already been developed; further details can be found in the article “Rebounding from the Bottom: Why the Old Token r/datadao in the AI Track is Coming Back to Life?“
This model not only incentivizes contributions of high-quality data but also creates a truly user-driven AI development ecosystem. Users transition from mere data providers to co-owners and beneficiaries of AI products.
Through this approach, Vana is reshaping the data economy landscape. In this new paradigm, users shift from passive data providers to active participants and co-beneficiaries in ecosystem building. This not only creates new avenues for individual value acquisition but also injects renewed vitality and innovation into the entire AI industry.
Vana’s technical architecture addresses core issues in the current data economy, such as data ownership, privacy protection, and value distribution, while paving the way for future data-driven innovations. As more data DAOs join the network and additional applications are built on the platform, Vana has the potential to become the foundational infrastructure for the next generation of decentralized AI and the data economy.
With the launch of the Satori testnet on June 11, Vana has showcased a prototype of its ecosystem to the public. This serves not only as a platform for technical validation but also as a preview of the operational model for the future mainnet. Currently, the Vana ecosystem offers participants three main pathways: running DLP validation nodes, creating new DLPs, or submitting data to existing DLPs to participate in “data mining.”
Validation nodes are the gatekeepers of the Vana network, responsible for verifying the quality of data submitted to DLPs. Operating a validation node requires not only technical expertise but also sufficient computing resources. According to Vana’s technical documentation, the minimum hardware requirements for a validation node are one CPU core, 8GB of RAM, and 10GB of high-speed SSD storage.
Users interested in becoming validators must first select a DLP and then register as a validator through that DLP’s smart contract. Once registered and approved, validators can run validation nodes specific to that DLP. It’s important to note that validators can operate nodes for multiple DLPs simultaneously, but each DLP has its unique minimum staking requirements.
For users with unique data resources or innovative ideas, creating a new DLP is an attractive option. Establishing a DLP requires a deep understanding of Vana’s technical architecture, particularly the proof of contribution and Nagoya consensus mechanisms.
The creators of a new DLP must design specific data contribution goals, validation methods, and reward parameters. Additionally, they need to implement a proof of contribution function that accurately assesses data value. Although this process can be complex, Vana provides detailed templates and documentation to support creators.
For most users, submitting data to existing DLPs to participate in “data mining” may be the most straightforward way to engage. Currently, 13 DLPs have been officially recommended, covering a range of fields from social media data to financial prediction data.
·Finquarium: Gathers financial prediction data.
·GPT Data DAO: Focuses on ChatGPT chat data exports.
· Reddit Data DAO: Concentrates on Reddit user data and has officially launched.
·Volara: Specializes in the collection and utilization of Twitter data.
·Flirtual: Collects dating data.
·ResumeDataDAO: Focuses on LinkedIn data exports.
·SixGPT: Collects and manages LLM chat data.
·YKYR: Gathers Google Analytics data.
· Sydintel: Crowdsources intelligence to reveal the dark corners of the internet.
·MindDAO: Collects time series data related to user well-being.
·Kleo: Builds the most comprehensive browsing history dataset globally.
·DataPIG: Focuses on token investment preference data.
·ScrollDAO: Collects and utilizes Instagram data.
Some of these DLPs are still in development, while others are already online, but all are in the pre-mining phase. Users can only officially submit data for mining once the mainnet is launched. However, users can secure participation eligibility in various ways ahead of time. For example, they can participate in relevant challenge activities in the Vana Telegram App or pre-register on the official websites of each DLP.
Vana’s emergence marks a paradigm shift in the data economy. In the current AI wave, data has become the “oil” of the new era, and Vana seeks to reshape the models for mining, refining, and distributing this resource.
Essentially, Vana is building a solution to the “tragedy of the commons” in data. Through clever incentive design and technological innovation, it transforms personal data—an apparently limitless supply that is hard to monetize—into a manageable, priceable, and tradable digital asset. This not only opens new pathways for ordinary users to participate in AI profit sharing but also provides a potential blueprint for the development of decentralized AI.
However, Vana’s success faces numerous uncertainties. Technically, it must find a balance between openness and security; economically, it needs to prove that its model can generate sustainable value; and socially, it must tackle potential data ethics and regulatory challenges.
On a deeper level, Vana represents a reflection and challenge to the existing data monopolies and AI development models. It raises an important question: In the AI era, do we choose to reinforce the current data oligarchs, or do we attempt to build a more open, fair, and diverse data ecosystem?
Regardless of whether Vana ultimately succeeds, its emergence offers us a window to rethink data value, AI ethics, and technological innovation. In the future, projects like Vana may become vital bridges connecting Web3 ideals with AI realities, guiding the next phase of digital economic development.
This article is reproduced from [BlockBeats], the copyright belongs to the original author [Weird thinking], if you have any objections to the reprint, please contact the Gate Learn team, and the team will handle it as soon as possible according to relevant procedures.
Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.
Other language versions of the article are translated by the Gate Learn team and are not mentioned in Gate.io, the translated article may not be reproduced, distributed or plagiarized.
Have you ever wondered why social media platforms like Reddit and X (formerly Twitter) can be used for free? The answer lies in the posts you make, the likes you give, and even the time you spend scrolling.
In the past, these platforms sold your attention as a commodity to advertisers. Now, they have found a bigger buyer—AI companies. Reports indicate that a single data licensing agreement between Reddit and Google can generate $60 million annually for the former. Yet, this massive wealth has nothing to do with us as data creators.
What’s even more disturbing is that the AI trained on our data may eventually replace our jobs. While AI might also create new employment opportunities, the concentration of wealth resulting from this data monopoly undoubtedly exacerbates social inequality. It seems we are sliding into a cyberpunk world controlled by a handful of tech giants.
So, how can ordinary people protect their interests in this AI era? After the rise of AI, many view blockchain as humanity’s last line of defense against it. Based on this thinking, some innovators have begun exploring solutions. They propose that first, we must reclaim ownership and control of our data; second, we should use this data to collaboratively train an AI model that truly serves the common people.
This idea may seem idealistic, but history shows us that every technological revolution starts with a “crazy” concept. Today, a new public chain project called “Vana” is turning this vision into reality. As the first decentralized data liquidity network, Vana aims to transform your data into freely circulating tokens, thereby promoting a truly user-controlled decentralized artificial intelligence.
In fact, the birth of Vana can be traced back to a classroom at the MIT Media Lab, where two young individuals with a vision to change the world—Anna Kazlauskas and Art Abal—met.
Left: Anna Kazlauskas; Right: Art Abal.
Anna Kazlauskas majored in computer science and economics at MIT, and her interest in data and cryptocurrency dates back to 2015. At that time, she was involved in early Ethereum mining, which gave her a profound understanding of the potential of decentralized technology. Subsequently, Anna conducted data research at international financial institutions such as the Federal Reserve, European Central Bank, and World Bank, experiences that led her to realize that data would become a new form of currency in the future.
Meanwhile, Art Abal pursued a master’s degree in public policy at Harvard University and conducted in-depth research on data impact assessments at the Belfer Center for Science and International Affairs. Before joining Vana, Art led innovative data collection methods at Appen, an AI training data provider, contributing significantly to the emergence of many generative AI tools today. His insights into data ethics and AI accountability infused Vana with a strong sense of social responsibility.
When Anna and Art met in a class at the MIT Media Lab, they quickly discovered their shared passion for data democratization and user data rights. They recognized that to truly address issues of data ownership and AI fairness, a new paradigm was needed—one that would allow users to genuinely control their own data.
This common vision motivated them to co-found Vana. Their goal is to create a revolutionary platform that not only advocates for data sovereignty for users but also ensures that users can derive economic benefits from their data. Through the innovative Data Liquidity Pool (DLP) mechanism and Proof of Contribution system, Vana enables users to securely contribute private data, co-own, and benefit from the AI models trained on that data, thus promoting user-driven AI development.
Vana’s vision quickly gained recognition in the industry. To date, Vana has announced it has completed a total of $25 million in funding, including a $5 million strategic round led by Coinbase Ventures, an $18 million Series A round led by Paradigm, and a $2 million seed round led by Polychain. Other notable investors include Casey Caruso, Packy McCormick, Manifold, GSR, and DeFiance Capital.
In this world where data is the new oil of the era, the emergence of Vana undoubtedly provides us with an important opportunity to reclaim data sovereignty. So, how does this promising project operate? Let’s delve into Vana’s technical architecture and innovative concepts together.
Vana’s technical architecture is a meticulously designed ecosystem aimed at democratizing data and maximizing its value. Its core components include the Data Liquidity Pool (DLP), Proof of Contribution mechanism, Nagoya Consensus, user self-custody of data, and a decentralized application layer. Together, these elements create an innovative platform that protects user privacy while unlocking the potential value of data.
The Data Liquidity Pool (DLP) serves as the fundamental unit within the Vana network and can be likened to “liquidity mining” but for data. Each DLP is essentially a smart contract designed to aggregate specific types of data assets. For example, the Reddit Data DAO (r/datadao) is a successful DLP case, attracting over 140,000 Reddit users and aggregating users’ Reddit posts, comments, and voting histories.
After users submit their data to a DLP, they can earn specific tokens associated with that DLP, such as RDAT for the Reddit Data DAO (r/datadao). These tokens not only represent the user’s contribution to the data pool but also grant governance rights and future profit-sharing benefits within the DLP. Notably, Vana allows each DLP to issue its own tokens, offering a flexible value-capture mechanism for different types of data assets.
In Vana’s ecosystem, the top 16 DLPs receive additional VANA token emissions, further incentivizing the formation and competition of high-quality data pools. This approach cleverly transforms scattered personal data into liquid digital assets, laying the groundwork for data valorization and liquidity.
Proof of Contribution is Vana’s key mechanism for ensuring data quality. Each DLP can tailor a unique Proof of Contribution function based on its specific needs. This function not only verifies the authenticity and completeness of data but also assesses its contribution to improving AI model performance.
For instance, the ChatGPT Data DAO’s Proof of Contribution considers four critical dimensions: authenticity, ownership, quality, and uniqueness. Authenticity is verified via data export links provided by OpenAI; ownership is confirmed through users’ email verification; quality assessment leverages LLM scoring on randomly sampled conversations; and uniqueness is determined by calculating data feature vectors and comparing them with existing data.
This multidimensional evaluation ensures that only high-quality, valuable data is accepted and rewarded. Proof of Contribution serves as the foundation for data pricing and is essential for maintaining data quality across the ecosystem.
The Nagoya Consensus is the core of the Vana network, inspired by and enhancing Bittensor’s Yuma Consensus. This mechanism revolves around a collective evaluation of data quality by a set of validation nodes, arriving at a final score through weighted averaging.
What sets it apart is the “two-layer evaluation” approach: not only do validation nodes assess data quality, but they also score other nodes’ rating behaviors. This adds a layer of fairness and accuracy, deterring misconduct. For instance, if a validation node assigns a high score to low-quality data, other nodes can penalize this misjudgment with a corrective score.
Every 1800 blocks (roughly every 3 hours) marks a cycle, during which nodes are rewarded based on their cumulative scores. This mechanism incentivizes honesty among validators and swiftly identifies and removes misconduct, ensuring the network’s healthy operation.
One of Vana’s significant innovations lies in its unique data management approach. In the Vana network, users’ original data is never truly “on-chain.” Instead, users can choose their storage locations, such as Google Drive, Dropbox, or even personal servers running on a MacBook.
When users submit data to a DLP, they are essentially providing a URL pointing to the encrypted data and an optional content integrity hash. This information is recorded in Vana’s data registration contract. Validators can request decryption keys to download and verify the data when needed.
This design cleverly addresses issues of data privacy and control. Users maintain complete control over their data while still participating in the data economy. This not only ensures data security but also opens up possibilities for broader data application scenarios in the future.
The top layer of Vana is an open application ecosystem. Here, developers can leverage the data liquidity accumulated in DLPs to build various innovative applications, while data contributors can derive tangible economic value from these applications.
For example, a development team might train a specialized AI model using data from the Reddit Data DAO. Users who contributed data can not only utilize the model once it’s trained but also receive a share of the profits generated by the model according to their contribution. In fact, such an AI model has already been developed; further details can be found in the article “Rebounding from the Bottom: Why the Old Token r/datadao in the AI Track is Coming Back to Life?“
This model not only incentivizes contributions of high-quality data but also creates a truly user-driven AI development ecosystem. Users transition from mere data providers to co-owners and beneficiaries of AI products.
Through this approach, Vana is reshaping the data economy landscape. In this new paradigm, users shift from passive data providers to active participants and co-beneficiaries in ecosystem building. This not only creates new avenues for individual value acquisition but also injects renewed vitality and innovation into the entire AI industry.
Vana’s technical architecture addresses core issues in the current data economy, such as data ownership, privacy protection, and value distribution, while paving the way for future data-driven innovations. As more data DAOs join the network and additional applications are built on the platform, Vana has the potential to become the foundational infrastructure for the next generation of decentralized AI and the data economy.
With the launch of the Satori testnet on June 11, Vana has showcased a prototype of its ecosystem to the public. This serves not only as a platform for technical validation but also as a preview of the operational model for the future mainnet. Currently, the Vana ecosystem offers participants three main pathways: running DLP validation nodes, creating new DLPs, or submitting data to existing DLPs to participate in “data mining.”
Validation nodes are the gatekeepers of the Vana network, responsible for verifying the quality of data submitted to DLPs. Operating a validation node requires not only technical expertise but also sufficient computing resources. According to Vana’s technical documentation, the minimum hardware requirements for a validation node are one CPU core, 8GB of RAM, and 10GB of high-speed SSD storage.
Users interested in becoming validators must first select a DLP and then register as a validator through that DLP’s smart contract. Once registered and approved, validators can run validation nodes specific to that DLP. It’s important to note that validators can operate nodes for multiple DLPs simultaneously, but each DLP has its unique minimum staking requirements.
For users with unique data resources or innovative ideas, creating a new DLP is an attractive option. Establishing a DLP requires a deep understanding of Vana’s technical architecture, particularly the proof of contribution and Nagoya consensus mechanisms.
The creators of a new DLP must design specific data contribution goals, validation methods, and reward parameters. Additionally, they need to implement a proof of contribution function that accurately assesses data value. Although this process can be complex, Vana provides detailed templates and documentation to support creators.
For most users, submitting data to existing DLPs to participate in “data mining” may be the most straightforward way to engage. Currently, 13 DLPs have been officially recommended, covering a range of fields from social media data to financial prediction data.
·Finquarium: Gathers financial prediction data.
·GPT Data DAO: Focuses on ChatGPT chat data exports.
· Reddit Data DAO: Concentrates on Reddit user data and has officially launched.
·Volara: Specializes in the collection and utilization of Twitter data.
·Flirtual: Collects dating data.
·ResumeDataDAO: Focuses on LinkedIn data exports.
·SixGPT: Collects and manages LLM chat data.
·YKYR: Gathers Google Analytics data.
· Sydintel: Crowdsources intelligence to reveal the dark corners of the internet.
·MindDAO: Collects time series data related to user well-being.
·Kleo: Builds the most comprehensive browsing history dataset globally.
·DataPIG: Focuses on token investment preference data.
·ScrollDAO: Collects and utilizes Instagram data.
Some of these DLPs are still in development, while others are already online, but all are in the pre-mining phase. Users can only officially submit data for mining once the mainnet is launched. However, users can secure participation eligibility in various ways ahead of time. For example, they can participate in relevant challenge activities in the Vana Telegram App or pre-register on the official websites of each DLP.
Vana’s emergence marks a paradigm shift in the data economy. In the current AI wave, data has become the “oil” of the new era, and Vana seeks to reshape the models for mining, refining, and distributing this resource.
Essentially, Vana is building a solution to the “tragedy of the commons” in data. Through clever incentive design and technological innovation, it transforms personal data—an apparently limitless supply that is hard to monetize—into a manageable, priceable, and tradable digital asset. This not only opens new pathways for ordinary users to participate in AI profit sharing but also provides a potential blueprint for the development of decentralized AI.
However, Vana’s success faces numerous uncertainties. Technically, it must find a balance between openness and security; economically, it needs to prove that its model can generate sustainable value; and socially, it must tackle potential data ethics and regulatory challenges.
On a deeper level, Vana represents a reflection and challenge to the existing data monopolies and AI development models. It raises an important question: In the AI era, do we choose to reinforce the current data oligarchs, or do we attempt to build a more open, fair, and diverse data ecosystem?
Regardless of whether Vana ultimately succeeds, its emergence offers us a window to rethink data value, AI ethics, and technological innovation. In the future, projects like Vana may become vital bridges connecting Web3 ideals with AI realities, guiding the next phase of digital economic development.
This article is reproduced from [BlockBeats], the copyright belongs to the original author [Weird thinking], if you have any objections to the reprint, please contact the Gate Learn team, and the team will handle it as soon as possible according to relevant procedures.
Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.
Other language versions of the article are translated by the Gate Learn team and are not mentioned in Gate.io, the translated article may not be reproduced, distributed or plagiarized.