As a distributed ledger, blockchain needs to store historical data on all nodes to ensure the security and sufficient decentralization of data storage. Since the correctness of each state change is related to the previous state (transaction source), to ensure the correctness of transactions, a blockchain should in principle store all historical records from the first transaction to the current transaction. Taking Ethereum as an example, even if the average block size is estimated to be 20 kb, the current total size of Ethereum blocks has reached 370 GB. In addition to the block itself, a full node also needs to record status and transaction receipts. Counting this part, the total storage capacity of a single node has exceeded 1 TB, which concentrates the operation of the node to a few people.
Ethereum’s latest block height, image source: Etherscan
Compared with database or linked list storage structures, the non-comparability of blockchain comes from the ability to verify newly generated data through historical data. Therefore, ensuring the security of historical data is the first issue to be considered in DA layer storage. When judging the data security of blockchain systems, we often analyze it from the amount of data redundancy and the verification method of data availability.
On the premise of ensuring basic security, the next core goal that the DA layer needs to achieve is to reduce costs and increase efficiency. The first is to reduce storage costs, regardless of hardware performance differences, that is, to reduce the memory usage caused by storing unit-size data. At this stage, the main ways to reduce storage costs in blockchain are to adopt sharding technology and use reward-based storage to ensure that data is effectively stored and reduce the number of data backups. However, it is not difficult to see from the above improvement methods that there is a game relationship between storage cost and data security. Reducing storage occupancy often means a decrease in security. Therefore, an excellent DA layer needs to achieve a balance between storage cost and data security. In addition, if the DA layer is a separate public chain, it needs to reduce the cost by minimizing the intermediate process of data exchange. In each transfer process, index data needs to be left for subsequent query calls. Therefore, The longer the call process, the more index data will be left and the storage cost will increase. Finally, the cost of data storage is directly linked to the durability of the data. Generally speaking, the higher the storage cost of data, the more difficult it is for the public chain to store data persistently.
After achieving cost reduction, the next step is to increase efficiency, which is the ability to quickly call data out of the DA layer when it needs to be used. This process involves two steps. The first is to search for nodes that store data. This process is mainly for public chains that have not achieved data consistency across the entire network. If the public chain achieves data synchronization for nodes across the entire network, this can be ignored. The time consumption of a process. Secondly, in the current mainstream blockchain systems, including Bitcoin, Ethereum, and Filecoin, the node storage method is the Leveldb database. In Leveldb, data is stored in three ways. First, the data written immediately will be stored in Memtable-type files. When the Memtable storage is full, the file type will be changed from Memtable to Immutable Memtable. Both types of files are stored in memory, but Immutable Memtable files can no longer be changed, only data can be read from them. The hot storage used in the IPFS network stores data in this part. When it is called, it can be quickly read from the memory. However, the mobile memory of an ordinary node is often GB level, and it is easy to write slowly, When a node crashes or other abnormal situation occurs, the data in the memory will be permanently lost. If you want the data to be stored persistently, you need to store it in the form of an SST file on a solid-state drive (SSD). However, when reading the data, you need to read the data into the memory first, which greatly reduces the data indexing speed. Finally, for systems that use shared storage, data restoration requires sending data requests to multiple nodes and restoring them. This process will also reduce the data reading speed.
Leveldb data storage method, picture source: Leveldb-handbook
With the development of DeFi and various problems with CEX, users’ requirements for cross-chain transactions of decentralized assets are also growing. Regardless of the cross-chain mechanism of hash locking, notary public, or relay chain, the simultaneous determination of historical data on both chains cannot be avoided. The key to this problem lies in the separation of data on the two chains, and direct communication cannot be achieved in different decentralized systems. Therefore, a solution is proposed at this stage by changing the DA layer storage method, which not only stores the historical data of multiple public chains on the same trusted public chain but only needs to call the data on this public chain during verification. Can. This requires the DA layer to be able to establish secure communication methods with different types of public chains, which means that the DA layer has good versatility.
Data storage method after Sharding, image source: Kernel Ventures
DAS technology is based on further optimization of Sharding storage methods. During the Sharding process, due to the simple random storage of nodes, a certain Block may be lost. Secondly, for fragmented data, it is also very important to confirm the authenticity and integrity of the data during the restoration process. In DAS, these two problems are solved through Eraser code and KZG polynomial commitment.
Data validation ensures that the data called from a node are accurate and complete. To minimize the amount of data and computational cost required in the validation process, the DA layer now uses a tree structure as the mainstream validation method. The simplest form is to use Merkle Tree for verification, which uses the form of complete binary tree records, only need to keep a Merkle Root and the hash value of the subtree on the other side of the path of the node can be verified, the time complexity of the verification is O(logN) level (the logN is default log2(N)). Although the validation process has been greatly simplified, the amount of data for the validation process in general still grows with the increase of data. To solve the problem of increasing validation volume, another validation method, Verkle Tree, is proposed at this stage, in which each node in the Verkle Tree not only stores the value but also attaches a Vector Commitment, which can quickly validate the authenticity of the data by using the value of the original node and the commitment proof, without the need to call the values of other sister nodes, which makes the computation of each validation easier and faster. This makes the number of computations for each verification only related to the depth of the Verkle Tree, which is a fixed constant, thus greatly accelerating the verification speed. However, the calculation of Vector Commitment requires the participation of all sister nodes in the same layer, which greatly increases the cost of writing and changing data. However, for data such as historical data, which is permanently stored and cannot be tampered with, also, can only be read but not written, the Verkle Tree is extremely suitable. In addition, Merkle Tree and Verkle Tree itself have a K-ary form of variants, the specific implementation of the mechanism is similar, just change the number of subtrees under each node, the specific performance comparison can be seen in the following table.
Time performance comparison of data verification methods, picture source: Verkle Trees
The continuous expansion of the blockchain ecosystem has brought about a continuous increase in the number of public chains. Due to the advantages and irreplaceability of each public chain in their respective fields, it is almost impossible for Layer 1 public chains to unify in a short time. However, with the development of DeFi and various problems with CEX, users’ requirements for decentralized cross-chain trading assets are also growing. Therefore, DA layer multi-chain data storage that can eliminate security issues in cross-chain data interactions has received more and more attention. However, to accept historical data from different public chains, the DA layer needs to provide a decentralized protocol for standardized storage and verification of data streams. For example, kvye, a storage middleware based on Arweave, actively grabs data from the chain and all Data on the chain is stored in Arweave in a standard form to minimize differences in the data transmission process. Relatively speaking, Layer2, which specifically provides DA layer data storage for a certain public chain, interacts with data through internal shared nodes. Although it reduces the cost of interaction and improves security, it has relatively large limitations and can only provide data to Specific public chains that provide services.
This type of storage solution has no definite name yet, and the most prominent representative is DankSharding on Ethereum, so this article uses the class DankSharding to refer to this type of solution. This type of solution mainly uses the two DA storage technologies mentioned above, Sharding and DAS. First, the data is divided into appropriate shares through Sharding, and then each node extracts a data block in the form of DAS for storage. If there are enough nodes in the entire network, we can choose a larger number of shards N, so that the storage pressure of each node is only 1/N of the original, thereby achieving N times expansion of the overall storage space. At the same time, to prevent the extreme situation that a certain Block is not stored in any block, DankSharding encodes the data using an Eraser Code, and only half of the data can be completely restored. The last step is the data verification process, which uses the Verkle tree structure and polynomial commitment to achieve fast verification.
For the DA of the main chain, one of the simplest data processing methods is to store historical data in the short term. In essence, the blockchain plays the role of a public ledger, allowing changes to the ledger content to be witnessed by the entire network, without the need for permanent storage. Taking Solana as an example, although its historical data is synchronized to Arweave, the main network node only retains the transaction data of the past two days. On the public chain based on account records, the historical data at each moment retains the final status of the account on the blockchain, which is enough to provide a verification basis for changes at the next moment. For projects that have special needs for data before this period, they can store it themselves on other decentralized public chains or by a trusted third party. In other words, those who have additional data needs need to pay for historical data storage.
EthStorage contract, image source: Kernel Ventures
Celestia data reading method, image source: Celestia Core
In terms of main chain DA technical principles, many technologies similar to Sharding are borrowed from the storage public chain. Among third-party DAs, some directly use the storage public chain to complete some storage tasks. For example, the specific transaction data in Celestia is placed on the LL-IPFS network. In the third-party DA solution, in addition to building a separate public chain to solve the storage problem of Layer1, a more direct way is to directly connect the storage public chain with Layer1 to store the huge historical data on Layer1. For high-performance blockchains, the volume of historical data is even larger. When running at full speed, the data volume of the high-performance public chain Solana is close to 4 PG, which is completely beyond the storage range of ordinary nodes. The solution Solana chose is to store historical data on the decentralized storage network Arweave, and only retain 2 days of data on the main network nodes for verification. To ensure the security of the stored process, Solana and Arweave Chain have specially designed a storage bridge protocol, Solar Bridge. The data verified by the Solana node will be synchronized to Arweave and the corresponding tag will be returned. Only through this tag, the Solana node can view the historical data of the Solana blockchain at any time. On Arweave, there is no need for all network nodes to maintain data consistency and use this as a threshold to participate in network operations. Instead, reward storage is adopted. First of all, Arweave does not use a traditional chain structure to build blocks but is more similar to a graph structure. In Arweave, a new block will not only point to the previous block, but also randomly point to a generated block Recall Block. The specific location of the Recall Block is determined by the hash result of its previous block and its block height. The location of the Recall Block is unknown until the previous block is mined. However, in the process of generating a new block, the node needs to have Recall Block data to use the POW mechanism to calculate the hash of the specified difficulty. Only the first miner to calculate the hash that meets the difficulty can get the reward, which encourages miners to store as much as possible. historical data. At the same time, the fewer people who store a certain historical block, the nodes will have fewer competitors when generating nonces that meet the difficulty, encouraging miners to store fewer blocks in the network. Finally, to ensure that nodes permanently store data in Arweave, it introduces WildFire’s node scoring mechanism. Nodes will tend to communicate with nodes that can provide more historical data faster, while nodes with lower ratings are often unable to obtain the latest block and transaction data as soon as possible and thus cannot take advantage of the POW competition…
Arweave block construction method, image source: Arweave Yellow-Paper
Next, we will compare the advantages and disadvantages of the five storage solutions based on the four dimensions of DA performance indicators.
Storage solution performance comparison, image source: Kernel Ventures
The current blockchain is undergoing a transformation from Crypto to the more inclusive Web3. This process brings not only a richness of projects on the blockchain. To accommodate the simultaneous operation of so many projects on Layer1 while ensuring the experience of Gamefi and Socialfi projects, Layer1 represented by Ethereum has adopted methods such as Rollup and Blobs to improve TPS. Among the new blockchains, the number of high-performance blockchains is also growing. But higher TPS not only means higher performance, but also greater storage pressure on the network. For massive historical data, various DA methods based on the main chain and third parties are currently proposed to adapt to the increase in on-chain storage pressure. Each improvement method has advantages and disadvantages and has different applicability in different situations.
Blockchains that focus on payment have extremely high requirements for the security of historical data and do not pursue particularly high TPS. If this type of public chain is still in the preparation stage, a DankSharding-like storage method can be adopted, which can achieve a huge increase in storage capacity while ensuring security. However, if it is a public chain like Bitcoin that has already taken shape and has a large number of nodes, there are huge risks in rash improvements at the consensus layer. Therefore, the main chain dedicated DA with higher security in off-chain storage can be used to balance security and storage issues… However, it is worth noting that the functions of blockchain are not static but constantly changing. For example, the early functions of Ethereum were mainly limited to payments and simple automated processing of assets and transactions using smart contracts. However, as the blockchain landscape continues to expand, various Socialfi and Defi projects have gradually been added to Ethereum. Make Ethereum develop in a more comprehensive direction. Recently, with the explosion of the inscription ecology on Bitcoin, the transaction fees of the Bitcoin network have surged nearly 20 times since August. This reflects that the transaction speed of the Bitcoin network at this stage cannot meet the transaction demand, and traders can only Raise fees to make transactions processed as quickly as possible. Now, the Bitcoin community needs to make a trade-off, whether to accept high fees and slow transaction speeds or reduce network security to increase transaction speeds but defeat the original intention of the payment system. If the Bitcoin community chooses the latter, then in the face of increasing data pressure, the corresponding storage solution will also need to be adjusted.
Bitcoin mainnet transaction fees fluctuate, image source: OKLINK
Public chains with comprehensive functions have a higher pursuit of TPS, and the growth of historical data is even greater. It is difficult to adapt to the rapid growth of TPS in the long run by adopting a DankSharding-like solution. Therefore, a more appropriate way is to migrate the data to a third-party DA for storage. Among them, the main chain-specific DA has the highest compatibility and may have more advantages if only the storage issues of a single public chain are considered. But today, when Layer 1 public chains are flourishing, cross-chain asset transfer and data interaction have become a common pursuit of the blockchain community. If the long-term development of the entire blockchain ecosystem is taken into account, storing historical data of different public chains on the same public chain can eliminate many security issues in the data exchange and verification process. Therefore, the difference between modular DA and storage public chain DA way might be a better choice. Under the premise of close versatility, modular DA focuses on providing blockchain DA layer services, introducing more refined index data management historical data, which can reasonably classify different public chain data, and store public chain data. Has more advantages than. However, the above solution does not take into account the cost of adjusting the consensus layer on the existing public chain. This process is extremely risky. Once problems occur, it may lead to systemic vulnerabilities and cause the public chain to lose community consensus. Therefore, if it is a transitional solution during the blockchain expansion process, the simplest temporary storage of the main chain may be more suitable. Finally, the above discussion is based on performance during actual operation. However, if the goal of a certain public chain is to develop its ecology and attract more project parties and participants, it may also prefer projects that are supported and funded by its foundation… For example, when the overall performance is equivalent to or even slightly lower than that of public chain storage solutions, the Ethereum community will also tend to Layer 2 projects supported by the Ethereum Foundation such as EthStorage to continue to develop the Ethereum ecosystem.
All in all, the functions of today’s blockchain are becoming more and more complex, which also brings greater storage space requirements. When there are enough Layer1 verification nodes, historical data does not need to be backed up by all nodes in the entire network. Only when the number of backups reaches a certain value can relative security be guaranteed.. at the same time, The division of labor in public chains has also become more and more detailed., Layer 1 is responsible for consensus and execution, Rollup is responsible for calculation and verification, and a separate blockchain is used for data storage. Each part can focus on a certain function without being limited by the performance of other parts. However, how much specific amount of storage or what proportion of nodes should be allowed to store historical data can achieve a balance between security and efficiency, and how to ensure secure interoperability between different blockchains, this is an issue that requires blockchain developers to think about and continuously improve. Investors, yet pay attention to the main chain-specific DA project on Ethereum, because Ethereum already has enough supporters at this stage and does not need to rely on other communities to expand its influence. What is more needed is to improve and develop your community and attract more projects to the Ethereum ecosystem. However, for public chains in the catch-up position, such as Solana and Aptos, the single chain itself does not have such a complete ecology, so it may be more inclined to join forces with other communities to build a huge cross-chain ecology to expand influence. Thus the emerging Layer1, general third-party DA deserves more attention.
Kernel Ventures is a crypto venture capital fund driven by the research and development community with over 70 early-stage investments focused on infrastructure, middleware, dApps, especially ZK, Rollup, DEX, modular blockchains, and onboarding Vertical areas for billions of crypto users in the future, such as account abstraction, data availability, scalability, etc. For the past seven years, we have been committed to supporting the growth of core development communities and university blockchain associations around the world.
As a distributed ledger, blockchain needs to store historical data on all nodes to ensure the security and sufficient decentralization of data storage. Since the correctness of each state change is related to the previous state (transaction source), to ensure the correctness of transactions, a blockchain should in principle store all historical records from the first transaction to the current transaction. Taking Ethereum as an example, even if the average block size is estimated to be 20 kb, the current total size of Ethereum blocks has reached 370 GB. In addition to the block itself, a full node also needs to record status and transaction receipts. Counting this part, the total storage capacity of a single node has exceeded 1 TB, which concentrates the operation of the node to a few people.
Ethereum’s latest block height, image source: Etherscan
Compared with database or linked list storage structures, the non-comparability of blockchain comes from the ability to verify newly generated data through historical data. Therefore, ensuring the security of historical data is the first issue to be considered in DA layer storage. When judging the data security of blockchain systems, we often analyze it from the amount of data redundancy and the verification method of data availability.
On the premise of ensuring basic security, the next core goal that the DA layer needs to achieve is to reduce costs and increase efficiency. The first is to reduce storage costs, regardless of hardware performance differences, that is, to reduce the memory usage caused by storing unit-size data. At this stage, the main ways to reduce storage costs in blockchain are to adopt sharding technology and use reward-based storage to ensure that data is effectively stored and reduce the number of data backups. However, it is not difficult to see from the above improvement methods that there is a game relationship between storage cost and data security. Reducing storage occupancy often means a decrease in security. Therefore, an excellent DA layer needs to achieve a balance between storage cost and data security. In addition, if the DA layer is a separate public chain, it needs to reduce the cost by minimizing the intermediate process of data exchange. In each transfer process, index data needs to be left for subsequent query calls. Therefore, The longer the call process, the more index data will be left and the storage cost will increase. Finally, the cost of data storage is directly linked to the durability of the data. Generally speaking, the higher the storage cost of data, the more difficult it is for the public chain to store data persistently.
After achieving cost reduction, the next step is to increase efficiency, which is the ability to quickly call data out of the DA layer when it needs to be used. This process involves two steps. The first is to search for nodes that store data. This process is mainly for public chains that have not achieved data consistency across the entire network. If the public chain achieves data synchronization for nodes across the entire network, this can be ignored. The time consumption of a process. Secondly, in the current mainstream blockchain systems, including Bitcoin, Ethereum, and Filecoin, the node storage method is the Leveldb database. In Leveldb, data is stored in three ways. First, the data written immediately will be stored in Memtable-type files. When the Memtable storage is full, the file type will be changed from Memtable to Immutable Memtable. Both types of files are stored in memory, but Immutable Memtable files can no longer be changed, only data can be read from them. The hot storage used in the IPFS network stores data in this part. When it is called, it can be quickly read from the memory. However, the mobile memory of an ordinary node is often GB level, and it is easy to write slowly, When a node crashes or other abnormal situation occurs, the data in the memory will be permanently lost. If you want the data to be stored persistently, you need to store it in the form of an SST file on a solid-state drive (SSD). However, when reading the data, you need to read the data into the memory first, which greatly reduces the data indexing speed. Finally, for systems that use shared storage, data restoration requires sending data requests to multiple nodes and restoring them. This process will also reduce the data reading speed.
Leveldb data storage method, picture source: Leveldb-handbook
With the development of DeFi and various problems with CEX, users’ requirements for cross-chain transactions of decentralized assets are also growing. Regardless of the cross-chain mechanism of hash locking, notary public, or relay chain, the simultaneous determination of historical data on both chains cannot be avoided. The key to this problem lies in the separation of data on the two chains, and direct communication cannot be achieved in different decentralized systems. Therefore, a solution is proposed at this stage by changing the DA layer storage method, which not only stores the historical data of multiple public chains on the same trusted public chain but only needs to call the data on this public chain during verification. Can. This requires the DA layer to be able to establish secure communication methods with different types of public chains, which means that the DA layer has good versatility.
Data storage method after Sharding, image source: Kernel Ventures
DAS technology is based on further optimization of Sharding storage methods. During the Sharding process, due to the simple random storage of nodes, a certain Block may be lost. Secondly, for fragmented data, it is also very important to confirm the authenticity and integrity of the data during the restoration process. In DAS, these two problems are solved through Eraser code and KZG polynomial commitment.
Data validation ensures that the data called from a node are accurate and complete. To minimize the amount of data and computational cost required in the validation process, the DA layer now uses a tree structure as the mainstream validation method. The simplest form is to use Merkle Tree for verification, which uses the form of complete binary tree records, only need to keep a Merkle Root and the hash value of the subtree on the other side of the path of the node can be verified, the time complexity of the verification is O(logN) level (the logN is default log2(N)). Although the validation process has been greatly simplified, the amount of data for the validation process in general still grows with the increase of data. To solve the problem of increasing validation volume, another validation method, Verkle Tree, is proposed at this stage, in which each node in the Verkle Tree not only stores the value but also attaches a Vector Commitment, which can quickly validate the authenticity of the data by using the value of the original node and the commitment proof, without the need to call the values of other sister nodes, which makes the computation of each validation easier and faster. This makes the number of computations for each verification only related to the depth of the Verkle Tree, which is a fixed constant, thus greatly accelerating the verification speed. However, the calculation of Vector Commitment requires the participation of all sister nodes in the same layer, which greatly increases the cost of writing and changing data. However, for data such as historical data, which is permanently stored and cannot be tampered with, also, can only be read but not written, the Verkle Tree is extremely suitable. In addition, Merkle Tree and Verkle Tree itself have a K-ary form of variants, the specific implementation of the mechanism is similar, just change the number of subtrees under each node, the specific performance comparison can be seen in the following table.
Time performance comparison of data verification methods, picture source: Verkle Trees
The continuous expansion of the blockchain ecosystem has brought about a continuous increase in the number of public chains. Due to the advantages and irreplaceability of each public chain in their respective fields, it is almost impossible for Layer 1 public chains to unify in a short time. However, with the development of DeFi and various problems with CEX, users’ requirements for decentralized cross-chain trading assets are also growing. Therefore, DA layer multi-chain data storage that can eliminate security issues in cross-chain data interactions has received more and more attention. However, to accept historical data from different public chains, the DA layer needs to provide a decentralized protocol for standardized storage and verification of data streams. For example, kvye, a storage middleware based on Arweave, actively grabs data from the chain and all Data on the chain is stored in Arweave in a standard form to minimize differences in the data transmission process. Relatively speaking, Layer2, which specifically provides DA layer data storage for a certain public chain, interacts with data through internal shared nodes. Although it reduces the cost of interaction and improves security, it has relatively large limitations and can only provide data to Specific public chains that provide services.
This type of storage solution has no definite name yet, and the most prominent representative is DankSharding on Ethereum, so this article uses the class DankSharding to refer to this type of solution. This type of solution mainly uses the two DA storage technologies mentioned above, Sharding and DAS. First, the data is divided into appropriate shares through Sharding, and then each node extracts a data block in the form of DAS for storage. If there are enough nodes in the entire network, we can choose a larger number of shards N, so that the storage pressure of each node is only 1/N of the original, thereby achieving N times expansion of the overall storage space. At the same time, to prevent the extreme situation that a certain Block is not stored in any block, DankSharding encodes the data using an Eraser Code, and only half of the data can be completely restored. The last step is the data verification process, which uses the Verkle tree structure and polynomial commitment to achieve fast verification.
For the DA of the main chain, one of the simplest data processing methods is to store historical data in the short term. In essence, the blockchain plays the role of a public ledger, allowing changes to the ledger content to be witnessed by the entire network, without the need for permanent storage. Taking Solana as an example, although its historical data is synchronized to Arweave, the main network node only retains the transaction data of the past two days. On the public chain based on account records, the historical data at each moment retains the final status of the account on the blockchain, which is enough to provide a verification basis for changes at the next moment. For projects that have special needs for data before this period, they can store it themselves on other decentralized public chains or by a trusted third party. In other words, those who have additional data needs need to pay for historical data storage.
EthStorage contract, image source: Kernel Ventures
Celestia data reading method, image source: Celestia Core
In terms of main chain DA technical principles, many technologies similar to Sharding are borrowed from the storage public chain. Among third-party DAs, some directly use the storage public chain to complete some storage tasks. For example, the specific transaction data in Celestia is placed on the LL-IPFS network. In the third-party DA solution, in addition to building a separate public chain to solve the storage problem of Layer1, a more direct way is to directly connect the storage public chain with Layer1 to store the huge historical data on Layer1. For high-performance blockchains, the volume of historical data is even larger. When running at full speed, the data volume of the high-performance public chain Solana is close to 4 PG, which is completely beyond the storage range of ordinary nodes. The solution Solana chose is to store historical data on the decentralized storage network Arweave, and only retain 2 days of data on the main network nodes for verification. To ensure the security of the stored process, Solana and Arweave Chain have specially designed a storage bridge protocol, Solar Bridge. The data verified by the Solana node will be synchronized to Arweave and the corresponding tag will be returned. Only through this tag, the Solana node can view the historical data of the Solana blockchain at any time. On Arweave, there is no need for all network nodes to maintain data consistency and use this as a threshold to participate in network operations. Instead, reward storage is adopted. First of all, Arweave does not use a traditional chain structure to build blocks but is more similar to a graph structure. In Arweave, a new block will not only point to the previous block, but also randomly point to a generated block Recall Block. The specific location of the Recall Block is determined by the hash result of its previous block and its block height. The location of the Recall Block is unknown until the previous block is mined. However, in the process of generating a new block, the node needs to have Recall Block data to use the POW mechanism to calculate the hash of the specified difficulty. Only the first miner to calculate the hash that meets the difficulty can get the reward, which encourages miners to store as much as possible. historical data. At the same time, the fewer people who store a certain historical block, the nodes will have fewer competitors when generating nonces that meet the difficulty, encouraging miners to store fewer blocks in the network. Finally, to ensure that nodes permanently store data in Arweave, it introduces WildFire’s node scoring mechanism. Nodes will tend to communicate with nodes that can provide more historical data faster, while nodes with lower ratings are often unable to obtain the latest block and transaction data as soon as possible and thus cannot take advantage of the POW competition…
Arweave block construction method, image source: Arweave Yellow-Paper
Next, we will compare the advantages and disadvantages of the five storage solutions based on the four dimensions of DA performance indicators.
Storage solution performance comparison, image source: Kernel Ventures
The current blockchain is undergoing a transformation from Crypto to the more inclusive Web3. This process brings not only a richness of projects on the blockchain. To accommodate the simultaneous operation of so many projects on Layer1 while ensuring the experience of Gamefi and Socialfi projects, Layer1 represented by Ethereum has adopted methods such as Rollup and Blobs to improve TPS. Among the new blockchains, the number of high-performance blockchains is also growing. But higher TPS not only means higher performance, but also greater storage pressure on the network. For massive historical data, various DA methods based on the main chain and third parties are currently proposed to adapt to the increase in on-chain storage pressure. Each improvement method has advantages and disadvantages and has different applicability in different situations.
Blockchains that focus on payment have extremely high requirements for the security of historical data and do not pursue particularly high TPS. If this type of public chain is still in the preparation stage, a DankSharding-like storage method can be adopted, which can achieve a huge increase in storage capacity while ensuring security. However, if it is a public chain like Bitcoin that has already taken shape and has a large number of nodes, there are huge risks in rash improvements at the consensus layer. Therefore, the main chain dedicated DA with higher security in off-chain storage can be used to balance security and storage issues… However, it is worth noting that the functions of blockchain are not static but constantly changing. For example, the early functions of Ethereum were mainly limited to payments and simple automated processing of assets and transactions using smart contracts. However, as the blockchain landscape continues to expand, various Socialfi and Defi projects have gradually been added to Ethereum. Make Ethereum develop in a more comprehensive direction. Recently, with the explosion of the inscription ecology on Bitcoin, the transaction fees of the Bitcoin network have surged nearly 20 times since August. This reflects that the transaction speed of the Bitcoin network at this stage cannot meet the transaction demand, and traders can only Raise fees to make transactions processed as quickly as possible. Now, the Bitcoin community needs to make a trade-off, whether to accept high fees and slow transaction speeds or reduce network security to increase transaction speeds but defeat the original intention of the payment system. If the Bitcoin community chooses the latter, then in the face of increasing data pressure, the corresponding storage solution will also need to be adjusted.
Bitcoin mainnet transaction fees fluctuate, image source: OKLINK
Public chains with comprehensive functions have a higher pursuit of TPS, and the growth of historical data is even greater. It is difficult to adapt to the rapid growth of TPS in the long run by adopting a DankSharding-like solution. Therefore, a more appropriate way is to migrate the data to a third-party DA for storage. Among them, the main chain-specific DA has the highest compatibility and may have more advantages if only the storage issues of a single public chain are considered. But today, when Layer 1 public chains are flourishing, cross-chain asset transfer and data interaction have become a common pursuit of the blockchain community. If the long-term development of the entire blockchain ecosystem is taken into account, storing historical data of different public chains on the same public chain can eliminate many security issues in the data exchange and verification process. Therefore, the difference between modular DA and storage public chain DA way might be a better choice. Under the premise of close versatility, modular DA focuses on providing blockchain DA layer services, introducing more refined index data management historical data, which can reasonably classify different public chain data, and store public chain data. Has more advantages than. However, the above solution does not take into account the cost of adjusting the consensus layer on the existing public chain. This process is extremely risky. Once problems occur, it may lead to systemic vulnerabilities and cause the public chain to lose community consensus. Therefore, if it is a transitional solution during the blockchain expansion process, the simplest temporary storage of the main chain may be more suitable. Finally, the above discussion is based on performance during actual operation. However, if the goal of a certain public chain is to develop its ecology and attract more project parties and participants, it may also prefer projects that are supported and funded by its foundation… For example, when the overall performance is equivalent to or even slightly lower than that of public chain storage solutions, the Ethereum community will also tend to Layer 2 projects supported by the Ethereum Foundation such as EthStorage to continue to develop the Ethereum ecosystem.
All in all, the functions of today’s blockchain are becoming more and more complex, which also brings greater storage space requirements. When there are enough Layer1 verification nodes, historical data does not need to be backed up by all nodes in the entire network. Only when the number of backups reaches a certain value can relative security be guaranteed.. at the same time, The division of labor in public chains has also become more and more detailed., Layer 1 is responsible for consensus and execution, Rollup is responsible for calculation and verification, and a separate blockchain is used for data storage. Each part can focus on a certain function without being limited by the performance of other parts. However, how much specific amount of storage or what proportion of nodes should be allowed to store historical data can achieve a balance between security and efficiency, and how to ensure secure interoperability between different blockchains, this is an issue that requires blockchain developers to think about and continuously improve. Investors, yet pay attention to the main chain-specific DA project on Ethereum, because Ethereum already has enough supporters at this stage and does not need to rely on other communities to expand its influence. What is more needed is to improve and develop your community and attract more projects to the Ethereum ecosystem. However, for public chains in the catch-up position, such as Solana and Aptos, the single chain itself does not have such a complete ecology, so it may be more inclined to join forces with other communities to build a huge cross-chain ecology to expand influence. Thus the emerging Layer1, general third-party DA deserves more attention.
Kernel Ventures is a crypto venture capital fund driven by the research and development community with over 70 early-stage investments focused on infrastructure, middleware, dApps, especially ZK, Rollup, DEX, modular blockchains, and onboarding Vertical areas for billions of crypto users in the future, such as account abstraction, data availability, scalability, etc. For the past seven years, we have been committed to supporting the growth of core development communities and university blockchain associations around the world.