🎉 Join Gate.io's 15-day Thanksgiving Posting Challenge and win a Share of $2,000 Rewards!
To Celebrate Thanksgiving! Gate.io is launching a 15-day Posting Challenge! Join Gate Post to win a share of $2,000. There’s also an exclusive merch for Gate Post Ambassadors!
🔎 To join:
Click the form in the
Vitalik: How should the Surge stage of the Ethereum protocol develop
Note: This article is the second part of the series of articles "Possible futures for the Ethereum protocol, part 2: The Surge" recently published by Vitalik, the founder of ETH Place. Compiled by Deng Tong of Golden Finance, the following is the full text of the second part:
At the beginning, there were two expansion strategies in the Ethereum roadmap.
One of them is "Sharding (sharding)": each Node only needs to verify and store a small portion of the transactions, instead of verifying and storing all the transactions in the chain. This is also how any other peer-to-peer network (e.g. BitTorrent) works, so of course we can make the blockchain work in the same way.
Another is the 2nd layer protocol: Networks will be built on top of the Ethereum blockchain, enabling them to benefit fully from its security while keeping most of the data and computation off the mainchain. '2nd layer protocol' refers to State Channels in 2015, Plasma in 2017, and Rollups in 2019. Rollups are more powerful than State Channels or Plasma, but they require a significant amount of on-chain data bandwidth.
Fortunately, by 2019, the Sharding research had already addressed the issue of massive verification of "data availability". As a result, the two paths converged, and we obtained a roadmap centered around Rollup, which is still Ethereum's scaling strategy today.
The Surge, 2023 Roadmap Edition.
The Rollup-centric roadmap proposes a simple division of labor: ETH Kiln L1 focuses on becoming a powerful and Decentralization-based foundation, while L2 takes on the task of helping the ecosystem expand. This is a recurring pattern everywhere: the court system (L1) is not for super fast and efficient, but to protect contracts and property rights, while entrepreneurs (L2) need to build a solid foundation on this and take humanity to (metaphorical and literal) Mars.
This year, the roadmap centered around Rollup has achieved significant success: the ETH network's L1 data bandwidth has been greatly increased through EIP-4844 blob, and multiple EVM Rollups are now in the first phase. The highly heterogeneous and diversified implementation of Sharding, where each L2 acts as a 'shard' with its own internal rules and logic, has now become a reality. But as we have seen, there are some unique challenges along this path. Therefore, our task now is to complete the Rollup-centric roadmap, address these issues, while maintaining the robustness and decentralization that makes ETH network L1 unique.
The triple dilemma of scalability
Scalability The Impossible Triangle is an idea proposed in 2017 that suggests a tense relationship between three attributes of blockchain: Decentralization (specifically, low cost of running Nodes), scalability (specifically, handling a large number of transactions), and security (specifically, requiring attackers to compromise a significant portion of Nodes in the network to make a single transaction fail).
It is worth noting that the trilemma is not a theorem, and the post introducing the trilemma does not come with a mathematical proof. It provides a heuristic mathematical argument: if a Decentralization-friendly Node (such as a consumer laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then (i) each transaction can only be seen by 1/k of the Nodes, which means that an attacker only needs to disrupt a few Nodes to push through malicious transactions, or (ii) your Node will become powerful and your chain will not be Decentralization. The purpose of this article has never been to show that breaking the trilemma is impossible; on the contrary, it is to show that breaking the trilemma is difficult—it requires thinking outside the implied framework of the argument in some way.
For many years, some high-performance chains have often claimed that they have solved the trilemma without taking any clever measures at the infrastructure level, usually by optimizing Nodes through software engineering techniques. This is always misleading, and running Nodes in such chains is always much more difficult than in the Ethereum network. This article explores many subtleties about why this is the case (and why L1 client software engineering cannot independently scale Ethereum itself).
However, the combination of Data Availability Sampling (DAS) and SNARK does solve the trilemma. It allows clients to verify the availability of a certain amount of data and the correct execution of a certain number of computational steps, while only downloading a small portion of the data and running much less computational work. SNARK is untrusted. Data Availability Sampling has a subtle minority N trust model, but it retains the fundamental property of an inextensible chain, making it resistant to accepting bad blocks even in the event of a 51% attack.
Another way to solve the trilemma is through the Plasma architecture, which cleverly incentivizes users to monitor data availability. As early as 2017-2019, when all we needed for scaling was fraud proof, Plasma had limited security features. However, the mainstream adoption of SNARKs has made the Plasma architecture more suitable for a wider range of use cases.
Further Progress of DAS
What problem are we trying to solve?
As of March 13, 2024, when Dencun is upgraded and launched, the Ethereum blockchain will have approximately 3 'blobs' of about 125 kB every 12 seconds, or about 375 kB of available bandwidth per period. Assuming transaction data is directly published on-chain, the maximum TPS of rollups on the Ethereum blockchain is approximately:
375000 / 12 / 180 = 173.6 TPS
If we add the calldata of Ethereum (theoretical maximum: 30 million gas per slot / 16 gas per byte = 1,875,000 bytes per slot), this will become 607 TPS. For PeerDAS, the plan is to increase the blob count target to 8-16, which will provide us with 463-926 TPS of calldata.
This is a major improvement over ETH L1, but it is not enough. We want more scalability. Our midterm goal is 16 MB per shard, which, when combined with improvements in data compression, will provide us with approximately 58,000 TPS.
What is PeerDAS and how does it work?
PeerDAS is a relatively simple implementation of 'one-dimensional sampling'. Each blob in the Ethereum network is a 4096th-degree polynomial over a 253-bit prime field. We broadcast 'shares' of the polynomial, where each share consists of 16 evaluations at adjacent 16 coordinates obtained from a total of 8192 coordinate sets. Any 4096 of the 8192 evaluations (using the currently proposed parameters: any 64 out of 128 possible samples) can reconstruct the blob.
The working principle of PeerDAS is to let each client listen to a small number of subnets, of which the i-th subnet broadcasts any Blob's i-th sample, and additionally requests the required Blobs on other subnets through querying peers in the global p2p network (who will listen to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism without additional peer-to-peer requests. The current recommendation is for Nodes participating in Proof of Stake to use SubnetDAS, and other Nodes (i.e. 'clients') to use PeerDAS.
In theory, we can extend 1D sampling quite far: if we increase the maximum value of blob counting to 256 (thus, the target is 128), then we will reach the 16 MB target, and data availability sampling only costs 16 samples per Node * 128 blobs * 512 bytes per sample = 1 MB data bandwidth per slot. This is just within our tolerance range: it is feasible, but it means that bandwidth-limited clients cannot sample. We can optimize this by reducing the number of blobs and increasing the blob size, but this will make reconstruction more expensive.
So in the end, we want to go further and perform 2D sampling, which not only involves random sampling within the blob but also random sampling between blobs. The promised linear property of KZG is used to 'expand' the blob set in the Block by encoding the same information redundantly in a new 'virtual blob' list.
2D sampling. Source: a16z
It is crucial that the expansion of computational commitments does not require a blob, so this approach is fundamentally friendly to distributed block construction. The actual construction Node of Block only needs to have a Blob KZG commitment, and can rely on DAS to verify the availability of the Blob. 1D DAS is also very friendly to distributed block construction in nature.
What are the connections with existing research?
Introduction to Data Availability: Original Article (2018):
Subsequent papers:
DAS interpreter post, paradigm:
KZG promises 2D availability:
PeerDAS on ethresear.ch: and paper:
EIP-7594:
SubnetDAS on ethresear.ch:
Recoverable subtle differences in 2D sampling:
What else needs to be done, what needs to be weighed?
The next step is to complete the implementation and launch of PeerDAS. From then on, the continuous increase in blob counts on PeerDAS is an incremental task, while carefully observing the network and improving the software to ensure security. At the same time, we hope to carry out more academic work on the interaction aspects of PeerDAS and other versions of DAS formalization, as well as their security with fork choice rules.
Going forward, we need to do more work to figure out the ideal version of 2D DAS and prove its security features. We also want to eventually migrate from KZG to a quantum-resistant, trust-free alternative. At this time, we don't know of any candidates who are friendly to distributed block builds. Even the expensive "brute" technique of using recursive STARKs to generate validity proofs of reconstructed rows and columns is not enough, because the hash size of the STARK is technically O(log(n) * log(log(n)) (with STIR), and the STARK is actually almost as large as the entire blob.
In the long run, I think the realistic path is:
We can view these by balancing the scope.
Please note that even if we decide to scale directly on L1, this option still exists. This is because if L1 has to handle a large number of TPS, L1 blocks will become very large, and customers will need an efficient way to verify if they are correct, so we must use the same technology (ZK-EVM and DAS) that supports Rollup and L1.
How does it interact with other parts of the roadmap?
If data compression (see below) is implemented, the demand for 2D DAS will be reduced, or at least the latency will be reduced. If Plasma is widely used, the demand for 2D DAS will be further reduced. DAS also poses challenges to the construction protocols and mechanisms of distributed blocks: although DAS is theoretically friendly to distributed reconstruction, it needs to be combined with fork selection mechanisms around proposals containing lists in practice.
Data Compression
What problem are we trying to solve?
Each transaction in Rollup will occupy a large amount of on-chain data space: transferring ERC20 tokens requires approximately 180 bytes. Even with ideal data availability sampling, this will still limit the scalability of the second layer protocol. With each slot being 16 MB, we get:
16000000 / 12 / 180 = 7407 TPS
What if we can solve the denominator in addition to the numerator, and make each transaction in Rollup occupy fewer bytes on-chain?
What is it and how does it work?
I think the best explanation is this picture from two years ago:
The simplest gain is zero-byte compression: replacing each long zero-byte sequence with two bytes representing the number of zero bytes. Going further, we utilize the specific attributes of the transaction:
What are the connections with existing research?
Exploration from sequence.xyz:
For L2 optimized contract calldata, from ScopeLift:
Another strategy - Rollup based on validity proof (also known as ZKRollup) publishes state differences instead of transactions: the -l2- data footprint
BLS Wallet - Achieving BLS aggregation through ERC-4337:
What else needs to be done, what needs to be weighed?
The main remaining task is to implement the above plan. The main consideration is:
How does it interact with other parts of the roadmap?
The adoption of ERC-4337, as well as the eventual incorporation of some of its content into L2 EVM, can greatly accelerate the deployment of aggregation technology. Incorporating part of ERC-4337 into L1 can speed up its deployment on L2.
Generalized Plasma
What problem are we trying to solve?
Even with a 16 MB blob and data compression, 58,000 TPS may not be enough to fully take over consumer payments, social Decentralization, or other high-bandwidth areas. When we start considering privacy, the situation becomes even more so, which may decrease scalability by 3-8x. For high-capacity, low-value applications, a current option is validium, which puts data off-chain and has an interesting security model where operators cannot steal user funds, but they can disappear and temporarily or permanently freeze all user funds. But we can do better.
What is it and how does it work?
Plasma is an extension solution that involves operators publishing Blocks off-chain and putting the Merkle roots of these Blocks on-chain (unlike Rollups, where the entire Block is put on-chain). For each Block, the operator sends a Merkle branch to each user, proving what did or did not happen to their assets. Users can withdraw their assets by providing the Merkle branch. Importantly, this branch does not have to be rooted in the latest state - so even if data availability fails, users can recover their assets by withdrawing the latest available state. If a user submits an invalid branch (e.g., withdrawing assets they have already sent to someone else, or operators creating assets out of thin air), an on-chain challenge mechanism can determine who the assets truly belong to.
Plasma Cash chain map. The transaction spending coin i is placed in the i-th position of the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns coin 1, David owns coin 4, and George owns coin 6.
The early version of Plasma could only handle payment use cases and could not be effectively further extended. However, if we require each root to be verified using SNARK, then Plasma will become more powerful. Each challenge game can be greatly simplified because we eliminate most of the possible paths for operator cheating. New paths are also opened up, allowing Plasma technology to expand to a wider range of asset categories. Finally, in the case of operators not cheating, users can withdraw funds immediately without waiting for a one-week challenge period.
One way to create an EVM Plasma chain (not the only way): Use ZK-SNARK to build a parallel UTXO tree that reflects the balance changes made by EVM, and define the unique mapping differences of "same coin" in history. Then the Plasma structure can be built on this basis.
An important insight is that the Plasma system does not need to be perfect. Even if you can only protect a portion of the assets (e.g., even if it's just tokens that haven't moved in the past week), you have already greatly improved the current state of the highly scalable EVM, which is a validation.
Another type of structure is the hybrid Plasma/rollups structure, such as Intmax. These structures place a very small amount of data for each user on-chain (e.g., 5 bytes). By doing so, it is possible to achieve properties that fall between Plasma and Rollup: in the case of Intmax, you can achieve a very high level of scalability and privacy, even in a 16 MB world. The theoretical capacity limit is approximately 16,000,000 / 12 / 5 = 266,667 TPS.
What are the connections with existing research?
Original Plasma paper:
Plasma Cash:
Plasma cash flow:
Intmax(2023):
What else needs to be done, what needs to be weighed?
The remaining main task is to put the Plasma system into production. As mentioned above, 'plasma vs validium' is not a binary opposition: any validium can at least improve a little security performance by adding Plasma functionality to the exit mechanism. The research part is to obtain the best attributes of EVM (in terms of trust requirements, worst-case L1 gas costs, and DoS vulnerability) and alternative application-specific structures. In addition, the concept complexity of Plasma is greater than that of rollups, and it needs to be directly addressed by researching and building better universal frameworks.
The main drawback of using Plasma designs is that they rely more on operators and are more difficult to "base", although the mixed Plasma/rollup design can often avoid this weakness.
How does it interact with other parts of the roadmap?
The more effective the Plasma solution is, the less pressure there is on L1, which has high-performance data availability capabilities. Moving activities to L2 can also reduce MEV pressure on L1.
Mature L2 Proof System
What problem are we trying to solve?
Currently, most Rollups are not actually Trustless; there is a security council capable of overturning the behavior of (optimistic or validity) proof systems. In some cases, the proof system does not even exist, or even if it does, it only has a "consultative" function. The most advanced are (i) Rollups specific to certain applications, such as FUEL, which are Trustless, and (ii) as of the time of writing this article, Optimism and Arbitrum, both complete EVM Rollups, have achieved partial Trustless milestones called "Phase One". The reason Rollups have not further developed is the concern about bugs in the code. We need Trustless Rollups, so we need to address this issue head-on.
What is it and how does it work?
First, let's review the "stage" system introduced in the original article. There are more detailed requirements, but the summary is as follows:
Our goal is to reach the second stage. The main challenge of reaching the second stage is to gain enough confidence, and to prove that the system is actually trustworthy. There are two main ways to do this:
The programmatic diagram of multiple verifiers combines an optimistic proof system, a validity proof system, and a security committee.
What are the connections with existing research?
EVM K Semantics (formal verification work started in 2017):
Presentation on the Proof of Stake Concept (2022):
Taiko plans to use multi-proof:
What else needs to be done, what needs to be weighed?
For formal verification, there are many. We need to create a formal verification version of the entire SNARK prover for the EVM. This is an extremely complex project, although we have already started. There is a trick that can significantly simplify the task: we can create a formal verification SNARK prover for the minimal Virtual Machine, for example, RISC-V or Cairo, and then implement the EVM in this minimal VM (and formally prove its equivalence to some other EVM specifications).
For multisignature, there are two main remaining parts. First, we need to have enough confidence in at least two different proof systems, each of which is fairly secure, and if they fail, they will fail for different and unrelated reasons (so they will not fail at the same time). Second, we need a very high level of assurance in the underlying logic of merging proof systems. This is a small piece of code. There are many ways to make it very small - just store funds in a secure Multi-signature contract whose signatories are contracts representing individual proof systems - but this comes at the cost of high on-chain gas costs. We need to find a balance between efficiency and security.
How does it interact with other parts of the roadmap?
Moving the activity to L2 can reduce the MEV pressure on L1.
Cross-L2 Interoperability Improvement
What problem are we trying to solve?
One of the major challenges in the L2 ecosystem today is the difficulty for users to operate. In addition, the simplest methods often reintroduce trust assumptions: centralized bridges, RPC clients, and so on. If we take the idea of L2 being a part of the ETHereum seriously, we need to make using the L2 ecosystem feel like using a unified ETHereum ecosystem.
A pathological and terrible example (even dangerous: I personally lost $100 due to a wrong chain selection here) of cross L2 UX - although this is not Polymarket's fault, cross L2 interoperability should be the responsibility of Wallet and Ethereum's standard (ERC) community. In a well-functioning Ethereum ecosystem, sending tokens from L1 to L2 or from one L2 to another should be as simple as sending tokens within the same L1.
What is it and how does it work?
There are many categories of improvements for cross L2 interoperability. Generally, the approach to addressing these issues is to note that in theory, Rollup-centric ETHereuM with L1 Sharding is the same, and then ask in which aspects the current ETHereuM L2 version differs from the ideal in practice. Here are some:
How does the light client update its view of the ETH chain's headers? Once you have the headers, you can use Merkle proofs to verify any state object. Once you have the correct L1 state object, you can use Merkle proofs (and possibly signatures if you want to check pre-commitments) to verify any state object on L2. Helios has achieved the former. Extending to the latter is a standardization challenge.
How does the Secret Key library Wallet work in a programmatic chart?
What are the connections with existing research?
Chain specific Address: ERC-3770:
ERC-7683:
RIP-7755:
Rolling keychain Wallet design:
Helios:
ERC-3668 (sometimes referred to as CCIP-read):
Justin Drake's proposal of 'pre-commitment (based) on sharing'.
L1SLOAD (RIP-7728):
Optimistic Remote Invocation:
AggLayer, including the idea of a shared token bridge:
What else needs to be done, what needs to be weighed?
Many of the examples above face the dilemma of when to standardize and which layers to standardize. If standardization is done too early, there may be risks of poor solutions. If standardization is done too late, unnecessary fragmentation may occur. In some cases, there are short-term solutions that are less performant but easier to implement, as well as long-term solutions that are 'ultimately correct' but take a considerable amount of time to implement.
The uniqueness of this section is that these tasks are not just technical issues: they are also (perhaps mainly!) social issues. They require cooperation between L2 and Wallet as well as L1. Our ability to successfully address this issue is a test of our ability as a community to come together.
How does it interact with other parts of the roadmap?
Most of these proposals are 'higher-level' structures, so they will not have much impact on L1 considerations. An exception is shared sorting, which has a significant impact on MEV.
Execution on L1 expansion
What problem are we trying to solve?
If L2 becomes highly scalable and successful, but L1 can still only handle very few transactions, then there may be many risks for the Ethereum network.
For these reasons, it is valuable to continue to expand L1 itself and ensure that it can continue to adapt to an increasing number of use cases.
What is it and how does it work?
The simplest way to scale is to simply increase the Gas limit. However, this introduces centralization risks to L1, undermining another key property that makes Ethereum's L1 so powerful: its credibility as a strong base layer. There has been ongoing debate about how sustainable it is to simply increase the Gas limit, and this will also evolve based on the implementation of other technologies to make larger Blocks easier to validate (e.g., historical expiry, statelessness, L1 EVM validity proof). Another important thing that needs constant improvement is the efficiency of Ethereum client software, which is more optimized today than it was five years ago. A viable L1 Gas limit increase strategy will involve accelerating these validation technologies.
Another extension strategy involves identifying specific functionalities and computation types that can be made cheaper without compromising network decentralization or its security attributes. Examples in this regard include:
These improvements will be discussed in more detail in future articles about Splurge.
Finally, the third strategy is native Rollup (or 'built-in Rollup, enshrined rollups'): essentially, creating multiple copies of the EVM that run in parallel, forming a model that is equivalent to what Rollup can provide, but more natively integrated into the protocol.
What are the connections with existing research?
Polynya's ETH L1 Scaling Roadmap:
Multi-dimensional Gas Pricing:
EIP-7706:
EOF:
EVM-MAX:
SIMD:
Native Rollup:
Interview Max Resnick about the value of expanding L1:
Justin Drake on using SNARK and native Rollup for scalability:
What else needs to be done, what needs to be weighed?
There are three strategies for L1 scaling, which can be executed separately or in parallel:
It is worth understanding that these are different technologies with different trade-offs. For example, native Rollups share many of the same weaknesses as regular Rollups in terms of composability: you cannot send a single transaction to synchronize operations across multiple transactions, as you can in dealing with contracts on the same L1 (or L2). Increasing the gas limit would deprive other benefits that can be achieved by making L1 easier to verify, such as increasing the percentage of users running verification nodes and increasing individual stakers. Making certain operations cheaper in the EVM (depending on how they are done specifically) could increase the overall complexity of the EVM.
A key question that any L1 scalability roadmap needs to address is: what is the ultimate vision for L1 and L2? Clearly, it is absurd to have everything happening on L1: potential use cases involve hundreds of thousands of transactions per second, which would render L1 completely unverifiable (unless we adopt the native Rollup approach). However, we do need some guiding principles so that we don't end up in a situation where we raise the Gas limit tenfold, severely compromising the Decentralization of Ethereum L1, only to find that we have just entered a world where 99% of activity is on L2 and 90% of activity is on L2, resulting in a situation that looks almost the same, except for the irreversible loss of much of Ethereum L1's uniqueness.
A proposal view on the "division of labor" between L1 and L2
How does it interact with other parts of the roadmap?
Allowing more users to enter L1 means improving not only the scale, but also other aspects of L1. This means that more MEV will be retained on L1 (rather than just becoming an issue for L2), making it more urgent to handle it explicitly. It greatly increases the value of fast time slots on L1. It also depends to a large extent on the smooth verification of L1 ("The Verge").
Special thanks to Justin Drake, Francesco, Hsiao-wei Wang, @antonttc, and Georgios Konstantopoulos