Vitalik: How should the Surge stage of the Ethereum protocol develop

Note: This article is the second part of the series of articles "Possible futures for the Ethereum protocol, part 2: The Surge" recently published by Vitalik, the founder of ETH Place. Compiled by Deng Tong of Golden Finance, the following is the full text of the second part:

At the beginning, there were two expansion strategies in the Ethereum roadmap.

One of them is "Sharding (sharding)": each Node only needs to verify and store a small portion of the transactions, instead of verifying and storing all the transactions in the chain. This is also how any other peer-to-peer network (e.g. BitTorrent) works, so of course we can make the blockchain work in the same way.

Another is the 2nd layer protocol: Networks will be built on top of the Ethereum blockchain, enabling them to benefit fully from its security while keeping most of the data and computation off the mainchain. '2nd layer protocol' refers to State Channels in 2015, Plasma in 2017, and Rollups in 2019. Rollups are more powerful than State Channels or Plasma, but they require a significant amount of on-chain data bandwidth.

Fortunately, by 2019, the Sharding research had already addressed the issue of massive verification of "data availability". As a result, the two paths converged, and we obtained a roadmap centered around Rollup, which is still Ethereum's scaling strategy today.

The Surge, 2023 Roadmap Edition.

The Rollup-centric roadmap proposes a simple division of labor: ETH Kiln L1 focuses on becoming a powerful and Decentralization-based foundation, while L2 takes on the task of helping the ecosystem expand. This is a recurring pattern everywhere: the court system (L1) is not for super fast and efficient, but to protect contracts and property rights, while entrepreneurs (L2) need to build a solid foundation on this and take humanity to (metaphorical and literal) Mars.

This year, the roadmap centered around Rollup has achieved significant success: the ETH network's L1 data bandwidth has been greatly increased through EIP-4844 blob, and multiple EVM Rollups are now in the first phase. The highly heterogeneous and diversified implementation of Sharding, where each L2 acts as a 'shard' with its own internal rules and logic, has now become a reality. But as we have seen, there are some unique challenges along this path. Therefore, our task now is to complete the Rollup-centric roadmap, address these issues, while maintaining the robustness and decentralization that makes ETH network L1 unique.

Surge: Key Objectives

L1+L2 on 100,000+ TPS

Maintain the Decentralization and robustness of L1

At least some L2 fully inherits the core attributes of Ethereum (Trustless, Open, Censorship-resistant)

Maximum interoperability between L2s. Ethereum should feel like an ecosystem, not 34 different blockchains.

The triple dilemma of scalability

Scalability The Impossible Triangle is an idea proposed in 2017 that suggests a tense relationship between three attributes of blockchain: Decentralization (specifically, low cost of running Nodes), scalability (specifically, handling a large number of transactions), and security (specifically, requiring attackers to compromise a significant portion of Nodes in the network to make a single transaction fail).

It is worth noting that the trilemma is not a theorem, and the post introducing the trilemma does not come with a mathematical proof. It provides a heuristic mathematical argument: if a Decentralization-friendly Node (such as a consumer laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then (i) each transaction can only be seen by 1/k of the Nodes, which means that an attacker only needs to disrupt a few Nodes to push through malicious transactions, or (ii) your Node will become powerful and your chain will not be Decentralization. The purpose of this article has never been to show that breaking the trilemma is impossible; on the contrary, it is to show that breaking the trilemma is difficult—it requires thinking outside the implied framework of the argument in some way.

For many years, some high-performance chains have often claimed that they have solved the trilemma without taking any clever measures at the infrastructure level, usually by optimizing Nodes through software engineering techniques. This is always misleading, and running Nodes in such chains is always much more difficult than in the Ethereum network. This article explores many subtleties about why this is the case (and why L1 client software engineering cannot independently scale Ethereum itself).

However, the combination of Data Availability Sampling (DAS) and SNARK does solve the trilemma. It allows clients to verify the availability of a certain amount of data and the correct execution of a certain number of computational steps, while only downloading a small portion of the data and running much less computational work. SNARK is untrusted. Data Availability Sampling has a subtle minority N trust model, but it retains the fundamental property of an inextensible chain, making it resistant to accepting bad blocks even in the event of a 51% attack.

Another way to solve the trilemma is through the Plasma architecture, which cleverly incentivizes users to monitor data availability. As early as 2017-2019, when all we needed for scaling was fraud proof, Plasma had limited security features. However, the mainstream adoption of SNARKs has made the Plasma architecture more suitable for a wider range of use cases.

Further Progress of DAS

What problem are we trying to solve?

As of March 13, 2024, when Dencun is upgraded and launched, the Ethereum blockchain will have approximately 3 'blobs' of about 125 kB every 12 seconds, or about 375 kB of available bandwidth per period. Assuming transaction data is directly published on-chain, the maximum TPS of rollups on the Ethereum blockchain is approximately:

375000 / 12 / 180 = 173.6 TPS

If we add the calldata of Ethereum (theoretical maximum: 30 million gas per slot / 16 gas per byte = 1,875,000 bytes per slot), this will become 607 TPS. For PeerDAS, the plan is to increase the blob count target to 8-16, which will provide us with 463-926 TPS of calldata.

This is a major improvement over ETH L1, but it is not enough. We want more scalability. Our midterm goal is 16 MB per shard, which, when combined with improvements in data compression, will provide us with approximately 58,000 TPS.

What is PeerDAS and how does it work?

PeerDAS is a relatively simple implementation of 'one-dimensional sampling'. Each blob in the Ethereum network is a 4096th-degree polynomial over a 253-bit prime field. We broadcast 'shares' of the polynomial, where each share consists of 16 evaluations at adjacent 16 coordinates obtained from a total of 8192 coordinate sets. Any 4096 of the 8192 evaluations (using the currently proposed parameters: any 64 out of 128 possible samples) can reconstruct the blob.

The working principle of PeerDAS is to let each client listen to a small number of subnets, of which the i-th subnet broadcasts any Blob's i-th sample, and additionally requests the required Blobs on other subnets through querying peers in the global p2p network (who will listen to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism without additional peer-to-peer requests. The current recommendation is for Nodes participating in Proof of Stake to use SubnetDAS, and other Nodes (i.e. 'clients') to use PeerDAS.

In theory, we can extend 1D sampling quite far: if we increase the maximum value of blob counting to 256 (thus, the target is 128), then we will reach the 16 MB target, and data availability sampling only costs 16 samples per Node * 128 blobs * 512 bytes per sample = 1 MB data bandwidth per slot. This is just within our tolerance range: it is feasible, but it means that bandwidth-limited clients cannot sample. We can optimize this by reducing the number of blobs and increasing the blob size, but this will make reconstruction more expensive.

So in the end, we want to go further and perform 2D sampling, which not only involves random sampling within the blob but also random sampling between blobs. The promised linear property of KZG is used to 'expand' the blob set in the Block by encoding the same information redundantly in a new 'virtual blob' list.

2D sampling. Source: a16z

It is crucial that the expansion of computational commitments does not require a blob, so this approach is fundamentally friendly to distributed block construction. The actual construction Node of Block only needs to have a Blob KZG commitment, and can rely on DAS to verify the availability of the Blob. 1D DAS is also very friendly to distributed block construction in nature.

What are the connections with existing research?

Introduction to Data Availability: Original Article (2018):

Subsequent papers:

DAS interpreter post, paradigm:

KZG promises 2D availability:

PeerDAS on ethresear.ch: and paper:

EIP-7594：

SubnetDAS on ethresear.ch:

Recoverable subtle differences in 2D sampling:

What else needs to be done, what needs to be weighed?

The next step is to complete the implementation and launch of PeerDAS. From then on, the continuous increase in blob counts on PeerDAS is an incremental task, while carefully observing the network and improving the software to ensure security. At the same time, we hope to carry out more academic work on the interaction aspects of PeerDAS and other versions of DAS formalization, as well as their security with fork choice rules.

Going forward, we need to do more work to figure out the ideal version of 2D DAS and prove its security features. We also want to eventually migrate from KZG to a quantum-resistant, trust-free alternative. At this time, we don't know of any candidates who are friendly to distributed block builds. Even the expensive "brute" technique of using recursive STARKs to generate validity proofs of reconstructed rows and columns is not enough, because the hash size of the STARK is technically O(log(n) * log(log(n)) (with STIR), and the STARK is actually almost as large as the entire blob.

In the long run, I think the realistic path is:

Ideal 2D DAS tool;
Persist in using 1D DAS, sacrificing sampling bandwidth efficiency for simplicity and robustness and accepting a lower data limit.
(Hard Pivot) Abandon DA and fully embrace Plasma as the primary Layer 2 architecture we follow.

We can view these by balancing the scope.

Please note that even if we decide to scale directly on L1, this option still exists. This is because if L1 has to handle a large number of TPS, L1 blocks will become very large, and customers will need an efficient way to verify if they are correct, so we must use the same technology (ZK-EVM and DAS) that supports Rollup and L1.

How does it interact with other parts of the roadmap?

If data compression (see below) is implemented, the demand for 2D DAS will be reduced, or at least the latency will be reduced. If Plasma is widely used, the demand for 2D DAS will be further reduced. DAS also poses challenges to the construction protocols and mechanisms of distributed blocks: although DAS is theoretically friendly to distributed reconstruction, it needs to be combined with fork selection mechanisms around proposals containing lists in practice.

Data Compression

What problem are we trying to solve?

Each transaction in Rollup will occupy a large amount of on-chain data space: transferring ERC20 tokens requires approximately 180 bytes. Even with ideal data availability sampling, this will still limit the scalability of the second layer protocol. With each slot being 16 MB, we get:

16000000 / 12 / 180 = 7407 TPS

What if we can solve the denominator in addition to the numerator, and make each transaction in Rollup occupy fewer bytes on-chain?

What is it and how does it work?

I think the best explanation is this picture from two years ago:

The simplest gain is zero-byte compression: replacing each long zero-byte sequence with two bytes representing the number of zero bytes. Going further, we utilize the specific attributes of the transaction:

Signature Aggregation - We switch from ECDSA signatures to BLS signatures, which have the property of combining many signatures into a single signature that can prove the validity of all the original signatures. L1 did not consider this because the computational cost of verification (even with aggregation) is higher, but in data-scarce environments like L2, they can be said to make sense. The aggregation feature of ERC-4337 provides a way to achieve this purpose.
Replace Address with Pointer - If Address has been used before, we can replace the 20-byte Address with a 4-byte pointer pointing to the historical location. This is necessary to achieve maximum benefits, although it requires effort to implement, as it needs (at least in part) the history of the blockchain to effectively become part of the state.
Custom Serialization of Transaction Values - Most transaction values are just small numbers, e.g. 0.25 ETH represented as 250,000,000,000,000,000 wei. Gas max-basefees and priority fees work similarly. Therefore, we can use custom decimal floating point formats or even dictionaries for commonly occurring values to represent most currency values very compactly.

What are the connections with existing research?

Exploration from sequence.xyz:

For L2 optimized contract calldata, from ScopeLift:

Another strategy - Rollup based on validity proof (also known as ZKRollup) publishes state differences instead of transactions: the -l2- data footprint

BLS Wallet - Achieving BLS aggregation through ERC-4337:

What else needs to be done, what needs to be weighed?

The main remaining task is to implement the above plan. The main consideration is:

Switching to BLS signature requires tremendous effort and may drop compatibility with trusted hardware chips that can improve security. ZK-SNARK wrappers using other signature schemes can be used as a substitute.
Dynamic compression (such as replacing Address with pointers) makes client code more complex.
Publishing state differences to the chain instead of transactions will drop auditability and render many software (such as blockchain explorer) unable to function.

How does it interact with other parts of the roadmap?

The adoption of ERC-4337, as well as the eventual incorporation of some of its content into L2 EVM, can greatly accelerate the deployment of aggregation technology. Incorporating part of ERC-4337 into L1 can speed up its deployment on L2.

Generalized Plasma

What problem are we trying to solve?

Even with a 16 MB blob and data compression, 58,000 TPS may not be enough to fully take over consumer payments, social Decentralization, or other high-bandwidth areas. When we start considering privacy, the situation becomes even more so, which may decrease scalability by 3-8x. For high-capacity, low-value applications, a current option is validium, which puts data off-chain and has an interesting security model where operators cannot steal user funds, but they can disappear and temporarily or permanently freeze all user funds. But we can do better.

What is it and how does it work?

Plasma is an extension solution that involves operators publishing Blocks off-chain and putting the Merkle roots of these Blocks on-chain (unlike Rollups, where the entire Block is put on-chain). For each Block, the operator sends a Merkle branch to each user, proving what did or did not happen to their assets. Users can withdraw their assets by providing the Merkle branch. Importantly, this branch does not have to be rooted in the latest state - so even if data availability fails, users can recover their assets by withdrawing the latest available state. If a user submits an invalid branch (e.g., withdrawing assets they have already sent to someone else, or operators creating assets out of thin air), an on-chain challenge mechanism can determine who the assets truly belong to.

Plasma Cash chain map. The transaction spending coin i is placed in the i-th position of the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns coin 1, David owns coin 4, and George owns coin 6.

The early version of Plasma could only handle payment use cases and could not be effectively further extended. However, if we require each root to be verified using SNARK, then Plasma will become more powerful. Each challenge game can be greatly simplified because we eliminate most of the possible paths for operator cheating. New paths are also opened up, allowing Plasma technology to expand to a wider range of asset categories. Finally, in the case of operators not cheating, users can withdraw funds immediately without waiting for a one-week challenge period.

One way to create an EVM Plasma chain (not the only way): Use ZK-SNARK to build a parallel UTXO tree that reflects the balance changes made by EVM, and define the unique mapping differences of "same coin" in history. Then the Plasma structure can be built on this basis.

An important insight is that the Plasma system does not need to be perfect. Even if you can only protect a portion of the assets (e.g., even if it's just tokens that haven't moved in the past week), you have already greatly improved the current state of the highly scalable EVM, which is a validation.

Another type of structure is the hybrid Plasma/rollups structure, such as Intmax. These structures place a very small amount of data for each user on-chain (e.g., 5 bytes). By doing so, it is possible to achieve properties that fall between Plasma and Rollup: in the case of Intmax, you can achieve a very high level of scalability and privacy, even in a 16 MB world. The theoretical capacity limit is approximately 16,000,000 / 12 / 5 = 266,667 TPS.

What are the connections with existing research?

Original Plasma paper:

Plasma Cash:

Plasma cash flow:

Intmax（2023）:

What else needs to be done, what needs to be weighed?

The remaining main task is to put the Plasma system into production. As mentioned above, 'plasma vs validium' is not a binary opposition: any validium can at least improve a little security performance by adding Plasma functionality to the exit mechanism. The research part is to obtain the best attributes of EVM (in terms of trust requirements, worst-case L1 gas costs, and DoS vulnerability) and alternative application-specific structures. In addition, the concept complexity of Plasma is greater than that of rollups, and it needs to be directly addressed by researching and building better universal frameworks.

The main drawback of using Plasma designs is that they rely more on operators and are more difficult to "base", although the mixed Plasma/rollup design can often avoid this weakness.

How does it interact with other parts of the roadmap?

The more effective the Plasma solution is, the less pressure there is on L1, which has high-performance data availability capabilities. Moving activities to L2 can also reduce MEV pressure on L1.

Mature L2 Proof System

What problem are we trying to solve?

Currently, most Rollups are not actually Trustless; there is a security council capable of overturning the behavior of (optimistic or validity) proof systems. In some cases, the proof system does not even exist, or even if it does, it only has a "consultative" function. The most advanced are (i) Rollups specific to certain applications, such as FUEL, which are Trustless, and (ii) as of the time of writing this article, Optimism and Arbitrum, both complete EVM Rollups, have achieved partial Trustless milestones called "Phase One". The reason Rollups have not further developed is the concern about bugs in the code. We need Trustless Rollups, so we need to address this issue head-on.

What is it and how does it work?

First, let's review the "stage" system introduced in the original article. There are more detailed requirements, but the summary is as follows:

Phase 0: Users must be able to run Nodes and synchronize the chain. If the validation is fully trusted/centralized, it will be sufficient.
Phase One: There must be a (Trustless) proof system to ensure only valid transactions are accepted. A security council that can overturn the proof system is allowed, but with a voting threshold of only 75%. In addition, a statutory number of council members (i.e. over 26%) must be outside of the major companies building the Rollup. We allow the use of a weaker upgrade mechanism (e.g. DAO), but with sufficiently long latency so that users can withdraw funds before a malicious upgrade is approved and goes live.
Phase 2: There must be an (untrusted) proof system to ensure that only valid transactions are accepted. The Security Council is only allowed to intervene if there are provable errors in the code, such as when two redundant proof systems are inconsistent with each other, or if one proof system accepts two different subsequent state roots for the same block (or does not accept any content for a long enough time, such as a week). Upgrade mechanisms are allowed, but there must be a long latency.

Our goal is to reach the second stage. The main challenge of reaching the second stage is to gain enough confidence, and to prove that the system is actually trustworthy. There are two main ways to do this:

Formal Verification: We can use modern mathematics and computing technology to prove (optimistically or effectively) that the system only accepts Blocks that comply with the EVM specification. These technologies have been around for decades, but recent advances (e.g., Lean 4) have made them more practical, and further progress in AI-assisted proof may accelerate this trend.
Multisigners: Create a multisignature system and deposit funds into these signature systems and a security council (and/or other small tools with trust assumptions, such as TEE) between 2-of-3 (or more) multi-signature. If the signature system agrees, the Security Council has no power. If they disagree, the Security Council can only choose one of them, rather than unilaterally impose its own answer.

The programmatic diagram of multiple verifiers combines an optimistic proof system, a validity proof system, and a security committee.

What are the connections with existing research?

EVM K Semantics (formal verification work started in 2017):

Presentation on the Proof of Stake Concept (2022):

Taiko plans to use multi-proof:

What else needs to be done, what needs to be weighed?

For formal verification, there are many. We need to create a formal verification version of the entire SNARK prover for the EVM. This is an extremely complex project, although we have already started. There is a trick that can significantly simplify the task: we can create a formal verification SNARK prover for the minimal Virtual Machine, for example, RISC-V or Cairo, and then implement the EVM in this minimal VM (and formally prove its equivalence to some other EVM specifications).

For multisignature, there are two main remaining parts. First, we need to have enough confidence in at least two different proof systems, each of which is fairly secure, and if they fail, they will fail for different and unrelated reasons (so they will not fail at the same time). Second, we need a very high level of assurance in the underlying logic of merging proof systems. This is a small piece of code. There are many ways to make it very small - just store funds in a secure Multi-signature contract whose signatories are contracts representing individual proof systems - but this comes at the cost of high on-chain gas costs. We need to find a balance between efficiency and security.

How does it interact with other parts of the roadmap?

Moving the activity to L2 can reduce the MEV pressure on L1.

Cross-L2 Interoperability Improvement

What problem are we trying to solve?

One of the major challenges in the L2 ecosystem today is the difficulty for users to operate. In addition, the simplest methods often reintroduce trust assumptions: centralized bridges, RPC clients, and so on. If we take the idea of L2 being a part of the ETHereum seriously, we need to make using the L2 ecosystem feel like using a unified ETHereum ecosystem.

A pathological and terrible example (even dangerous: I personally lost $100 due to a wrong chain selection here) of cross L2 UX - although this is not Polymarket's fault, cross L2 interoperability should be the responsibility of Wallet and Ethereum's standard (ERC) community. In a well-functioning Ethereum ecosystem, sending tokens from L1 to L2 or from one L2 to another should be as simple as sending tokens within the same L1.

What is it and how does it work?

There are many categories of improvements for cross L2 interoperability. Generally, the approach to addressing these issues is to note that in theory, Rollup-centric ETHereuM with L1 Sharding is the same, and then ask in which aspects the current ETHereuM L2 version differs from the ideal in practice. Here are some:

Chain-specific Address: The chain (L1, Optimism, Arbitrum, etc.) should be part of the Address. Once implemented, simply put the Address in the "Send" field to initiate the cross-L2 sending process, at which point the Wallet can figure out how to send in the background (including using bridge protocol).
Chain-specific Payment Requests: Crafting messages in the form of “send me Z on-chain Y type of X Token” should be simple and standardized. This serves two main use cases: (i) payments, whether peer-to-peer or for services from individuals to merchants, and (ii) dapps requesting funds, for example, the Polymarket example above.
Cross-Chain Interaction Exchange and Gas Payment: There should be a standardized open protocol to express Cross-Chain Interaction operations, for example, "I send 1 ETH on Optimism to someone who sends 0.9999 ETH on Arbitrum," and "I send 0.0001 ETH on Optimism to anyone on Arbitrum including this transaction." ERC-7683 is an attempt for the former, while RIP-7755 is an attempt for the latter, although both are more general than these specific use cases.
light client: Users should be able to actually verify the chain they are interacting with, rather than just trusting the RPC provider. Helios from A16z Cryptocurrency has achieved this for Ethereum itself, but we need to extend this trustlessness to L2. ERC-3668 (CCIP-read) is one strategy to achieve this purpose.

How does the light client update its view of the ETH chain's headers? Once you have the headers, you can use Merkle proofs to verify any state object. Once you have the correct L1 state object, you can use Merkle proofs (and possibly signatures if you want to check pre-commitments) to verify any state object on L2. Helios has achieved the former. Extending to the latter is a standardization challenge.

**Secret Key库Wallet：**Today, if you want to update the Secret Key that controls the Smart ContractWallet, you must perform this operation on all N on-chain where the Wallet is located. The Secret Key库Wallet is a technology that allows the Secret Key to exist in one place (whether on L1 or possibly on L2 later), and then be read from any L2 with a copy of the Wallet. This means that the update only needs to occur once. For efficiency, the Secret Key库Wallet requires L2 to have a standardized way to read L1 at no cost; two suggestions for this are L1SLOAD and REMOTESTATICCALL.

How does the Secret Key library Wallet work in a programmatic chart?

More radical "shared Token bridge" idea: Imagine a world where all L2s are validity proof Rollups, and each slot is dedicated to an Ethereum token. Even in this world, transferring assets "locally" from one L2 to another would require withdrawal and deposit, which incurs significant L1 Gas fees. One solution to this problem is to create a shared minimal Rollup whose sole function is to maintain how many types of tokens each L2 owns and allow these balances to be collectively updated through a series of cross-L2 transactions initiated by any L2. This will allow for cross-L2 transfers to occur without paying L1 Gas fees for each transfer and without relying on Liquidity Provider-based techniques (such as ERC-7683).
Synchronized Composability: Allows synchronous calls between specific L2 and L1 or multiple L2s. This can help improve the financial efficiency of the defi protocol. The former can be done without any cross-L2 coordination; the latter requires shared sequencing. Based on Rollup, it is friendly to all these technologies.

What are the connections with existing research?

Chain specific Address: ERC-3770:

ERC-7683:

RIP-7755:

Rolling keychain Wallet design:

Helios:

ERC-3668 (sometimes referred to as CCIP-read):

Justin Drake's proposal of 'pre-commitment (based) on sharing'.

L1SLOAD (RIP-7728):

Optimistic Remote Invocation:

AggLayer, including the idea of a shared token bridge:

What else needs to be done, what needs to be weighed?

Many of the examples above face the dilemma of when to standardize and which layers to standardize. If standardization is done too early, there may be risks of poor solutions. If standardization is done too late, unnecessary fragmentation may occur. In some cases, there are short-term solutions that are less performant but easier to implement, as well as long-term solutions that are 'ultimately correct' but take a considerable amount of time to implement.

The uniqueness of this section is that these tasks are not just technical issues: they are also (perhaps mainly!) social issues. They require cooperation between L2 and Wallet as well as L1. Our ability to successfully address this issue is a test of our ability as a community to come together.

How does it interact with other parts of the roadmap?

Most of these proposals are 'higher-level' structures, so they will not have much impact on L1 considerations. An exception is shared sorting, which has a significant impact on MEV.

Execution on L1 expansion

What problem are we trying to solve?

If L2 becomes highly scalable and successful, but L1 can still only handle very few transactions, then there may be many risks for the Ethereum network.

The economic situation of ETH assets has become more dangerous, which in turn affects the long-term security of the network.
Many L2 benefit from close ties to highly developed financial ecosystems on L1, and if this ecosystem is significantly weakened, the incentive to become L2 (rather than independent L1) will diminish.
It takes a long time for L2 to have the same security guarantees as L1.
If L2 fails (e.g., due to malicious operation or a disappearing operator), users still need to recover their assets through L1. Therefore, L1 needs to be powerful enough to occasionally truly handle the highly complex and chaotic terminations of L2.

For these reasons, it is valuable to continue to expand L1 itself and ensure that it can continue to adapt to an increasing number of use cases.

What is it and how does it work?

The simplest way to scale is to simply increase the Gas limit. However, this introduces centralization risks to L1, undermining another key property that makes Ethereum's L1 so powerful: its credibility as a strong base layer. There has been ongoing debate about how sustainable it is to simply increase the Gas limit, and this will also evolve based on the implementation of other technologies to make larger Blocks easier to validate (e.g., historical expiry, statelessness, L1 EVM validity proof). Another important thing that needs constant improvement is the efficiency of Ethereum client software, which is more optimized today than it was five years ago. A viable L1 Gas limit increase strategy will involve accelerating these validation technologies.

Another extension strategy involves identifying specific functionalities and computation types that can be made cheaper without compromising network decentralization or its security attributes. Examples in this regard include:

EOF - A new EVM bytecode format that is more friendly to static analysis and can achieve faster implementation. Considering these efficiencies, lower gas costs can be given to EOF bytecode.
Multidimensional Gas Pricing - Establishing separate base fees and limits for computation, data, and storage can increase the average capacity of Ethereum L1 without increasing its maximum capacity (thus creating new security risks).
drop specific opcodes and precompiled gas costs - Historically, we have increased gas costs for certain underpriced operations to avoid a denial-of-service attack. We have done less of that, but we can do more by dropping gas costs for operations that are priced too high. For example, addition is much cheaper than multiplication, but the gas costs for the ADD and MUL opcodes are currently the same. We can make ADD cheaper and even cheaper for simpler opcodes (e.g., PUSH). EOF overall has more to compare.
EVM-MAX and SIMD: EVM-MAX (Modular Arithmetic Extensions) is a proposal that allows for more efficient native large number modular arithmetic as a separate module for EVM. The values computed by EVM-MAX can only be accessed by other EVM-MAX opcodes unless intentionally exported; this allows for larger space to store these values in an optimized format. SIMD (Single Instruction Multiple Data) is a proposal that allows for efficient execution of the same instruction on arrays of values. Together, they can create a powerful co-processor with EVM that can be used to implement encryption operations more efficiently. This is particularly useful for privacy protocols and L2 proof systems, thus it will contribute to L1 and L2 scalability.

These improvements will be discussed in more detail in future articles about Splurge.

Finally, the third strategy is native Rollup (or 'built-in Rollup, enshrined rollups'): essentially, creating multiple copies of the EVM that run in parallel, forming a model that is equivalent to what Rollup can provide, but more natively integrated into the protocol.

What are the connections with existing research?

Polynya's ETH L1 Scaling Roadmap:

Multi-dimensional Gas Pricing:

EIP-7706:

EOF:

EVM-MAX:

SIMD:

Native Rollup:

Interview Max Resnick about the value of expanding L1:

Justin Drake on using SNARK and native Rollup for scalability:

What else needs to be done, what needs to be weighed?

There are three strategies for L1 scaling, which can be executed separately or in parallel:

Improving technologies such as client-side code, stateless clients, and history expiration make L1 easier to verify and then increase the gas limit.
Increase the average capacity of drop-specific operations without increasing the worst-case risk.
Native rollup (i.e. "create N parallel copies of the EVM", although it may provide developers with a lot of flexibility in terms of the parameters of the deployment replicas)

It is worth understanding that these are different technologies with different trade-offs. For example, native Rollups share many of the same weaknesses as regular Rollups in terms of composability: you cannot send a single transaction to synchronize operations across multiple transactions, as you can in dealing with contracts on the same L1 (or L2). Increasing the gas limit would deprive other benefits that can be achieved by making L1 easier to verify, such as increasing the percentage of users running verification nodes and increasing individual stakers. Making certain operations cheaper in the EVM (depending on how they are done specifically) could increase the overall complexity of the EVM.

A key question that any L1 scalability roadmap needs to address is: what is the ultimate vision for L1 and L2? Clearly, it is absurd to have everything happening on L1: potential use cases involve hundreds of thousands of transactions per second, which would render L1 completely unverifiable (unless we adopt the native Rollup approach). However, we do need some guiding principles so that we don't end up in a situation where we raise the Gas limit tenfold, severely compromising the Decentralization of Ethereum L1, only to find that we have just entered a world where 99% of activity is on L2 and 90% of activity is on L2, resulting in a situation that looks almost the same, except for the irreversible loss of much of Ethereum L1's uniqueness.

A proposal view on the "division of labor" between L1 and L2

How does it interact with other parts of the roadmap?

Allowing more users to enter L1 means improving not only the scale, but also other aspects of L1. This means that more MEV will be retained on L1 (rather than just becoming an issue for L2), making it more urgent to handle it explicitly. It greatly increases the value of fast time slots on L1. It also depends to a large extent on the smooth verification of L1 ("The Verge").

Special thanks to Justin Drake, Francesco, Hsiao-wei Wang, @antonttc, and Georgios Konstantopoulos