Possible futures of Ethereum, Part 2: The Surge

Advanced10/22/2024, 4:43:33 AM
Ethereum's scaling strategy has evolved from sharding and layer 2 protocols to a rollup-centric approach. The current roadmap proposes a division of labor between L1 and L2: L1 serves as a robust foundation layer, while L2 is responsible for ecosystem expansion. Recent achievements include EIP-4844 blobs increasing L1 data bandwidth, and multiple EVM rollups reaching stage 1. Future goals include achieving 100,000+ TPS, maintaining L1 decentralization, ensuring some L2s inherit Ethereum's core properties, and maximizing interoperability between L2s. Key research areas include data availability sampling, data compression, and cross-L2 interoperability.

At the beginning, Ethereum had two scaling strategies in its roadmap. One (eg. see this early paper from 2015) was “sharding”: instead of verifying and storing all of the transactions in the chain, each node would only need to verify and store a small fraction of the transactions. This is how any other peer-to-peer network (eg. BitTorrent) works too, so surely we could make blockchains work the same way. Another was layer 2 protocols: networks that would sit on top of Ethereum in a way that allow them to fully benefit from its security, while keeping most data and computation off the main chain. “Layer 2 protocols” meant state channels in 2015, Plasma in 2017, and then rollups in 2019. Rollups are more powerful than state channels or Plasma, but they require a large amount of on-chain data bandwidth. Fortunately, by 2019 sharding research had solved the problem of verifying “data availability” at scale. As a result, the two paths converged, and we got the rollup-centric roadmap which continues to be Ethereum’s scaling strategy today.

The Surge, 2023 roadmap edition.

The rollup-centric roadmap proposes a simple division of labor: the Ethereum L1 focuses on being a robust and decentralized base layer, while L2s take on the task of helping the ecosystem scale. This is a pattern that recurs everywhere in society: the court system (L1) is not there to be ultra-fast and efficient, it’s there to protect contracts and property rights, and it’s up to entrepreneurs (L2) to build on top of that sturdy base layer and take humanity to (metaphorical and literal) Mars.

This year, the rollup-centric roadmap has seen important successes: Ethereum L1 data bandwidth has increased greatly with EIP-4844 blobs, and multiple EVM rollups are now at stage 1. A very heterogeneous and pluralistic implementation of sharding, where each L2 acts as a “shard” with its own internal rules and logic, is now reality. But as we have seen, taking this path has some unique challenges of its own. And so now our task is to bring the rollup-centric roadmap to completion, and solve these problems, while preserving the robustness and decentralization that makes the Ethereum L1 special.

The Surge: key goals

  • 100,000+ TPS on L1+L2
  • Preserve decentralization and robustness of L1
  • At least some L2s fully inherit Ethereum’s core properties (trustless, open, censorship resistant)
  • Maximum interoperability between L2s. Ethereum should feel like one ecosystem, not 34 different blockchains.

In this chapter

Aside: the scalability trilemma

The scalability trilemma was an idea introduced in 2017, which argued that there is a tension between three properties of a blockchain: decentralization (more specifically: low cost to run a node), scalability (more specifically: high number of transactions processed), and security (more specifically: an attacker needing to corrupt a large portion of the nodes in the whole network to make even a single transaction fail).

Notably, the trilemma is not a theorem, and the post introducing the trilemma did not come with a mathematical proof. It did give a heuristic mathematical argument: if a decentralization-friendly node (eg. consumer laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then either (i) each transaction is only seen by 1/k of nodes, which implies an attacker only needs to corrupt a few nodes to push a bad transaction through, or (ii) your nodes are going to be beefy and your chain not decentralized. The purpose of the post was never to show that breaking the trilemma is impossible; rather, it was to show that breaking the trilemma is hard - it requires somehow thinking outside of the box that the argument implies.

For many years, it has been common for some high-performance chains to claim that they solve the trilemma without doing anything clever at a fundamental architecture level, typically by using software engineering tricks to optimize the node. This is always misleading, and running a node in such chains always ends up far more difficult than in Ethereum. This post gets into some of the many subtleties why this is the case (and hence, why L1 client software engineering alone cannot scale Ethereum itself).

However, the combination of data availability sampling and SNARKs does solve the trilemma: it allows a client to verify that some quantity of data is available, and some number of steps of computation were carried out correctly, while downloading only a small portion of that data and running a much smaller amount of computation. SNARKs are trustless. Data availability sampling has a nuanced few-of-N trust model, but it preserves the fundamental property that non-scalable chains have, which is that even a 51% attack cannot force bad blocks to get accepted by the network.

Another way to solve the trilemma is Plasma architectures, which use clever techniques to push the responsibility to watch for data availability to the user in an incentive-compatible way. Back in 2017-2019, when all we had to scale computation was fraud proofs, Plasma was very limited in what it could safely do, but the mainstreaming of SNARKs makes Plasma architectures far more viable for a wider array of use cases than before.

Further progress in data availability sampling

What problem are we solving?

As of 2024 March 13, when the Dencun upgrade went live, the Ethereum blockchain has three ~125 kB “blobs” per 12-second slot, or ~375 kB per slot of data availability bandwidth. Assuming transaction data is published onchain directly, an ERC20 transfer is ~180 bytes, and so the maximum TPS of rollups on Ethereum is:

375000 / 12 / 180 = 173.6 TPS

If we add Ethereum’s calldata (theoretical max: 30 million gas per slot / 16 gas per byte = 1,875,000 bytes per slot), this becomes 607 TPS. With PeerDAS, the plan is to increase the blob count target to 8-16, which would give us 463-926 TPS in calldata.

This is a major increase over the Ethereum L1, but it is not enough. We want much more scalability. Our medium-term target is 16 MB per slot, which if combined with improvements in rollup data compression would give us ~58,000 TPS.

What is it and how does it work?

PeerDAS is a relatively simple implementation of “1D sampling”. Each blob in Ethereum is a degree-4096 polynomial over a 253-bit prime field. We broadcast “shares” of the polynomial, where each share consists of 16 evaluations at an adjacent 16 coordinates taken from a total set of 8192 coordinates. Any 4096 of the 8192 evaluations (with current proposed parameters: any 64 of the 128 possible samples) can recover the blob.

PeerDAS works by having each client listen on a small number of subnets, where the i’th subnet broadcasts the i’th sample of any blob, and additionally asks for blobs on other subnets that it needs by asking its peers in the global p2p network (who would be listening to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism, without the additional layer of asking peers. A current proposal is for nodes participating in proof of stake to use SubnetDAS, and for other nodes (ie. “clients”) to use PeerDAS.

Theoretically, we can scale 1D sampling pretty far: if we increase the blob count maximum to 256 (so, the target to 128), then we would get to our 16 MB target while data availability sampling would only cost each node 16 samples 128 blobs 512 bytes per sample per blob = 1 MB of data bandwidth per slot. This is just barely within our reach of tolerance: it’s doable, but it would mean bandwidth-constrained clients cannot sample. We could optimize this somewhat by decreasing blob count and increasing blob size, but this would make reconstruction more expensive.

And so ultimately we want to go further, and do 2D sampling, which works by random sampling not just within blobs, but also between blobs. The linear properties of KZG commitments are used to “extend” the set of blobs in a block with a list of new “virtual blobs” that redundantly encode the same information.

2D sampling. Source: a16z crypto

Crucially, computing the extension of the commitments does not require having the blobs, so the scheme is fundamentally friendly to distributed block construction. The node actually constructing the block would only need to have the blob KZG commitments, and can themslves rely on DAS to verify the availability of the blobs. 1D DAS is also inherently friendly to distributed block construction.

What is left to do, and what are the tradeoffs?

The immediate next step is to finish the implementation and rollout of PeerDAS. From there, it’s a progressive grind to keep increasing the blob count on PeerDAS while carefully watching the network and improving the software to ensure safety. At the same time, we want more academic work on formalizing PeerDAS and other versions of DAS and its interactions with issues such as fork choice rule safety.

Further into the future, we need much more work figuring out the ideal version of 2D DAS and proving its safety properties. We also want to eventually migrate away from KZG to a quantum-resistant, trusted-setup-free alternative. Currently, we do not know of candidates that are friendly to distributed block building. Even the expensive “brute force” technique of using recursive STARKs to generate proofs of validity for reconstructing rows and columns does not suffice, because while technically a STARK is O(log(n) * log(log(n)) hashes in size (with STIR), in practice a STARK is almost as big as a whole blob.

The realistic paths I see for the long term are:

  • Implement ideal 2D DAS
  • Stick with 1D DAS, sacrificing sampling bandwidth efficiency and accepting a lower data cap for the sake of simplicity and robustness
  • (Hard pivot) abandon DA, and fully embrace Plasma as a primary layer 2 architecture we are focusing on

We can view these along a tradeoff spectrum:

Note that this choice exists even if we decide to scale execution on L1 directly. This is because if L1 is to process lots of TPS, L1 blocks will become very big, and clients will want an efficient way to verify that they are correct, so we would have to use the same technology that powers rollups (ZK-EVM and DAS) at L1.

How does it interact with other parts of the roadmap?

The need for 2D DAS is somewhat lessened, or at least delayed, if data compression (see below) is implemented, and it’s lessened even further if Plasma is widely used. DAS also poses a challenge to distributed block building protocols and mechanisms: while DAS is theoretically friendly to distributed reconstruction, this needs to be combined in practice with inclusion list proposals and their surrounding fork choice mechanics.

Data compression

What problem are we solving?

Each transaction in a rollup takes a significant amount of data space onchain: an ERC20 transfer takes about 180 bytes. Even with ideal data availability sampling, this puts a cap on scalability of layer 2 protocols. With 16 MB per slot, we get:

16000000 / 12 / 180 = 7407 TPS

What if in addition to tackling the numerator, we can also tackle the denominator, and make each transaction in a rollup take fewer bytes onchain?

What is it and how does it work?

The best explanation in my opinion is this diagram from two years ago:

The simplest gains are just zero-byte compression: replacing each long sequence of zero bytes with two bytes representing how many zero bytes there are. To go further, we take advantage of the specific properties of transactions:

  • Signature aggregation - we switch from ECDSA signatures to BLS signatures, which have the property that many signatures can be combined together into a single signature that attests for the validity of all of the original signatures. This is not considered for L1 because the computational costs of verification, even with aggregation, are higher, but in a data-scarce environment like L2s, they arguably make sense. The aggregation feature of ERC-4337 presents one path for implementing this.
  • Replacing addresses with pointers - if an address was used before, we can replace the 20-byte address with a 4-byte pointer to a location in history. This is needed to achieve the biggest gains, though it takes effort to implement, because it requires (at least a portion of) the blockchain’s history to effectively become part of the state.
  • Custom serialization for transaction values - most transaction values have very few digits, eg. 0.25 ETH is represented as 250,000,000,000,000,000 wei. Gas max-basefees and priority fees work similarly. We can thus represent most currency values very compactly with a custom decimal floating point format, or even a dictionary of especially common values.

What is left to do, and what are the tradeoffs?

The main thing left to do is to actually implement the above schemes. The main tradeoffs are:

  • Switching to BLS signatures takes significant effort, and reduces compatibility with trusted hardware chips that can increase security. A ZK-SNARK wrapper around other signature schemes could be used to replace this.
  • Dynamic compression (eg. replacing addresses with pointers) complicates client code.
  • Posting state diffs to chain instead of transactions reduces auditability, and makes a lot of software (eg. block explorers) not work.

How does it interact with other parts of the roadmap?

Adoption of ERC-4337, and eventually the enshrinement of parts of it in L2 EVMs, can greatly hasten the deployment of aggregation techniques. Enshrinement of parts of ERC-4337 on L1 can hasten its deployment on L2s.

Generalized Plasma

What problem are we solving?

Even with 16 MB blobs and data compression, 58,000 TPS is not necessarily enough to fully take over consumer payments, decentralized social or other high-bandwidth sectors, and this becomes especially true if we start taking privacy into account, which could drop scalability by 3-8x. For high-volume, low-value applications, one option today is a validium, which keeps data off-chain and has an interesting security model where the operator cannot steal users’ funds, but they can disappear and temporarily or permanently freeze all users’ funds. But we can do better.

What is it and how does it work?

Plasma is a scaling solution that involves an operator publishing blocks offchain, and putting the Merkle roots of those blocks onchain (as opposed to rollups, where the full block is put onchain). For each block, the operator sends to each user a Merkle branch proving what happened, or did not happen, to that user’s assets. Users can withdraw their assets by providing a Merkle branch. Importantly, this branch does not have to be rooted in the latest state - for this reason, even if data availability fails, the user can still recover their assets by withdrawing the latest state they have that is available. If a user submits an invalid branch (eg. exiting an asset that they already sent to someone else, or the operator themselves creating an asset out of thin air), an onchain challenge mechanism can adjudicate who the asset rightfully belongs to.

A diagram of a Plasma Cash chain. Transactions spending coin i are put into the i’th position in the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns coin 1, David owns coin 4 and George owns coin 6.

Early versions of Plasma were only able to handle the payments use case, and were not able to effectively generalize further. If we require each root to be verified with a SNARK, however, Plasma becomes much more powerful. Each challenge game can be simplified significantly, because we take away most possible paths for the operator to cheat. New paths also open up to allow Plasma techniques to be extended to a much more general class of assets. Finally, in the case where the operator does not cheat, users can withdraw their funds instantly, without needing to wait for a one-week challenge period.

One way (not the only way) to make an EVM plasma chain: use a ZK-SNARK to construct a parallel UTXO tree that reflects the balance changes made by the EVM, and defines a unique mapping of what is “the same coin” at different points in history. A Plasma construction can then be built on top of that.

One key insight is that the Plasma system does not need to be perfect. Even if you can only protect a subset of assets (eg. even just coins that have not moved in the past week), you’ve already greatly improved on the status quo of ultra-scalable EVM, which is a validium.

Another class of constructions is hybrid plasma/rollups, such as Intmax. These constructions put a very small amount of data per user onchain (eg. 5 bytes), and by doing so, get properties that are somewhere between plasma and rollups: in the Intmax case, you get a very high level of scalability and privacy, though even in the 16 MB world capacity is theoretically capped to roughly 16,000,000 / 12 / 5 = 266,667 TPS.

What is left to do, and what are the tradeoffs?

The main remaining task is to bring Plasma systems to production. As mentioned above, “plasma vs validium” is not a binary: any validium can have its safety properties improved at least a little bit by adding Plasma features into the exit mechanism. The research part is in getting optimal properties (in terms of trust requirements, and worst-case L1 gas cost, and vulnerability to DoS) for an EVM, as well as alternative application specific constructions. Additionally, the greater conceptual complexity of Plasma relative to rollups needs to be addressed directly, both through research and through construction of better generalized frameworks.

The main tradeoff in using Plasma designs is that they depend more on operators and are harder to make “based“, though hybrid plasma/rollup designs can often avoid this weakness.

How does it interact with other parts of the roadmap?

The more effective Plasma solutions can be, the less pressure there is for the L1 to have a high-performance data availability functionality. Moving activity to L2 also reduces MEV pressure on L1.

Maturing L2 proof systems

What problem are we solving?

Today, most rollups are not yet actually trustless; there is a security council that has the ability to override the behavior of the (optimistic or validity) proof system. In some cases, the proof system is not even live at all, or if it is it only has an “advisory” functionality. The furthest ahead are (i) a few application-specific rollups, such as Fuel, which are trustless, and (ii) as of the time of this writing, Optimism and Arbitrum, two full-EVM rollups that have achieved a partial-trustlessness milestone known as “stage 1”. The reason why rollups have not gone further is concern about bugs in the code. We need trustless rollups, and so we need to tackle this problem head on.

What is it and how does it work?

First, let us recap the “stage” system, originally introduced in this post. There are more detailed requirements, but the summary is:

  • Stage 0: it must be possible for a user to run a node and sync the chain. It’s ok if validation is fully trusted/centralized.
  • Stage 1: there must be a (trustless) proof system that ensures that only valid transactions get accepted. It’s allowed for there to be a security council that can override the proof system, but only with a 75% threshold vote. Additionally, a quorum-blocking portion of the council (so, 26%+) must be outside the main company building the rollup. An upgrade mechanism with weaker features (eg. a DAO) is allowed, but it must have a delay long enough that if it approves a malicious upgrade, users can exit their funds before it comes online.
  • Stage 2: there must be a (trustless) proof system that ensures that only valid transactions get accepted. Security councils are only allowed to intervene in the event of provable bugs in the code, eg. if two redundant proof systems disagree with each other or if one proof system accepts two different post-state roots for the same block (or accepts nothing for a sufficiently long period of time eg. a week). An upgrade mechanism is allowed, but it must have a very long delay.

The goal is to reach Stage 2. The main challenge in reaching stage 2 is getting enough confidence that the proof system actually is trustworthy enough. There are two major ways to do this:

  • Formal verification: we can use modern mathematical and computational techniques to prove that an (optimistic or validity) proof system only accept blocks that pass the EVM specification. These techniques have existed for decades, but recent advancements such as Lean 4 have made them much more practical, and advancements in AI-assisted proving could potentially accelerate this trend further.
  • Multi-provers: make multiple proof systems, and put funds into a 2-of-3 (or larger) multisig between those proof systems and a security council (and/or other gadget with trust assumptions, eg. TEEs). If the proof systems agree, the security council has no power; if they disagree, the security council can only choose between one of them, it can’t unilaterally impose its own answer.

Stylized diagram of a multi-prover, combining one optimistic proof system, one validity proof system and a security council.

What is left to do, and what are the tradeoffs?

For formal verification, a lot. We need to create a formally verified version of an entire SNARK prover of an EVM. This is an incredibly complex project, though it is one that we have already started. There is one trick that significantly simplifies the task: we can make a formally verified SNARK prover of a minimal VM, eg. RISC-V or Cairo, and then write an implementation of the EVM in that minimal VM (and formally prove its equivalence to some other EVM specification).

For multi-provers, there are two main remaining pieces. First, we need to get enough confidence in at least two different proof systems, both that they are reasonably safe individually and that if they break, they would break for different and unrelated reasons (and so they would not break at the same time). Second, we need to get a very high level of assurance in the underlying logic that merges the proof systems. This is a much smaller piece of code. There are ways to make it extremely small - just store funds in a Safe multisig contract whose signers are contracts representing individual proof systems - but this has the tradeoff of high onchain gas costs. Some balance between efficiency and safety will need to be found.

How does it interact with other parts of the roadmap?

Moving activity to L2 reduces MEV pressure on L1.

Cross-L2 interoperability improvements

What problem are we solving?

One major challenge with the L2 ecosystem today is that it is difficult for users to navigate. Furthermore, the easiest ways of doing so often re-introduce trust assumptions: centralized bridges, RPC clients, and so forth. If we are serious about the idea that L2s are part of Ethereum, we need to make using the L2 ecosystem feel like using a unified Ethereum ecosystem.

An example of pathologically bad (and even dangerous: I personally lost $100 to a chain-selection mistake here) cross-L2 UX - though this is not Polymarket’s fault, cross-L2 interoperability should be the responsibility of wallets and the Ethereum standards (ERC) community. In a well-functioning Ethereum ecosystem, sending coins from L1 to L2, or from one L2 to another, should feel just like sending coins within the same L1.

What is it and how does it work?

There are many categories of cross-L2 interoperability improvements. In general, the way to come up with these is to notice that in theory, a rollup-centric Ethereum is the same thing as L1 execution sharding, and then ask where the current Ethereum L2-verse falls short of that ideal in practice. Here are a few:

  • Chain-specific addresses: the chain (L1, Optimism, Arbitrum…) should be part of the address. Once this is implemented, cross-L2 sending flows can be implemented by just putting the address into the “send” field, at which point the wallet can figure out how to do the send (including using bridging protocols) in the background.
  • Chain-specific payment requests: it should be easy and standardized to make a message of the form “send me X tokens of type Y on chain Z”. This has two primary use cases: (i) payments, whether person-to-person or person-to-merchant-service, and (ii) dapps requesting funds, eg. the Polymarket example above.
  • Cross-chain swaps and gas payment: there should be a standardized open protocol for expressing cross-chain operations such as “I am sending 1 ETH on Optimism to whoever sends me 0.9999 ETH on Arbitrum”, and “I am sending 0.0001 ETH on Optimism to whoever includes this transaction on Arbitrum”. ERC-7683 is one attempt at the former, and RIP-7755 is one attempt at the latter, though both are also more general than just these specific use cases.
  • Light clients: users should be able to actually verify the chains that they are interacting with, and not just trust RPC providers. A16z crypto’s Helios does this for Ethereum itself, but we need to extend this trustlessness to L2s. ERC-3668 (CCIP-read) is one strategy for doing this.

How a light client can update its view of the Ethereum header chain. Once you have the header chain, you can use Merkle proofs to validate any state object. And once you have the right L1 state objects, you can use Merkle proofs (and possibly signatures, if you want to check preconfirmations) to validate any state object on L2. Helios does the former already. Extending to the latter is a standardization challenge.

  • Keystore wallets: today, if you want to update the keys that control your smart contract wallet, you have to do it on all N chains on which that wallet exists. Keystore wallets are a technique that allow the keys to exist in one place (either on L1, or later potentially on an L2), and then be read from any L2 that has a copy of the wallet. This means that updates only need to happen once. To be efficient, keystore wallets require L2s to have a standardized way to costlessly read L1; two proposals for this are L1SLOAD and REMOTESTATICCALL.

A stylized diagram of how keystore wallets work.

  • More radical “shared token bridge” ideas: imagine a world where all L2s are validity proof rollups, that commit to Ethereum every slot. Even in this world, moving assets from one L2 to another L2 “natively” would require withdrawaing and depositing, which requires paying a substantial amount of L1 gas. One way to solve this is to create a shared minimal rollup, whose only function would be to maintain the balances of how many tokens of which type are owned by which L2, and allow those balances to be updated en masse by a series of cross-L2 send operations initiated by any of the L2s. This would allow cross-L2 transfers to happen without needing to pay L1 gas per transfer, and without needing liquidity-provider-based techniques like ERC-7683.
  • Synchronous composability: allow synchronous calls to happen either between a specific L2 and L1, or between multiple L2s. This could be helpful in improving financial efficiency of defi protocols. The former could be done without any cross-L2 coordination; the latter would require shared sequencing. Based rollups are automatically friendly to all of these techniques.

What is left to do, and what are the tradeoffs?

Many of the examples above face standard dilemmas of when to standardize and what layers to standardize. If you standardize too early, you risk entrenching an inferior solution. If you standardize too late, you risk creating needless fragmentation. In some cases, there is both a short-term solution that has weaker properties but is easier to implement, and a long-term solution that is “ultimately right” but will take quite a few years to get there.

One way in which this section is unique, is that these tasks are not just technical problems: they are also (perhaps even primarily!) social problems. They require L2s and wallets and L1 to cooperate. Our ability to handle this problem successfully is a test of our ability to stick together as a community.

How does it interact with other parts of the roadmap?

Most of these proposals are “higher-layer” constructions, and so do not greatly affect L1 considerations. One exception is shared sequencing, which has heavy impacts on MEV.

Scaling execution on L1

What problem are we solving?

If L2s become very scalable and successful but L1 remains capable of processing only a very low volume of transactions, there are many risks to Ethereum that might arise:

  1. The economic situation of ETH the asset becomes more risky, which in turn affects long-run security of the network.
  2. Many L2s benefit from being closely tied to a highly developed financial ecosystem on L1, and if this ecosystem greatly weakens, the incentive to become an L2 (instead of being an independent L1) weakens
  3. It will take a long time before L2s have exactly the same security assurances as L1.
  4. If an L2 fails (eg. due to a malicious or disappearing operator), users would still need to go through L1 in order to recover their assets. Hence, L1 needs to be powerful enough to be able to at least occasionally actually handle a highly complex and chaotic wind-down of an L2.

For these reasons, it is valuable to continue scaling L1 itself, and making sure that it can continue to accommodate a growing number of uses.

What is it and how does it work?

The easiest way to scale is to simply increase the gas limit. However, this risks centralizing the L1, and thus weakening the other important property that makes the Ethereum L1 so powerful: its credibility as a robust base layer. There is an ongoing debate about what degree of simple gas limit increase is sustainable, and this also changes based on which other technologies get implemented to make larger blocks easier to verify (eg. history expiry, statelessness, L1 EVM validity proofs). Another important thing to keep improving is simply the efficiency of Ethereum client software, which is far more optimized today than it was five years ago. An effective L1 gas limit increase strategy would involve accelerating these verification technologies.

Another scaling strategy involves identifying specific features and types of computation that can be made cheaper without harming the decentralization of the network or its security properties. Examples of this include:

  • EOF - a new EVM bytecode format that is more friendly to static analysis, allowing for faster implementations. EOF bytecode could be given lower gas costs to take these efficiencies into account.
  • Multidimensional gas pricing - establishing separate basefees and limits for computation, data and storage can increase the Ethereum L1’s average capacity without increasing its maximum capacity (and hence creating new security risks).
  • Reduce gas costs of specific opcodes and precompiles - historically, we have had several rounds of increasing gas costs for certain operations that were underpriced in order to avoid denial of service attacks. What we have had less of, and could do much more, is reducing gas costs for operations that are overpriced. For example, addition is much cheaper than multiplication, but the costs of the ADD and MUL opcodes are currently the same. We could make ADD cheaper, and even simpler opcodes such as PUSH even cheaper.
  • EVM-MAX and SIMD: EVM-MAX (“modular arithmetic extensions”) is a proposal to allow more efficient native big-number modular math as a separate module of the EVM. Values computed by EVM-MAX computations would only be accessible by other EVM-MAX opcodes, unless deliberately exported; this allows greater room to store these values in optimized formats. SIMD (“single instruction multiple data”) is a proposal to allow efficiently executing the same instruction on an array of values. The two together can create a powerful coprocessor alongside the EVM that could be used to much more efficiently implement cryptographic operations. This would be especially useful for privacy protocols, and for L2 proof systems, so it would help both L1 and L2 scaling.

These improvements will be discussed in more detail in a future post on the Splurge.

Finally, a third strategy is native rollups (or “enshrined rollups”): essentially, creating many copies of the EVM that run in parallel, leading to a model that is equivalent to what rollups can provide, but much more natively integrated into the protocol.

What is left to do, and what are the tradeoffs?

There are three strategies for L1 scaling, which can be pursued individually or in parallel:

  • Improve technology (eg. client code, stateless clients, history expiry) to make the L1 easier to verify, and then raise the gas limit
  • Make specific operations cheaper, increasing average capacity without increasing worst-case risks
  • Native rollups (ie. “create N parallel copies of the EVM”, though potentially giving developers a lot of flexibility in the parameters of the copies they deploy)

It’s worth understanding that these are different techniques that have different tradeoffs. For example, native rollups have many of the same weaknesses in composability as regular rollups: you cannot send a single transaction that synchronously performs operations across many of them, like you can with contracts on the same L1 (or L2). Raising the gas limit takes away from other benefits that can be achieved by making the L1 easier to verify, such as increasing the portion of users that run verifying nodes, and increasing solo stakers. Making specific operations in the EVM cheaper, depending on how it’s done, can increase total EVM complexity.

A big question that any L1 scaling roadmap needs to answer is: what is the ultimate vision for what belongs on L1 and what belongs on L2? Clearly, it’s absurd for everything to go on L1: the potential use cases go into the hundreds of thousands of transactions per second, and that would make the L1 completely unviable to verify (unless we go the native rollup route). But we do need some guiding principle, so that we can make sure that we are not creating a situation where we increase the gas limit 10x, heavily damage the Ethereum L1’s decentralization, and find that we’ve only gotten to a world where instead of 99% of activity being on L2, 90% of activity is on L2, and so the result otherwise looks almost the same, except for an irreversible loss of much of what makes Ethereum L1 special.

One proposed view of a “division of labor” between L1 and L2s, source.

How does it interact with other parts of the roadmap?

Bringing more users onto L1 implies improving not just scale, but also other aspects of L1. It means that more MEV will remain on L1 (as opposed to becoming a problem just for L2s), and so will be even more of a pressing need to handle it explicitly. It greatly increases the value of having fast slot times on L1. And it’s also heavily dependent on verification of L1 (“the Verge”) going well.

Disclaimer:

  1. This article is reprinted from [Vitalik Buterin], All copyrights belong to the original author [Vitalik Buterin]. If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
  2. Liability Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.

Possible futures of Ethereum, Part 2: The Surge

Advanced10/22/2024, 4:43:33 AM
Ethereum's scaling strategy has evolved from sharding and layer 2 protocols to a rollup-centric approach. The current roadmap proposes a division of labor between L1 and L2: L1 serves as a robust foundation layer, while L2 is responsible for ecosystem expansion. Recent achievements include EIP-4844 blobs increasing L1 data bandwidth, and multiple EVM rollups reaching stage 1. Future goals include achieving 100,000+ TPS, maintaining L1 decentralization, ensuring some L2s inherit Ethereum's core properties, and maximizing interoperability between L2s. Key research areas include data availability sampling, data compression, and cross-L2 interoperability.

At the beginning, Ethereum had two scaling strategies in its roadmap. One (eg. see this early paper from 2015) was “sharding”: instead of verifying and storing all of the transactions in the chain, each node would only need to verify and store a small fraction of the transactions. This is how any other peer-to-peer network (eg. BitTorrent) works too, so surely we could make blockchains work the same way. Another was layer 2 protocols: networks that would sit on top of Ethereum in a way that allow them to fully benefit from its security, while keeping most data and computation off the main chain. “Layer 2 protocols” meant state channels in 2015, Plasma in 2017, and then rollups in 2019. Rollups are more powerful than state channels or Plasma, but they require a large amount of on-chain data bandwidth. Fortunately, by 2019 sharding research had solved the problem of verifying “data availability” at scale. As a result, the two paths converged, and we got the rollup-centric roadmap which continues to be Ethereum’s scaling strategy today.

The Surge, 2023 roadmap edition.

The rollup-centric roadmap proposes a simple division of labor: the Ethereum L1 focuses on being a robust and decentralized base layer, while L2s take on the task of helping the ecosystem scale. This is a pattern that recurs everywhere in society: the court system (L1) is not there to be ultra-fast and efficient, it’s there to protect contracts and property rights, and it’s up to entrepreneurs (L2) to build on top of that sturdy base layer and take humanity to (metaphorical and literal) Mars.

This year, the rollup-centric roadmap has seen important successes: Ethereum L1 data bandwidth has increased greatly with EIP-4844 blobs, and multiple EVM rollups are now at stage 1. A very heterogeneous and pluralistic implementation of sharding, where each L2 acts as a “shard” with its own internal rules and logic, is now reality. But as we have seen, taking this path has some unique challenges of its own. And so now our task is to bring the rollup-centric roadmap to completion, and solve these problems, while preserving the robustness and decentralization that makes the Ethereum L1 special.

The Surge: key goals

  • 100,000+ TPS on L1+L2
  • Preserve decentralization and robustness of L1
  • At least some L2s fully inherit Ethereum’s core properties (trustless, open, censorship resistant)
  • Maximum interoperability between L2s. Ethereum should feel like one ecosystem, not 34 different blockchains.

In this chapter

Aside: the scalability trilemma

The scalability trilemma was an idea introduced in 2017, which argued that there is a tension between three properties of a blockchain: decentralization (more specifically: low cost to run a node), scalability (more specifically: high number of transactions processed), and security (more specifically: an attacker needing to corrupt a large portion of the nodes in the whole network to make even a single transaction fail).

Notably, the trilemma is not a theorem, and the post introducing the trilemma did not come with a mathematical proof. It did give a heuristic mathematical argument: if a decentralization-friendly node (eg. consumer laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then either (i) each transaction is only seen by 1/k of nodes, which implies an attacker only needs to corrupt a few nodes to push a bad transaction through, or (ii) your nodes are going to be beefy and your chain not decentralized. The purpose of the post was never to show that breaking the trilemma is impossible; rather, it was to show that breaking the trilemma is hard - it requires somehow thinking outside of the box that the argument implies.

For many years, it has been common for some high-performance chains to claim that they solve the trilemma without doing anything clever at a fundamental architecture level, typically by using software engineering tricks to optimize the node. This is always misleading, and running a node in such chains always ends up far more difficult than in Ethereum. This post gets into some of the many subtleties why this is the case (and hence, why L1 client software engineering alone cannot scale Ethereum itself).

However, the combination of data availability sampling and SNARKs does solve the trilemma: it allows a client to verify that some quantity of data is available, and some number of steps of computation were carried out correctly, while downloading only a small portion of that data and running a much smaller amount of computation. SNARKs are trustless. Data availability sampling has a nuanced few-of-N trust model, but it preserves the fundamental property that non-scalable chains have, which is that even a 51% attack cannot force bad blocks to get accepted by the network.

Another way to solve the trilemma is Plasma architectures, which use clever techniques to push the responsibility to watch for data availability to the user in an incentive-compatible way. Back in 2017-2019, when all we had to scale computation was fraud proofs, Plasma was very limited in what it could safely do, but the mainstreaming of SNARKs makes Plasma architectures far more viable for a wider array of use cases than before.

Further progress in data availability sampling

What problem are we solving?

As of 2024 March 13, when the Dencun upgrade went live, the Ethereum blockchain has three ~125 kB “blobs” per 12-second slot, or ~375 kB per slot of data availability bandwidth. Assuming transaction data is published onchain directly, an ERC20 transfer is ~180 bytes, and so the maximum TPS of rollups on Ethereum is:

375000 / 12 / 180 = 173.6 TPS

If we add Ethereum’s calldata (theoretical max: 30 million gas per slot / 16 gas per byte = 1,875,000 bytes per slot), this becomes 607 TPS. With PeerDAS, the plan is to increase the blob count target to 8-16, which would give us 463-926 TPS in calldata.

This is a major increase over the Ethereum L1, but it is not enough. We want much more scalability. Our medium-term target is 16 MB per slot, which if combined with improvements in rollup data compression would give us ~58,000 TPS.

What is it and how does it work?

PeerDAS is a relatively simple implementation of “1D sampling”. Each blob in Ethereum is a degree-4096 polynomial over a 253-bit prime field. We broadcast “shares” of the polynomial, where each share consists of 16 evaluations at an adjacent 16 coordinates taken from a total set of 8192 coordinates. Any 4096 of the 8192 evaluations (with current proposed parameters: any 64 of the 128 possible samples) can recover the blob.

PeerDAS works by having each client listen on a small number of subnets, where the i’th subnet broadcasts the i’th sample of any blob, and additionally asks for blobs on other subnets that it needs by asking its peers in the global p2p network (who would be listening to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism, without the additional layer of asking peers. A current proposal is for nodes participating in proof of stake to use SubnetDAS, and for other nodes (ie. “clients”) to use PeerDAS.

Theoretically, we can scale 1D sampling pretty far: if we increase the blob count maximum to 256 (so, the target to 128), then we would get to our 16 MB target while data availability sampling would only cost each node 16 samples 128 blobs 512 bytes per sample per blob = 1 MB of data bandwidth per slot. This is just barely within our reach of tolerance: it’s doable, but it would mean bandwidth-constrained clients cannot sample. We could optimize this somewhat by decreasing blob count and increasing blob size, but this would make reconstruction more expensive.

And so ultimately we want to go further, and do 2D sampling, which works by random sampling not just within blobs, but also between blobs. The linear properties of KZG commitments are used to “extend” the set of blobs in a block with a list of new “virtual blobs” that redundantly encode the same information.

2D sampling. Source: a16z crypto

Crucially, computing the extension of the commitments does not require having the blobs, so the scheme is fundamentally friendly to distributed block construction. The node actually constructing the block would only need to have the blob KZG commitments, and can themslves rely on DAS to verify the availability of the blobs. 1D DAS is also inherently friendly to distributed block construction.

What is left to do, and what are the tradeoffs?

The immediate next step is to finish the implementation and rollout of PeerDAS. From there, it’s a progressive grind to keep increasing the blob count on PeerDAS while carefully watching the network and improving the software to ensure safety. At the same time, we want more academic work on formalizing PeerDAS and other versions of DAS and its interactions with issues such as fork choice rule safety.

Further into the future, we need much more work figuring out the ideal version of 2D DAS and proving its safety properties. We also want to eventually migrate away from KZG to a quantum-resistant, trusted-setup-free alternative. Currently, we do not know of candidates that are friendly to distributed block building. Even the expensive “brute force” technique of using recursive STARKs to generate proofs of validity for reconstructing rows and columns does not suffice, because while technically a STARK is O(log(n) * log(log(n)) hashes in size (with STIR), in practice a STARK is almost as big as a whole blob.

The realistic paths I see for the long term are:

  • Implement ideal 2D DAS
  • Stick with 1D DAS, sacrificing sampling bandwidth efficiency and accepting a lower data cap for the sake of simplicity and robustness
  • (Hard pivot) abandon DA, and fully embrace Plasma as a primary layer 2 architecture we are focusing on

We can view these along a tradeoff spectrum:

Note that this choice exists even if we decide to scale execution on L1 directly. This is because if L1 is to process lots of TPS, L1 blocks will become very big, and clients will want an efficient way to verify that they are correct, so we would have to use the same technology that powers rollups (ZK-EVM and DAS) at L1.

How does it interact with other parts of the roadmap?

The need for 2D DAS is somewhat lessened, or at least delayed, if data compression (see below) is implemented, and it’s lessened even further if Plasma is widely used. DAS also poses a challenge to distributed block building protocols and mechanisms: while DAS is theoretically friendly to distributed reconstruction, this needs to be combined in practice with inclusion list proposals and their surrounding fork choice mechanics.

Data compression

What problem are we solving?

Each transaction in a rollup takes a significant amount of data space onchain: an ERC20 transfer takes about 180 bytes. Even with ideal data availability sampling, this puts a cap on scalability of layer 2 protocols. With 16 MB per slot, we get:

16000000 / 12 / 180 = 7407 TPS

What if in addition to tackling the numerator, we can also tackle the denominator, and make each transaction in a rollup take fewer bytes onchain?

What is it and how does it work?

The best explanation in my opinion is this diagram from two years ago:

The simplest gains are just zero-byte compression: replacing each long sequence of zero bytes with two bytes representing how many zero bytes there are. To go further, we take advantage of the specific properties of transactions:

  • Signature aggregation - we switch from ECDSA signatures to BLS signatures, which have the property that many signatures can be combined together into a single signature that attests for the validity of all of the original signatures. This is not considered for L1 because the computational costs of verification, even with aggregation, are higher, but in a data-scarce environment like L2s, they arguably make sense. The aggregation feature of ERC-4337 presents one path for implementing this.
  • Replacing addresses with pointers - if an address was used before, we can replace the 20-byte address with a 4-byte pointer to a location in history. This is needed to achieve the biggest gains, though it takes effort to implement, because it requires (at least a portion of) the blockchain’s history to effectively become part of the state.
  • Custom serialization for transaction values - most transaction values have very few digits, eg. 0.25 ETH is represented as 250,000,000,000,000,000 wei. Gas max-basefees and priority fees work similarly. We can thus represent most currency values very compactly with a custom decimal floating point format, or even a dictionary of especially common values.

What is left to do, and what are the tradeoffs?

The main thing left to do is to actually implement the above schemes. The main tradeoffs are:

  • Switching to BLS signatures takes significant effort, and reduces compatibility with trusted hardware chips that can increase security. A ZK-SNARK wrapper around other signature schemes could be used to replace this.
  • Dynamic compression (eg. replacing addresses with pointers) complicates client code.
  • Posting state diffs to chain instead of transactions reduces auditability, and makes a lot of software (eg. block explorers) not work.

How does it interact with other parts of the roadmap?

Adoption of ERC-4337, and eventually the enshrinement of parts of it in L2 EVMs, can greatly hasten the deployment of aggregation techniques. Enshrinement of parts of ERC-4337 on L1 can hasten its deployment on L2s.

Generalized Plasma

What problem are we solving?

Even with 16 MB blobs and data compression, 58,000 TPS is not necessarily enough to fully take over consumer payments, decentralized social or other high-bandwidth sectors, and this becomes especially true if we start taking privacy into account, which could drop scalability by 3-8x. For high-volume, low-value applications, one option today is a validium, which keeps data off-chain and has an interesting security model where the operator cannot steal users’ funds, but they can disappear and temporarily or permanently freeze all users’ funds. But we can do better.

What is it and how does it work?

Plasma is a scaling solution that involves an operator publishing blocks offchain, and putting the Merkle roots of those blocks onchain (as opposed to rollups, where the full block is put onchain). For each block, the operator sends to each user a Merkle branch proving what happened, or did not happen, to that user’s assets. Users can withdraw their assets by providing a Merkle branch. Importantly, this branch does not have to be rooted in the latest state - for this reason, even if data availability fails, the user can still recover their assets by withdrawing the latest state they have that is available. If a user submits an invalid branch (eg. exiting an asset that they already sent to someone else, or the operator themselves creating an asset out of thin air), an onchain challenge mechanism can adjudicate who the asset rightfully belongs to.

A diagram of a Plasma Cash chain. Transactions spending coin i are put into the i’th position in the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns coin 1, David owns coin 4 and George owns coin 6.

Early versions of Plasma were only able to handle the payments use case, and were not able to effectively generalize further. If we require each root to be verified with a SNARK, however, Plasma becomes much more powerful. Each challenge game can be simplified significantly, because we take away most possible paths for the operator to cheat. New paths also open up to allow Plasma techniques to be extended to a much more general class of assets. Finally, in the case where the operator does not cheat, users can withdraw their funds instantly, without needing to wait for a one-week challenge period.

One way (not the only way) to make an EVM plasma chain: use a ZK-SNARK to construct a parallel UTXO tree that reflects the balance changes made by the EVM, and defines a unique mapping of what is “the same coin” at different points in history. A Plasma construction can then be built on top of that.

One key insight is that the Plasma system does not need to be perfect. Even if you can only protect a subset of assets (eg. even just coins that have not moved in the past week), you’ve already greatly improved on the status quo of ultra-scalable EVM, which is a validium.

Another class of constructions is hybrid plasma/rollups, such as Intmax. These constructions put a very small amount of data per user onchain (eg. 5 bytes), and by doing so, get properties that are somewhere between plasma and rollups: in the Intmax case, you get a very high level of scalability and privacy, though even in the 16 MB world capacity is theoretically capped to roughly 16,000,000 / 12 / 5 = 266,667 TPS.

What is left to do, and what are the tradeoffs?

The main remaining task is to bring Plasma systems to production. As mentioned above, “plasma vs validium” is not a binary: any validium can have its safety properties improved at least a little bit by adding Plasma features into the exit mechanism. The research part is in getting optimal properties (in terms of trust requirements, and worst-case L1 gas cost, and vulnerability to DoS) for an EVM, as well as alternative application specific constructions. Additionally, the greater conceptual complexity of Plasma relative to rollups needs to be addressed directly, both through research and through construction of better generalized frameworks.

The main tradeoff in using Plasma designs is that they depend more on operators and are harder to make “based“, though hybrid plasma/rollup designs can often avoid this weakness.

How does it interact with other parts of the roadmap?

The more effective Plasma solutions can be, the less pressure there is for the L1 to have a high-performance data availability functionality. Moving activity to L2 also reduces MEV pressure on L1.

Maturing L2 proof systems

What problem are we solving?

Today, most rollups are not yet actually trustless; there is a security council that has the ability to override the behavior of the (optimistic or validity) proof system. In some cases, the proof system is not even live at all, or if it is it only has an “advisory” functionality. The furthest ahead are (i) a few application-specific rollups, such as Fuel, which are trustless, and (ii) as of the time of this writing, Optimism and Arbitrum, two full-EVM rollups that have achieved a partial-trustlessness milestone known as “stage 1”. The reason why rollups have not gone further is concern about bugs in the code. We need trustless rollups, and so we need to tackle this problem head on.

What is it and how does it work?

First, let us recap the “stage” system, originally introduced in this post. There are more detailed requirements, but the summary is:

  • Stage 0: it must be possible for a user to run a node and sync the chain. It’s ok if validation is fully trusted/centralized.
  • Stage 1: there must be a (trustless) proof system that ensures that only valid transactions get accepted. It’s allowed for there to be a security council that can override the proof system, but only with a 75% threshold vote. Additionally, a quorum-blocking portion of the council (so, 26%+) must be outside the main company building the rollup. An upgrade mechanism with weaker features (eg. a DAO) is allowed, but it must have a delay long enough that if it approves a malicious upgrade, users can exit their funds before it comes online.
  • Stage 2: there must be a (trustless) proof system that ensures that only valid transactions get accepted. Security councils are only allowed to intervene in the event of provable bugs in the code, eg. if two redundant proof systems disagree with each other or if one proof system accepts two different post-state roots for the same block (or accepts nothing for a sufficiently long period of time eg. a week). An upgrade mechanism is allowed, but it must have a very long delay.

The goal is to reach Stage 2. The main challenge in reaching stage 2 is getting enough confidence that the proof system actually is trustworthy enough. There are two major ways to do this:

  • Formal verification: we can use modern mathematical and computational techniques to prove that an (optimistic or validity) proof system only accept blocks that pass the EVM specification. These techniques have existed for decades, but recent advancements such as Lean 4 have made them much more practical, and advancements in AI-assisted proving could potentially accelerate this trend further.
  • Multi-provers: make multiple proof systems, and put funds into a 2-of-3 (or larger) multisig between those proof systems and a security council (and/or other gadget with trust assumptions, eg. TEEs). If the proof systems agree, the security council has no power; if they disagree, the security council can only choose between one of them, it can’t unilaterally impose its own answer.

Stylized diagram of a multi-prover, combining one optimistic proof system, one validity proof system and a security council.

What is left to do, and what are the tradeoffs?

For formal verification, a lot. We need to create a formally verified version of an entire SNARK prover of an EVM. This is an incredibly complex project, though it is one that we have already started. There is one trick that significantly simplifies the task: we can make a formally verified SNARK prover of a minimal VM, eg. RISC-V or Cairo, and then write an implementation of the EVM in that minimal VM (and formally prove its equivalence to some other EVM specification).

For multi-provers, there are two main remaining pieces. First, we need to get enough confidence in at least two different proof systems, both that they are reasonably safe individually and that if they break, they would break for different and unrelated reasons (and so they would not break at the same time). Second, we need to get a very high level of assurance in the underlying logic that merges the proof systems. This is a much smaller piece of code. There are ways to make it extremely small - just store funds in a Safe multisig contract whose signers are contracts representing individual proof systems - but this has the tradeoff of high onchain gas costs. Some balance between efficiency and safety will need to be found.

How does it interact with other parts of the roadmap?

Moving activity to L2 reduces MEV pressure on L1.

Cross-L2 interoperability improvements

What problem are we solving?

One major challenge with the L2 ecosystem today is that it is difficult for users to navigate. Furthermore, the easiest ways of doing so often re-introduce trust assumptions: centralized bridges, RPC clients, and so forth. If we are serious about the idea that L2s are part of Ethereum, we need to make using the L2 ecosystem feel like using a unified Ethereum ecosystem.

An example of pathologically bad (and even dangerous: I personally lost $100 to a chain-selection mistake here) cross-L2 UX - though this is not Polymarket’s fault, cross-L2 interoperability should be the responsibility of wallets and the Ethereum standards (ERC) community. In a well-functioning Ethereum ecosystem, sending coins from L1 to L2, or from one L2 to another, should feel just like sending coins within the same L1.

What is it and how does it work?

There are many categories of cross-L2 interoperability improvements. In general, the way to come up with these is to notice that in theory, a rollup-centric Ethereum is the same thing as L1 execution sharding, and then ask where the current Ethereum L2-verse falls short of that ideal in practice. Here are a few:

  • Chain-specific addresses: the chain (L1, Optimism, Arbitrum…) should be part of the address. Once this is implemented, cross-L2 sending flows can be implemented by just putting the address into the “send” field, at which point the wallet can figure out how to do the send (including using bridging protocols) in the background.
  • Chain-specific payment requests: it should be easy and standardized to make a message of the form “send me X tokens of type Y on chain Z”. This has two primary use cases: (i) payments, whether person-to-person or person-to-merchant-service, and (ii) dapps requesting funds, eg. the Polymarket example above.
  • Cross-chain swaps and gas payment: there should be a standardized open protocol for expressing cross-chain operations such as “I am sending 1 ETH on Optimism to whoever sends me 0.9999 ETH on Arbitrum”, and “I am sending 0.0001 ETH on Optimism to whoever includes this transaction on Arbitrum”. ERC-7683 is one attempt at the former, and RIP-7755 is one attempt at the latter, though both are also more general than just these specific use cases.
  • Light clients: users should be able to actually verify the chains that they are interacting with, and not just trust RPC providers. A16z crypto’s Helios does this for Ethereum itself, but we need to extend this trustlessness to L2s. ERC-3668 (CCIP-read) is one strategy for doing this.

How a light client can update its view of the Ethereum header chain. Once you have the header chain, you can use Merkle proofs to validate any state object. And once you have the right L1 state objects, you can use Merkle proofs (and possibly signatures, if you want to check preconfirmations) to validate any state object on L2. Helios does the former already. Extending to the latter is a standardization challenge.

  • Keystore wallets: today, if you want to update the keys that control your smart contract wallet, you have to do it on all N chains on which that wallet exists. Keystore wallets are a technique that allow the keys to exist in one place (either on L1, or later potentially on an L2), and then be read from any L2 that has a copy of the wallet. This means that updates only need to happen once. To be efficient, keystore wallets require L2s to have a standardized way to costlessly read L1; two proposals for this are L1SLOAD and REMOTESTATICCALL.

A stylized diagram of how keystore wallets work.

  • More radical “shared token bridge” ideas: imagine a world where all L2s are validity proof rollups, that commit to Ethereum every slot. Even in this world, moving assets from one L2 to another L2 “natively” would require withdrawaing and depositing, which requires paying a substantial amount of L1 gas. One way to solve this is to create a shared minimal rollup, whose only function would be to maintain the balances of how many tokens of which type are owned by which L2, and allow those balances to be updated en masse by a series of cross-L2 send operations initiated by any of the L2s. This would allow cross-L2 transfers to happen without needing to pay L1 gas per transfer, and without needing liquidity-provider-based techniques like ERC-7683.
  • Synchronous composability: allow synchronous calls to happen either between a specific L2 and L1, or between multiple L2s. This could be helpful in improving financial efficiency of defi protocols. The former could be done without any cross-L2 coordination; the latter would require shared sequencing. Based rollups are automatically friendly to all of these techniques.

What is left to do, and what are the tradeoffs?

Many of the examples above face standard dilemmas of when to standardize and what layers to standardize. If you standardize too early, you risk entrenching an inferior solution. If you standardize too late, you risk creating needless fragmentation. In some cases, there is both a short-term solution that has weaker properties but is easier to implement, and a long-term solution that is “ultimately right” but will take quite a few years to get there.

One way in which this section is unique, is that these tasks are not just technical problems: they are also (perhaps even primarily!) social problems. They require L2s and wallets and L1 to cooperate. Our ability to handle this problem successfully is a test of our ability to stick together as a community.

How does it interact with other parts of the roadmap?

Most of these proposals are “higher-layer” constructions, and so do not greatly affect L1 considerations. One exception is shared sequencing, which has heavy impacts on MEV.

Scaling execution on L1

What problem are we solving?

If L2s become very scalable and successful but L1 remains capable of processing only a very low volume of transactions, there are many risks to Ethereum that might arise:

  1. The economic situation of ETH the asset becomes more risky, which in turn affects long-run security of the network.
  2. Many L2s benefit from being closely tied to a highly developed financial ecosystem on L1, and if this ecosystem greatly weakens, the incentive to become an L2 (instead of being an independent L1) weakens
  3. It will take a long time before L2s have exactly the same security assurances as L1.
  4. If an L2 fails (eg. due to a malicious or disappearing operator), users would still need to go through L1 in order to recover their assets. Hence, L1 needs to be powerful enough to be able to at least occasionally actually handle a highly complex and chaotic wind-down of an L2.

For these reasons, it is valuable to continue scaling L1 itself, and making sure that it can continue to accommodate a growing number of uses.

What is it and how does it work?

The easiest way to scale is to simply increase the gas limit. However, this risks centralizing the L1, and thus weakening the other important property that makes the Ethereum L1 so powerful: its credibility as a robust base layer. There is an ongoing debate about what degree of simple gas limit increase is sustainable, and this also changes based on which other technologies get implemented to make larger blocks easier to verify (eg. history expiry, statelessness, L1 EVM validity proofs). Another important thing to keep improving is simply the efficiency of Ethereum client software, which is far more optimized today than it was five years ago. An effective L1 gas limit increase strategy would involve accelerating these verification technologies.

Another scaling strategy involves identifying specific features and types of computation that can be made cheaper without harming the decentralization of the network or its security properties. Examples of this include:

  • EOF - a new EVM bytecode format that is more friendly to static analysis, allowing for faster implementations. EOF bytecode could be given lower gas costs to take these efficiencies into account.
  • Multidimensional gas pricing - establishing separate basefees and limits for computation, data and storage can increase the Ethereum L1’s average capacity without increasing its maximum capacity (and hence creating new security risks).
  • Reduce gas costs of specific opcodes and precompiles - historically, we have had several rounds of increasing gas costs for certain operations that were underpriced in order to avoid denial of service attacks. What we have had less of, and could do much more, is reducing gas costs for operations that are overpriced. For example, addition is much cheaper than multiplication, but the costs of the ADD and MUL opcodes are currently the same. We could make ADD cheaper, and even simpler opcodes such as PUSH even cheaper.
  • EVM-MAX and SIMD: EVM-MAX (“modular arithmetic extensions”) is a proposal to allow more efficient native big-number modular math as a separate module of the EVM. Values computed by EVM-MAX computations would only be accessible by other EVM-MAX opcodes, unless deliberately exported; this allows greater room to store these values in optimized formats. SIMD (“single instruction multiple data”) is a proposal to allow efficiently executing the same instruction on an array of values. The two together can create a powerful coprocessor alongside the EVM that could be used to much more efficiently implement cryptographic operations. This would be especially useful for privacy protocols, and for L2 proof systems, so it would help both L1 and L2 scaling.

These improvements will be discussed in more detail in a future post on the Splurge.

Finally, a third strategy is native rollups (or “enshrined rollups”): essentially, creating many copies of the EVM that run in parallel, leading to a model that is equivalent to what rollups can provide, but much more natively integrated into the protocol.

What is left to do, and what are the tradeoffs?

There are three strategies for L1 scaling, which can be pursued individually or in parallel:

  • Improve technology (eg. client code, stateless clients, history expiry) to make the L1 easier to verify, and then raise the gas limit
  • Make specific operations cheaper, increasing average capacity without increasing worst-case risks
  • Native rollups (ie. “create N parallel copies of the EVM”, though potentially giving developers a lot of flexibility in the parameters of the copies they deploy)

It’s worth understanding that these are different techniques that have different tradeoffs. For example, native rollups have many of the same weaknesses in composability as regular rollups: you cannot send a single transaction that synchronously performs operations across many of them, like you can with contracts on the same L1 (or L2). Raising the gas limit takes away from other benefits that can be achieved by making the L1 easier to verify, such as increasing the portion of users that run verifying nodes, and increasing solo stakers. Making specific operations in the EVM cheaper, depending on how it’s done, can increase total EVM complexity.

A big question that any L1 scaling roadmap needs to answer is: what is the ultimate vision for what belongs on L1 and what belongs on L2? Clearly, it’s absurd for everything to go on L1: the potential use cases go into the hundreds of thousands of transactions per second, and that would make the L1 completely unviable to verify (unless we go the native rollup route). But we do need some guiding principle, so that we can make sure that we are not creating a situation where we increase the gas limit 10x, heavily damage the Ethereum L1’s decentralization, and find that we’ve only gotten to a world where instead of 99% of activity being on L2, 90% of activity is on L2, and so the result otherwise looks almost the same, except for an irreversible loss of much of what makes Ethereum L1 special.

One proposed view of a “division of labor” between L1 and L2s, source.

How does it interact with other parts of the roadmap?

Bringing more users onto L1 implies improving not just scale, but also other aspects of L1. It means that more MEV will remain on L1 (as opposed to becoming a problem just for L2s), and so will be even more of a pressing need to handle it explicitly. It greatly increases the value of having fast slot times on L1. And it’s also heavily dependent on verification of L1 (“the Verge”) going well.

Disclaimer:

  1. This article is reprinted from [Vitalik Buterin], All copyrights belong to the original author [Vitalik Buterin]. If there are objections to this reprint, please contact the Gate Learn team, and they will handle it promptly.
  2. Liability Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Translations of the article into other languages are done by the Gate Learn team. Unless mentioned, copying, distributing, or plagiarizing the translated articles is prohibited.
Start Now
Sign up and get a
$100
Voucher!