Vitalik's new article: The Possible Future of Ethereum, The Verge

Original Title: "Possible futures of the Ethereum protocol, part 4: The Verge"

Original author: Vitalik Buterin

Original translation: Mensh, ChainCatcher

Special thanks to Justin Drake, Hsia-wei Wanp, Guillaume Ballet, Icinacio, Rosh Rudolf, Lev Soukhanoy, Ryan Sean Adams, and Uma Roy for their feedback and review.

One of the most powerful features of the Block chain is that anyone can run a Node on their own computer and verify the correctness of the Block chain. Even if the 9596 Nodes running the Consensus (PoW, PoS) of the chain immediately agree to change the rules and start producing Blocks according to the new rules, everyone who runs a fully verified Node will reject the chain. Coin issuers who do not belong to this conspiracy group will automatically converge to an on-chain that continues to follow the old rules and continue to build this chain, while fully verified users will follow this chain.

This is the key difference between blockchain and centralized systems. However, to make this feature work, it is indeed feasible for enough people to run a fully validating Node. This applies to both stakers (because if stakers do not validate the chain, they are not contributing to the execution of protocol rules) and ordinary users. Today, running a Node on consumer laptops (including the laptop used to write this article) is possible, but it is difficult to do so. The Verge aims to change this situation by making the computational cost of running a fully validating chain affordable, so that every smartphone Wallet, browser Wallet, and even smartwatch will default to validation.

Vitalik新文：以太坊可能的未来，The Verge

The Verge 2023 Roadmap

Initially, "Verge" referred to transferring the state storage of the Ethereum to the Verkle tree - a tree-like structure that allows for more compact proofs, enabling stateless verification of Ethereum blocks. A Node can verify an Ethereum block without needing to store any Ethereum state (account balances, contract code, storage space, etc.) on its hard drive, at the cost of spending a few hundred KB of proof data and a few hundred milliseconds of extra time for verification. Today, Verge represents a larger vision, focusing on achieving maximum resource efficiency verification for the Ethereum chain, which includes not only stateless verification technology, but also using SNARK to verify all Ethereum executions.

In addition to long-term following of SNARK verification of the entire chain, another new issue is related to whether Verkle Tree is the best technology. Verkle Tree is vulnerable to attacks from Quantum Computers. Therefore, if we replace Verkle Tree in the current KECCAK Merkle Patricia Tree, we will have to replace the Tree again in the future. The self-replacement method of Merkle Tree is to directly skip using Merkle branches and put them into a binary tree with STARK. Historically, this method has been considered impractical due to cost and technical complexity. However, recently, we have seen that Starkware has proven 1.7 million Poseidon hashes per second using ckcle STARKs on a laptop, and the proof time of more "traditional" hash values is rapidly decreasing due to the emergence of technologies such as GKB. Therefore, "Verge" has become more open and has several possibilities over the past year.

The Verge: Key Objectives

Stateless client: The storage space required for full verification client and marking Node should not exceed several GB.
(Long-term) Fully validate the chain on the smartwatch (Consensus and execution). Download some data, verify SNARK, done.

In this chapter

Stateless client: Verkle or STARKs
Validity proof of EVM execution
Consensus的validity proof

Stateless Verification: Verkle or STARKs

What problem are we trying to solve?

Now, the Ether client needs to store hundreds of gigabytes of state data to verify Blocks, and this amount is increasing every year. The original state data increases by about 30 GB each year, and a single client must store some additional data on top of it in order to effectively update the triad.

Vitalik新文：以太坊可能的未来，The Verge

This reduces the number of users who can run a fully validating Ethereum Node: although it is possible to store the entire Ethereum state and even years of history on large hard drives that are readily available, the computers people typically buy have only a few hundred gigabytes of storage space. The size of the state also creates significant friction in the process of setting up a Node for the first time: the Node needs to download the entire state, which can take hours or even days. This produces various cascading effects. For example, it greatly increases the difficulty for Node creators to upgrade their Node setup. Technically, it is possible to complete the upgrade without downtime - start a new client, wait for it to sync, shut down the old client, transfer the Secret Key - but in practice, this is very complex from a technical perspective.

How does it work?

Stateless validation is a technique that allows Nodes to verify Blocks without having the entire state. Instead, each Block comes with a witness that includes: (i) values, code, balances, and storage at specific positions in the state that the Block will access; (ii) encryption proofs that demonstrate the correctness of these values.

In fact, achieving stateless verification requires changing the state tree structure of Ethereum. This is because the current Merkle Patricia tree is extremely unfriendly to implementing any encryption proof scheme, especially in the worst case. Whether it is the 'raw' Merblk branch or the possibility of 'packaging' it as STARK, the same is true. The main difficulty arises from some weaknesses of MPT:

This is a 16-ary tree (i.e., each Node has 16 child Nodes). This means that in a tree of size N, a proof on average needs 32*(16-1)*log 16(N) = 120* log 2(N) bytes, or about 3840 bytes in a tree with 2 ^ 32 items. For a binary tree, only 32*(2-1)*log 2(N) = 32* log 2(N) bytes, or about 1024 bytes.
The code is not Merkleized. This means that to prove any access rights to the account code, the entire code, up to a maximum of 24000 bytes, must be provided.

Vitalik新文：以太坊可能的未来，The Verge

We can calculate the worst case as follows:

30000000 gas / 2400 (cold account reading cost) * (5 * 488 + 24000) = 330000000 bytes

The branch cost drops slightly (using 5* 480 instead of 8* 480) because when there are more branches, the top part will be repeated. But even so, it is completely unrealistic to download the amount of data in one time slot. If we try to encapsulate it with STARK, we will encounter two problems: (i) KECCAK is relatively unfriendly to STARK; (ii) 330 MB of data means that we have to prove 5 million calls to the KECCAK round function, which may not be proven for all hardware except for the most powerful consumer-grade hardware, even if we can make STARK prove the efficiency of KECCAK higher.

If we directly use a binary tree instead of a hexadecimal tree and perform additional Merkleization on the code, then the worst-case scenario will roughly become 30000000/2400* 32*( 32-14+ 8) = 10400000 bytes (14 is the subtraction of redundant bits for 2 ^ 14 branches, and 8 is the length of the proof into the leaf in the code block). It should be noted that this requires a change in gas costs, charging for accessing each individual code block; EIP-4762 does this. 10.4 MB of capacity is much better, but for many Nodes, the amount of data downloaded in one timeslot is still too much. Therefore, we need to introduce more powerful techniques. In this regard, there are two leading solutions: Verkle tree and STARKed binary hash tree.

Verkle Tree

Verkle tree uses elliptic curve-based vector commitments for shorter proofs. The key to unlocking is that, regardless of the width of the tree, each proof part corresponding to a parent-child relationship is only 32 bytes. The only limitation on the width of the proof tree is that if the proof tree is too wide, the proof calculation efficiency will drop. The proposed implementation width for Ethereum is 256.

Vitalik新文：以太坊可能的未来，The Verge

Therefore, the size of a single branch in the proof becomes 32-1 og 256(N) = 4 * log2(N) bytes. Therefore, the maximum theoretical proof size is roughly 130,000 bytes (due to uneven distribution of state blocks, the actual calculation result is slightly different, but it's a good first approximation).

Another thing to note is that in all of the above examples, this "worst case" is not actually the worst case: a worse case is if an attacker intentionally "mines" two Addresses that have a long common prefix in the tree and reads data from one of them, which could double the length of the branch in the worst case. However, even in such a scenario, the worst-case proof length of the Verkle tree is 2.6 MB, which is basically the same as the current worst-case verification data.

We also used this precaution to do something else: we made the cost of accessing 'adjacent' storage space very low: either many code blocks of the same contract, or adjacent storage slots. EIP-4762 provides a definition for adjacency, and charges only 200 gas for adjacent access. In the case of adjacent access, the worst-case proof size becomes 30000000 / 200* 32 - 4800800 bytes, which is still roughly within the tolerance range. If for safety reasons, we want to reduce this value, we can slightly increase the cost of adjacent access.

STARKed Binary Hash Tree

The principle of this technology is self-evident: you only need to create a binary tree, obtain a proof of maximum 10.4 MB to prove the values in the block, and then replace the proof with the STARK proof. In this way, the proof itself only contains the data to be proved, plus a fixed overhead of 100-300 kB from the actual STARK.

The main challenge here is to verify the time. We can perform the same calculations as above, except that we calculate not the number of bytes, but the hash values. A 10.4 MB Block means 330,000 hash values. If we consider the possibility of an attacker mining an Address tree with a longer common prefix, the worst-case scenario would be around 660,000 hash values. Therefore, if we can prove that the number of hash values per second is 200,000, then there is no problem.

On consumer laptops using the Poseidon hash function, these numbers have reached , while the Poseidon hash function is specifically designed for STARK friendliness. However, the Poseidon system is still relatively immature, so many people do not trust its security. Therefore, there are two realistic paths forward:

Quickly conduct a large number of security analyses on Poseidon, and become familiar enough with it to deploy it on L1.
Use a more "conservative" hash function, such as SHA 256 or BLAKE

To demonstrate a conservative hash function, Starkware's STARK circle can only prove 10-30k hash values per second on consumer laptops at the time of writing. However, STARK technology is rapidly improving. Even today, GKR-based technology shows the potential to increase this speed to the range of 100-200k.

Witness usage cases outside of block validation

In addition to verifying Blocks, there are three other key use cases that require more efficient stateless verification:

Memory Pool: When a transaction is broadcast, Nodes in the P2P network need to validate whether the transaction is valid before rebroadcasting it. Today, validation includes verifying signatures, as well as checking whether the balance is sufficient and the prefix is correct. In the future (for example, using native account abstraction, such as EIP-7701), this may involve running some EVM code that performs some state access. If the Node is stateless, the transaction needs to be accompanied by a proof of the state object.
List of Inclusions: This is a proposed feature that allows (possibly small-scale and uncomplicated) Proof of Stake validators to enforce that the next Block contains transactions, regardless of the intentions of the (possibly large-scale and complex) Block builders. This will weaken the ability of powerful parties to manipulate the Block chain through transaction latency. However, this requires validators to have a way to verify the validity of the transactions included in the list.
light client: If we want users to access the chain through wallets (such as Metamask, Rainbow, Rabby, etc.), to achieve this, they need to run a light client (such as Helios). The core module of Helios provides users with a verified state root. And for a fully trustless experience, users need to provide proof for every RPC call they make (for example, for ETH network call requests (users need to prove all states accessed during the call).

All of these use cases have one thing in common: they require a significant amount of proof, but each proof is very small. Therefore, STARK proofs do not make much sense for them; instead, the most practical approach is to use Merkle branches directly. Another advantage of Merkle branches is that they are updatable: given a proof of a state object X with Block B as the root, if a sub-block B2 and its witness are received, the proof can be updated to have Block B2 as the root. Verkle proofs are also natively updatable.

Connections with existing research:

Verkle trees
John Kuszmaul's original paper on Verkle tree
Starkware
Poseidon 2 paper
Ajtai (An alternative fast hash function based on lattice hardness)
Verkle.info

What else can be done?

The remaining main work is

More analysis on the consequences of EIP-4762 (changes in stateless gas costs)

Complete and test more work on the transition program, which is the main part of the complexity of any stateless implementation solution.
More security analysis of Poseidon, Ajtai, and other "STARK-friendly" hash functions
To further develop the "conservative" (or "traditional") hash with ultra-efficient STARK protocol features, such as based on Binius or GKR perspectives.

In addition, we will soon decide to choose one of the following three options: (i) Verkle tree, (ii) STARK-friendly hash function, and (iii) conservative hash function. Their characteristics can be roughly summarized in the table below:

Vitalik新文：以太坊可能的未来，The Verge

Apart from these 'title numbers', there are other important factors to consider:

Now, the Verkle tree code is quite mature. Using any code other than Verkle will delay deployment, possibly delaying a hard fork. This is okay, especially if we need extra time for hash function analysis or validator implementation, or if we have other important features that we want to incorporate into Ethereum earlier.
Updating the state root with hash values is faster than using Verkle trees. This means that the hash-based approach can drop the synchronization time of the Full Node.
Verkle trees have interesting witness update properties - Verkle tree witnesses are updatable. This property is useful for mempool, inclusion lists, and other use cases, and may also help improve implementation efficiency: if the state object is updated, the witness of the penultimate layer can be updated without reading the content of the last layer.
Verkle trees make it harder to perform SNARK proofs. If we want to keep proof sizes consistently small, Verkle proofs pose some difficulties. This is because verifying Verkle proofs introduces a large number of 256-bit operations, which either requires a proof system with significant overhead or a custom internal structure that includes a 256-bit Verkle proof component. This isn't a problem for stateful systems, but it presents more difficulties.

If we want to achieve the witness verifiability of Verkle in a quantum-secure and reasonably efficient way, another possible approach is based on lattice-based Merkle trees.

If in the worst case, it is proven that the efficiency of the system is not high enough, we can still use the unexpected tool of multidimensional gas to compensate for this deficiency: separate gas limits for (i) calldata; (ii) computation; (iii) state access and possible other different resource settings. Multidimensional gas adds complexity, but in exchange, it more strictly limits the ratio between average cases and worst cases. With multidimensional gas, the maximum number of branches that need to be proven theoretically may decrease from 12500 to, for example, 3000. This would make BLAKE 3 barely sufficient even today.

Vitalik新文：以太坊可能的未来，The Verge

Multidimensional gas allows the resource constraints of the Block to be closer to the resource constraints of the underlying hardware.

Another unexpected tool is to calculate the latency of the state root to the Block. This means that we have a full 12 seconds to calculate the state root, which means that even in the most extreme cases, the proof time of only 60,000 hash numbers per second is sufficient. This once again makes us believe that BLAKE 3 can only barely meet the requirements.

The disadvantage of this method is that it will increase the latency of a light client time slot, but there are also more clever techniques that can reduce this latency to only proof generation latency. For example, the proof can be broadcasted on the network immediately after it is generated by any Node, instead of waiting for the next Block.

How does it interact with other parts of the roadmap?

Solving the stateless problem greatly increases the difficulty of single-point fixing. If there is technology to drop the minimum balance of single-point fixing, such as Orbit SSF or Application Layer strategy, such as team fixing, this will become more feasible.

If EOF is introduced at the same time, multidimensional gas analysis will become easier. This is because the main source of complexity in multidimensional gas execution comes from handling sub-calls that do not pass all gas from the parent call, and EOF only needs to make such sub-calls illegal, which can make this problem insignificant (and the native account abstraction provides an alternative solution for the current main use of part of the gas within the protocol.

There is an important synergy between stateless validation and historical expiry. Nowadays, clients must store nearly 1 TB of historical data, which is several times the amount of state data. Even if the client is stateless, the dream of almost no storage client will not be realized unless we can relieve the client of the responsibility of storing historical data. The first step in this regard is EIP-4444, which also means storing historical data in torrents or the Portal Network.

Validity proof of EVM execution

What problem are we trying to solve?

The long-term goal of Block verification in the ETH network is very clear - to be able to verify the ETH Block in the following ways: (i) download the Block, or even just a small part of the data availability sampling in the Block; (ii) verify a small proof of the Block's validity. This will be a very low resource-consuming operation, which can be completed in a mobile client, browser Wallet, or even in another chain (without the data availability part).

To achieve this, it is necessary to provide SNARK or STARK proofs for (i) the consensus layer (i.e. proof of equity) and (ii) the execution layer (i.e. EVM). The former is a challenge in itself and should be addressed in the process of further improving the consensus layer (e.g. targeting single slot finality). The latter requires EVM execution proof.

What is it, how does it work?

From a formal point of view, in the Ethereum (ETH) specification, EVM is defined as a state transition function: you have some previous state S, a Block B, and you are calculating a new state S' = STF(S, B). If a user is using a light client, they do not fully possess S and S', or even E; instead, they have a previous state root R, a new state root R', and a Block hash value H.

Public Input: Previous state root R, next state root R', block hash H
Private input: Program block body B, objects in the state accessed by program block Q, the same objects after executing program block Q', and state proof (such as Merkle branch) P
Proposition 1: P is a valid proof that demonstrates that Q contains some parts of the state represented by R
Proposition 2: If STF runs on Q, (i) the execution process only accesses the objects inside Q, (ii) the data blocks are valid, (iii) the result is Q'
Proposition 3: If we use the information of Q' and P to recalculate the root of the new state, we will get R'.

If this is the case, you can have a lightweight client that fully validates the Ethereum Virtual Machine (EVM) execution of ETH. This makes the client's resources quite low. To achieve a truly fully validating Ethereum client, the same work needs to be done in the Consensus aspect.

The implementation of validity proof for EVM computation already exists and is widely used in L2. However, there is still a lot of work to be done to make EVM validity proof feasible in L1.

What are the connections with existing research?

EFPSE ZK-EVM (discontinued due to better alternatives)
Zeth, whose working principle is to compile EVM into RISC-0 ZK-VM.
ZK-EVM Formal Verification project

What else can be done?

Now, there are two shortcomings in the validity proof of the electronic accounting system: security and verification time.

A secure validity proof needs to ensure that SNARK does verify the computation of EVM and there are no vulnerabilities. The two main technologies to enhance security are multi-validator and formal verification. Multi-validator refers to having multiple independently written validity proof implementations, just like having multiple clients, if a Block is proven by a sufficiently large subset of these implementations, the client will accept the Block. Formal verification involves using tools commonly used to prove mathematical theorems, such as Lean 4, to prove that the validity proof only accepts the correct execution of the underlying EVM specification (e.g., EVM K semantics or the ETH block execution layer specification (EELS) written in python).

A fast verification time means that any Ether block can be verified in less than 4 seconds. Today, we are still far from this goal, although we are much closer than we imagined two years ago. To achieve this goal, we need to make progress in three directions:

Parallelization - The fastest EVM verifier can currently prove an Ethereum block within an average of 15 seconds. This is achieved by parallelizing the work across hundreds of GPUs and then aggregating their work together. In theory, we know exactly how to build an EVM verifier that can prove computations in O(log(N)) time: have each GPU perform every step and then create an "aggregation tree" at the end:

Vitalik新文：以太坊可能的未来，The Verge

Achieving this presents challenges. Even in the worst case, where a very large transaction occupies the entire Block, the computation cannot be done sequentially, but must be done by opcode (the opcode of the underlying Virtual Machine such as EVM or RISC-V). Ensuring the 'memory' of the Virtual Machine remains consistent between different parts of the proof is a key challenge in the implementation process. However, if we can achieve this recursive proof, then we know that at least the prover's latency issue has been resolved, even without any other improvements.

Proof system optimization--The new proof systems such as Orion, Binius, GRK, and more information are likely to lead to another significant reduction in the verification time of general computation.
Other changes in EVM gas costs - Many things in EVM can be optimized to make it more favorable to validators, especially in the worst case. If an attacker can construct a Block that blocks validators for ten minutes, then proving a regular ETH Block in just 4 seconds is not enough. The required EVM changes can be roughly divided into the following categories:

The change in gas costs - if an operation takes a long time to prove, then even if its calculation speed is relatively fast, it should have a high gas cost. EIP-7667 is an EIP proposed to address the most serious issue in this regard: it greatly increases the gas cost of (traditional) hash functions, because the opcodes and precompiles of these functions are relatively cheap. To compensate for the increase in these gas costs, we can drop the gas cost of EVM opcodes that prove to be relatively low, thus keeping the average throughput unchanged.
Data structure replacement - In addition to replacing the state tree with a more STARK-friendly approach, we also need to replace the transaction list, receipt tree, and other costly proof structures. Moving the transaction and receipt structures to SSZ in EIP by Etan Kissling is a step in this direction.

In addition, the two tools mentioned in the previous section (multidimensional gas and latency state root) can also provide assistance in this regard. However, it is worth noting that unlike stateless verification, using these two tools means that we already have enough technology to complete the work we currently need, and even with these technologies, more work is still needed for a complete ZK-EVM verification - just less work is required.

One point not mentioned above is the proof-of-work hardware: using GPU, FPGA, and ASIC to generate proofs faster. Fabric Cryptography, Cysic, and Accseal are three companies making progress in this area. This is valuable for L2, but unlikely to be a decisive factor for L1, as people strongly desire L1 to maintain a high level of decentralization, which means proof generation must be within a reasonable range for Ethereum users and not restricted by the bottleneck of a single company's hardware. L2 can make more positive trade-offs.

There is still more work to be done in these areas:

Parallelization of proofs requires that different parts of the proving system can "share memory" (such as a lookup table). We know the technology to do this, but it needs to be implemented.
We need to conduct more analysis to identify the ideal set of gas cost changes in order to minimize verification time in worst-case scenarios.
We need to do more work in the proof system aspect

The possible cost is:

Security and Validator Time: Choosing a more aggressive hash function, a more complex proof system, or more aggressive security assumptions or other design schemes may shorten the validator time.
Decentralization and Prover Time: Does the community need to agree on the 'specifications' for the targeted prover hardware? Can proofers be large-scale entities? Do we hope that high-end consumer laptops can prove an ETH block in 4 seconds? Or is it somewhere in between?
Degree of breaking backward compatibility: Other deficiencies can be compensated by more aggressive gas cost changes, but this is more likely to disproportionately increase the cost of certain applications, forcing developers to rewrite and redeploy code to maintain economic viability. Similarly, these two tools also have their own complexities and drawbacks.

How does it interact with other parts of the roadmap?

The core technology required to achieve L1 EVM validity proof is largely shared with two other fields:

L2的validity proof（即 "ZK rollup"）
Stateless "STARK Binary Hash Proof" method

Achieving validity proof on L1 will eventually enable easy solo stake: even the weakest computers (including mobile phones or smart watches) can stake. This further enhances the value of addressing other restrictions on solo stake (such as the minimum limit of 32 ETH).

In addition, the EVM validity proof of L1 can greatly increase the gas limit of L1.

Consensus的validity proof

What problem are we trying to solve?

If we want to fully verify an ETH block using SNARK, the execution of EVM is not the only part we need to prove. We also need to prove Consensus, which includes the handling of deposits, withdrawals, signatures, validator balance updates, and other elements of the Proof of Stake part of the ETH block.

Consensus is much simpler than EVM, but the challenge it faces is that we don't have L2 EVM convolution, so most of the work has to be done anyway. Therefore, any implementation of Consensus for ETH blockchain needs to start from scratch, although the proof system itself can be built on top of it for shared work.

What is it, how does it work?

The beacon chain is defined as a state transition function, just like the EVM. The state transition function is mainly composed of three parts:

ECADD (used to verify BLS signature)
Pairing (used for verifying BLS signatures)
SHA 256 hash value (used for reading and updating state)

In each Block, we need to prove 1-16 BLS 12-381 ECADD for each validator (there may be more than one, as the signature may be included in multiple sets). This can be mitigated by subset precomputation, so we can say that each validators only need to prove one BLS 12-381 ECADD. Currently, each slot has 30,000 validator signatures. In the future, with the implementation of single-slot finality, this situation may change in two directions: if we take the "brute force" route, the number of validators per slot may increase to 1 million. At the same time, if Orbit SSF is adopted, the number of validators will remain at 32,768, or even decrease to 8,192.

Vitalik新文：以太坊可能的未来，The Verge

How does BLS aggregation work: Verifying the total signature only requires each participant to perform an ECADD, rather than an ECMUL. However, 30000 ECADDs is still a large amount of proof.

In terms of pairing, currently there are up to 128 proofs per slot, which means that 128 pairings need to be verified. With ElP-7549 and further modifications, the number of proofs per slot can be reduced to 16. The number of pairings is small, but the cost is extremely high: the running time (or proof time) of each pairing is several k times longer than ECADD.

One of the main challenges of proving BLS 12-381 operations is that there is no convenient curve with an order equal to the BLS 12-381 field size, which adds a considerable overhead to any proof system. On the other hand, the Verkle tree proposed for Ethereum (ETH) is constructed using the Bandersnatch curve, making BLS 12-381 itself the self-curve used to prove Verkle branches in SNARK systems. A relatively simple implementation can prove 100 G1 additions per second; to make the proof speed fast enough, clever techniques like GKR are almost certainly needed.

For SHA 256 hash values, the worst case scenario at present is the epoch transition block, where the entire validator short balance tree and a large number of validator balances will be updated. Each validator's short balance tree has only one byte, so 1 MB of data will be rehashed. This is equivalent to 32768 SHA 256 calls. If the balance of a thousand validators is higher or lower than a threshold, the valid balance in the validators' records needs to be updated, which is equivalent to a thousand Merkle branches, so it may require ten thousand hash values. The shuffling mechanism requires 90 bits per validator (thus requiring 11 MB of data), but this can be calculated at any time during an epoch. In the case of a single slot termination, these numbers may increase or decrease depending on the specific circumstances. Shuffling becomes unnecessary, although Orbit may partially restore this need.

Another challenge is the need to reacquire the status of all validators, including Public Key, to verify a Block. For 1 million validators, reading only the Public Key requires 48 million bytes, plus the Merkle branches. This requires millions of hash values for each epoch. If we must prove the validity of PoS, a realistic approach is some form of incremental verifiable computation: store a separate data structure within the proof system, which is optimized for efficient lookup and prove updates to that structure.

In short, there are many challenges. To address these challenges most effectively, it may be necessary to undergo a thorough redesign of the beacon chain, potentially in conjunction with transitioning to single-slot finality. The characteristics of this redesign may include:

Changes to the hash function: Now we are using the "complete" SHA 256 hash function, so due to padding, each call corresponds to two underlying compression function calls. If we switch to the SHA 256 compression function, we can at least double the benefits. If we switch to Poseidon, we may gain 100 times the benefits, thus completely solving all our problems (at least in terms of hash values): we can "read" the proof even with one million verification records in a matter of seconds at a rate of 1.7 million hash values per second (54 MB).
If it is Orbit, the shuffled validator records are stored directly: if a certain number of validators (such as 8192 or 32768) are selected as the committee for a given slot, they are placed directly in the adjacent states of each other, so that the Public Key of all validators can be read into the proof with minimum hash. This can also efficiently complete all balance updates.
Signature Aggregation: Any high-performance signature aggregation scheme will involve some form of recursive proof, where different Nodes in the network will provide intermediate proofs for the subset of signatures. This naturally distributes the proof work to multiple Nodes in the network, greatly reducing the workload of the "ultimate prover".
Other signature schemes: For Lamport+ Merkle signatures, we need 256 + 32 hash values to verify the signature; multiplied by 32768 signers, we get 9437184 hash values. After optimizing the signature scheme, this result can be further improved by a small constant factor. If we use Poseidon, this can be proven within a single slot. However, in practice, using a recursive aggregation scheme will be faster.

What are the connections with existing research?

Concise ETH Ethereum Consensus Proof (limited to synchronous committee)
Helios inside the concise SP 1
Concise BLS 12-381 Pre-compiled
BLS set signature verification based on Halo 2

What other work needs to be done and how to prioritize it:

In fact, it will take us several years to obtain the validity proof of ETH consensus on the Ethereum blockchain. This is roughly the same amount of time needed to achieve single-slot finality, Orbit, modify the Signature Algorithm, and conduct security analysis. Security analysis requires enough confidence to use "aggressive" hash functions like Poseidon. Therefore, the wisest approach is to address these other issues while considering the compatibility with STARK.

The main trade-off is likely to be in the order of operations, between a more incremental approach to reforming the Ethereum consensus layer and a more radical "one-change-many" approach. For the EVM, the incremental approach makes sense as it minimizes disruption to backward compatibility. For the consensus layer, there is less impact on backward compatibility and there are also benefits to rethinking various details of the beacon chain construction in a more "comprehensive" way to optimize SNARK friendliness.

How does it interact with other parts of the roadmap?

When redesigning PoS for Ethereum, STARK friendliness must be the primary consideration, especially for single-slot finality, Orbit, signature scheme changes, and signature aggregation.