Paradigm: A detailed explanation of Ethereum history rise problems and their solutions

Original authors: Storm Slivkoff, Georgios Konstantopoulos

Original compilation: Luffy, Foresight News

History rise growth is currently the biggest bottleneck in Ethereum expansion. Surprisingly, historical rise has become a bigger problem than state rise. Within a few years, historical data will exceed long Ethereum Node storage capacity.

Here's the good news:

  • Historical rise is a much easier problem to solve than state rise.
  • The solution is already under active development.
  • Addressing historical rise will alleviate state rise issues.

In this post, we will continue to look at the Ethereum scaling problem in Part 1 and now turn our attention from state rise to historical rise. Using granular datasets, our goals are to 1) technically understand Ethereum's scaling bottlenecks, and 2) help open the discussion around the optimal solution to Ethereum's gas limits.

What is Historical rise?

History is the collection of all blocks and transactions executed by Ethereum throughout its lifetime, and it is all the data from the Genesis Block to the current Block. Historical growth is the rise of new blocks and new transactions over time.

Figure 1 shows the relationship between historical rise and various protocol metrics and Ethereum Node hardware constraints. Compared to state rise, historical rise are limited by a different set of hardware constraints. Historical rise puts pressure on network IO as new Block and transactions must be transmitted throughout the network. Historical rise can also put pressure on Node storage short, as each Ethereum Node stores a complete copy of history. If the historical rate of rise is fast enough to exceed these hardware limits, the Node will no longer be able to reach stable Consensus with its Node. For an overview of state rise and other scaling bottlenecks, see Part 1 of this series.

Paradigm:详解以太坊历史增长问题及其解决方案

Figure 1: Ethereum scaling bottleneck

Until recently, most of the network throughput per node was used to transfer history (such as new blocks and transactions). This changed with the introduction of blobs in the Dencun Hard Fork. blobs now account for a large portion of Node Network activity. However, blobs are not considered part of the history because 1) they are only stored by Nodes for 2 weeks and then discarded, and 2) they do not need to repeat data from the inception of Ethereum. Because of (1), blobs don't significantly increase the storage burden per Ethereum Node. We'll talk about blobs later in this article.

In this article, we will focus on historical rise and discuss the relationship between history and state. Because state rise and historical rise have some overlapping hardware constraints, they are related problems, and solving one problem can help solve the other.

How fast is history rise long?

Figure 2 shows the historical rise rate since the creation of Ethereum. Each vertical line represents a month's rise. The y-axis represents the number of k exabytes of historical rise in that month. Transactions are categorized by their "destination Address" and use RLP() bytes to indicate size. Contracts that cannot be easily identified are classified as "unknown". The "Other" category includes a range of sub-categories such as infrastructure and games.

Paradigm:详解以太坊历史增长问题及其解决方案

Figure 2: Ethereum historical rise rate over time

A few key takeaways from the chart above:

  • Historical rise rate is 6 to 8 times faster than state rise: Historical rise rate recently peaked at 36.0 GiB/month and is currently 19.3 GiB/month. The state rise rate peaked at about 6.0 GiB/month and is currently 2.5 GiB/month. A comparison of history and state in terms of rise and cumulative size will be described later in this article.
  • Prior to Decun, the historical rise rate had been accelerating: While the state had been roughly linear rise for long years (see Part 1), the history was rise superlinear. Considering that the rise rate of a linear rise results in a quadratic rise of the overall scale, the rise rate of a superlinear rise results in an overall size exceeding the quadratic rise. This acceleration stops abruptly after Dencun. This is the first time that Ethereum has experienced a significant drop in the historical rise rate.
  • The majority of recent historical rise comes from Rollups: each L2 publishes a copy of its transaction back to Mainnet. This generated a large amount of historical rise and led to Rollup being the most significant contributor to historical growth over the past year. However, Dencun allows L2 to publish its transaction data using blobs instead of history, so Rollups no longer generate most of Ethereum history. We'll cover Rollups in more detail later in this article.

Who is the biggest contributor to Ethereum's historical rise?

The historical number of different contract classes generated reveals how Ethereum's usage patterns have evolved over time. Figure 3 shows the relative contributions of the various contract categories. This is normalized to the same data as in Figure 2.

Paradigm:详解以太坊历史增长问题及其解决方案

Figure 3: Contribution of different contract classes to historical rise

This data reveals four different periods of Ethereum usage patterns:

  • Early (purple): There was little on-chain activity in the first few years of Ethereum. Large long of these early contracts are now difficult to identify and are marked as "unknown" in the chart.
  • ERC-20 era (green): The ERC-20 standard was finalized at the end of 2015 but did not gain significant development until 2017 and 2018. ERC-20 contracts became the largest source of historical rise in 2019.
  • DEX / Decentralized Finance Era (Brown): DEX and Decentralized Finance contracts appeared on the on-chain as early as 2016 and began to gain traction in 2017. But it wasn't until the summer of Decentralized Finance 2020 that they became the largest category of all-time rise. Decentralized Finance and DEX contracts accounted for more than 50% of the all-time rise for part of 2021 and 2022.
  • Rollup Era (Gray): In early 2023, L2 Rollups began executing longer transactions than Mainnet. In the months leading up to Dencun, they generated about 2/3 of Ethereum's history.

Each era represents a more complex pattern of using Ethereum than ever before. Over time, complexity can be seen as a form of Ethereum scaling, which cannot be measured by simple metrics such as transactions per second.

In the most recent data month (April 2024), Rollups no longer produce most of the history. It's unclear whether future history originates from DEXs and Decentralized Finance, or if some new usage patterns will emerge.

What about blobs?

The Dencun Hard Fork dramatically changed the historical rising dynamics by introducing blobs, allowing rollups to publish data using cheap blobs instead of history. Figure 4 amplifies the historical rise before and after the Dencun upgrade. The chart is similar to Figure 2, except that each vertical line represents a day instead of a month.

Paradigm:详解以太坊历史增长问题及其解决方案

Figure 4 The impact of :D encun on historical rise

From this chart, we can draw several key conclusions:

  • The historical rise of rollups has dropped by about 2/3 since Dencun: large long rollups have been converted from call data to blobs, which greatly reduces the amount of history they generate. However, as of April 2024, there are still some rollups that have not yet been converted from call data to blobs.
  • Total all-time rise has dropped by about 1/3 since Dencun: Dencun has only drop the all-time rise of rollups. There was a slight increase in historical rise for other contract categories. Even after Dencun, the historical rise is still 8 times the state rise (see the next section for details).

Although blobs have drop historical rise speed, they are still a new feature of the Ethereum. It's unclear at what level the historical rise velocity will stabilize in the presence of blobs.

Is long fast historical rise acceptable?

Increasing the gas cap will increase the historical rise rate. Therefore, proposals to increase the gas cap, such as Pump the Gas, must take into account the relationship between historical rise and the hardware bottlenecks of each Node.

To determine an acceptable historical rise rate, you must first understand how long your current Node hardware can sustain long in terms of networking and storage. Networked hardware may be able to maintain the status quo indefinitely, as historical growth rates are unlikely to rise back to their pre-Dencun peaks until gas limits are increased. However, the storage burden of history increases over time. Under the current storage strategy, it is inevitable that each Node's storage disk will eventually be filled with history.

Figure 5 shows Ethereum Node storage burden over time and predicts the rise of the storage burden over the next 3 years. The forecast is based on the rise rate in April 2024. This rise rate may rise or decrease as future usage patterns or gas limits change.

Paradigm:详解以太坊历史增长问题及其解决方案

Figure 5: The size of the history, state, and full node storage burden

From this graph, we can draw several key conclusions:

  • History occupies approximately 3 times the storage shorts of state. This difference also rises over time, as historical growth is about 8 times faster than state.
  • 1.8 TiB is the critical threshold at which Xu long Node will be forced to upgrade their storage drives. 2 TB is a common storage hard drive size, providing only 1.8 TiB of usable shorts. Note that TB (1 trillion bytes) is a different unit than TiB (= 1024^4 bytes). For Xu long Node operators, the "true" critical threshold is even lower, as the post-merge validator must run alongside Consensus execution client.
  • The critical threshold will be reached in 2 to 3 years. Raising the gas limit by any amount will speed up the time accordingly. Reaching this threshold will place a significant maintenance burden on Node operators and require the purchase of additional hardware (e.g., $300 NVME drives).

Unlike status data, historical data is append-only and accessed longest less frequently. Therefore, it is theoretically possible to store historical data separately from state data on a cheaper storage medium. This can be achieved with some clients such as Geth.

In addition to storage capacity, network IO is another major limitation of historical rise. Unlike storage capacity, network IO limits will not cause problems for Nodes in the short term, but these limits will become important for increasing gas limits in the future.

To understand how the network capacity of a typical Ethereum Node can support long few historical rise, it is important to know the relationship between historical rise and various network health metrics, such as reorganization rate, slot misses, final misses, proof misses, synchronization committee misses, and Block commit latency. The analysis of these metrics is beyond the scope of this article, but more long information can be found in previous surveys of Consensus layer health. In addition, the Ethereum Foundation's Xatu project has been building public datasets to speed up such analysis.

How to solve the historical rise problem?

Historical rising is a much easier problem to solve than state rising. It can be addressed almost entirely by candidate proposal EIP-4444. This EIP changes each Node from keeping the entire Ethereum historical data to only one year's worth of historical data. After the implementation of EIP-4444, data storage will no longer be a bottleneck for Ethereum scaling, and in the long run, gas limit increases will not be constrained. EIP-4444 is necessary for the long-term sustainability of the network, otherwise the historical rate of rise will be rapid and the hardware of the network Node needs to be updated regularly.

Figure 6 shows the impact of EIP-4444 on the storage burden of each Node over the next 3 years. This is the same as Figure 4, but with the addition of a shallower line to indicate the storage burden following the implementation of EIP-4444.

Paradigm:详解以太坊历史增长问题及其解决方案

Figure 6: Impact of EIP-4444 on Ethereum Node storage burden

Some key conclusions can be seen from this graph:

  • EIP-4444 will Halving the current storage burden. The storage burden will be reduced from 1.2 TiB to 633 GiB.
  • EIP-4444 will stabilize the historical storage burden. Assuming a constant historical rise rate, historical data is discarded at the rate generated.
  • After EIP-4444, it would take longest years for the Node storage burden to reach where it is today. This is because state rise will be the only factor that increases the storage burden, and state rise slower than historical rise.

After EIP-4444 is implemented, historical rise will still introduce some level of storage burden, as Node will store a year's worth of historical history. However, even if Ethereum reaches global scale, this burden will not be difficult to solve. Once the history-keeping method proves to be reliable, the one-year expiration time for EIP-4444 may be shortened to months, weeks, or even less.

How do I save my Ethereum history?

EIP-4444 raises the question: if history is not saved by Ethereum Node itself, then how should it be saved? History plays a central role in Ethereum's verification, accounting, and analysis, so it's crucial to preserve history. Luckily, history keeping is a simple matter that only requires 1/n honest data providers. This is in contrast to state Consensus issues, which require 1/3 to 2/3 of participants to be honest. Node operators can verify the authenticity of historical datasets by 1) replaying all transactions since the Genesis Block and 2) checking whether these transactions reproduce the same state root as the current Blockchain side.

There are longest ways to save history.

  • Torrents/P2P: Torrents are the easiest and most reliable method. Ethereum Node can package part of the history on a regular basis and share it as a public torrent file. For example, a Node might create a new historical torrent file every 100, 000 Blocks. Node clients like erigons already perform this process in a somewhat non-standardized manner. To standardize this process, all Node clients must use the same data format, the same parameters, and the same P2P network. Nodes will be able to choose whether or not to participate in this network based on their storage and bandwidth capabilities. The advantage of Torrents is the use of high lindy open standards that are already supported by a large number of data tools.
  • Portal Network: Portal Network is a new network designed specifically for hosting Ethereum data. It's an approach similar to Torrent while also offering some extra features that make data validation easier. The advantage of Portal Network is that these additional layers of validation provide light clients with utilities to efficiently validate and query shared datasets.
  • Cloud hosting: Cloud storage services like AWS's S 3 or Cloudflare's R 2 offer an inexpensive and high-performance option for keeping history. However, this approach introduces more long legal and operational risks, as there is no guarantee that these cloud services will always be willing and able to host Crypto Assets data.

The remaining implementation challenges are longer social than technical. The Ethereum community needs to coordinate specific implementation details in order to integrate them directly into each Node client. In particular, performing a full sync from the Genesis Block (instead of Snapshot sync) will require retrieving the history from the history provider instead of the Ethereum Node. These changes don't technically require a hard fork, so they can be implemented earlier than Ethereum's next hard fork, Pectra.

All of these history-keeping methods can also be used by L2s to hold blob data they publish to Mainnet. Compared to historical preservation, blob preservation 1) is more difficult because the total amount of data is longer; 2) Less important because blobs are not necessary to replay Mainnet history. However, blob preservation is still necessary for each L2 to replay its own history. Therefore, some form of blob saving is important for the entire Ethereum ecosystem. In addition, if L2 develops a robust blob storage infrastructure, they may also be able to easily store historical L1 data.

It can be helpful to directly compare the datasets stored by various Node configurations before and after EIP-4444. Figure 7 shows the storage burden for different Ethereum Node types. State data is accounts and contracts, historical data is Block and Transactions, and archive data is an optional set of data indexes. The number of bytes in this table is based on the most recent reth Snapshot, but the numbers for other Node clients should be roughly the same.

Paradigm:详解以太坊历史增长问题及其解决方案

Figure 7: Storage burden for different Ethereum Node types

Other words

  • Archive Node stores state and historical data as well as archive data. Archive Node can be used when someone wants to be able to easily query the historical chain state.
  • Full Node stores only historical and state data. Most of today's long Node are Full Node. The storage burden of a full node is about half that of an archive Node.
  • Full nodes after EIP-4444 store only state data and historical data for the most recent year. This reduces the storage burden on the Node from 1.2 TiB to 633 GiB and brings the storage shorts for historical data to a steady-state value.
  • Stateless Nodes, also known as "light nodes", do not store any data sets and can be verified immediately at the end of the chain. This Node type becomes possible once Verkle attempts or other state commitment schemes are added to Ethereum.

Finally, there are additional EIPs that can limit the historical rise rate, not just accommodate the current rise rate. This helps stay within network IO constraints in the short term and storage constraints in the long term. Although EIP-4444 is still necessary for the long-term sustainability of the network, these other EIPs will help Ethereum scale more efficiently in the future:

  • EIP-7623: Repricing call data to make some transactions with longing call data more expensive. Making these usage patterns more expensive will force some of them to convert from call data to blobs. This will drop historical rise rate.
  • EIP-4488: Imposes a limit on the total amount of call data that can be included in each block. This will impose a tighter limit on the rate of rise of the history.

These EIPs are easier to implement than EIP-4444, so they may serve as short-term options before EIP-4444 goes into production.

Conclusion

The purpose of this article is to use data to understand 1) how historical rise works and 2) ways to solve that problem. Much of the long data in this article is difficult to obtain through traditional means, so we wanted to expose this data to provide some new insights into historical rise issues.

Historically rise as a bottleneck for Ethereum expansion, not enough attention has been paid to it. Even without increasing the gas cap, Ethereum current history-keeping conventions will force Xu long Node to upgrade their hardware within a few years. Fortunately, this is not a difficult problem to solve. There is already a clear solution in EIP-4444. We believe that the implementation of this EIP should be accelerated to allow shorts for future gas cap increases.

Link to original article

View Original
  • Reward
  • Comment
  • Share
Comment
No comments