History in Ethereum consists of all blocks and transactions executed over its lifetime. This data is needed to sync the chain from the genesis block to its current state. Historical growth refers to the accumulation of new blocks and transactions over time.
Figure 1 shows the relationship between historical growth, various protocol metrics, and Ethereum node hardware limitations. Unlike state growth, historical growth is constrained by a different set of hardware limitations. This growth pressures network I/O because new blocks and transactions must be transmitted across the network. It also strains node storage space, as each Ethereum node stores a complete copy of the historical record. If historical growth outpaces these hardware limitations, nodes will no longer be able to reach stable consensus with their peers. For an overview of state growth and other scaling bottlenecks, refer to Part 1 of this series.
Figure 1: Ethereum expansion bottleneck
Until recently, most of each node’s network throughput was used for transmitting historical records (e.g., new blocks and transactions). This changed with the introduction of blobs in the Dencun hard fork. Blobs now constitute a significant portion of node network activity. However, blobs are not considered part of the historical record because 1) nodes store them for only 2 weeks before discarding them, and 2) they are not required for replaying the chain from Genesis. Due to (1), blobs do not significantly increase the storage burden on each Ethereum node. We will discuss blobs in more detail later in this article.
In this article, we will focus on the growth of historical data and its relationship with state growth. Since state and historical growth share some overlapping hardware constraints, they are interrelated issues; addressing one can help mitigate the other.
Figure 2 shows the rate of historical growth over time since Ethereum’s genesis. Each vertical bar represents one month’s growth. The Y-axis represents the number of GBs added to history in that month. Transactions are categorized by their “destination address” and their size is determined using their RLP byte representation. Contracts that cannot be easily identified are classified as “unknown.” The “other” category includes a long tail of smaller categories like infrastructure and gaming.
Several key conclusions can be drawn from this chart:
The amount of historical data generated by each contract category reveals how Ethereum usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories. This uses the same data as Figure 2, normalized to 100%.
The data reveals four distinct eras of Ethereum usage patterns:
Early Era (Purple): In Ethereum’s initial years, there was minimal on-chain activity. Most of these early contracts are now difficult to identify and are labeled as “unknown” in the chart.
ERC-20 Era (Green): The ERC-20 standard was finalized in late 2015 but did not gain significant traction until 2017 and 2018. By 2019, ERC-20 contracts became the largest category in historical growth.
DEX/DeFi Era (Brown): DEX and DeFi contracts appeared on-chain as early as 2016 and began gaining attention in 2017. However, they did not become the largest historical category until the DeFi Summer of 2020. DeFi and DEX contracts peaked at over 50% of historical growth at various times in 2021 and 2022.
Rollup Era (Gray): In early 2023, L2 Rollups started consistently executing more transactions than the mainnet. This coincided with their contracts generating a large portion of historical data, accounting for about two-thirds of Ethereum’s historical growth in the months leading up to Dencun.
Each era represents increasingly complex usage patterns on Ethereum. Over time, this complexity can be seen as a form of Ethereum scaling, not captured by simple metrics like transactions per second.
In the most recent data (April 2024), rollups no longer generate the majority of historical records. It is unclear whether future historical growth will be dominated by DEX and DeFi or if new usage patterns will emerge.
The introduction of blobs in the Dencun hard fork significantly altered the dynamics of historical growth, allowing Rollups to use inexpensive blobs instead of historical records to post data. Figure 4 zooms in on the historical growth rate around the date of the Dencun upgrade. This chart is similar to Figure 2, but each vertical bar represents one day instead of one month.
Several key conclusions can be drawn from this chart:
Historical Growth from Rollups Has Decreased by About Two-Thirds Since Dencun: Most rollups have shifted from using call data to blobs, significantly reducing the amount of historical data they generate. However, as of April 2024, some rollups have yet to switch from call data to blobs.
Total Historical Growth Has Decreased by About One-Third Since Dencun: Dencun has primarily reduced the historical growth from rollups. Historical growth from other contract categories has slightly increased. Even after Dencun, historical growth remains about eight times that of state growth (details in the next section).
Despite the reduction in historical growth, blobs remain a new addition to Ethereum. It is currently unclear where historical growth will stabilize in the presence of blobs.
Increasing the gas limit will raise the historical growth rate. Therefore, proposals to raise the gas limit (such as Pump the Gas) must consider the relationship between historical growth and hardware bottlenecks on each node.
To determine an acceptable historical growth rate, it is helpful to first examine how long modern node networks and storage node hardware can sustain the current state. Network hardware may sustain the status quo indefinitely because historical growth rates are unlikely to return to pre-Dencun levels before gas limit increases. However, the storage burden of historical records increases over time. According to current storage policies, each node’s storage drive will eventually be filled with history.
Figure 5 illustrates the storage burden of Ethereum nodes over time and also predicts how this burden may grow over the next 3 years. Predictions are made using growth rates from April 2024. This rate may increase or decrease in the future due to changes in usage patterns or gas limits.
Several key conclusions can be drawn from this chart:
Storage Space Used by History is About Three Times That of State: This disparity will increase over time as the growth rate of history is approximately eight times that of state.
Critical Threshold Around 1.8 TiB: Many nodes will be forced to upgrade their storage drives at this point. A 2TB drive, a common size, only provides 1.8 TiB of usable space. Note that TB (terabytes) and TiB (tebibytes, = 1024^4 bytes) are different units. For many node operators, the “real” critical threshold is even lower because validators must run a consensus client along with an execution client post-merge.
Threshold Reached in 2-3 Years: Any increase in the gas limit will accelerate this timeline. Reaching this threshold will impose a significant maintenance burden on node operators, necessitating the purchase of additional hardware, such as a $300 NVME drive.
Separate Storage for Historical Data: Unlike state data, historical data is append-only and accessed much less frequently. Therefore, it could theoretically be stored separately from state data on cheaper storage media. Some clients, like Geth, already support this separation.
Network IO as a Hardware Limitation: Besides storage capacity, network IO is another major hardware constraint for historical growth. Unlike storage capacity, network IO limits won’t pose immediate issues for nodes but will become significant for future gas limit increases.
To understand how much historical growth the network capacity of a typical Ethereum node can support, it is necessary to describe the relationship between historical growth and various network health metrics, such as reorganization rate, slot misses, lack of finality, missing attestations, sync committee misses, and block proposal delays. Analyzing these metrics is beyond the scope of this article but can be found in previous investigations into consensus layer health [1] [2] [3] 4]. Additionally, the Ethereum Foundation’s @ethpandaops/xatu-overview">Xatu project has been building public datasets to facilitate such analyses.
Historical growth is an easier problem to address than state growth. The proposed EIP-4444 almost completely solves this issue. This EIP changes the requirement for each node from retaining the entire Ethereum history to only retaining one year of history. Once EIP-4444 is implemented, even with significant gas limit increases in the long term, data storage will no longer be a bottleneck for Ethereum scaling. EIP-4444 is essential for the long-term sustainability of the network, as otherwise, the historical data will grow quickly enough to necessitate regular hardware upgrades for network nodes.
Figure 6 shows how EIP-4444 would affect the storage burden of each node over the next 3 years. This is the same as Figure 4, with the addition of finer lines representing the storage burden after EIP-4444 implementation.
Several key conclusions can be drawn from this chart:
EIP-4444 Will Halve the Current Storage Burden: The storage burden will decrease from 1.2 TiB to 633 GiB.
EIP-4444 Will Stabilize the Historical Storage Burden: Assuming a constant rate of historical growth, historical data will be discarded at the same rate it is generated.
Post-EIP-4444, It Will Take Many Years for the Storage Burden to Reach Today’s Level: This is because state growth, which is slower than historical growth, will be the only factor increasing storage burden.
EIP-4444 Will Still Impose Some Storage Burden Due to One Year of Historical Data: However, this burden will be manageable even if Ethereum scales globally. Once the method of handling historical data proves reliable, the one-year retention period in EIP-4444 could be shortened to a few months, weeks, or even less.
EIP-4444 raises the question: if Ethereum nodes themselves do not preserve the history, how should it be preserved? History is crucial for Ethereum’s verification, accounting, and analysis, so it must be preserved. Fortunately, preserving history is straightforward, requiring only 1/n honest data providers, compared to the state consensus problem, which needs 1/3 to 2/3 honest participants. Node operators can verify the authenticity of any historical dataset by: 1) replaying all transactions from Genesis; and 2) checking if these transactions reproduce the same state root as the current chain tip.
There are multiple methods to preserve history, each of which should be deployed in parallel to maximize preservation chances:
Torrents / P2P: Torrents are the simplest and most robust method. Ethereum nodes can periodically package parts of history and share them as public torrent files. For example, a node might create a new history torrent file every 100,000 blocks. Some node clients, like Erigon, already perform this process in a non-standardized way. To standardize this process, all node clients must use the same data format, parameters, and P2P network. Nodes can choose to participate in this network based on their storage and bandwidth capacity. The advantage of torrents is the use of mature open standards supported by a large data tools ecosystem.
Portal Network: The Portal Network is a new network designed specifically to host Ethereum data. This approach is similar to torrents but provides additional features to make data verification easier. The advantage of the Portal Network is that these extra layers of verification offer lightweight clients efficient verification and query utilities for shared datasets.
Cloud Hosting: Cloud storage services like AWS S3 or Cloudflare R2 offer cheap and high-performance options for preserving history. However, this method comes with more legal and business operation risks, as these cloud services might not always be willing or able to host cryptocurrency data.
The remaining implementation challenges are more social than technical. The Ethereum community needs to coordinate on specific implementation details so they can be directly integrated into every node client. Notably, fully syncing from Genesis (instead of snapshot syncing) will require retrieving history from historical data providers rather than Ethereum nodes. These changes do not require a hard fork and can be implemented before Ethereum’s next hard fork, Pectra.
L2s can also use all these methods to preserve the blob data they publish to the mainnet. Blob preservation is 1) more challenging due to the larger total data volume; 2) less critical because blobs are not required for replaying mainnet history. However, blob preservation is necessary for each L2 to replay its own history. Therefore, some form of blob preservation is crucial for the entire Ethereum ecosystem. Moreover, if L2s develop robust blob storage infrastructure, they can also easily store L1 historical data.
A direct comparison of the datasets stored by various node configurations before and after EIP-4444 is helpful. Figure 7 shows the storage burden of Ethereum node types. State data includes accounts and contracts, historical data includes blocks and transactions, and archive data is a set of optional data indexes. The byte counts in the table are based on recent reth snapshots, but the figures should be roughly comparable across other node clients.
Figure 7: Storage burden of Ethereum node types
In language,
Finally, there are additional ecosystem proposals that aim to limit the historical growth rate rather than merely adapt to the current rate. These are helpful for maintaining network IO limits in the short term and storage limits in the long term. While EIP-4444 is essential for the network’s long-term sustainability, these other EIPs will help Ethereum scale more efficiently in the future:
EIP-7623: This proposal suggests repricing call data so that transactions with excessive call data become more expensive. Making these usage patterns more costly will encourage some to switch from call data to blobs, thereby reducing the historical growth rate.
EIP-4488: This proposal imposes limits on the total amount of call data that can be included in each block, applying stricter control on the rate of historical growth.
These EIPs are easier to implement than EIP-4444 and can serve as interim measures before EIP-4444 is ready for production.
The goal of this article is to provide a data-driven understanding of how historical growth operates and how to address this issue. Much of the data presented in this article has traditionally been difficult to access, so we aim to offer novel insights into the historical growth problem.
Historical growth as a bottleneck to Ethereum’s scalability has not received sufficient attention. Even without increasing gas limits, the current practice of retaining history on Ethereum would necessitate many nodes to upgrade their hardware within a few years. Fortunately, this is not a particularly difficult problem to solve. Clear solutions have been outlined in EIP-4444. We believe there should be an acceleration in the implementation of this EIP to make room for future increases in gas limits.
If you are interested in Ethereum scalability research, please contact storm@paradigm.xyz and georgios@paradigm.xyz. We would love to hear your perspectives on this issue and explore potential collaborations. The data and code used in this article can be found on Github.
Big thanks to Thomas Thiery、Beiko Team、Toni Wahrstaetter、Oliver NordbjergandRoman Krasiuk for their review and feedback. Thank you to lAchal Srinivasan for the figures provided in Figure 1 and Figure 7.
History in Ethereum consists of all blocks and transactions executed over its lifetime. This data is needed to sync the chain from the genesis block to its current state. Historical growth refers to the accumulation of new blocks and transactions over time.
Figure 1 shows the relationship between historical growth, various protocol metrics, and Ethereum node hardware limitations. Unlike state growth, historical growth is constrained by a different set of hardware limitations. This growth pressures network I/O because new blocks and transactions must be transmitted across the network. It also strains node storage space, as each Ethereum node stores a complete copy of the historical record. If historical growth outpaces these hardware limitations, nodes will no longer be able to reach stable consensus with their peers. For an overview of state growth and other scaling bottlenecks, refer to Part 1 of this series.
Figure 1: Ethereum expansion bottleneck
Until recently, most of each node’s network throughput was used for transmitting historical records (e.g., new blocks and transactions). This changed with the introduction of blobs in the Dencun hard fork. Blobs now constitute a significant portion of node network activity. However, blobs are not considered part of the historical record because 1) nodes store them for only 2 weeks before discarding them, and 2) they are not required for replaying the chain from Genesis. Due to (1), blobs do not significantly increase the storage burden on each Ethereum node. We will discuss blobs in more detail later in this article.
In this article, we will focus on the growth of historical data and its relationship with state growth. Since state and historical growth share some overlapping hardware constraints, they are interrelated issues; addressing one can help mitigate the other.
Figure 2 shows the rate of historical growth over time since Ethereum’s genesis. Each vertical bar represents one month’s growth. The Y-axis represents the number of GBs added to history in that month. Transactions are categorized by their “destination address” and their size is determined using their RLP byte representation. Contracts that cannot be easily identified are classified as “unknown.” The “other” category includes a long tail of smaller categories like infrastructure and gaming.
Several key conclusions can be drawn from this chart:
The amount of historical data generated by each contract category reveals how Ethereum usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories. This uses the same data as Figure 2, normalized to 100%.
The data reveals four distinct eras of Ethereum usage patterns:
Early Era (Purple): In Ethereum’s initial years, there was minimal on-chain activity. Most of these early contracts are now difficult to identify and are labeled as “unknown” in the chart.
ERC-20 Era (Green): The ERC-20 standard was finalized in late 2015 but did not gain significant traction until 2017 and 2018. By 2019, ERC-20 contracts became the largest category in historical growth.
DEX/DeFi Era (Brown): DEX and DeFi contracts appeared on-chain as early as 2016 and began gaining attention in 2017. However, they did not become the largest historical category until the DeFi Summer of 2020. DeFi and DEX contracts peaked at over 50% of historical growth at various times in 2021 and 2022.
Rollup Era (Gray): In early 2023, L2 Rollups started consistently executing more transactions than the mainnet. This coincided with their contracts generating a large portion of historical data, accounting for about two-thirds of Ethereum’s historical growth in the months leading up to Dencun.
Each era represents increasingly complex usage patterns on Ethereum. Over time, this complexity can be seen as a form of Ethereum scaling, not captured by simple metrics like transactions per second.
In the most recent data (April 2024), rollups no longer generate the majority of historical records. It is unclear whether future historical growth will be dominated by DEX and DeFi or if new usage patterns will emerge.
The introduction of blobs in the Dencun hard fork significantly altered the dynamics of historical growth, allowing Rollups to use inexpensive blobs instead of historical records to post data. Figure 4 zooms in on the historical growth rate around the date of the Dencun upgrade. This chart is similar to Figure 2, but each vertical bar represents one day instead of one month.
Several key conclusions can be drawn from this chart:
Historical Growth from Rollups Has Decreased by About Two-Thirds Since Dencun: Most rollups have shifted from using call data to blobs, significantly reducing the amount of historical data they generate. However, as of April 2024, some rollups have yet to switch from call data to blobs.
Total Historical Growth Has Decreased by About One-Third Since Dencun: Dencun has primarily reduced the historical growth from rollups. Historical growth from other contract categories has slightly increased. Even after Dencun, historical growth remains about eight times that of state growth (details in the next section).
Despite the reduction in historical growth, blobs remain a new addition to Ethereum. It is currently unclear where historical growth will stabilize in the presence of blobs.
Increasing the gas limit will raise the historical growth rate. Therefore, proposals to raise the gas limit (such as Pump the Gas) must consider the relationship between historical growth and hardware bottlenecks on each node.
To determine an acceptable historical growth rate, it is helpful to first examine how long modern node networks and storage node hardware can sustain the current state. Network hardware may sustain the status quo indefinitely because historical growth rates are unlikely to return to pre-Dencun levels before gas limit increases. However, the storage burden of historical records increases over time. According to current storage policies, each node’s storage drive will eventually be filled with history.
Figure 5 illustrates the storage burden of Ethereum nodes over time and also predicts how this burden may grow over the next 3 years. Predictions are made using growth rates from April 2024. This rate may increase or decrease in the future due to changes in usage patterns or gas limits.
Several key conclusions can be drawn from this chart:
Storage Space Used by History is About Three Times That of State: This disparity will increase over time as the growth rate of history is approximately eight times that of state.
Critical Threshold Around 1.8 TiB: Many nodes will be forced to upgrade their storage drives at this point. A 2TB drive, a common size, only provides 1.8 TiB of usable space. Note that TB (terabytes) and TiB (tebibytes, = 1024^4 bytes) are different units. For many node operators, the “real” critical threshold is even lower because validators must run a consensus client along with an execution client post-merge.
Threshold Reached in 2-3 Years: Any increase in the gas limit will accelerate this timeline. Reaching this threshold will impose a significant maintenance burden on node operators, necessitating the purchase of additional hardware, such as a $300 NVME drive.
Separate Storage for Historical Data: Unlike state data, historical data is append-only and accessed much less frequently. Therefore, it could theoretically be stored separately from state data on cheaper storage media. Some clients, like Geth, already support this separation.
Network IO as a Hardware Limitation: Besides storage capacity, network IO is another major hardware constraint for historical growth. Unlike storage capacity, network IO limits won’t pose immediate issues for nodes but will become significant for future gas limit increases.
To understand how much historical growth the network capacity of a typical Ethereum node can support, it is necessary to describe the relationship between historical growth and various network health metrics, such as reorganization rate, slot misses, lack of finality, missing attestations, sync committee misses, and block proposal delays. Analyzing these metrics is beyond the scope of this article but can be found in previous investigations into consensus layer health [1] [2] [3] 4]. Additionally, the Ethereum Foundation’s @ethpandaops/xatu-overview">Xatu project has been building public datasets to facilitate such analyses.
Historical growth is an easier problem to address than state growth. The proposed EIP-4444 almost completely solves this issue. This EIP changes the requirement for each node from retaining the entire Ethereum history to only retaining one year of history. Once EIP-4444 is implemented, even with significant gas limit increases in the long term, data storage will no longer be a bottleneck for Ethereum scaling. EIP-4444 is essential for the long-term sustainability of the network, as otherwise, the historical data will grow quickly enough to necessitate regular hardware upgrades for network nodes.
Figure 6 shows how EIP-4444 would affect the storage burden of each node over the next 3 years. This is the same as Figure 4, with the addition of finer lines representing the storage burden after EIP-4444 implementation.
Several key conclusions can be drawn from this chart:
EIP-4444 Will Halve the Current Storage Burden: The storage burden will decrease from 1.2 TiB to 633 GiB.
EIP-4444 Will Stabilize the Historical Storage Burden: Assuming a constant rate of historical growth, historical data will be discarded at the same rate it is generated.
Post-EIP-4444, It Will Take Many Years for the Storage Burden to Reach Today’s Level: This is because state growth, which is slower than historical growth, will be the only factor increasing storage burden.
EIP-4444 Will Still Impose Some Storage Burden Due to One Year of Historical Data: However, this burden will be manageable even if Ethereum scales globally. Once the method of handling historical data proves reliable, the one-year retention period in EIP-4444 could be shortened to a few months, weeks, or even less.
EIP-4444 raises the question: if Ethereum nodes themselves do not preserve the history, how should it be preserved? History is crucial for Ethereum’s verification, accounting, and analysis, so it must be preserved. Fortunately, preserving history is straightforward, requiring only 1/n honest data providers, compared to the state consensus problem, which needs 1/3 to 2/3 honest participants. Node operators can verify the authenticity of any historical dataset by: 1) replaying all transactions from Genesis; and 2) checking if these transactions reproduce the same state root as the current chain tip.
There are multiple methods to preserve history, each of which should be deployed in parallel to maximize preservation chances:
Torrents / P2P: Torrents are the simplest and most robust method. Ethereum nodes can periodically package parts of history and share them as public torrent files. For example, a node might create a new history torrent file every 100,000 blocks. Some node clients, like Erigon, already perform this process in a non-standardized way. To standardize this process, all node clients must use the same data format, parameters, and P2P network. Nodes can choose to participate in this network based on their storage and bandwidth capacity. The advantage of torrents is the use of mature open standards supported by a large data tools ecosystem.
Portal Network: The Portal Network is a new network designed specifically to host Ethereum data. This approach is similar to torrents but provides additional features to make data verification easier. The advantage of the Portal Network is that these extra layers of verification offer lightweight clients efficient verification and query utilities for shared datasets.
Cloud Hosting: Cloud storage services like AWS S3 or Cloudflare R2 offer cheap and high-performance options for preserving history. However, this method comes with more legal and business operation risks, as these cloud services might not always be willing or able to host cryptocurrency data.
The remaining implementation challenges are more social than technical. The Ethereum community needs to coordinate on specific implementation details so they can be directly integrated into every node client. Notably, fully syncing from Genesis (instead of snapshot syncing) will require retrieving history from historical data providers rather than Ethereum nodes. These changes do not require a hard fork and can be implemented before Ethereum’s next hard fork, Pectra.
L2s can also use all these methods to preserve the blob data they publish to the mainnet. Blob preservation is 1) more challenging due to the larger total data volume; 2) less critical because blobs are not required for replaying mainnet history. However, blob preservation is necessary for each L2 to replay its own history. Therefore, some form of blob preservation is crucial for the entire Ethereum ecosystem. Moreover, if L2s develop robust blob storage infrastructure, they can also easily store L1 historical data.
A direct comparison of the datasets stored by various node configurations before and after EIP-4444 is helpful. Figure 7 shows the storage burden of Ethereum node types. State data includes accounts and contracts, historical data includes blocks and transactions, and archive data is a set of optional data indexes. The byte counts in the table are based on recent reth snapshots, but the figures should be roughly comparable across other node clients.
Figure 7: Storage burden of Ethereum node types
In language,
Finally, there are additional ecosystem proposals that aim to limit the historical growth rate rather than merely adapt to the current rate. These are helpful for maintaining network IO limits in the short term and storage limits in the long term. While EIP-4444 is essential for the network’s long-term sustainability, these other EIPs will help Ethereum scale more efficiently in the future:
EIP-7623: This proposal suggests repricing call data so that transactions with excessive call data become more expensive. Making these usage patterns more costly will encourage some to switch from call data to blobs, thereby reducing the historical growth rate.
EIP-4488: This proposal imposes limits on the total amount of call data that can be included in each block, applying stricter control on the rate of historical growth.
These EIPs are easier to implement than EIP-4444 and can serve as interim measures before EIP-4444 is ready for production.
The goal of this article is to provide a data-driven understanding of how historical growth operates and how to address this issue. Much of the data presented in this article has traditionally been difficult to access, so we aim to offer novel insights into the historical growth problem.
Historical growth as a bottleneck to Ethereum’s scalability has not received sufficient attention. Even without increasing gas limits, the current practice of retaining history on Ethereum would necessitate many nodes to upgrade their hardware within a few years. Fortunately, this is not a particularly difficult problem to solve. Clear solutions have been outlined in EIP-4444. We believe there should be an acceleration in the implementation of this EIP to make room for future increases in gas limits.
If you are interested in Ethereum scalability research, please contact storm@paradigm.xyz and georgios@paradigm.xyz. We would love to hear your perspectives on this issue and explore potential collaborations. The data and code used in this article can be found on Github.
Big thanks to Thomas Thiery、Beiko Team、Toni Wahrstaetter、Oliver NordbjergandRoman Krasiuk for their review and feedback. Thank you to lAchal Srinivasan for the figures provided in Figure 1 and Figure 7.