Forward the Original Title: AI如何革新以太坊?从另一个角度看“AI+Blockchain”
Over the past year, with generative AI repeatedly exceeding expectations, a wave of AI productivity revolution has swept through the cryptocurrency community. Many AI concept projects have brought about a myth of wealth creation in the secondary market. At the same time, more and more developers are starting to develop their own “AI+Crypto” projects.
However, upon closer observation, it can be noticed that these projects exhibit severe fungibility, with most projects only focusing on improving “production relations,” such as organizing computing power through decentralized networks or creating “decentralized Hugging Face,” and so on. Few projects attempt genuine integration and innovation from the underlying technology. We believe that the reason for this phenomenon lies in a “domain bias” between the AI and blockchain fields. Despite their extensive intersection, few people truly understand both domains. For example, AI developers find it challenging to understand the technical implementation and historical infrastructure status of Ethereum, let alone propose in-depth optimization solutions.
Taking machine learning (ML), the most basic branch of AI, as an example, it is a technology where machines can make decisions based on data without explicit programming instructions. Machine learning has shown tremendous potential in data analysis and pattern recognition and has become commonplace in Web 2. However, due to its early limitations, even in the forefront of blockchain technology innovation like Ethereum, its architecture, network, and governance mechanisms have not yet effectively utilized machine learning as a tool to solve complex problems.
“Great innovations often arise from interdisciplinary fields.” The purpose of writing this article is to help AI developers better understand the blockchain world and provide new ideas for developers in the Ethereum community. In this article, we first introduce the technical implementation of Ethereum and then propose a solution to apply machine learning, a fundamental AI algorithm, to the Ethereum network to enhance its security, efficiency, and scalability. We hope that this case serves as a starting point to present some different perspectives from the market and stimulate more innovative cross-combinations of “AI+Blockchain” in the developer ecosystem.
Basic data structure
The essence of blockchain is a chain of blocks, and the key to distinguishing chains lies in the chain configuration, an essential part of any blockchain genesis. For Ethereum, chain configuration is used to differentiate between different chains within Ethereum, identifying important upgrade protocols and milestone events. For instance, the DAOForkBlock signifies Ethereum’s hard fork height after the DAO attack, while ConstantinopleBlock marks the block height for the Constantinople upgrade. For major upgrades containing numerous improvement proposals, special fields are set to identify the corresponding block heights. Additionally, Ethereum comprises various test networks and the main network, uniquely identified by ChainID to denote their respective network ecosystems.
The genesis block serves as the zeroth block of the entire blockchain, directly or indirectly referenced by other blocks. Therefore, nodes must load the correct genesis block information upon initialization, with no arbitrary modifications allowed. The genesis block’s configuration information includes the aforementioned chain configuration, along with additional details such as relevant mining rewards, timestamps, difficulty, and gas limits. It’s worth noting that Ethereum’s consensus mechanism has shifted from proof-of-work mining to proof-of-stake.
Ethereum accounts are divided into external accounts and contract accounts. External accounts are controlled by a unique private key, while contract accounts lack private key control and can only be operated by calling contract code execution through external accounts. Each account corresponds to a leaf node in the Ethereum world state, storing the account’s state (various account information and code details).
Transactions: As a decentralized platform primarily for transactions and contracts, Ethereum’s blocks consist of packaged transactions and additional related information. A block is divided into two parts: the block header and the block body. The block header data contains evidence linking all blocks into a chain, including the previous block hash and proofs of the entire Ethereum world state, transaction root, receipt root, and additional data such as difficulty and nonce. The block body stores the transaction list and the list of uncle block headers (as Ethereum has transitioned to proof-of-stake, uncle block references no longer exist).
Transaction receipts provide the results of transaction execution and additional information, which cannot be directly obtained by examining the transaction itself. Specifically, they contain consensus content, transaction information, and block information, indicating whether the transaction processing was successful and providing transaction logs and gas consumption details. Analyzing the information in receipts helps debug smart contract code and optimize gas consumption, while providing confirmation that the transaction has been processed by the network and enabling the viewing of transaction results and impacts.
In Ethereum, gas fees can be understood simply as transaction fees. When you send tokens, execute smart contracts, transfer Ether, or perform various operations on the blockchain within a specific block, these transactions require gas fees. Ethereum’s computational resources are consumed when processing these transactions, and you must pay gas fees to incentivize the network to work for you. Ultimately, the gas fees are paid as transaction fees to miners, and the specific calculation formula can be understood as Fee = Gas Used * Gas Price, where the price per unit of gas is set by the transaction initiator and often determines the speed of transaction inclusion in blocks. Setting the gas price too low may result in transactions not being executed, and it’s also necessary to set a gas limit as an upper bound to avoid unexpected gas consumption due to errors in smart contracts.
Trading pool
In Ethereum, there are a large number of transactions, and compared to centralized systems, the throughput of decentralized systems in terms of transactions per second is significantly lower. With a large number of transactions entering nodes, nodes need to maintain a transaction pool to manage these transactions correctly. Transaction broadcasting occurs through peer-to-peer communication. Specifically, a node will broadcast executable transactions to its neighboring nodes, which will further propagate the transaction to their neighboring nodes, allowing a transaction to spread throughout the Ethereum network within 6 seconds.
Transactions in the trading pool are divided into executable transactions and non-executable transactions. Executable transactions, which have higher priority, are executed and included in blocks, while all transactions entering the pool initially are non-executable and become executable later. Executable and non-executable transactions are recorded in the pending container and the queue container, respectively.
Additionally, the transaction pool maintains a list of local transactions. Local transactions have various advantages, including higher priority, immunity to transaction volume restrictions, and immediate reloading into the transaction pool upon node restart. The local persistence storage of local transactions is achieved through a journal, ensuring that unfinished local transactions are not lost and are periodically updated.
Before a transaction is queued, its validity is verified, including various types of checks such as preventing DOS attacks, preventing negative transactions, and verifying transaction gas limits. The transaction pool’s simple composition can be divided into queue + pending (comprising all transactions). After completing the validity checks, subsequent checks are performed, including checking whether the transaction queue has reached its limit and determining if the remote transactions (non-local transactions) have the lowest price in the transaction pool, replacing the lowest-priced transaction in the pool. For replacing executable transactions, only transactions with a fee increase of up to 10% are allowed to replace those waiting to be executed, and the replaced transactions are stored as non-executable transactions. Additionally, invalid and over-limit transactions are removed during the maintenance process of the transaction pool, and eligible transactions are replaced.
Consensus Mechanism
In the early stages, Ethereum’s consensus theory was based on the difficulty value hash calculation method. In other words, it required computing the hash value of a block to meet the condition of the target difficulty value for the block to be considered valid. As Ethereum’s consensus algorithm has transitioned from Proof of Work (PoW) to Proof of Stake (PoS), I will briefly outline the PoS algorithm here. Ethereum completed the merge of the beacon chain in September 2022, implementing the PoS algorithm. Specifically, in a PoS-based Ethereum, each block’s block time is stable at 12 seconds. Users stake their Ethereum to gain the right to become validators. Then, a random selection process is carried out among the participating stakers to choose a set of validators. In each round, which includes 32 slots, a validator is selected as a proposer for each slot, while the remaining validators in the same slot serve as a committee to validate the proposed block’s legitimacy and judge the legitimacy of blocks from the previous round. The PoS algorithm significantly stabilizes and speeds up block production while greatly avoiding the waste of computing resources.
Signature Algorithm
Ethereum adopts the same signature algorithm standard as Bitcoin, which uses the secp256k1 curve. Specifically, the signature algorithm employed is ECDSA, where the signature is calculated based on the hash of the original message. The signature consists of R+S+V components. Each computation introduces a random number, and R+S represents the original output of ECDSA. The trailing field V, known as the recovery field, indicates the number of attempts needed to successfully recover the public key from the content and signature because finding the coordinates that meet the requirements based on the R value in the elliptic curve may have multiple solutions.
The entire process can be summarized as follows: the transaction data and signer’s relevant information are hashed after being encoded by RLP, and the final signature is obtained by signing with the private key via ECDSA. The curve used in ECDSA is the secp256k1 elliptic curve. Finally, the signed transaction data is combined with the transaction data to obtain a signed transaction data that can be broadcasted.
Ethereum’s data structure not only relies on traditional blockchain technology but also incorporates the Merkle Patricia Tree (MPT), also known as the Merkle Compressed Prefix Tree, for efficient storage and verification of large amounts of data. MPT combines the cryptographic hash function of the Merkle tree and the key path compression feature of the Patricia tree, providing a solution that ensures data integrity and supports rapid searches.
Merkle Patricia Trie (MPT)
In Ethereum, MPT is used to store all state and transaction data, ensuring that any changes to the data are reflected in the root hash of the tree. This means that by verifying the root hash, the integrity and accuracy of the data can be proven without checking the entire database. MPT consists of four types of nodes: leaf nodes, extension nodes, branch nodes, and empty nodes, which together form a tree capable of adapting to dynamic data changes. Whenever data is updated, MPT reflects these changes by adding, deleting, or modifying nodes, while updating the root hash of the tree. Since each node is encrypted through a hash function, any minor changes to the data will lead to significant changes in the root hash, ensuring the security and consistency of the data. Additionally, the design of MPT supports “light client” verification, allowing nodes to verify the existence or status of specific information through only storing the root hash of the tree and necessary path nodes, greatly reducing the need for data storage and processing.
Through MPT, Ethereum not only achieves efficient management and rapid access to data but also ensures the security and decentralization of the network, supporting the operation and development of the entire Ethereum network.
State Machine
The core architecture of Ethereum incorporates the concept of a state machine, where the Ethereum Virtual Machine (EVM) is the runtime environment for executing all smart contract code, and Ethereum itself can be viewed as a globally shared state transition system. The execution of each block can be seen as a state transition process, moving from one globally shared state to another. This design ensures the consistency and decentralization of the Ethereum network and makes the execution results of smart contracts predictable and tamper-proof.
In Ethereum, the state refers to the current information of all accounts, including the balance of each account, stored data, and the code of smart contracts. Whenever a transaction occurs, the EVM calculates and transforms the state based on the transaction content, and this process is efficiently and securely recorded through MPT. Each state transition not only changes the account data but also leads to the updating of MPT, reflected in the change of the root hash of the tree.
The relationship between EVM and MPT is crucial because MPT provides the assurance of data integrity for Ethereum’s state transitions. When the EVM executes transactions and changes account states, relevant MPT nodes are updated to reflect these changes. Since each node of MPT is linked through hashes, any modification to the state will cause a change in the root hash, which is then included in the new block, ensuring the consistency and security of the entire Ethereum state. Now, let’s introduce the Ethereum Virtual Machine (EVM).
EVM
The Ethereum Virtual Machine (EVM) is the fundamental component responsible for executing smart contracts and facilitating state transitions within the Ethereum network. It is thanks to the EVM that Ethereum can be envisioned as a world computer. The EVM is Turing complete, which means that smart contracts deployed on Ethereum can execute arbitrarily complex logic computations. The introduction of the gas mechanism in Ethereum prevents scenarios like infinite loops within contracts, ensuring network stability and security.
At a more technical level, the EVM is a stack-based virtual machine that executes smart contracts using Ethereum-specific bytecode. Developers typically write smart contracts in high-level languages such as Solidity, which are then compiled into bytecode understandable by the EVM for execution. The EVM is the key innovation of the Ethereum blockchain, supporting not only the execution of smart contracts but also providing a solid foundation for the development of decentralized applications (DApps). Through the EVM, Ethereum is shaping a decentralized, secure, and open digital future.
Figure 1 Historical Review of Ethereum
Smart contracts are computer programs running on the Ethereum blockchain. They allow developers to create and deploy various applications, including but not limited to lending apps, decentralized exchanges, insurance, secondary financing, social networks, and NFTs. The security of smart contracts is crucial for these applications. These applications are directly responsible for handling and controlling cryptocurrencies, and any vulnerabilities or malicious attacks on smart contracts pose a direct threat to fund security, potentially resulting in significant economic losses. For example, on February 26, 2024, the DeFi lending protocol Blueberry Protocol suffered an attack due to smart contract logic flaws, resulting in a loss of approximately $1,400,000.
Smart contract vulnerabilities are multifaceted, covering unreasonable business logic, improper access control, insufficient data validation, re-entry attacks, and DOS (Denial of Service) attacks, among other aspects. These vulnerabilities can cause problems with contract execution, affecting the effective operation of smart contracts. Taking DOS attacks as an example, this type of attack consumes network resources by sending a large number of transactions, causing transactions initiated by normal users to be processed slowly, leading to a decline in user experience. Additionally, this can also lead to an increase in transaction gas fees. When network resources are scarce, users may need to pay higher fees to prioritize their transactions for processing.
In addition to this, users on Ethereum also face investment risks, with their fund security being threatened. For instance, there are “rugs,” used to describe cryptocurrencies considered to have little to no value or long-term growth potential. Rugs are often exploited as tools for scams or for pump-and-dump strategies for price manipulation. Investing in rugs carries high investment risks and may result in significant financial losses. Due to their low price and market value, they are vulnerable to manipulation and volatility. These tokens are often used for pump-and-dump schemes and honeypot scams, enticing investors with false projects and stealing their funds. Another common risk of rug pulling, where creators suddenly remove all liquidity from a project, causing the token value to plummet. These scams often involve marketing through false partnerships and endorsements. Once the token price rises, scammers sell their tokens, disappear, leaving investors with worthless tokens. Moreover, investing in rugs also diverts attention and resources from legitimate cryptocurrencies with actual utility and growth potential. Besides rugs, air coins and pyramid scheme coins are also quick profit-making methods. For users lacking professional knowledge and experience, distinguishing them from legitimate cryptocurrencies is particularly challenging.
Efficiency
Two very direct indicators of Ethereum efficiency are transaction speed and gas fees. Transaction speed refers to the number of transactions the Ethereum network can process in a unit of time. This indicator directly reflects the processing capacity of the Ethereum network; the faster the speed, the higher the efficiency. Every transaction on Ethereum requires a certain amount of gas fees to compensate miners for transaction verification. Lower gas fees indicate higher efficiency in Ethereum.
A decrease in transaction speed can lead to an increase in gas fees. Generally, when transaction processing speed decreases, due to limited block space, there may be more competition for transactions to enter the next block. To stand out in the competition, traders typically increase gas fees, as miners often prioritize transactions with higher gas fees for verification. Consequently, higher gas fees decrease user experience satisfaction.
Transactions are just basic activities on Ethereum. In this ecosystem, users can also engage in various activities such as lending, staking, investing, insurance, etc. These can be done through specific DApps. However, given the variety of DApps and the lack of personalized recommendation services similar to traditional industries, users may feel confused when choosing suitable applications and products. This situation can lead to a decrease in user satisfaction, affecting the overall efficiency of the Ethereum ecosystem.
Take lending as an example. Some DeFi lending platforms use over-collateralization mechanisms to maintain the security and stability of their platforms. This means that borrowers need to provide more assets as collateral, which cannot be used by borrowers for other activities during the borrowing period. This leads to a decrease in the utilization of borrower funds, thereby reducing market liquidity.
Machine learning models such as the RMF model, Generative Adversarial Networks (GAN), Decision Tree model, K-Nearest Neighbors algorithm (KNN), and DBSCAN clustering algorithm are playing an important role in Ethereum. The application of these machine learning models in Ethereum can help optimize transaction processing efficiency, enhance the security of smart contracts, achieve user segmentation to provide more personalized services and contribute to maintaining the stability of the network.
Machine learning algorithms are a set of instructions or rules used to analyze data, learn patterns in the data, and make predictions or decisions based on this learning. They automatically learn and improve from the provided data without the need for explicit programming instructions from humans. Machine learning models such as the RMF model, Generative Adversarial Networks (GAN), Decision Tree model, K-Nearest Neighbors algorithm (KNN), and DBSCAN clustering algorithm are playing an important role in Ethereum. The application of these machine learning models in Ethereum can help optimize transaction processing efficiency, enhance the security of smart contracts, achieve user segmentation to provide more personalized services and contribute to maintaining the stability of the network.
The Bayes classifier is efficient in various statistical classification methods, aiming to minimize the probability of classification errors or minimize the average risk under specific cost frameworks. Its design philosophy is deeply rooted in the Bayes theorem, which enables it to determine the probability of an object belonging to a certain class given certain features and make decisions by calculating the posterior probability of the object.
Specifically, the Bayes classifier first considers the prior probability of an object, then applies the Bayesian formula to consider observed data comprehensively, thereby updating beliefs about object classification. Among all possible classifications, the Bayes classifier selects the class with the highest posterior probability and assigns the object to this class. The core advantage of this approach is its ability to naturally handle uncertainty and incomplete information, making it a powerful and flexible tool applicable to a wide range of scenarios.
Figure 2: Bayes Classifier
As illustrated in Figure 2, in supervised machine learning, the Bayesian classifier utilizes data and a probability model based on the Bayes theorem to make classification decisions. By considering the likelihood, prior probabilities of classes and features, the Bayes classifier computes the posterior probability of data points belonging to each class and assigns data points to the class with the highest posterior probability. In the scatter plot on the right, the classifier attempts to find a curve to separate points of different colours, thus minimizing classification errors.
The decision tree algorithm is commonly used in classification and regression tasks. It adopts a hierarchical decision-making approach, splitting trees based on features with higher information gain rates from known data, to train decision trees. In essence, the entire algorithm can autonomously learn decision rules from data to determine variable values. In implementation, the decision tree can decompose complex decision processes into several simple sub-decision processes, forming a tree-like structure.
As shown in Figure 3, each node represents a decision, with criteria for judging certain attributes, while branches represent decision results. Each leaf node represents the final predicted result and category. From the perspective of algorithm composition, decision tree models are intuitive, easy to understand, and possess strong interpretability.
Image 3: Decision tree model
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based spatial clustering algorithm that handles noise, and it appears to be particularly effective for non-connected datasets. This algorithm can discover clusters of arbitrary shapes without the need to specify the number of clusters beforehand, and it demonstrates good robustness against outliers in the dataset. Additionally, the algorithm can effectively identify outlier points in datasets with noise, where noise or outlier points are defined as points in low-density areas, as shown in Figure 4.
Image 4: DBSCAN algorithm identifies noise
The KNN (K-Nearest Neighbors) algorithm can be used for both classification and regression tasks. In classification problems, the algorithm determines the category of the item to be classified based on a voting mechanism, while in regression problems, it calculates the average or weighted average of the values of the k nearest samples to make predictions.
As shown in Figure 5, the working principle of the KNN algorithm in classification is to find the K nearest neighbors of a new data point and then predict the category of the new data point based on the categories of these neighbors. If K=1, then the new data point is simply assigned to the category of its nearest neighbor. If K>1, then typically a voting method is used to determine the category of the new data point, meaning it will be assigned to the category that the majority of its neighbors belong to. When the KNN algorithm is used for regression problems, the basic idea is the same, but the result is the average value of the output values of the K nearest neighbors.
Figure 5: KNN algorithm used for classification
Generative AI is an AI technology that can generate new content (such as text, images, music, etc.) based on input requirements. It is rooted in the advancements of machine learning and deep learning, particularly in fields like natural language processing and image recognition. Generative AI learns patterns and correlations from large amounts of data and then generates entirely new output based on this learned information. The key to generative AI lies in model training, which requires excellent data for learning and training. During this process, the model gradually improves its ability to generate new content by analyzing and understanding the structure, patterns, and relationships within the dataset.
As shown in Figure 6, the introduction of multi-head attention mechanisms and self-attention, along with residual connections and fully connected neural networks, combined with previous word embedding techniques, has greatly enhanced the performance of generative models related to natural language processing.
Figure 6 Transformer model
The RFM model is an analytical model based on user purchasing behavior, which can identify user segments of different value by analyzing their transaction behavior. This model stratifies users based on their Recency (R), Frequency (F), and Monetary value (M) of purchases. As shown in Figure 7, these three indicators collectively form the core of the RFM model. The model scores users based on these three dimensions and ranks them according to their scores to identify the most valuable user segments. Moreover, the model effectively segments customers into different groups to achieve the functionality of user stratification.
Figure 7 RFM layered model
In addressing the security challenges of Ethereum using machine learning techniques, we conducted research in four main areas:
Identifying and Filtering Malicious Transactions Based on Bayes Classifier
By constructing a Bayes classifier, potential spam transactions, including but not limited to those causing DOS attacks through large-scale, frequent, small transactions, can be identified and filtered. This approach effectively maintains the health of the network by analyzing transaction characteristics such as gas prices and transaction frequency, ensuring the stable operation of the Ethereum network.
Generative Adversarial Networks (GANs) and Transformer-based generative networks can be used to generate smart contract code that meets specific requirements and ensures code security as much as possible. However, there are differences in the types of data on which these two rely during the training process; the former mainly relies on insecure code samples, while the latter is the opposite.
By training GANs to learn existing secure contract patterns and constructing self-adversarial models to generate potential insecure code, then learning to identify these insecurities, it’s possible to automatically generate high-quality, safer smart contract code. Leveraging Transformer-based generative network models, by learning from a large number of secure contract examples, one can generate contract code that meets specific requirements and optimizes gas consumption, thereby significantly improving the efficiency and security of smart contract development.
Risk Analysis of Smart Contracts Based on Decision Trees
Utilizing decision trees to analyze smart contract features, such as function call frequency, transaction value, source code complexity, etc., can effectively identify potential risk levels of contracts. By analyzing contract operation patterns and code structures, possible vulnerabilities and risk points can be predicted, providing developers and users with security assessments. This method is expected to significantly improve the security of smart contracts in the Ethereum ecosystem, thereby reducing losses caused by vulnerabilities or malicious code.
Building a Cryptocurrency Evaluation Model to Reduce Investment Risks
By analyzing cryptocurrency transaction data, social media activities, market performance, and other multidimensional information using machine learning algorithms, it’s possible to construct an evaluation model that predicts the likelihood of junk coins. This model can provide valuable references for investors, helping them avoid investment risks and promote the healthy development of the cryptocurrency market.
In addition, the application of machine learning has the potential to further enhance the efficiency of Ethereum. We can delve into the following three key dimensions:
Optimizing the Decision Tree Application of Transaction Pool Queuing Models
Based on decision trees, it’s possible to effectively optimize the queuing mechanism of Ethereum transaction pools. By analyzing transaction characteristics such as gas prices and transaction sizes, decision trees can optimize transaction selection and queuing order. This method can significantly improve transaction processing efficiency, effectively reduce network congestion, and lower user transaction waiting times.
User Stratification and Personalized Service Provision
The RFM model (Recency, Frequency, Monetary value), widely used as an analytical tool in customer relationship management, can effectively stratify users by evaluating the recency of the user’s last transaction, transaction frequency, and transaction amount. Applying the RFM model on the Ethereum platform can help identify high-value user groups, optimize resource allocation, and provide more personalized services, thereby enhancing user satisfaction and overall platform efficiency.
The DBSCAN algorithm can also analyze user transaction behaviour, helping identify different user groups on Ethereum and further provide more customized financial services to different users. This user stratification strategy can optimize marketing strategies, and improve customer satisfaction and service efficiency.
Credit Scoring Based on KNN
The K-Nearest Neighbors (KNN) algorithm can analyze Ethereum user transaction histories and behavior patterns to score user credit, which plays an extremely important role in financial activities such as lending. Credit scoring helps financial institutions and lending platforms assess borrowers’ repayment ability and credit risk more accurately, thereby making more precise lending decisions. This can avoid over-lending and improve market liquidity.
From the perspective of macro-level fund allocation, Ethereum, as the world’s largest distributed computer, cannot have too much investment in its infrastructure layer. It needs to attract more developers from diverse backgrounds to participate in co-construction. In this article, by reviewing Ethereum’s technical implementations and the challenges it faces, we envision a series of intuitive potential applications of machine learning. We also eagerly anticipate AI developers within the community to deliver these visions into tangible value.
As on-chain computing power gradually increases, we can anticipate the development of more sophisticated models for network management, transaction monitoring, security audits, and various other aspects, ultimately enhancing the efficiency and security of the Ethereum network.
Looking further ahead, AI/agent-driven governance mechanisms could also become a major point of innovation within the Ethereum ecosystem. Such mechanisms would bring about more efficient, transparent, and automated decision-making processes, resulting in a more flexible and reliable governance structure for the Ethereum platform. These future directions will not only drive innovation in Ethereum technology but also provide users with a higher-quality on-chain experience.
Forward the Original Title: AI如何革新以太坊?从另一个角度看“AI+Blockchain”
Over the past year, with generative AI repeatedly exceeding expectations, a wave of AI productivity revolution has swept through the cryptocurrency community. Many AI concept projects have brought about a myth of wealth creation in the secondary market. At the same time, more and more developers are starting to develop their own “AI+Crypto” projects.
However, upon closer observation, it can be noticed that these projects exhibit severe fungibility, with most projects only focusing on improving “production relations,” such as organizing computing power through decentralized networks or creating “decentralized Hugging Face,” and so on. Few projects attempt genuine integration and innovation from the underlying technology. We believe that the reason for this phenomenon lies in a “domain bias” between the AI and blockchain fields. Despite their extensive intersection, few people truly understand both domains. For example, AI developers find it challenging to understand the technical implementation and historical infrastructure status of Ethereum, let alone propose in-depth optimization solutions.
Taking machine learning (ML), the most basic branch of AI, as an example, it is a technology where machines can make decisions based on data without explicit programming instructions. Machine learning has shown tremendous potential in data analysis and pattern recognition and has become commonplace in Web 2. However, due to its early limitations, even in the forefront of blockchain technology innovation like Ethereum, its architecture, network, and governance mechanisms have not yet effectively utilized machine learning as a tool to solve complex problems.
“Great innovations often arise from interdisciplinary fields.” The purpose of writing this article is to help AI developers better understand the blockchain world and provide new ideas for developers in the Ethereum community. In this article, we first introduce the technical implementation of Ethereum and then propose a solution to apply machine learning, a fundamental AI algorithm, to the Ethereum network to enhance its security, efficiency, and scalability. We hope that this case serves as a starting point to present some different perspectives from the market and stimulate more innovative cross-combinations of “AI+Blockchain” in the developer ecosystem.
Basic data structure
The essence of blockchain is a chain of blocks, and the key to distinguishing chains lies in the chain configuration, an essential part of any blockchain genesis. For Ethereum, chain configuration is used to differentiate between different chains within Ethereum, identifying important upgrade protocols and milestone events. For instance, the DAOForkBlock signifies Ethereum’s hard fork height after the DAO attack, while ConstantinopleBlock marks the block height for the Constantinople upgrade. For major upgrades containing numerous improvement proposals, special fields are set to identify the corresponding block heights. Additionally, Ethereum comprises various test networks and the main network, uniquely identified by ChainID to denote their respective network ecosystems.
The genesis block serves as the zeroth block of the entire blockchain, directly or indirectly referenced by other blocks. Therefore, nodes must load the correct genesis block information upon initialization, with no arbitrary modifications allowed. The genesis block’s configuration information includes the aforementioned chain configuration, along with additional details such as relevant mining rewards, timestamps, difficulty, and gas limits. It’s worth noting that Ethereum’s consensus mechanism has shifted from proof-of-work mining to proof-of-stake.
Ethereum accounts are divided into external accounts and contract accounts. External accounts are controlled by a unique private key, while contract accounts lack private key control and can only be operated by calling contract code execution through external accounts. Each account corresponds to a leaf node in the Ethereum world state, storing the account’s state (various account information and code details).
Transactions: As a decentralized platform primarily for transactions and contracts, Ethereum’s blocks consist of packaged transactions and additional related information. A block is divided into two parts: the block header and the block body. The block header data contains evidence linking all blocks into a chain, including the previous block hash and proofs of the entire Ethereum world state, transaction root, receipt root, and additional data such as difficulty and nonce. The block body stores the transaction list and the list of uncle block headers (as Ethereum has transitioned to proof-of-stake, uncle block references no longer exist).
Transaction receipts provide the results of transaction execution and additional information, which cannot be directly obtained by examining the transaction itself. Specifically, they contain consensus content, transaction information, and block information, indicating whether the transaction processing was successful and providing transaction logs and gas consumption details. Analyzing the information in receipts helps debug smart contract code and optimize gas consumption, while providing confirmation that the transaction has been processed by the network and enabling the viewing of transaction results and impacts.
In Ethereum, gas fees can be understood simply as transaction fees. When you send tokens, execute smart contracts, transfer Ether, or perform various operations on the blockchain within a specific block, these transactions require gas fees. Ethereum’s computational resources are consumed when processing these transactions, and you must pay gas fees to incentivize the network to work for you. Ultimately, the gas fees are paid as transaction fees to miners, and the specific calculation formula can be understood as Fee = Gas Used * Gas Price, where the price per unit of gas is set by the transaction initiator and often determines the speed of transaction inclusion in blocks. Setting the gas price too low may result in transactions not being executed, and it’s also necessary to set a gas limit as an upper bound to avoid unexpected gas consumption due to errors in smart contracts.
Trading pool
In Ethereum, there are a large number of transactions, and compared to centralized systems, the throughput of decentralized systems in terms of transactions per second is significantly lower. With a large number of transactions entering nodes, nodes need to maintain a transaction pool to manage these transactions correctly. Transaction broadcasting occurs through peer-to-peer communication. Specifically, a node will broadcast executable transactions to its neighboring nodes, which will further propagate the transaction to their neighboring nodes, allowing a transaction to spread throughout the Ethereum network within 6 seconds.
Transactions in the trading pool are divided into executable transactions and non-executable transactions. Executable transactions, which have higher priority, are executed and included in blocks, while all transactions entering the pool initially are non-executable and become executable later. Executable and non-executable transactions are recorded in the pending container and the queue container, respectively.
Additionally, the transaction pool maintains a list of local transactions. Local transactions have various advantages, including higher priority, immunity to transaction volume restrictions, and immediate reloading into the transaction pool upon node restart. The local persistence storage of local transactions is achieved through a journal, ensuring that unfinished local transactions are not lost and are periodically updated.
Before a transaction is queued, its validity is verified, including various types of checks such as preventing DOS attacks, preventing negative transactions, and verifying transaction gas limits. The transaction pool’s simple composition can be divided into queue + pending (comprising all transactions). After completing the validity checks, subsequent checks are performed, including checking whether the transaction queue has reached its limit and determining if the remote transactions (non-local transactions) have the lowest price in the transaction pool, replacing the lowest-priced transaction in the pool. For replacing executable transactions, only transactions with a fee increase of up to 10% are allowed to replace those waiting to be executed, and the replaced transactions are stored as non-executable transactions. Additionally, invalid and over-limit transactions are removed during the maintenance process of the transaction pool, and eligible transactions are replaced.
Consensus Mechanism
In the early stages, Ethereum’s consensus theory was based on the difficulty value hash calculation method. In other words, it required computing the hash value of a block to meet the condition of the target difficulty value for the block to be considered valid. As Ethereum’s consensus algorithm has transitioned from Proof of Work (PoW) to Proof of Stake (PoS), I will briefly outline the PoS algorithm here. Ethereum completed the merge of the beacon chain in September 2022, implementing the PoS algorithm. Specifically, in a PoS-based Ethereum, each block’s block time is stable at 12 seconds. Users stake their Ethereum to gain the right to become validators. Then, a random selection process is carried out among the participating stakers to choose a set of validators. In each round, which includes 32 slots, a validator is selected as a proposer for each slot, while the remaining validators in the same slot serve as a committee to validate the proposed block’s legitimacy and judge the legitimacy of blocks from the previous round. The PoS algorithm significantly stabilizes and speeds up block production while greatly avoiding the waste of computing resources.
Signature Algorithm
Ethereum adopts the same signature algorithm standard as Bitcoin, which uses the secp256k1 curve. Specifically, the signature algorithm employed is ECDSA, where the signature is calculated based on the hash of the original message. The signature consists of R+S+V components. Each computation introduces a random number, and R+S represents the original output of ECDSA. The trailing field V, known as the recovery field, indicates the number of attempts needed to successfully recover the public key from the content and signature because finding the coordinates that meet the requirements based on the R value in the elliptic curve may have multiple solutions.
The entire process can be summarized as follows: the transaction data and signer’s relevant information are hashed after being encoded by RLP, and the final signature is obtained by signing with the private key via ECDSA. The curve used in ECDSA is the secp256k1 elliptic curve. Finally, the signed transaction data is combined with the transaction data to obtain a signed transaction data that can be broadcasted.
Ethereum’s data structure not only relies on traditional blockchain technology but also incorporates the Merkle Patricia Tree (MPT), also known as the Merkle Compressed Prefix Tree, for efficient storage and verification of large amounts of data. MPT combines the cryptographic hash function of the Merkle tree and the key path compression feature of the Patricia tree, providing a solution that ensures data integrity and supports rapid searches.
Merkle Patricia Trie (MPT)
In Ethereum, MPT is used to store all state and transaction data, ensuring that any changes to the data are reflected in the root hash of the tree. This means that by verifying the root hash, the integrity and accuracy of the data can be proven without checking the entire database. MPT consists of four types of nodes: leaf nodes, extension nodes, branch nodes, and empty nodes, which together form a tree capable of adapting to dynamic data changes. Whenever data is updated, MPT reflects these changes by adding, deleting, or modifying nodes, while updating the root hash of the tree. Since each node is encrypted through a hash function, any minor changes to the data will lead to significant changes in the root hash, ensuring the security and consistency of the data. Additionally, the design of MPT supports “light client” verification, allowing nodes to verify the existence or status of specific information through only storing the root hash of the tree and necessary path nodes, greatly reducing the need for data storage and processing.
Through MPT, Ethereum not only achieves efficient management and rapid access to data but also ensures the security and decentralization of the network, supporting the operation and development of the entire Ethereum network.
State Machine
The core architecture of Ethereum incorporates the concept of a state machine, where the Ethereum Virtual Machine (EVM) is the runtime environment for executing all smart contract code, and Ethereum itself can be viewed as a globally shared state transition system. The execution of each block can be seen as a state transition process, moving from one globally shared state to another. This design ensures the consistency and decentralization of the Ethereum network and makes the execution results of smart contracts predictable and tamper-proof.
In Ethereum, the state refers to the current information of all accounts, including the balance of each account, stored data, and the code of smart contracts. Whenever a transaction occurs, the EVM calculates and transforms the state based on the transaction content, and this process is efficiently and securely recorded through MPT. Each state transition not only changes the account data but also leads to the updating of MPT, reflected in the change of the root hash of the tree.
The relationship between EVM and MPT is crucial because MPT provides the assurance of data integrity for Ethereum’s state transitions. When the EVM executes transactions and changes account states, relevant MPT nodes are updated to reflect these changes. Since each node of MPT is linked through hashes, any modification to the state will cause a change in the root hash, which is then included in the new block, ensuring the consistency and security of the entire Ethereum state. Now, let’s introduce the Ethereum Virtual Machine (EVM).
EVM
The Ethereum Virtual Machine (EVM) is the fundamental component responsible for executing smart contracts and facilitating state transitions within the Ethereum network. It is thanks to the EVM that Ethereum can be envisioned as a world computer. The EVM is Turing complete, which means that smart contracts deployed on Ethereum can execute arbitrarily complex logic computations. The introduction of the gas mechanism in Ethereum prevents scenarios like infinite loops within contracts, ensuring network stability and security.
At a more technical level, the EVM is a stack-based virtual machine that executes smart contracts using Ethereum-specific bytecode. Developers typically write smart contracts in high-level languages such as Solidity, which are then compiled into bytecode understandable by the EVM for execution. The EVM is the key innovation of the Ethereum blockchain, supporting not only the execution of smart contracts but also providing a solid foundation for the development of decentralized applications (DApps). Through the EVM, Ethereum is shaping a decentralized, secure, and open digital future.
Figure 1 Historical Review of Ethereum
Smart contracts are computer programs running on the Ethereum blockchain. They allow developers to create and deploy various applications, including but not limited to lending apps, decentralized exchanges, insurance, secondary financing, social networks, and NFTs. The security of smart contracts is crucial for these applications. These applications are directly responsible for handling and controlling cryptocurrencies, and any vulnerabilities or malicious attacks on smart contracts pose a direct threat to fund security, potentially resulting in significant economic losses. For example, on February 26, 2024, the DeFi lending protocol Blueberry Protocol suffered an attack due to smart contract logic flaws, resulting in a loss of approximately $1,400,000.
Smart contract vulnerabilities are multifaceted, covering unreasonable business logic, improper access control, insufficient data validation, re-entry attacks, and DOS (Denial of Service) attacks, among other aspects. These vulnerabilities can cause problems with contract execution, affecting the effective operation of smart contracts. Taking DOS attacks as an example, this type of attack consumes network resources by sending a large number of transactions, causing transactions initiated by normal users to be processed slowly, leading to a decline in user experience. Additionally, this can also lead to an increase in transaction gas fees. When network resources are scarce, users may need to pay higher fees to prioritize their transactions for processing.
In addition to this, users on Ethereum also face investment risks, with their fund security being threatened. For instance, there are “rugs,” used to describe cryptocurrencies considered to have little to no value or long-term growth potential. Rugs are often exploited as tools for scams or for pump-and-dump strategies for price manipulation. Investing in rugs carries high investment risks and may result in significant financial losses. Due to their low price and market value, they are vulnerable to manipulation and volatility. These tokens are often used for pump-and-dump schemes and honeypot scams, enticing investors with false projects and stealing their funds. Another common risk of rug pulling, where creators suddenly remove all liquidity from a project, causing the token value to plummet. These scams often involve marketing through false partnerships and endorsements. Once the token price rises, scammers sell their tokens, disappear, leaving investors with worthless tokens. Moreover, investing in rugs also diverts attention and resources from legitimate cryptocurrencies with actual utility and growth potential. Besides rugs, air coins and pyramid scheme coins are also quick profit-making methods. For users lacking professional knowledge and experience, distinguishing them from legitimate cryptocurrencies is particularly challenging.
Efficiency
Two very direct indicators of Ethereum efficiency are transaction speed and gas fees. Transaction speed refers to the number of transactions the Ethereum network can process in a unit of time. This indicator directly reflects the processing capacity of the Ethereum network; the faster the speed, the higher the efficiency. Every transaction on Ethereum requires a certain amount of gas fees to compensate miners for transaction verification. Lower gas fees indicate higher efficiency in Ethereum.
A decrease in transaction speed can lead to an increase in gas fees. Generally, when transaction processing speed decreases, due to limited block space, there may be more competition for transactions to enter the next block. To stand out in the competition, traders typically increase gas fees, as miners often prioritize transactions with higher gas fees for verification. Consequently, higher gas fees decrease user experience satisfaction.
Transactions are just basic activities on Ethereum. In this ecosystem, users can also engage in various activities such as lending, staking, investing, insurance, etc. These can be done through specific DApps. However, given the variety of DApps and the lack of personalized recommendation services similar to traditional industries, users may feel confused when choosing suitable applications and products. This situation can lead to a decrease in user satisfaction, affecting the overall efficiency of the Ethereum ecosystem.
Take lending as an example. Some DeFi lending platforms use over-collateralization mechanisms to maintain the security and stability of their platforms. This means that borrowers need to provide more assets as collateral, which cannot be used by borrowers for other activities during the borrowing period. This leads to a decrease in the utilization of borrower funds, thereby reducing market liquidity.
Machine learning models such as the RMF model, Generative Adversarial Networks (GAN), Decision Tree model, K-Nearest Neighbors algorithm (KNN), and DBSCAN clustering algorithm are playing an important role in Ethereum. The application of these machine learning models in Ethereum can help optimize transaction processing efficiency, enhance the security of smart contracts, achieve user segmentation to provide more personalized services and contribute to maintaining the stability of the network.
Machine learning algorithms are a set of instructions or rules used to analyze data, learn patterns in the data, and make predictions or decisions based on this learning. They automatically learn and improve from the provided data without the need for explicit programming instructions from humans. Machine learning models such as the RMF model, Generative Adversarial Networks (GAN), Decision Tree model, K-Nearest Neighbors algorithm (KNN), and DBSCAN clustering algorithm are playing an important role in Ethereum. The application of these machine learning models in Ethereum can help optimize transaction processing efficiency, enhance the security of smart contracts, achieve user segmentation to provide more personalized services and contribute to maintaining the stability of the network.
The Bayes classifier is efficient in various statistical classification methods, aiming to minimize the probability of classification errors or minimize the average risk under specific cost frameworks. Its design philosophy is deeply rooted in the Bayes theorem, which enables it to determine the probability of an object belonging to a certain class given certain features and make decisions by calculating the posterior probability of the object.
Specifically, the Bayes classifier first considers the prior probability of an object, then applies the Bayesian formula to consider observed data comprehensively, thereby updating beliefs about object classification. Among all possible classifications, the Bayes classifier selects the class with the highest posterior probability and assigns the object to this class. The core advantage of this approach is its ability to naturally handle uncertainty and incomplete information, making it a powerful and flexible tool applicable to a wide range of scenarios.
Figure 2: Bayes Classifier
As illustrated in Figure 2, in supervised machine learning, the Bayesian classifier utilizes data and a probability model based on the Bayes theorem to make classification decisions. By considering the likelihood, prior probabilities of classes and features, the Bayes classifier computes the posterior probability of data points belonging to each class and assigns data points to the class with the highest posterior probability. In the scatter plot on the right, the classifier attempts to find a curve to separate points of different colours, thus minimizing classification errors.
The decision tree algorithm is commonly used in classification and regression tasks. It adopts a hierarchical decision-making approach, splitting trees based on features with higher information gain rates from known data, to train decision trees. In essence, the entire algorithm can autonomously learn decision rules from data to determine variable values. In implementation, the decision tree can decompose complex decision processes into several simple sub-decision processes, forming a tree-like structure.
As shown in Figure 3, each node represents a decision, with criteria for judging certain attributes, while branches represent decision results. Each leaf node represents the final predicted result and category. From the perspective of algorithm composition, decision tree models are intuitive, easy to understand, and possess strong interpretability.
Image 3: Decision tree model
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based spatial clustering algorithm that handles noise, and it appears to be particularly effective for non-connected datasets. This algorithm can discover clusters of arbitrary shapes without the need to specify the number of clusters beforehand, and it demonstrates good robustness against outliers in the dataset. Additionally, the algorithm can effectively identify outlier points in datasets with noise, where noise or outlier points are defined as points in low-density areas, as shown in Figure 4.
Image 4: DBSCAN algorithm identifies noise
The KNN (K-Nearest Neighbors) algorithm can be used for both classification and regression tasks. In classification problems, the algorithm determines the category of the item to be classified based on a voting mechanism, while in regression problems, it calculates the average or weighted average of the values of the k nearest samples to make predictions.
As shown in Figure 5, the working principle of the KNN algorithm in classification is to find the K nearest neighbors of a new data point and then predict the category of the new data point based on the categories of these neighbors. If K=1, then the new data point is simply assigned to the category of its nearest neighbor. If K>1, then typically a voting method is used to determine the category of the new data point, meaning it will be assigned to the category that the majority of its neighbors belong to. When the KNN algorithm is used for regression problems, the basic idea is the same, but the result is the average value of the output values of the K nearest neighbors.
Figure 5: KNN algorithm used for classification
Generative AI is an AI technology that can generate new content (such as text, images, music, etc.) based on input requirements. It is rooted in the advancements of machine learning and deep learning, particularly in fields like natural language processing and image recognition. Generative AI learns patterns and correlations from large amounts of data and then generates entirely new output based on this learned information. The key to generative AI lies in model training, which requires excellent data for learning and training. During this process, the model gradually improves its ability to generate new content by analyzing and understanding the structure, patterns, and relationships within the dataset.
As shown in Figure 6, the introduction of multi-head attention mechanisms and self-attention, along with residual connections and fully connected neural networks, combined with previous word embedding techniques, has greatly enhanced the performance of generative models related to natural language processing.
Figure 6 Transformer model
The RFM model is an analytical model based on user purchasing behavior, which can identify user segments of different value by analyzing their transaction behavior. This model stratifies users based on their Recency (R), Frequency (F), and Monetary value (M) of purchases. As shown in Figure 7, these three indicators collectively form the core of the RFM model. The model scores users based on these three dimensions and ranks them according to their scores to identify the most valuable user segments. Moreover, the model effectively segments customers into different groups to achieve the functionality of user stratification.
Figure 7 RFM layered model
In addressing the security challenges of Ethereum using machine learning techniques, we conducted research in four main areas:
Identifying and Filtering Malicious Transactions Based on Bayes Classifier
By constructing a Bayes classifier, potential spam transactions, including but not limited to those causing DOS attacks through large-scale, frequent, small transactions, can be identified and filtered. This approach effectively maintains the health of the network by analyzing transaction characteristics such as gas prices and transaction frequency, ensuring the stable operation of the Ethereum network.
Generative Adversarial Networks (GANs) and Transformer-based generative networks can be used to generate smart contract code that meets specific requirements and ensures code security as much as possible. However, there are differences in the types of data on which these two rely during the training process; the former mainly relies on insecure code samples, while the latter is the opposite.
By training GANs to learn existing secure contract patterns and constructing self-adversarial models to generate potential insecure code, then learning to identify these insecurities, it’s possible to automatically generate high-quality, safer smart contract code. Leveraging Transformer-based generative network models, by learning from a large number of secure contract examples, one can generate contract code that meets specific requirements and optimizes gas consumption, thereby significantly improving the efficiency and security of smart contract development.
Risk Analysis of Smart Contracts Based on Decision Trees
Utilizing decision trees to analyze smart contract features, such as function call frequency, transaction value, source code complexity, etc., can effectively identify potential risk levels of contracts. By analyzing contract operation patterns and code structures, possible vulnerabilities and risk points can be predicted, providing developers and users with security assessments. This method is expected to significantly improve the security of smart contracts in the Ethereum ecosystem, thereby reducing losses caused by vulnerabilities or malicious code.
Building a Cryptocurrency Evaluation Model to Reduce Investment Risks
By analyzing cryptocurrency transaction data, social media activities, market performance, and other multidimensional information using machine learning algorithms, it’s possible to construct an evaluation model that predicts the likelihood of junk coins. This model can provide valuable references for investors, helping them avoid investment risks and promote the healthy development of the cryptocurrency market.
In addition, the application of machine learning has the potential to further enhance the efficiency of Ethereum. We can delve into the following three key dimensions:
Optimizing the Decision Tree Application of Transaction Pool Queuing Models
Based on decision trees, it’s possible to effectively optimize the queuing mechanism of Ethereum transaction pools. By analyzing transaction characteristics such as gas prices and transaction sizes, decision trees can optimize transaction selection and queuing order. This method can significantly improve transaction processing efficiency, effectively reduce network congestion, and lower user transaction waiting times.
User Stratification and Personalized Service Provision
The RFM model (Recency, Frequency, Monetary value), widely used as an analytical tool in customer relationship management, can effectively stratify users by evaluating the recency of the user’s last transaction, transaction frequency, and transaction amount. Applying the RFM model on the Ethereum platform can help identify high-value user groups, optimize resource allocation, and provide more personalized services, thereby enhancing user satisfaction and overall platform efficiency.
The DBSCAN algorithm can also analyze user transaction behaviour, helping identify different user groups on Ethereum and further provide more customized financial services to different users. This user stratification strategy can optimize marketing strategies, and improve customer satisfaction and service efficiency.
Credit Scoring Based on KNN
The K-Nearest Neighbors (KNN) algorithm can analyze Ethereum user transaction histories and behavior patterns to score user credit, which plays an extremely important role in financial activities such as lending. Credit scoring helps financial institutions and lending platforms assess borrowers’ repayment ability and credit risk more accurately, thereby making more precise lending decisions. This can avoid over-lending and improve market liquidity.
From the perspective of macro-level fund allocation, Ethereum, as the world’s largest distributed computer, cannot have too much investment in its infrastructure layer. It needs to attract more developers from diverse backgrounds to participate in co-construction. In this article, by reviewing Ethereum’s technical implementations and the challenges it faces, we envision a series of intuitive potential applications of machine learning. We also eagerly anticipate AI developers within the community to deliver these visions into tangible value.
As on-chain computing power gradually increases, we can anticipate the development of more sophisticated models for network management, transaction monitoring, security audits, and various other aspects, ultimately enhancing the efficiency and security of the Ethereum network.
Looking further ahead, AI/agent-driven governance mechanisms could also become a major point of innovation within the Ethereum ecosystem. Such mechanisms would bring about more efficient, transparent, and automated decision-making processes, resulting in a more flexible and reliable governance structure for the Ethereum platform. These future directions will not only drive innovation in Ethereum technology but also provide users with a higher-quality on-chain experience.