Combating Fraud in Decentralized Finance: A Comprehensive Feature Engineering Scheme for Machine Learning-Based Detection

Main Article Content

Teng Yuan

Abstract

This paper conducts a systematic and in-depth study on the increasingly severe fraudulent transaction problem in the field of Decentralized Finance (DeFi). First, the DeFi ecosystem is carefully examined, focusing on the analysis of Ethereum's architecture, key technical elements, and the layered architecture of DeFi applications. On this basis, core issues such as smart contract security and token system design are discussed, and the technical challenges currently faced by DeFi fraudulent transaction detection are analyzed. To address the limitations of existing research in effectively modeling the complex dynamic associations between transaction entities and the evolution of fraud patterns over time, this paper innovatively proposes a DeFi fraudulent address detection method based on machine learning.


This method focuses on multiple aspects such as transaction behavior, capital flow, and account attributes, designing a comprehensive feature engineering scheme. It extracts 35 key features closely related to fraud detection and further optimizes them to 27 features through wrapped feature selection, constructing a complete and concise feature set.


Different from existing research that mainly relies on the analysis of ordinary transactions and ERC20 token transactions, this paper additionally introduces internal transaction features. Although internal transactions are not directly recorded on the blockchain, they contain rich user behavior information. By capturing and utilizing internal transaction information, this paper further improves the performance of fraud detection models.


Multiple machine learning models are employed and their performance on this task is analyzed and compared. The paper selects four models: K-Nearest Neighbor, Random Forest, XGBoost, and LightGBM, and trains and tests them using the feature set from existing research and the feature set constructed in this paper respectively. Experimental results show that under most models, the performance using the feature set of this paper is superior to using the existing feature set, verifying the effectiveness of the feature engineering scheme proposed in this paper. Among them, the LightGBM model achieves the best overall performance.


This paper conducts fruitful explorations in the field of DeFi fraudulent transaction detection. The proposed theoretical methods, technical models, and practical systems provide key technical support for the security governance of the DeFi ecosystem. 

Article Details

Section
Articles