Balancing Interpretability and Performance: Optimizing Random Forest Algorithm Based on Point-to-Point Federated Learning

Main Article Content

Chao Gao, Xinhui Yang, Youguang Guo

Abstract

Federated learning is extensively applied in collaborative data scenarios involving multiple data owners. While the majority of state-of-the-art federated learning algorithms are currently black-box models, making it challenging for users to comprehend how decisions are made. Random forest models are extensively utilized in medical contexts owing to their exceptional interpretability. However, when faced with multicenter data, the heterogeneity of data from each center often leads to its predictive performance falling short of expectations. To mitigate this challenge, the present study introduces DFLRF (Decentralized Federated Learning Random Forest), a federated learning algorithm based on random forests. Expanding on conventional random forests, DFLRF employs federated learning to disseminate decision tree models. It assesses and consolidates tree models from all client sites, thereby comprehensively addressing data disparities across various centers. The algorithm selects the optimal decision tree model based on the magnitude of model loss to guarantee the predictive performance of the final federated random forest model. The algorithm undergoes testing on a public dataset. Experimental results demonstrate that, compared to baseline algorithms, DFLRF enhances the AUC by 1.5% and the recall rate by 6%, while also ensuring superior interpretability.

Article Details

Section
Articles