Employee Turnover Prediction Based on Ensemble Learning DGNK Model

Lihe Ma, Kechao Wang, Yan Wang, Lin Liu, Ning Sha, Lin Ma

doi:10.52783/jes.1120

PDF

Published: Apr 18, 2024

DOI: https://doi.org/10.52783/jes.1120

Keywords:

Ensemble learning, Oversampling, AUC

Lihe Ma, Kechao Wang, Yan Wang, Lin Liu, Ning Sha, Lin Ma

Abstract

Employee turnover is a problem that can have significant negative impacts on an enterprise. It not only results in the loss of valuable talent and knowledge but also incurs substantial costs in terms of hiring, onboarding, and training new employees. Therefore, predicting employee intent to quit can be crucial for organizations to take proactive measures to prevent it from happening. Early detection of employee turnover intention will help enterprise develop and enhance core competitiveness. This study aims to predict the employee intension to quit. In the present study, more than 1,400 samples containing 31 features of a company’s employees were collected from Kaggle website as data sets. A two-layer DGNK model was designed with decision tree, gradient boosting, naive bayes and k-nearest neighbor model as the primary classifier and gradient boosting as the secondary classifier to build the predictive model of employee turnover intention. The experimental outcomes show that DGNK model based on two-layer ensemble learning has the best outcome, while naive bayes model has the worst outcome. In conclusion, this study highlights the importance of predicting employee turnover intention as an effective strategy to enhance organizational performance and competitive advantage. Furthermore, the success achieved in the study suggests that machine learning models like DGNK can play a crucial role in achieving this goal.

Issue

Vol. 20 No. 2 (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Author Biography

Lihe Ma, Kechao Wang, Yan Wang, Lin Liu, Ning Sha, Lin Ma

[1]^,* Lihe Ma

²Kechao Wang

³Yan Wang

⁴Lin Liu

⁵Ning Sha

⁶Lin Ma

[1] School of Information Engineering, Harbin University, Harbin, China; Heilongjiang Provincial Key Laboratory of the Intelligent Perception and Intelligent Software, Harbin, China

² School of Information Engineering, Harbin University, Harbin, China; Heilongjiang Provincial Key Laboratory of the Intelligent Perception and Intelligent Software, Harbin, China

³ Heilongjiang Government Affairs Big Data Center, Harbin, China

⁴ School of Information Engineering, Harbin University, Harbin, China; Heilongjiang Provincial Key Laboratory of the Intelligent Perception and Intelligent Software, Harbin, China

⁵ Heilongjiang Government Affairs Big Data Center, Harbin, China

⁶ Heilongjiang Government Affairs Big Data Center, Harbin, China

*Corresponding author: Lihe Ma

References

M. Agrawal and S. Agrawal, “A systematic review on artificial intelligence/deep learning applications and challenges to battle against COVID-19 pandemic,” Disaster Advances, vol. 14, no. 8, pp. 90–99, 2021.

Qezelbash-Chamak J, Badamchizadeh S, Eshghi K, et al. A survey of machine learning in kidney disease diagnosis[J]. Machine Learning with Applications, 2022, 10: 100418.

Mostafa N, Ramadan H S M, Elfarouk O. Renewable energy management in smart grids by using big data analytics and machine learning[J]. Machine Learning with Applications, 2022, 9: 100363.

Batarseh F A, Gopinath M, Monken A, et al. Public policymaking for international agricultural trade using association rules and ensemble machine learning[J]. Machine Learning with Applications, 2021, 5: 100046.

Shin D, Cho W I, Park C H K, et al. Detection of minor and major depression through voice as a biomarker using machine learning[J]. Journal of Clinical Medicine, 2021, 10(14): 3046.

Ribeiro M , Silva R ,Moreno S R , et al. Efficient bootstrap stacking ensemble learning model applied to wind power generation forecasting[J]. International Journal of Electrical Power & Energy Systems, 2022, 136:107712-.

Visser L, AlSkaif T, van Sark W. Operational day-ahead solar power forecasting for aggregated PV systems with a varying spatial distribution[J]. Renewable Energy, 2022, 183: 267-282.

Hwangbo L, Kang Y J, Kwon H, et al. Stacking ensemble learning model to predict 6-month mortality in ischemic stroke patients[J]. Scientific Reports, 2022, 12(1): 1-9.

Mienye I D, Sun Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects[J]. IEEE Access, 2022, 10: 99129-99149.

Xu S B ,Huang S Y ,Yuan Z G , et al. Prediction of the Dst Index with Bagging Ensemble-learning Algorithm[J]. The Astrophysical Journal Supplement Series, 2020, 248(1):14.

Yen A ,Morgan H E ,Wang K , et al. Interpretable Machine Learning Model Supported by Parallel Ensemble Learning to Predict Local Recurrence for Patients with Cervical Cancer[J]. International Journal of Radiation Oncology, Biology, Physics, 2021(3S):111.

Peña, F., & Ferri, C. (2020). A comparative study of naive Bayes classifiers for imbalanced data sets. Knowledge-Based Systems, 193, 105436.

Celikyilmaz, A., Sezgin, T. M., Inan, H. (2020). K-nearest neighbor graph-based unsupervised dimensionality reduction for hyperspectral image classification. International Journal of Remote Sensing, 41(7), 2636-2655.

Akter, S., Islam, M. H., & Uddin, M. Z. (2020). Gradient boosting machine for predicting bankruptcy: A comparative analysis with logistic regression. Expert Systems with Applications, 152, 113347.

Kumar, A., & Rani, P. (2021). Breast cancer diagnosis using PCA-based feature selection and decision tree classification. Complex & Intelligent Systems, 7(3), 1689-1704.

Keyvanpour, M., & Vahdat, A. (2020). Ensemble models for imbalanced data classification: A review. Journal of Big Data, 7(1), 1-35.

Zhang, Y., & Ma, J. (2021). Ensemble-based deep learning for large-scale image recognition. Neurocomputing, 448, 473-483.

Xu, J., Song, T., Wu, H., & Chen, J. (2022). Multi-model ensemble with hybrid feature selection for energy consumption forecasting in smart building. Applied Energy, 309, 117955.

Chen, J., Lin, J., & Hao, M. (2020). A comparative analysis of machine learning algorithms for credit risk assessment: Evidence from peer-to-peer lending. Emerging Markets Finance and Trade, 56(10), 2313-2327.

Guyon, I., & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182.

Ullah, I., Al-Maadeed, S., Bouridane, A., & Khelifi, F. (2021). A Novel Texture Feature Selection and Classification using Recursive ROC Analysis for Automatic Facial Expression Recognition. Pattern Recognition Letters, 142, 280-287.

Deka, B., & Sarma, N. (2021). A Comparative Study of Machine Learning Models for Predicting the Severity of Dengue Disease. Health Information Science and Systems, 9(1), 1-14.

Article Sidebar

Main Article Content

Abstract

Article Details

Lihe Ma, Kechao Wang, Yan Wang, Lin Liu, Ning Sha, Lin Ma

References