Optimizing Diabetes and Heart Disease Prediction through Machine Learning Algorithms incorporating Lifestyle Factors
Main Article Content
Abstract
Lifestyle diseases are becoming significant global public health concern. These diseases include hypertension, diabetes, heart diseases, asthma, obesity etc. This paper explores the use of ML models to predict lifestyle diseases, focusing on diabetes and heart disease. We utilized publicly available datasets—PIMA Diabetes and Cleveland Heart Disease to develop eight distinct ML models: k Nearest Neighbors (kNN), Logistic Regression (LR) , Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Deep Neural Network (DNN), ADABoost, and XGBoost. Our approach emphasizes data preprocessing techniques to ensure high-quality input for model training, including the handling of missing values, and standard scaling for normalization. The importance of each feature was assessed using the ANOVA F-value method. We used stratified sampling for splitting dataset to maintain equal class distribution. Our findings indicate that DNN and XGBoost achieved the highest predictive performance on the PIMA Diabetes dataset, with recall values of 0.89 and 0.92, respectively along with AUC scores of 0.836 and 0.83, respectively. For the Cleveland Heart Disease dataset, AdaBoost emerged as the most reliable model, demonstrating a precision of 0.85, a recall of 0.909, and a high AUC of 0.924. Overall, this research highlights the potential of ML techniques in improving the early detection of lifestyle diseases, while also addressing the challenges of dataset quality and model interpretability.
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.