Diabetes Prediction Using Machine Learning

Main Article Content

Payal Gupta, Ritu Sindhu

Abstract

Diabetes, characterized by high blood sugar levels over time, poses challenges for accurate prediction due to limited labelled data and data quality issues like outliers and missing values. To address these challenges, our research introduces a robust prediction framework. This framework integrates techniques such as outlier rejection, filling missing values, standardizing data, selecting relevant features, and employing K-fold cross-validation. We utilize a range of machine learning algorithms including k-nearest Neighbour, Decision Trees, Random Forest, AdaBoost, Naive Bayes, XGBoost, and Multilayer Perceptron (MLP). Additionally, we propose a weighted ensembling technique to enhance prediction accuracy. The weights for ensembling are determined based on the performance of each classifier using the Area Under ROC Curve (AUC) metric. We optimize model performance through hyperparameter tuning using grid search. Our experiments are conducted using the Pima Indian Diabetes Dataset under uniform conditions. The ensembling classifier we propose achieves superior performance with a sensitivity of 0.789, specificity of 0.934, false omission rate of 0.092, diagnostic odds ratio of 66.234, and AUC of 0.950. This outperforms existing methods by 2.00% in terms of AUC. Our framework surpasses other approaches discussed in the literature and shows promise for improving diabetes prediction. We have released our source code for diabetes prediction to the public.

Article Details

Section
Articles