Analysis of Diabetic Prediction & Regression System using Machine Learning Algorithms

Main Article Content

Ujwala Ghodeswar, Minal Keote


This paper analyses linear regression classifier machine learning algorithms for prediction of diabetes. Classification & Regression analysis is done by using linear classifier, logistic regression, Naïve bayes, Random Forest, XGBoost, Decision classifier machine learning algorithm.   Datasets of diabetes is taken from Kaggle. There are total 768 patients data with 9 features. Feature Outcome indicates output dependent feature. 20% Test size and 80% training size are considered for calculation of training, testing accuracy. Model is iterated number of times. Highest accuracy is calculated after doing iterations. Naïve bays algorithm gives highest accuracy as compared to Decision tree, linear regression and logical regression algorithms. Training and testing accuracy is calculated for the mentioned machine learning algorithms. K fold cross validation method is used to remove the overfitting problem. further performance of the algorithm is also calculated based on precision, recall, F1 score, support parameters. AUC curve and confidence matrix terms are also used for validation of results. These parameters are derived using the confidence matrix. The results show that naïve bays, XGBoost, and random forest algorithms outperform other algorithms in terms of precision and accuracy. As a result, for the diabetic data set, these three algorithms are utilized to predict people with diabetes disease using AUC and Precision Recall analysis.

Article Details
