Design and Development of Lung Cancer Prediction Model for Performance Enhancement Using Boosting Ensembled Machine Learning Classifiers with Shuffle-Split Cross Validations

Main Article Content

Venkat P. Patil , Pravin Kshirsagar , Bhuvan Unhelkar , Prasun Chakrabarti


Clinical research on lung cancer has lately made significant progress because of advances in imaging and sequencing techniques. At the same time, there are limitations to how much data the human brain can process and apply effectively. Integrating and evaluating these huge and sophisticated data has substantially described lung cancer from various perspectives, and machine learning-based technologies are critical to this process.  To identify a specific lung cancer disease prediction, this study employs a lung cancer dataset to test a variety of Boosting algorithm models. The purpose of this work is to identify the most effective boosting algorithms and cross-validation strategies for improving performance in lung disease predicting. The method's efficacy is assessed using a variety of performance metrics, including recall, accuracy, precision, F-score, ROC AUC score, and cross validation score. This academic paper applies many Boosting algorithm-based Machine Learning classification algorithms to the standard Lung Cancer Dataset, including Gradient Boost (GB), Extended Boost - XGBOOST (XGB), Adaptive Boost (ADABOOST), Categorized Boost (CATBOOST), and Light Gradient Boost (LGBM). The impact of hybrid combinations of cross-validation approaches and boosting techniques on the accuracy of lung cancer prediction utilising algorithms like Stratified KFold, Shuffle and Split, and Stratified Shuffle split is investigated.  This study presents a hybrid approach that could accurately predict the development of lung cancer. This study discovered that a hybrid combination of the GB Model, a classifier built using machine learning from the Boosting algorithm-based modelling category, and the Shuffle and Split Cross validation approach worked well to create more accurate predictions about lung cancer.

Article Details