Machine Learning in DNA Microarray Analysis for Cancer Classification
Main Article Content
Abstract
Successful patient treatment depends on the early detection of cancer utilizing gene expression data. Since incorrect detection can lead to greater complexity and higher fatality rates, accurate data identification is crucial to preventing it. Many features, each representing a different gene, are commonly found in gene expression data. High dimensionality brought about by the number of characteristics raises computing complexity and resource requirements. Moreover, multicollinearity problems might arise from the presence of duplicated, strongly linked chosen characteristics. Overall classification accuracy may be jeopardized by some constraints in the current works, such as poor performance brought on by deteriorated data quality, overly storage space needs, overfitting problems, and lack of resilience. This study uses an effective framework based on a Machine learning (ML) method to overcome these issues and improve classification results. The data is first collected using five gene cancer databases, data transformation techniques are then used to enhance the data. We employ Min-Max Adjusting to pre-processed data. The most important genes are chosen while superfluous or undesirable ones are removed using the LDA, or linear discriminant analysis technique. The XGBoost is used to classify various malignant and non-cancerous classes based on the chosen gene collection. The resolution of the dimensionality and overfitting issues significantly enhances the performance of the suggested model. Python is used for the implementation, and it is shown that the XGBoost model's overall accuracy across all datasets is 99.37%. The model's overall performance is also assessed using measures including accuracy, remember as well as F1 score. The suggested model has a notably improved performance in terms of effectiveness than existing approaches.
Article Details
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.