Enhancing Software Defect Prediction accuracy using Modified Entropy Calculation in Random Forest Algorithm

Main Article Content

Ranjeetsingh, Suryawanshi,Amol Kadam


Imagine you are trying to classify software defect for a large dataset. How will you choose the best algorithm to do that? For the above problem we have various algorithms like Random Forest, Support Vector Machine, Neural Networks, Naive Bayes, K-Nearest Neighbours, Decision Tree, Logistic Regression etc. One of the most used methods is Random Forest algorithm, which uses multiple Decision Trees to make predictions. However, this algorithm relies on a complex calculation called Entropy, which measures the uncertainty in the data. Entropy is a function that uses natural logarithm which may be time consuming calculation. Is there a better way to calculate entropy? In this research, we have explored a different way to calculate the natural logarithm using the Taylor series expression. It is a series consisting of sum of infinite terms that approximates any function by using its derivatives. We further modified the Random Forest algorithm by replacing the natural logarithm with the Taylor series expression in the Entropy formula. We tested our modified algorithm on dataset and compared its performance with the original Entropy formula. We found that our modification in the algorithm has improved the accuracy of the algorithm on software defect prediction

Article Details

Author Biography

Ranjeetsingh, Suryawanshi,Amol Kadam

Ranjeetsingh Suryawanshi

2Amol Kadam

[1],2Bharati Vidyapeeth Deemed To Be University, College of Engineering, India



*Correspondence: ranjeetsinghsuryawanshi@gmail.com


M. K. Thota, F. H. Shajin, and P. Rajesh, “Survey on software defect prediction techniques,” Int. J. Appl. Sci. Eng., vol. 17, no. 4, pp. 331–344, 2020, doi: 10.6703/IJASE.202012_17(4).331.

X. Dong, Y. Liang, S. Miyamoto, and S. Yamaguchi, “Ensemble learning based software defect prediction,” J. Eng. Res., no. November, 2023, doi: 10.1016/j.jer.2023.10.038.

M. Mustaqeem and T. Siddiqui, “A hybrid software defects prediction model for imbalance datasets using machine learning techniques: (S-SVM model),” J. Auton. Intell., vol. 6, no. 1, pp. 1–19, 2023, doi: 10.32629/jai.v6i1.559.

S. K. Pandey and A. K. Tripathi, “An empirical study toward dealing with noise and class imbalance issues in software defect prediction,” Soft Comput., vol. 25, no. 21, pp. 13465–13492, 2021, doi: 10.1007/s00500-021-06096-3.

K. K. Bejjanki, J. Gyani, and N. Gugulothu, “Class imbalance reduction (CIR): A novel approach to software defect prediction in the presence of class imbalance,” Symmetry (Basel)., vol. 12, no. 3, 2020, doi: 10.3390/sym12030407.

K. J. Eldho, “Impact of Unbalanced Classification on the Performance of Software Defect Prediction Models,” Indian J. Sci. Technol., vol. 15, no. 6, pp. 237–242, 2022, doi: 10.17485/ijst/v15i6.2193.

S. K. Pandey, R. B. Mishra, and A. K. Tripathi, “BPDET: An effective software bug prediction model using deep representation and ensemble learning techniques,” Expert Syst. Appl., vol. 144, p. 113085, 2020, doi: 10.1016/j.eswa.2019.113085.

L. S. Shapley, “A Value for n-Person Games Contributions to the Theory of Games,” In Annals of Mathematical Studies,edited by Harold William Kuhn and Albert William Tucker,Princeton University Press, vol. 2, no. 4. pp. 307–318, 1953. doi: 10.1515/9781400881970-018.

K. Magal.R and S. Gracia Jacob, “Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques,” Int. J. Comput. Appl., vol. 117, no. 23, pp. 18–22, 2015, doi: 10.5120/20693-3582.

L. qiong Chen, C. Wang, and S. long Song, “Software defect prediction based on nested-stacking and heterogeneous feature selection,” Complex Intell. Syst., vol. 8, no. 4, pp. 3333–3348, 2022, doi: 10.1007/s40747-022-00676-y.

N. Gayatri, S. Nickolas, and A. V Reddy, “Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions,” World Congr. Eng. Comput. Sci. Vols 1 2, vol. I, pp. 124–129, 2010, [Online]. Available: http://www.iaeng.org/publication/WCECS2010/WCECS2010_pp124-129.pdf

D. T. Pham and G. A. Ruz, “Unsupervised training of Bayesian networks for data clustering,” Proc. R. Soc. A Math. Phys. Eng. Sci., vol. 465, no. 2109, pp. 2927–2948, 2009, doi: 10.1098/rspa.2009.0065.

R. Chennappan and Vidyaathulasiraman, “An automated software failure prediction technique using hybrid machine learning algorithms,” J. Eng. Res., vol. 11, no. 1, p. 100002, 2023, doi: 10.1016/j.jer.2023.100002.

H. Cao, “A Systematic Study for Learning-Based Software Defect Prediction,” J. Phys. Conf. Ser., vol. 1487, no. 1, 2020, doi: 10.1088/1742-6596/1487/1/012017.

T. D. Buskirk, “Surveying the Forests and Sampling the Trees: An overview of Classification and Regression Trees and Random Forests with applications in Survey Research,” Surv. Pract., vol. 11, no. 1, pp. 1–13, 2018, doi: 10.29115/sp-2018-0003.

H. M. Premalatha and C. V. Srikrishna, “Software fault prediction and classification using cost based random forest in spiral life cycle model,” Int. J. Intell. Eng. Syst., vol. 11, no. 2, pp. 10–17, 2018, doi: 10.22266/IJIES2018.0430.02.

L. Perreault, S. Berardinelli, C. Izurieta, and J. Sheppard, “Using classifiers for software defect detection,” 26th Int. Conf. Softw. Eng. Data Eng. SEDE 2017, pp. 131–137, 2017.

D. A. Pisner and D. M. Schnyer, Support vector machine. Elsevier Inc., 2019. doi: 10.1016/B978-0-12-815739-8.00006-7.

Y. Zhang, D. Lo, X. Xia, and J. Sun, “Combined classifier for cross-project defect prediction: an extended empirical study,” Front. Comput. Sci., vol. 12, no. 2, pp. 280–296, 2018, doi: 10.1007/s11704-017-6015-y.

Q. O. and M. H. M. Assim, “Software Defects Prediction Using Machine Learning Algorithms,” Int. Conf. Data Anal. Bus. Ind. W. Towar. a Sustain. Econ., pp. 1–6, 2020, doi: 10.1109/ICDABI51230.2020.9325677.

S. P. Niculescu, “Artificial neural networks and genetic algorithms in QSAR,” J. Mol. Struct. THEOCHEM, vol. 622, no. 1–2, pp. 71–83, 2003, doi: 10.1016/S0166-1280(02)00619-X.

Shinde, Swati V., and Deepak T. Mane. "Deep learning for COVID-19: COVID-19 Detection based on chest X-ray images by the fusion of deep learning and machine learning techniques." Understanding COVID-19: The Role of Computational Intelligence (2022): 471-500.