BALANCING SARCASTIC HINGLISH SHORT TEXT DATA USING AUGMENTATION TECHNIQUES WITH HANDLING SPELLING VARIATIONS

Main Article Content

Rajshree Singh, Reena Srivastava

Abstract

In the real world, there is a significant presence of imbalanced data due to the fact that the classes that make up the datasets are not evenly distributed. Even when using methods that are traditionally used to achieve class balance, such as re-sampling & re-weighting, current deep learning still faces a significant obstacle because of the class imbalance. This study’s major objective is proposing a data augmentation technique to balance the data to improve the sample sizes for the minority classes. Python, a well-known programming language, & multiple methods of machine learning are being employed in the execution of this study. Classification models like Logistic Regression, Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, Extra Trees Classifier, AdaBoost classifier, Gradient Boost classifier was used to implement this study. Precision, recall, & F-score were used to determine which model would be the most effective. According to the findings of this study's analysis, the Naive Bayes approach, which has a F1-Score of 95.85% & has Wn = 3, Cn = 3, & CWn =3 as its parameters, is the technique that yields the most accurate results.

Article Details

Section
Articles