Assessment of Classification Models for Identifying Cyberbullying Detection

Main Article Content

Mritunjaykumar Ojha, Nilesh M. Patil, Manuj Joshi

Abstract

The most critical challenge in cybersecurity is dealing with cyberbullying. The increase in the complicated dynamics of social media, which are marked by their complexity, variety, subjectivity, and multimodal nature, provide obstacles to the identification of cyberbullying.  This has led to the need for automated mechanisms that can identify these harmful behaviors. This study aims to assess how well various categorization methods detect cyberbullying. For training and testing, our study uses cybersecurity-related data. The models that have been selected include the Linear SVC, Random Forest, Decision Tree, Logistic Regression, and Stochastic Gradient classifiers. We use hyperparameter tuning to improve the performance of the model,, and then we show the results based on important metrics like “accuracy, precision, recall, and F1 score.” The end results highlight that Stochastic Gradient classifier is superior in performance, which has an F1 score of 94.39%, recall of 91.94%, accuracy of 92.81%, and precision of 96.97%. The investigation examines the advantages and disadvantages of each approach, offering insightful information for the cybersecurity field. In addition, suggestions for more studies are made to strengthen the resilience of cyber defenses. This work advances the effectiveness of cybersecurity measures by finding the best models for detecting threats and offering directions for improvement as cyber threats change. Other feature extraction techniques—”Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and Word2Vec”—are merged with the algorithms of “Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF)” to build the model [8]. Our objectives are to analyze the effectiveness of several classification techniques for identifying cyberbullying, such as “Random Forest, Decision Tree, Linear SVC, Logistic Regression, and Stochastic Gradient classifiers,” to improve the performance of the model by using Hyperparameter Tweaking   methods and analyze the outcomes using the F1 score, accuracy, precision, recall, and other critical performance.

Article Details

Section
Articles