A Multi-Dimensional Strategy for Spam Email Classification Leveraging Origin, Text, and Image Features in a Hybrid Model

Main Article Content

Pramod P. Ghogare, Husain H. Dawoodi, Manoj P. Patil

Abstract

This article proposes a novel method of integration of an origin, content, and image-based approach for spam email classification. A machine learning classifier takes each extracted feature from the email after a fine-tuned, customized Natural Language Processing (NLP) for spam email classification. A weight is assigned to each feature's classification result and final classification for spam emails is determined by considering all the feature weights. The proposed model demonstrates outstanding classification ability, achieving an impressive accuracy greater than that of the existing personal email provider and reducing substantial false positives compared to individual feature-based classification. The proposed hybrid model excels across accuracy, recall, precision, and f-score, underscoring its comprehensive effectiveness in classification tasks compared to the classification achieved by using each section of an email separately. To provide accurate classification with the least false positives, it is helpful to consider respective features from multiple sections of an email. While previous research focused on the individual section of the email and considered a few features simultaneously, this article proposes a novel approach to classifying spam email by considering significant features from the email's origin, content, and image with integration.

Article Details

Section
Articles