Adaptive Multimodal Sentiment Analysis: Improving Fusion Accuracy with Dynamic Attention for Missing Modality
Main Article Content
Abstract
Multimodal sentiment analysis combines data from different sources, like text and images, to improve the accuracy of sentiment predictions. Our research tackles two critical challenges in MSA: missing data and effective fusion through attention mechanisms. The challenge of missing data can be a significant obstacle in real-time applications. We propose a method to handle cases where either text or images are missing by utilizing the knowledge of available data to fill in the gaps. Our approach begins with feature extraction. For text, we utilize advanced natural language processing models to obtain rich, context-aware representations. For images, we employ deep convolutional neural networks to capture detailed visual features. After extracting these features, we calculate sentiment scores for both modalities to identify the most relevant modality. These sentiment scores play a crucial role in determining attention weights, allowing the model to focus dynamically on the most significant features from each modality. We then concatenate the text and image features according to these attention weights, ensuring a robust and accurate fusion of information. These fused features are fed into a classification algorithm to predict the overall sentiment. Our method outperforms previous approaches, demonstrating the effectiveness of using attention-based fusion networks for multimodal sentiment analysis. This framework also underscores the importance of effectively handling missing data to maintain robust performance in real-time scenarios. It shows the potential for improving sentiment analysis in practical applications by intelligently combining multimodal data using attention-weighted fusion
Article Details
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.