Speech Emotion Recognition Using Deep Federated Learning Techniques

Main Article Content

Kshirod Sarmah, Hem Chandra Das, Dipen Nath, Dharmeswar Tarang

Abstract

Speech Emotion Recognition (SER) plays a pivotal role in human-computer interaction and affective computing applications. Traditional SER approaches often face challenges related to privacy, bias, and scalability due to centralized data aggregation. In this paper, we propose a novel approach leveraging Deep Federated Learning (DFL) techniques for SER, aiming to address these challenges. By decentralizing the training process and allowing model updates to occur locally on user devices, DFL preserves user privacy while enabling the aggregation of knowledge from diverse data sources. The methodology includes preparing the data, setting up federated learning, initializing pre-trained models, and updating the models iteratively. Evaluation criteria including F1-score, accuracy, precision, and recall confirm the model's effectiveness across a range of emotion categories. This work advances the field of SER technology by presenting a practical, privacy-preserving method that is both reliable and effective. In order to enhance detection accuracy and protect local client privacy on edge devices, the model is additionally combined with a deep federated learning protocol. The results demonstrate that the suggested DFL-based model performs competitively better when compared to various baseline audiovisual emotion identification models, and that the implementation of federated learning increased classification accuracy by approximately from 3% to 4%.

Article Details

Section
Articles