A Novel, Robust and Deep Learning based Speaker Recognition System using Three Different Datasets
Main Article Content
Abstract
Speaker recognition methods have become increasingly popular across various domains like security, domestic services, smart terminals, speech communications and access control. However, current applications face challenges in accurately recognising speakers from short speech segments, common in modern interactive devices like smartphones and smart speakers. This paper introduces a novel, highly accurate and robust approach to address this issue by leveraging Convolutional Neural Networks (CNN) and LSTM (Long-short term memory- Recurrent Neural Networks (RNN). Three different databases, namely the SITW 2016, NIST 2008, and TIMIT, are used to evaluate the system performance of the data during different training and testing durations. According to the experimental results, our model LSTM-RNN with temporal learning and memory features performs significantly better than CNN, particularly when compared to short utterance durations. The proposed model presents the classification accuracy of 84.3%, 95.09%, 94% for 10s and, 85.4%, 96.47%, 95.24% for 20s training duration and for TIMIT, SITW 2016, and NIST 2008 datasets, respectively.
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.