Decoding Stress with Computer Vision-Based Approach Using Audio Signals for Psychological Event Identification during COVID-19

Main Article Content

Ankit Kumar, Snehal Godse, Sagar Kolekar, Dilip Kumar Jang Bahadur Saini, Deepak Pandita, Pulkit

Abstract

Interpreting psychological events can be costly and quite complex. It is simple to translate such experiences into a person's spoken and nonverbal cues.  The suggested model investigates a computer vision-based method for using an individual's audio signal to identify stressful psychological events. Different people's input speech signals are recorded and compared to the common questionnaire. A series of inquiries pertaining to the second stage of COVID-19 events are included in the questionnaire set. Through additional processing, these speech signals are converted into frequency components by means of the Fast Fourier transformation (FFT) method. A long short-term memory module processes each frequency component and produces temporal information from each frequency band. The features of speech signals are extracted into the temporal frames by this module. The VGG 16 algorithm is used to further classify each temporal frame into stress and un-stress classes. A classifier with 16 layers of architecture is called VGG 16. A feed-forward convolutional neural network called VGG 16 is used to divide the vast array of speech signal features into classes: stressed and unstressed. The proposed model attempts to recognize speech signals as stress indicators. A standard set of questionnaires with a series of interrogation-style questions has been used to develop the stress symptoms in an individual's mind. The audio signals generated by each person's responses are recorded and subsequently analyzed for stress and un-stress classes. The proposed model was able to identify stress in speech signals with 98% accuracy. The time and cost implications of the suggested model are relevant. Medical research is typically costly and time-consuming.LSTM; VGG 16; CNN model; data preprocessing; speech signal. 

Article Details

Section
Articles