IICVoiC - An Intelligent Image Captioning and Voice Converter for Visually Impaired
Main Article Content
Abstract
The most crucial sense for human beings is vision. Visual impairment is challenging to live with, perform daily chores, etc. More than 2.2 billion people worldwide suffer from some form of vision impairment, according to a World Health Organization report. This highlights a crucial need to improve the quality of the life of those visually impaired. In achieving this endeavor, assistive technology plays an essential role. Image captioning is one such approach that after the advent of deep learning has great potential to enhance the daily lives of visually impaired people. Automatically describing the contents of an image with proper linguistic properties is a fundamental and challenging problem in artificial intelligence as it needs to combine advanced levels of computer vision with natural language processing methods. Moreover, this semantic knowledge needs to be expressed in a natural language (here English) which requires a language model in addition to visual understanding. This proposed Intelligent Image Captioning and Voice Converter for Visually Impaired Intelligent (IICVoiC) work generates descriptive captions which are converted to audio for a visually impaired person to listen. The conversion of generated caption into audio should require pre-processing of an image to identify the features and generation of caption according to them. The audio converted from the image caption is pre-processed using CNN. So, the workflow is to initially deny the image and to extract the features from the pre-processed image using CNN. The captions from the dataset are pre-processed on the other hand. The respective topics are predicted from the pre-processed caption. Then, the LSTM model is created and trained using extracted features, predicted topics, and pre-processed captions. From the model, the respective caption is generated for the given input image. Finally, the generated text is converted into audio.
Article Details
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.