Recurrent Neural Networks for Image Captioning: A Case Study with LSTM

Main Article Content

Shailaja Sanjay Mohite, Suganthini. C, Arunarani AR, Lalitha Devi K, Manish Sharma, R. N. Patil, Anurag Shrivastava

Abstract

This research investigates the viability of Long Short-Term Memory (LSTM) systems, a subtype of Recurrent Neural Networks (RNNs), for picture captioning. Leveraging the MS COCO dataset, the study compares the execution of LSTM-based RNNs with Vanilla RNN, Gated Recurrent Unit (GRU), consideration components, and transformer-based models. Experimental comes about to illustrate that the LSTM-based RNN shows competitive execution, accomplishing a BLEU-4 score of 0.72, a METEOR score of 0.68, and a CIDEr score of 2.1. The comparative investigation uncovers its prevalence over Vanilla RNN and GRU, highlighting its capability to capture long-range conditions inside successive picture information. Moreover, the study investigates the effect of consideration instruments and transformer designs, exhibiting their potential improvements in the context-aware caption era. The transformer-based show outflanks all other models, accomplishing a BLEU-4 score of 0.78, a METEOR score of 0.72, and a CIDEr score of 2.5. The findings give important bits of knowledge toward the creating scene of picture captioning strategy, which makes LSTM-based RNNs solid and productive approaches for capturing worldly groupings in visual substance. In achieving these, the study provides a framework for future developments in hybrid models and manufacturing processes that push boundaries of smart image perception and understanding

Article Details

Section
Articles