Enhance the Performance of Arabic Image Captioning Using Conditional Generative Adversarial Networks

Main Article Content

Saliha Al-Malki, Alaa Khadidos, Abdulrhman Alshareef

Abstract

Image captioning (IC) is a process that creates automatic text descriptions of images. However, many existing works address IC with English captions, whereas few investigate the process in Arabic because of the scarcity of public Arabic datasets. Recently, conditional generative adversarial network (CGAN) has proven to generate diverse, natural, and human-like captions in IC tasks, aiming to reduce the gap between machine-generated and human-described captions. This method is used significantly in English language work. In contrast, to our knowledge, no existing Arabic work has applied CGAN to improve the performance of Arabic image captioning (AIC) models. In this paper, we aim to improve AIC, demonstrate the effectiveness of using CGAN to generate diverse human-like Arabic captions, and study the impact of using different datasets on our proposed model. We employ a reinforcement learning technique, which has not been previously applied in AIC works. We also evaluate our model quantitatively using the bilingual evaluation understudy (BLEU) metric to assess the accuracy of generated captions against the actual ground truth captions and qualitatively using human evaluation. The experiment outcomes show that, with the CGAN approach and a bigger dataset, AIC can be improved and produce more reasonably human-like Arabic captions than other current methods.

Article Details

Section
Articles