Hindi Abstractive Text Summarization using Transliteration with Pre-trained Model

Main Article Content

Jeetendra Kumar, Shashi Shekhar, Rashmi Gupta

Abstract

Automatic text summarization is a subarea of natural language processing that generates a summary of the text by keeping its key points. The research work done on summarizing low-resourced language text is very limited. In India, the Hindi language is being spoken by central and north Indian people and only a few research works have been done on abstractive summarization of Hindi language. Having matras in Hindi makes it difficult to tokenize so it is difficult to summarize Hindi text using abstractive text summarization. In the proposed method, abstractive Hindi text summarization is done using transliteration and fine-tuning. In this work, the model is trained to generate both summaries and headlines. ROUGE-score and BERT-score have been utilized to check summary quality. A new semantic similarity score-based performance measure is also proposed to measure semantic similarity between reference summaries and predicted summaries. Using the proposed method, we have achieved the highest 55.16 ROUGE score, 0.80 BERT score, and 0.98 similarity score. Along with these performance measures, human evaluation of predicted summaries is also done and it is found that summaries and headlines were generated at a human-acceptable level.

Article Details

Section
Articles