A BERT-based Prototypical Networks for Few-Shot Arabic Short-text Topic Detection

Main Article Content

Amani Aljehani, Syed Hamid Hasan

Abstract

With the rapid growth of social media platforms, messaging apps, and online forums, vast amounts of short textual content are generated daily in Arabic, covering a wide range of topics and discussions. The ability to automatically detect the topics within these short texts is crucial for various applications. State-of-the-art deep learning models demonstrate high performance in this particular task. However, these approaches may encounter challenges in acquiring knowledge of the semantic space, and their effectiveness significantly depends on the availability of extensive, annotated training datasets. Unfortunately, the Arabic language lacks sufficient resources in this regard. In this paper, we propose a few-shot learning model for Arabic short-text topic detection, where the model proves its ability to generalize from a few examples to new, unseen classes leveraging prior knowledge from related tasks. The model's performance was assessed on three short-text datasets that are publicly available (SemEval, ASND, and AITD). The experimental results demonstrate that the proposed model outperforms baseline models considering only 1, 5, and 10 examples during the training phase, providing empirical evidence for the effectiveness of employing few-shot learning in text classification tasks. 

Article Details

Section
Articles