NBESM: A Novel BERT Integrated with Sequence Model for Classifying the News in Telugu

Main Article Content

Chinni Bala Vijaya Durga, Gatram Rama Mohan Babu

Abstract

News classification helps individuals understand the political and social landscape of their region, enabling them to make informed decisions and participate in civic activities. Accurate classification ensures that readers can distinguish between news, opinion, and other types of content, maintaining the credibility and ethics of journalism. Many people prefer consuming news content in their native language. By classifying news in regional language, media outlets can cater to the language preferences of their audience. Traditional approaches have classified using Bag of Words and embedding approaches but Word embeddings are limited to the vocabulary seen during training. Words that were not present in the training data will have no pre-defined embeddings, and handling such out-of-vocabulary words can be challenging. Later, researchers started experimenting with neural networks such as RNN, GRU and LSTM. While LSTMs are designed to capture long-range dependencies, they can still struggle with very long sequences. As the sequence length increases, LSTMs may face challenges in retaining relevant information over extended periods. The proposed research integrates LSTM with transformers. Transformers excel at capturing contextual relationships across text, while LSTMs are designed to capture sequential patterns. This integration can be particularly useful for tasks that require modeling both local context and long-range dependencies.

Article Details

Section
Articles