Enabling Natural Language Processing and AI Research in Low-Resource Languages: Development and Description of an Assamese UPoS Tagged Dataset

Kuwali Talukdar, Shikhar Kumar Sarma

doi:10.52783/jes.1506

PDF

Published: Apr 4, 2024

DOI: https://doi.org/10.52783/jes.1506

Keywords:

Assamese; PoS Tags; UPoS Tags; NLP; Assamese UPoS Tagset

Kuwali Talukdar, Shikhar Kumar Sarma

Abstract

This paper describes in detail the Universal Parts of Speech (UPoS) tagged dataset for the Assamese language. PoS tagged dataset in a language is crucial for experimenting and creating resources for various Natural Language Processing (NLP) and AI research. With the growing usage of Universal Dependency standards, tagged dataset with Universal PoS tags are becoming very much essential for contemporary experiments in the NLP community. NLP research in Assamese, and Indo-Aryan language, is relatively new, and the language is considered a Low Resource language. The dataset of UPoS tagged Assamese text is created with an aim of contributing towards enriching this low resource language for NLP and AI tasks. The size of the dataset is 283506 tokens of Assamese vocabulary, against total 20280 sentences, tagged with 17 standard UPoS tags of core lexical categories. The raw data are taken from an open-source corpus originally tagged with BIS tagset. The original size of 453457 tokens against 29504 sentences, after subjected to data filtering, was reduced to this clean resource of 283506 tokens. Lexical categories mapping is done with linguistic expertise, from BIS to UPoS tagsets. Mapped pattern was used for a first-level conversion of BIS tags to UPoS tags. Linguistic validation is also performed with linguistic experts and inter annotator agreement/disagreements were recorded. Second level validation resulted in deciding on the agreement, producing the final version of the dataset. This Assamese UPoS tagged dataset is the first of its kind with UPoS annotations and shall serve a wider Assamese NLP research community for model training using Machine Learning/Deep Learning Techniques.

Issue

Vol. 20 No. 3s (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Announcement

Call for Papers for the New Issue.
Last Date of Submission: April 30^th, 2026

Indexing

Call for Papers & Reviewers

The Journal of Electrical Systems (JES) is inviting researchers, scholars, and experts in the field of electrical systems to submit their original and unpublished research papers for consideration in our upcoming issues. We welcome high-quality contributions that address innovative ideas, advancements, and challenges in electrical systems and related areas.

Submission Deadline: March 31^st, 2025

Topics of Interest Include, but are not Limited to:

• Power Systems and Smart Grids
• Renewable Energy
• Control Systems
• Electronics and Communication
• Signal Processing
• Artificial Intelligence in Electrical Engineering
• Internet of Things (IoT) in Electrical Systems
• Electric Vehicles and Transportation
• Robotics and Automation

Authors are requested to submit their manuscripts electronically through our online submission system by the specified deadline.

Submission Guidelines:

Manuscripts should be prepared according to the JES guidelines available on our website.
All submissions will undergo a rigorous peer-review process.
Manuscripts must be original, not previously published or under consideration elsewhere.

Call for Reviewers:
JES is also seeking qualified and experienced individuals to join our esteemed panel of reviewers. If you are interested in contributing your expertise to ensure the quality of the papers published in JES, kindly submit your resume to editor@esrgroups.org. Reviewers play a crucial role in maintaining the high standards of our journal.

We look forward to receiving your valuable contributions and appreciate your interest in the Journal of Electrical Systems.

Important Links

Home

Aims and Scope

Instructions for Authors

Editorial Board

Downloads

Download Paper Template

Article Sidebar

Main Article Content

Abstract

Article Details