Automated Feature Extraction from Version Control Artifacts in Github Repositories

Maitreya Vaghulade

doi:10.52783/jes.7091

PDF

Published: Jul 10, 2024

DOI: https://doi.org/10.52783/jes.7091

Keywords:

Web Scraping, Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Autoregressive Transformer, Generative Pre-trained Transformer (GPT), Software Engineering documentation, Summarization.

Maitreya Vaghulade, Urav Dalal, Sean Fargose, Devang Shah, Kush Maniar, Kiran Bhowmick, Meera Narvekar

Abstract

Managing and tracking implemented features in large-scale open-source projects with numerous contributors is a challenging task. This research proposes an automated system to extract features from version control artifacts. A substantial dataset of popular open-source GitHub repositories is collected for development and evaluation purposes. To ensure complete and current information, the system uses Selenium to scrape commits, release notes, README files, and closed/merged pull requests. The suggested method splits the data into manageable portions to preprocess it. Summarization was performed on each chunk using BART (Bidirectional and Auto-Regressive Transformer), and BERT (Bidirectional Encoder Representations from Transformers), two state-of-the-art large language models to extract features from the scraped version control artifacts automatically. Each chunk's text was corrected using GPT-4 (Generative Pre-trained Transformer - 4), which was then combined to create a thorough synopsis. This innovative method seeks to lessen the workload associated with manual feature tracking, streamline contribution management, improve project visibility, ease GitHub adoption, and foster productive contributor interactions. By automating the feature extraction process, developers can focus more on coding rather than extensive documentation, leading to well-structured and informative feature updates for GitHub repositories. Furthermore, the automated feature extraction can be seamlessly integrated into the CI/CD pipeline, enabling continuous monitoring and documentation of implemented features throughout the software development lifecycle.

Issue

Vol. 20 No. 10s (2024)

Section

Articles

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Announcement

Call for Papers for the New Issue
Last Date of Submission: October 31^st, 2025

Indexing

Call for Papers & Reviewers

The Journal of Electrical Systems (JES) is inviting researchers, scholars, and experts in the field of electrical systems to submit their original and unpublished research papers for consideration in our upcoming issues. We welcome high-quality contributions that address innovative ideas, advancements, and challenges in electrical systems and related areas.

Submission Deadline: March 31^st, 2025

Topics of Interest Include, but are not Limited to:

• Power Systems and Smart Grids
• Renewable Energy
• Control Systems
• Electronics and Communication
• Signal Processing
• Artificial Intelligence in Electrical Engineering
• Internet of Things (IoT) in Electrical Systems
• Electric Vehicles and Transportation
• Robotics and Automation

Authors are requested to submit their manuscripts electronically through our online submission system by the specified deadline.

Submission Guidelines:

Manuscripts should be prepared according to the JES guidelines available on our website.
All submissions will undergo a rigorous peer-review process.
Manuscripts must be original, not previously published or under consideration elsewhere.

Call for Reviewers:
JES is also seeking qualified and experienced individuals to join our esteemed panel of reviewers. If you are interested in contributing your expertise to ensure the quality of the papers published in JES, kindly submit your resume to editor@esrgroups.org. Reviewers play a crucial role in maintaining the high standards of our journal.

We look forward to receiving your valuable contributions and appreciate your interest in the Journal of Electrical Systems.

Important Links

Home

Aims and Scope

Instructions for Authors

Editorial Board

Downloads

Download Paper Template

Article Sidebar

Main Article Content

Abstract

Article Details