Design and Implementation of Scalable Test Platforms for LLM Deployments

Main Article Content

Reena Chandra

Abstract

As the adoption of Large Language Models (LLMs) and machine learning (ML) accelerates across domains, there is a growing need for scalable, cost-efficient, and reproducible deployment frameworks. This study introduces a cloud-native benchmarking architecture that integrates Google Colab for rapid model development with Amazon Web Services (AWS) for deployment simulation. Four ML models, Random Forest, XGBoost, LightGBM, and Multi-Layer Perceptron (MLP)—are trained and evaluated using a tabular classification dataset, then aligned with suitable AWS services (Lambda, EC2, SageMaker) based on their computational and concurrency profiles. The models are assessed on classification metrics, latency, cold start behaviour, and cost per inference. Findings reveal that XGBoost is optimal for stateless, serverless deployment via AWS Lambda, while MLP is better suited for EC2 due to memory demands. LightGBM benefits from SageMaker’s managed scalability. The framework demonstrates the viability of surrogate model benchmarking for LLM scenarios using lightweight ML models, and offers a reproducible, low-cost pipeline to support MLOps practices in cloud environments.

Article Details

Section
Articles