Open Access Open Access  Restricted Access Subscription Access

Computational Efficiency of Large Language Models (LLMs) in Resource-Constrained Environments

Mission Franklin

Abstract


Large Language Models (LLMs) such as GPT-3 and BERT have significantly improved the performance of natural language processing applications, including chatbots, machine translation, content generation, and virtual assistants. Despite their high accuracy and advanced language understanding capabilities, these models require substantial computational resources, including powerful GPUs, large memory capacity, and high energy consumption. Such requirements make the deployment of LLMs difficult in resource-constrained environments such as mobile devices, Internet of Things (IoT) systems, embedded platforms, and edge computing infrastructures.

This study focuses on improving the computational efficiency of Large Language Models while maintaining acceptable performance levels. The research examines the major challenges associated with deploying LLMs in low-resource environments and reviews common optimization techniques such as model compression, pruning, quantization, knowledge distillation, and parameter-efficient fine-tuning. The study also explores the trade-off between model accuracy and computational efficiency and highlights the importance of lightweight and scalable AI solutions for edge computing applications.

The findings suggest that optimization methods can significantly reduce model size, inference time, memory usage, and energy consumption, making LLMs more practical for real-world deployment on low-power devices. However, balancing efficiency and performance remains a major challenge, as excessive reduction in computational requirements may negatively affect model accuracy. The study concludes by recommending practical strategies for developing efficient, accessible, and scalable LLMs suitable for diverse resource-constrained environments.


Full Text:

PDF

References


Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … OpenAI. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019 (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423

Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., & Keutzer, K. (2021). A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630. https://arxiv.org/abs/2103.13630

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Google Research. (2019). Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning (pp. 2790–2799). https://arxiv.org/abs/1902.00751

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, L., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685. https://arxiv.org/abs/2106.09685

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. https://arxiv.org/abs/1910.01108

Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2019). Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053. https://arxiv.org/abs/1909.08053

Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2402.06196 (arXiv)

Fan, L., Li, L., Ma, Z., Lee, S., Yu, H., & Hemphill, L. (2023). A bibliometric review of large language models research from 2017 to 2023. arXiv. https://doi.org/10.48550/arXiv.2304.02020 (arXiv)

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., … Mian, A. (2023). A comprehensive overview of large language models. arXiv. https://doi.org/10.48550/arXiv.2307.06435 (arXiv)


Refbacks

  • There are currently no refbacks.