Computational Efficiency of Large Language Models (LLMs) in Resource-Constrained Environments
Abstract
Large Language Models (LLMs) such as GPT-3 and BERT have significantly improved the performance of natural language processing applications, including chatbots, machine translation, content generation, and virtual assistants. Despite their high accuracy and advanced language understanding capabilities, these models require substantial computational resources, including powerful GPUs, large memory capacity, and high energy consumption. Such requirements make the deployment of LLMs difficult in resource-constrained environments such as mobile devices, Internet of Things (IoT) systems, embedded platforms, and edge computing infrastructures.
This study focuses on improving the computational efficiency of Large Language Models while maintaining acceptable performance levels. The research examines the major challenges associated with deploying LLMs in low-resource environments and reviews common optimization techniques such as model compression, pruning, quantization, knowledge distillation, and parameter-efficient fine-tuning. The study also explores the trade-off between model accuracy and computational efficiency and highlights the importance of lightweight and scalable AI solutions for edge computing applications.
The findings suggest that optimization methods can significantly reduce model size, inference time, memory usage, and energy consumption, making LLMs more practical for real-world deployment on low-power devices. However, balancing efficiency and performance remains a major challenge, as excessive reduction in computational requirements may negatively affect model accuracy. The study concludes by recommending practical strategies for developing efficient, accessible, and scalable LLMs suitable for diverse resource-constrained environments.
References
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … OpenAI. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019 (pp. 4171–4186). https://doi.org/10.18653/v1/N19-1423
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., & Keutzer, K. (2021). A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630. https://arxiv.org/abs/2103.13630
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Google Research. (2019). Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning (pp. 2790–2799). https://arxiv.org/abs/1902.00751
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, L., Wang, L., & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685. https://arxiv.org/abs/2106.09685
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. https://arxiv.org/abs/1910.01108
Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., & Catanzaro, B. (2019). Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053. https://arxiv.org/abs/1909.08053
Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2402.06196 (arXiv)
Fan, L., Li, L., Ma, Z., Lee, S., Yu, H., & Hemphill, L. (2023). A bibliometric review of large language models research from 2017 to 2023. arXiv. https://doi.org/10.48550/arXiv.2304.02020 (arXiv)
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., … Mian, A. (2023). A comprehensive overview of large language models. arXiv. https://doi.org/10.48550/arXiv.2307.06435 (arXiv)
Refbacks
- There are currently no refbacks.