Open Access Open Access  Restricted Access Subscription Access

Large Language Models: A Comprehensive Survey on Architectures, Applications, and Challenges

Vinod Veeramachaneni

Abstract


This survey provides an in-depth exploration of Large Language Models (LLMs), examining notable architectures such as GPT-3, GPT-4, LLaMA, and PaLM. The paper traces the architectural evolution from traditional neural language models to cutting-edge transformer-based systems. Detailed insights are provided on training methodologies, including pre-training, fine-tuning, and instruction-tuning, which have enhanced the versatility and performance of LLMs across a range of applications, including natural language processing, text summarization, and code generation. This survey also discusses the current challenges LLMs face, such as bias in model outputs, ethical concerns, and the computational demands of scaling these models. Through analysis, we highlight the potential of LLMs to revolutionize industries while underscoring the need for efficient training techniques to mitigate their resource-intensive nature. Our findings indicate that while LLMs offer transformative capabilities, addressing ethical and practical limitations will be critical to their future development.


Full Text:

PDF

References


Chernyavskiy A, Ilvovsky D, Nakov P. Transformers: “the end of history” for natural language processing? In Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part III 21 2021 (pp. 677-693). Springer International Publishing.

Wang A, Pruksachatkun Y, Nangia N, Singh A, Michael J, Hill F, Levy O, Bowman S. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems. 2019;32.

Adiwardana D, Luong MT, So DR, Hall J, Fiedel N, Thoppilan R, Yang Z, Kulshreshtha A, Nemade G, Lu Y, Le QV. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977. 2020 Jan 27.

y Arcas BA. Do large language models understand us? Daedalus. 2022 May 1;151(2):183-97.

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI blog. 2019 Feb 24;1(8):9.

Brown TB. Language models are few-shot learners. arXiv preprint arXiv:2005.14165. 2020.

Chowdhary K, Chowdhary KR. Natural language processing. Fundamentals of artificial intelligence. 2020:603-49.

Iqbal T, Qureshi S. The survey: Text generation models in deep learning. Journal of King Saud University-Computer and Information Sciences. 2022 Jun 1;34(6):2515-28.

Nozza D, Bianchi F, Hovy D. HONEST: Measuring hurtful sentence completion in language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021. Association for Computational Linguistics.

Min B, Ross H, Sulem E, Veyseh AP, Nguyen TH, Sainz O, Agirre E, Heintz I, Roth D. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys. 2023 Sep 14;56(2):1-40.Wolf T, Debut L, Sanh V.


Refbacks

  • There are currently no refbacks.