Open Access Open Access  Restricted Access Subscription Access

Algorithmic Prompt Engineering: Improving Accuracy and Diversity in LLM-Based Code Synthesis

I.V. Shashikala, T. Aditya Sai Srinivas, M. Bhavya Sree, K. Shirisha, H. B. Varshini Reddy, K Dora Babu

Abstract


Recent advancements in Large Language Models (LLMs), notably GPT-3.5 and GPT-4, have transformed AI-assisted code generation; however, their performance appears to hinge on the formulation of input prompts. This study investigates a range of prompt engineering strategies aimed at optimizing code generation outcomes. Specifically, we compare generic, context-sensitive, and dynamically constructed prompts using a benchmark of more than fifty programming tasks spanning Python, Java, and C++. A central contribution of this work lies in the development of algorithmic techniques for dynamic prompt construction, which incorporate iterative refinement informed by automated feedback. Empirical findings indicate that dynamic prompting achieves superior results, with code accuracy reaching approximately 93% and average debugging time reduced by nearly two-thirds relative to generic strategies. Beyond accuracy, the analysis suggests notable gains in code relevance, structural efficiency, and solution diversity. These insights offer a set of evidence-based practices for practitioners and provide a conceptual basis for future research on adaptive and automated prompt design in software development environments.

 


Full Text:

PDF

References


Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 33, 1877–1901.

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., ... Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.

Li, X., Zhao, Y., Sun, Z., Wang, S., & Liu, Y. (2023). Prompt engineering for large language models: A survey. arXiv preprint arXiv:2304.07987.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems (NeurIPS), 35, 24824–24837.

Yao, Y., Zhao, Z., Sun, Q., & Li, H. (2023). Large language models as optimizers. arXiv preprint arXiv:2309.15172.

Fu, Y., Zhang, J., Chen, X., & Zhou, M. (2023). GPT models for code generation: An empirical study. arXiv preprint arXiv:2307.09381.

Wang, T., Zhou, N., & Chen, Z. (2024). Enhancing computer programming education with LLMs: A study on effective prompt engineering for Python code generation. arXiv preprint arXiv:2407.05437.

Huttula, A. (2025). Advanced prompt engineering: Systematic approaches to enhance LLM performance.

Shin, J., Tang, C., Mohati, T., Nayebi, M., Wang, S., & Hemmati, H. (2025). Prompt engineering or fine-tuning: An empirical assessment of LLMs for code. In 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) (pp. 490–502). IEEE.

Taherkhani, H., Sepindband, M., Pham, H. V., Wang, S., & Hemmati, H. (2024). Epic: Cost-effective search-based prompt engineering of LLMs for code generation. arXiv preprint arXiv:2408.11198.


Refbacks

  • There are currently no refbacks.