Complexity Analysis of LLM-Generated Recursive Code: A Systematic Evaluation

Khalida  Shaheen; Muhammad Shumail Naveed; Anwar Ali Sanjrani; Shafaque Saira Malik; Samina Azeem

doi:10.21015/vtse.v13i4.2269

Authors

Khalida Shaheen Department of Computer Science & Information Technology, University of Balochistan, Quetta, Pakistan https://orcid.org/0009-0002-0641-8849
Muhammad Shumail Naveed Department of Computer Science & Information Technology, University of Balochistan, Quetta, Pakistan https://orcid.org/0000-0003-3334-0848
Anwar Ali Sanjrani Department of Computer Science & Information Technology, University of Balochistan, Quetta, Pakistan https://orcid.org/0000-0002-0389-0913
Shafaque Saira Malik Department of Computer Science & Information Technology, University of Balochistan, Quetta, Pakistan https://orcid.org/0000-0003-3006-7612
Samina Azeem Department of Computer Science, Sardar Bahadur Khan Women's University, Quetta, Pakistan https://orcid.org/0009-0006-8231-5146

DOI:

https://doi.org/10.21015/vtse.v13i4.2269

Abstract

Programming is an essential skill, but it can be difficult for beginners, especially when it comes to logical concepts like recursion. Despite the development of many computational and pedagogical methods to simplify programming, recursion remains a challenging topic to understand, implement, and debug. Artificial intelligence has led to the development of large language models (LLMs), such as ChatGPT, Gemini, and DeepSeek that can generate programming source code. Various studies have analyzed the quality of code produced by LLM. However, the complexity of the recursive code generated by these models has not been studied. This study compared and analyzed recursive Python programs generated by Gemini (2.5 Pro), DeepSeek (V3.1) and ChatGPT (GPT-5) in an attempt to fill this gap. For the study, 250 programs generated by each model were examined using Halsted and cyclomatic complexity metrics. The results showed that ChatGPT produced less complex code, indicating easier recursion, while DeepSeek produced more complex programs due to higher Halstead and cyclomatic complexity scores. Gemini programs have a medium level of difficulty. The Kruskal-Wallis test was used to further analyze the data, and it revealed significant differences between the recursive code generated by ChatGPT, DeepSeek, and Gemini. Overall, the study found that each LLM has a distinct pattern: ChatGPT emphasizes simplicity, Gemini takes a balanced approach, and DeepSeek's generated code promotes clarity but suffers from complexity. More comprehensive analysis will be conducted in the future by expanding the dataset and including larger language models.

References

L. C. Cheng, W. Li, and J. C. Tseng, “Effects of an automated programming assessment system on the learning performances of experienced and novice learners,” Interactive Learning Environments, vol. 31, no. 8, pp. 5347–5363, 2023.

R. Yilmaz and F. G. K. Yilmaz, “The effect of generative artificial intelligence (AI)-based tool use on students' computational thinking skills, programming self-efficacy and motivation,” Computers and Education: Artificial Intelligence, vol. 4, 2023.

F. Kalelioglu and Y. Gülbahar, “The effects of teaching programming via Scratch on problem-solving skills: A discussion from learners' perspective,” Informatics in Education, vol. 13, no. 1, pp. 33–50, 2014. DOI: https://doi.org/10.15388/infedu.2014.03

S. Biswas, “Role of ChatGPT in computer programming,” Mesopotamian Journal of Computer Science, vol. 2023, pp. 9–15, 2023.

A. Hawlitschek, S. Berndt, and S. Schulz, “Empirical research on pair programming in higher education: A literature review,” Computer Science Education, vol. 33, no. 3, pp. 400–428, 2023.

J. Denner, E. Green, and S. Campe, “Learning to program in middle school: How pair programming helps and hinders intrepid exploration,” Journal of the Learning Sciences, vol. 30, no. 4–5, pp. 611–645, 2021.

M. S. Naveed and M. Sarim, “Two-phase CS0 for introductory programming: CS0 for CS1,” Proceedings of the Pakistan Academy of Sciences: A. Physical and Computational Sciences, vol. 59, no. 1, pp. 59–70, 2022.

M. Mladenović, Ž. Žanko, and M. A. Čuvić, “The impact of using program visualization techniques on learning basic programming concepts at the K–12 level,” Computer Applications in Engineering Education, vol. 29, no. 1, pp. 145–159, 2021.

A. Baron and D. Feitelson, “Why is recursion hard to comprehend? An experiment with experienced programmers in Python,” in Proc. 2024 on Innovation and Technology in Computer Science Education, vol. 1, pp. 115–121, 2024.

S. Thorgeirsson, L. C. Lais, T. B. Weidmann, and Z. Su, “Recursion in secondary computer science education: A comparative study of visual programming approaches,” in Proc. 55th ACM Technical Symposium on Computer Science Education, vol. 1, pp. 1321–1327, 2024.

E. G. Daylight, “Dijkstra’s rallying cry for generalization: The advent of the recursive procedure, late 1950s–early 1960s,” The Computer Journal, vol. 54, no. 11, pp. 1756–1772, 2011. DOI: https://doi.org/10.1093/comjnl/bxr002

P. N. Johnson-Laird, M. Bucciarelli, R. Mackiewicz, and S. S. Khemlani, “Recursion in programs, thought, and language,” Psychonomic Bulletin & Review, vol. 29, pp. 430–454, 2022.

J. Gal-Ezer and D. Harel, “What (else) should CS educators know?” Communications of the ACM, vol. 41, no. 9, pp. 77–84, 1998. DOI: https://doi.org/10.1145/285070.285085

R. McCauley, S. Grissom, S. Fitzgerald, and L. Murphy, “Teaching and learning recursive programming: A review of the research literature,” Computer Science Education, vol. 25, no. 1, pp. 37–66, 2015. DOI: https://doi.org/10.1080/08993408.2015.1033205

N. Raihan et al., “On the performance of large language models on introductory programming assignments,” Journal of Intelligent Information Systems, pp. 1–25, 2025.

X. Gu et al., “On the effectiveness of large language models in domain-specific code generation,” ACM Transactions on Software Engineering and Methodology, vol. 34, no. 3, pp. 1–22, 2025.

K. Jin, C. Wang, H. V. Pham, and H. Hemmati, “Can ChatGPT support developers? An empirical evaluation of large language models for code generation,” in 21st International Conference on Mining Software Repositories, 2024.

S. Fakhoury et al., “LLM-based test-driven interactive code generation: User study and empirical evaluation,” IEEE Transactions on Software Engineering, vol. 50, no. 9, pp. 2254–2268, 2024.

S. L. France, “Navigating software development in the ChatGPT and GitHub Copilot era,” Business Horizons, vol. 67, no. 5, pp. 649–661, 2024.

H. Hassani and E. S. Silva, “The role of ChatGPT in data science: How AI-assisted conversational interfaces are revolutionizing the field,” Big Data and Cognitive Computing, vol. 7, no. 2, p. 62, 2023.

Z. Deng et al., “Exploring DeepSeek: A survey on advances, applications, challenges and future directions,” IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 5, pp. 872–893, 2025.

P. C. Nair, D. Gupta, and B. I. Devi, “Extracting clinical relationships from discharge summaries of supra sellar lesion patients using Gemini LLM,” Procedia Computer Science, vol. 258, pp. 2391–2404, 2025.

K. Shahzad and S. Iqbal, “Comparative analysis of ChatGPT, DeepSeek, and Gemini for automated code generation,” in 2025 18th International Conference on Engineering of Modern Electric Systems, IEEE, pp. 1–4, 2025.

Y. Wang et al., “Beyond functional correctness: Investigating coding style inconsistencies in large language models,” Proceedings of the ACM on Software Engineering, vol. 2, pp. 690–712, 2025.

A. L. Maharani, Y. S. Nugroho, and S. Islam, “Unlocking AI potential: An investigation of Python coding capabilities of ChatGPT and Gemini,” in International Conference on Smart Computing, IoT and Machine Learning (SIML), pp. 1–6, 2025.

T. Coignion, C. Quinton, and R. Rouvoy, “A performance study of LLM-generated code on LeetCode,” in Proc. 28th International Conference on Evaluation and Assessment in Software Engineering, pp. 79–89, 2024.

X. Du et al., “Evaluating large language models in class-level code generation,” in Proc. IEEE/ACM 46th International Conference on Software Engineering, pp. 1–13, 2024.

S. Almanasra and K. Suwais, “Analysis of ChatGPT-generated codes across multiple programming languages,” IEEE Access, vol. 13, pp. 23580–23596, 2025.

D. Tosi, “Studying the quality of source code generated by different AI generative engines: An empirical evaluation,” Future Internet, vol. 16, no. 6, p. 188, 2024.

L. Solovyeva, S. Weidmann, and F. Castor, “AI-powered, but power-hungry? Energy efficiency of LLM-generated code,” in Proc. IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering, pp. 49–60, 2025.

P. Lanzi and D. Loiacono, “ChatGPT and other large language models as evolutionary engines for online interactive collaborative game design,” in Proc. Genetic and Evolutionary Computation Conference, pp. 1383–1390, 2023.

M. S. Naveed, “Measuring the programming complexity of C and C++ using Halstead metric,” Univ. of Sindh Journal of Information and Communication Technology, vol. 5, no. 4, pp. 2521–5582, 2021.

B. Khan and A. Nadeem, “Evaluating the effectiveness of decomposed Halstead metrics in software fault prediction,” PeerJ Computer Science, vol. 9, p. e1647, 2023.

S. Azeem, M. S. Naveed, M. Sajid, and I. Ali, “AI vs. human programmers: Complexity and performance in code generation,” VAWKUM Transactions on Computer Sciences, vol. 13, no. 1, pp. 201–216, 2025.

M. S. Naveed, “Pedagogical suitability: A software metrics-based analysis of Java and Python,” International Journal of Innovations in Science & Technology, vol. 6, no. 4, pp. 1956–1967, 2024.

G. Hao et al., “Complementarity in software code complexity metrics,” Journal of Systems and Software, vol. 232, p. 112679, 2025.

M. S. Naveed, “Comparison of C++ and Java in implementing introductory programming algorithms,” QUEST Research Journal, vol. 19, no. 1, pp. 95–103, 2021.

B. Khan and A. Nadeem, “Evaluating the effectiveness of decomposed Halstead metrics in software fault prediction,” PeerJ Computer Science, vol. 9, p. e1647, 2023.

D. R. Wijendra and K. P. Hewagamage, “Analysis of cognitive complexity with cyclomatic complexity metric of software,” International Journal of Computer Applications, vol. 174, pp. 14–19, 2021.

K. Munawar and M. S. Naveed, “The impact of language syntax on the complexity of programs: A case study of Java and Python,” International Journal of Innovations in Science & Technology, vol. 4, no. 3, pp. 683–695, 2022.

Complexity Analysis of LLM-Generated Recursive Code: A Systematic Evaluation

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

ISSN

Scopus Metrics

SCImago

Scopus CiteScore

Make a Submission