Influence of prompt format and complexity on the quality of responses from generative models
DOI:
https://doi.org/10.56124/encriptar.v8i16.011Keywords:
Artificial intelligence, Information technology, Online learning, CognitionAbstract
The objective of the research was to analyze the impact and complexity of prompts on the quality of responses generated by a generative language model (GLM), also known as a Large Language Model (LLM), using DeepSeek-R1 as a case study. For this purpose, 90 prompts were designed and distributed across three formats—declarative, interrogative, and structured—and three levels of complexity: simple, moderate, and complex, applied respectively to general knowledge questions and common objects. The responses were evaluated using a Likert scale (1–5) rubric by three experts in language technologies, considering aspects such as accuracy, coherence, and relevance, complemented with automatic metrics including ROUGE, BLEU, and BERTScore. The results showed that structured prompts generated responses that were significantly more accurate, coherent, and relevant than declarative and interrogative formats. In terms of complexity, moderate and complex prompts yielded better results than simple ones in coherence and semantic content, but not in lexical precision. Statistical tests including ANOVA and Tukey’s post hoc analysis revealed significant contrasts in many of the criteria. Among the errors, some responses presented overfitting issues caused by overly detailed prompts, while others were linked to declarative prompts; on the other hand, structured prompts produced more consistent answers. These findings highlight the importance of prompt optimization as a critical variable determining the quality of generated responses.
Downloads
References
Arora, S., Narayan, A., Chen, M., Orr, L., Guha, N., Bhatia, K., Chami, I., & Ré, C. (2022). Ask Me Anything: A simple strategy for prompting language models. 11th International Conference on Learning Representations, ICLR 2023. https://doi.org/https://doi.org/10.48550/arXiv.2210.02441
Beurer-Kellner, L., Fischer, M., & Vechev, M. (2023). Prompting Is Programming: A Query Language for Large Language Models. Proceedings of the ACM on Programming Languages, 7. https://doi.org/10.1145/3591300;TAXONOMY:TAXONOMY:ACM-PUBTYPE;PAGEGROUP:STRING:PUBLICATION
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 2020-December. https://doi.org/https://doi.org/10.48550/arXiv.2005.14165
Cheng, K., Ahmed, N. K., Willke, T. L., & Sun, Y. (2024). Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 9407–9430. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.528
Feyza, A., Muhammed, A., Kocyigit, Y., Paik, S., & Wijaya, D. (2022). Challenges in Measuring Bias via Open-Ended Language Generation. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Parte de Las Conferencias de La ACL, 76–76. https://doi.org/10.18653/v1/2022.gebnlp-1.9
Gonen, H., Iyer, S., Blevins, T., Smith, N. A., & Zettlemoyer, L. (2023). Demystifying Prompts in Language Models via Perplexity Estimation. Findings of the Association for Computational Linguistics: EMNLP 2023, 10136–10148. https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.679
ICFES. (2020). Informe nacional de resultados Saber 11: Educación media en Colombia. Instituto Colombiano para la Evaluación de la Educación. https://www.icfes.gov.co
Lee, D., & Palmer, E. (2025). Prompt engineering in higher education: a systematic review to help inform curricula. International Journal of Educational Technology in Higher Education, 22(1), 1–22. https://doi.org/10.1186/S41239-025-00503-7/TABLES/6
Lee, S. Y. Te, Bahukhandi, A., Liu, D., & Ma, K. L. (2024). Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2024.3456398
Lin, Z. (2024). Prompt Engineering for Applied Linguistics: Elements, Examples, Techniques, and Strategies. English Language Teaching, 17(9), p14. https://doi.org/10.5539/ELT.V17N9P14
Lu, Q., Qiu, B., Ding, L., Zhang ♢♠, K., Kocmi, T., & Tao, D. (2024). Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models. Findings of the Association for Computational Linguistics ACL 2024, 8801–8816. https://doi.org/10.18653/V1/2024.FINDINGS-ACL.520
Mischler, G., Li, Y. A., Bickel, S., Mehta, A. D., & Mesgarani, N. (2024). Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence 2024 6:12, 6(12), 1467–1477. https://doi.org/10.1038/s42256-024-00925-4
Ministerio de Educación del Ecuador/INEVAL. (2024). Instructivos de pruebas y pruebas modelo para bachillerato. Quito: Ministerio de Educación del Ecuador. https://educacion.gob.ec/instructivos-de-pruebas-y-pruebas-modelo/
Moraes, L. de C., Silvério, I. C., Marques, R. A. S., Anaia, B. de C., de Paula, D. F., de Faria, M. C. S., Cleveston, I., Correia, A. de S., & Freitag, R. M. K. (2024). Análise de ambiguidade linguística em modelos de linguagem de grande escala (LLMs). https://doi.org/https://doi.org/10.48550/arXiv.2404.16653
Mu, J., Li, X. L., & Goodman, N. (2023). Learning to Compress Prompts with Gist Tokens. Advances in Neural Information Processing Systems, 36. https://doi.org/https://doi.org/10.48550/arXiv.2304.08467
Newman, B., Cohn-Gordon, R., & Potts, C. (2020). Communication-based Evaluation for Natural Language Generation (G. J. J. P. Allyson Ettinger, Ed.; pp. 116–126). Association for Computational Linguistics. https://aclanthology.org/2020.scil-1.16/
Patel, D., Kadbhane, S., Sameed, M., Chandorkar, A., & Rumale, A. S. (2023). Prompt Engineering Using Artificial Intelligence. IJARCCE, 12(10). https://doi.org/10.17148/IJARCCE.2023.121018
Sattele, V., Reyes, M., & Fonseca, A. (2023). La Inteligencia Artificial Generativa en el Proceso Creativo y en el Desarrollo de Conceptos de Diseño. UMÁTICA. Revista Sobre Creación y Análisis de La Imagen, 6, 53–73. https://doi.org/10.24310/UMATICA.2023.V5I6.17153
Swamy, S., Tabari, N., Chen, C., & Gangadharaiah, R. (2023). Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems. EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 3102–3111. https://doi.org/10.18653/V1/2023.EACL-MAIN.226
Tepe, M., Emekli, E., Tepe, M., & Emekli, E. (2024). Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. Cureus, 16(5). https://doi.org/10.7759/CUREUS.59960
Wang, B., Min, S., Deng, X., Shen, J., Wu, Y., Zettlemoyer, L., & Sun, H. (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 2717–2739. https://doi.org/10.18653/v1/2023.acl-long.153
Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K. W., & Lim, E. P. (2023). Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 2609–2634. https://doi.org/10.18653/v1/2023.acl-long.147
Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, brian, Xia, F., Chi, E. H., Le, Q. V, & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35. https://doi.org/https://doi.org/10.48550/arXiv.2201.11903
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. Proceedings of the 30th Conference on Pattern Languages of Programs. https://arxiv.org/pdf/2302.11382
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. 8th International Conference on Learning Representations, ICLR 2020. https://doi.org/https://doi.org/10.48550/arXiv.1904.09675
Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. Proceedings of Machine Learning Research, 139, 12697–12706. https://doi.org/https://doi.org/10.48550/arXiv.2102.09690
Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. 11th International Conference on Learning Representations, ICLR 2023. https://doi.org/https://doi.org/10.48550/arXiv.2205.10625
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Scientific Journal of Informatics ENCRYPT - ISSN: 2737-6389.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.










