Influencia del formato y la complejidad de prompts en calidad de respuestas de modelos generativos

Wilmer Orley Zambrano Vera; Jessica Johanna  Morales Carrillo

doi:10.56124/encriptar.v8i16.011

Authors

Wilmer Orley Zambrano Vera Escuela Superior Politécnica Agropecuaria de Manabí Manuel Félix López, ESPAM
Jessica Johanna Morales Carrillo Escuela Superior Politécnica Agropecuaria de Manabí Manuel Félix López, ESPAM

DOI:

https://doi.org/10.56124/encriptar.v8i16.011

Keywords:

Artificial intelligence, Information technology, Online learning, Cognition

Abstract

The objective of the research was to analyze the impact and complexity of prompts on the quality of responses generated by a generative language model (GLM), also known as a Large Language Model (LLM), using DeepSeek-R1 as a case study. For this purpose, 90 prompts were designed and distributed across three formats—declarative, interrogative, and structured—and three levels of complexity: simple, moderate, and complex, applied respectively to general knowledge questions and common objects. The responses were evaluated using a Likert scale (1–5) rubric by three experts in language technologies, considering aspects such as accuracy, coherence, and relevance, complemented with automatic metrics including ROUGE, BLEU, and BERTScore. The results showed that structured prompts generated responses that were significantly more accurate, coherent, and relevant than declarative and interrogative formats. In terms of complexity, moderate and complex prompts yielded better results than simple ones in coherence and semantic content, but not in lexical precision. Statistical tests including ANOVA and Tukey’s post hoc analysis revealed significant contrasts in many of the criteria. Among the errors, some responses presented overfitting issues caused by overly detailed prompts, while others were linked to declarative prompts; on the other hand, structured prompts produced more consistent answers. These findings highlight the importance of prompt optimization as a critical variable determining the quality of generated responses.

Downloads

Download data is not yet available.

References

Arora, S., Narayan, A., Chen, M., Orr, L., Guha, N., Bhatia, K., Chami, I., & Ré, C. (2022). Ask Me Anything: A simple strategy for prompting language models. 11th International Conference on Learning Representations, ICLR 2023. https://doi.org/https://doi.org/10.48550/arXiv.2210.02441

Beurer-Kellner, L., Fischer, M., & Vechev, M. (2023). Prompting Is Programming: A Query Language for Large Language Models. Proceedings of the ACM on Programming Languages, 7. https://doi.org/10.1145/3591300;TAXONOMY:TAXONOMY:ACM-PUBTYPE;PAGEGROUP:STRING:PUBLICATION

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 2020-December. https://doi.org/https://doi.org/10.48550/arXiv.2005.14165

Cheng, K., Ahmed, N. K., Willke, T. L., & Sun, Y. (2024). Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 9407–9430. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.528

Feyza, A., Muhammed, A., Kocyigit, Y., Paik, S., & Wijaya, D. (2022). Challenges in Measuring Bias via Open-Ended Language Generation. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Parte de Las Conferencias de La ACL, 76–76. https://doi.org/10.18653/v1/2022.gebnlp-1.9

Gonen, H., Iyer, S., Blevins, T., Smith, N. A., & Zettlemoyer, L. (2023). Demystifying Prompts in Language Models via Perplexity Estimation. Findings of the Association for Computational Linguistics: EMNLP 2023, 10136–10148. https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.679

ICFES. (2020). Informe nacional de resultados Saber 11: Educación media en Colombia. Instituto Colombiano para la Evaluación de la Educación. https://www.icfes.gov.co

Lee, D., & Palmer, E. (2025). Prompt engineering in higher education: a systematic review to help inform curricula. International Journal of Educational Technology in Higher Education, 22(1), 1–22. https://doi.org/10.1186/S41239-025-00503-7/TABLES/6

Lee, S. Y. Te, Bahukhandi, A., Liu, D., & Ma, K. L. (2024). Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2024.3456398

Lin, Z. (2024). Prompt Engineering for Applied Linguistics: Elements, Examples, Techniques, and Strategies. English Language Teaching, 17(9), p14. https://doi.org/10.5539/ELT.V17N9P14

Lu, Q., Qiu, B., Ding, L., Zhang ♢♠, K., Kocmi, T., & Tao, D. (2024). Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models. Findings of the Association for Computational Linguistics ACL 2024, 8801–8816. https://doi.org/10.18653/V1/2024.FINDINGS-ACL.520

Mischler, G., Li, Y. A., Bickel, S., Mehta, A. D., & Mesgarani, N. (2024). Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence 2024 6:12, 6(12), 1467–1477. https://doi.org/10.1038/s42256-024-00925-4

Ministerio de Educación del Ecuador/INEVAL. (2024). Instructivos de pruebas y pruebas modelo para bachillerato. Quito: Ministerio de Educación del Ecuador. https://educacion.gob.ec/instructivos-de-pruebas-y-pruebas-modelo/

Moraes, L. de C., Silvério, I. C., Marques, R. A. S., Anaia, B. de C., de Paula, D. F., de Faria, M. C. S., Cleveston, I., Correia, A. de S., & Freitag, R. M. K. (2024). Análise de ambiguidade linguística em modelos de linguagem de grande escala (LLMs). https://doi.org/https://doi.org/10.48550/arXiv.2404.16653

Mu, J., Li, X. L., & Goodman, N. (2023). Learning to Compress Prompts with Gist Tokens. Advances in Neural Information Processing Systems, 36. https://doi.org/https://doi.org/10.48550/arXiv.2304.08467

Newman, B., Cohn-Gordon, R., & Potts, C. (2020). Communication-based Evaluation for Natural Language Generation (G. J. J. P. Allyson Ettinger, Ed.; pp. 116–126). Association for Computational Linguistics. https://aclanthology.org/2020.scil-1.16/

Patel, D., Kadbhane, S., Sameed, M., Chandorkar, A., & Rumale, A. S. (2023). Prompt Engineering Using Artificial Intelligence. IJARCCE, 12(10). https://doi.org/10.17148/IJARCCE.2023.121018

Sattele, V., Reyes, M., & Fonseca, A. (2023). La Inteligencia Artificial Generativa en el Proceso Creativo y en el Desarrollo de Conceptos de Diseño. UMÁTICA. Revista Sobre Creación y Análisis de La Imagen, 6, 53–73. https://doi.org/10.24310/UMATICA.2023.V5I6.17153

Swamy, S., Tabari, N., Chen, C., & Gangadharaiah, R. (2023). Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems. EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 3102–3111. https://doi.org/10.18653/V1/2023.EACL-MAIN.226

Tepe, M., Emekli, E., Tepe, M., & Emekli, E. (2024). Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. Cureus, 16(5). https://doi.org/10.7759/CUREUS.59960

Wang, B., Min, S., Deng, X., Shen, J., Wu, Y., Zettlemoyer, L., & Sun, H. (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 2717–2739. https://doi.org/10.18653/v1/2023.acl-long.153

Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K. W., & Lim, E. P. (2023). Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 2609–2634. https://doi.org/10.18653/v1/2023.acl-long.147

Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, brian, Xia, F., Chi, E. H., Le, Q. V, & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35. https://doi.org/https://doi.org/10.48550/arXiv.2201.11903

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. Proceedings of the 30th Conference on Pattern Languages of Programs. https://arxiv.org/pdf/2302.11382

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. 8th International Conference on Learning Representations, ICLR 2020. https://doi.org/https://doi.org/10.48550/arXiv.1904.09675

Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. Proceedings of Machine Learning Research, 139, 12697–12706. https://doi.org/https://doi.org/10.48550/arXiv.2102.09690

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. 11th International Conference on Learning Representations, ICLR 2023. https://doi.org/https://doi.org/10.48550/arXiv.2205.10625