Influencia del formato y la complejidad de prompts en calidad de respuestas de modelos generativos

Wilmer Orley Zambrano Vera; Jessica Johanna  Morales Carrillo

doi:10.56124/encriptar.v8i16.011

Autores/as

Wilmer Orley Zambrano Vera Escuela Superior Politécnica Agropecuaria de Manabí Manuel Félix López, ESPAM
Jessica Johanna Morales Carrillo Escuela Superior Politécnica Agropecuaria de Manabí Manuel Félix López, ESPAM

DOI:

https://doi.org/10.56124/encriptar.v8i16.011

Palabras clave:

Inteligencia artificial, Tecnologías de la información, Aprendizaje en línea, Cognición

Resumen

El objetivo de la investigación fue analizar el impacto y la complejidad del los prompts en calidad de respuestas generadas por modelo de lenguaje generativo (MLG), también conocido como Large Language Model (LLM), empleando a DeepSeek-R1 como caso de prueba. Para ello se diseñaron 90 prompts, los mismos que se distribuyeron en tres formatos: declarativo, interrogativo y estructurado, y tres niveles de complejidad: sencillo, moderado y complejo, aplicados a preguntas de cultura general y objetos comunes respectivamente. Las respuestas fueron evaluadas a través de una rúbrica en escala Likert (1–5), por tres expertos en tecnologías del lenguaje, considerando aspectos como: precisión, coherencia y relevancia, complementada con métricas automáticas de: ROUGE, BLEU y BERTScore. Los resultados evidenciaron que los prompts estructurados generaron respuestas mucho más precisas, coherentes y relevantes que los formatos declarativo e interrogativo. En cambio, en la complejidad, los prompts moderados y complejos mostraron mejores resultados que aquellos que eran sencillos en coherencia y contenido semántico, pero no en precisión léxica. Se realizaron pruebas estadísticas de ANOVA y post hoc de Tukey que revelaron contrastes relevantes en gran parte de los criterios. Entre los errores hubo problemas de respuestas con sobreajuste provocado por prompts muy detallados y otros por prompts declarativos, por otro lado, los prompts estructurados mostraron mejores respuestas. Estos resultados ponen de manifiesto la importancia que tienen la optimización de los prompts como una variable critica que determina la calidad obtenida en las respuestas.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Arora, S., Narayan, A., Chen, M., Orr, L., Guha, N., Bhatia, K., Chami, I., & Ré, C. (2022). Ask Me Anything: A simple strategy for prompting language models. 11th International Conference on Learning Representations, ICLR 2023. https://doi.org/https://doi.org/10.48550/arXiv.2210.02441

Beurer-Kellner, L., Fischer, M., & Vechev, M. (2023). Prompting Is Programming: A Query Language for Large Language Models. Proceedings of the ACM on Programming Languages, 7. https://doi.org/10.1145/3591300;TAXONOMY:TAXONOMY:ACM-PUBTYPE;PAGEGROUP:STRING:PUBLICATION

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 2020-December. https://doi.org/https://doi.org/10.48550/arXiv.2005.14165

Cheng, K., Ahmed, N. K., Willke, T. L., & Sun, Y. (2024). Structure Guided Prompt: Instructing Large Language Model in Multi-Step Reasoning by Exploring Graph Structure of the Text. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 9407–9430. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.528

Feyza, A., Muhammed, A., Kocyigit, Y., Paik, S., & Wijaya, D. (2022). Challenges in Measuring Bias via Open-Ended Language Generation. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), Parte de Las Conferencias de La ACL, 76–76. https://doi.org/10.18653/v1/2022.gebnlp-1.9

Gonen, H., Iyer, S., Blevins, T., Smith, N. A., & Zettlemoyer, L. (2023). Demystifying Prompts in Language Models via Perplexity Estimation. Findings of the Association for Computational Linguistics: EMNLP 2023, 10136–10148. https://doi.org/10.18653/V1/2023.FINDINGS-EMNLP.679

ICFES. (2020). Informe nacional de resultados Saber 11: Educación media en Colombia. Instituto Colombiano para la Evaluación de la Educación. https://www.icfes.gov.co

Lee, D., & Palmer, E. (2025). Prompt engineering in higher education: a systematic review to help inform curricula. International Journal of Educational Technology in Higher Education, 22(1), 1–22. https://doi.org/10.1186/S41239-025-00503-7/TABLES/6

Lee, S. Y. Te, Bahukhandi, A., Liu, D., & Ma, K. L. (2024). Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2024.3456398

Lin, Z. (2024). Prompt Engineering for Applied Linguistics: Elements, Examples, Techniques, and Strategies. English Language Teaching, 17(9), p14. https://doi.org/10.5539/ELT.V17N9P14

Lu, Q., Qiu, B., Ding, L., Zhang ♢♠, K., Kocmi, T., & Tao, D. (2024). Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models. Findings of the Association for Computational Linguistics ACL 2024, 8801–8816. https://doi.org/10.18653/V1/2024.FINDINGS-ACL.520

Mischler, G., Li, Y. A., Bickel, S., Mehta, A. D., & Mesgarani, N. (2024). Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence 2024 6:12, 6(12), 1467–1477. https://doi.org/10.1038/s42256-024-00925-4

Ministerio de Educación del Ecuador/INEVAL. (2024). Instructivos de pruebas y pruebas modelo para bachillerato. Quito: Ministerio de Educación del Ecuador. https://educacion.gob.ec/instructivos-de-pruebas-y-pruebas-modelo/

Moraes, L. de C., Silvério, I. C., Marques, R. A. S., Anaia, B. de C., de Paula, D. F., de Faria, M. C. S., Cleveston, I., Correia, A. de S., & Freitag, R. M. K. (2024). Análise de ambiguidade linguística em modelos de linguagem de grande escala (LLMs). https://doi.org/https://doi.org/10.48550/arXiv.2404.16653

Mu, J., Li, X. L., & Goodman, N. (2023). Learning to Compress Prompts with Gist Tokens. Advances in Neural Information Processing Systems, 36. https://doi.org/https://doi.org/10.48550/arXiv.2304.08467

Newman, B., Cohn-Gordon, R., & Potts, C. (2020). Communication-based Evaluation for Natural Language Generation (G. J. J. P. Allyson Ettinger, Ed.; pp. 116–126). Association for Computational Linguistics. https://aclanthology.org/2020.scil-1.16/

Patel, D., Kadbhane, S., Sameed, M., Chandorkar, A., & Rumale, A. S. (2023). Prompt Engineering Using Artificial Intelligence. IJARCCE, 12(10). https://doi.org/10.17148/IJARCCE.2023.121018

Sattele, V., Reyes, M., & Fonseca, A. (2023). La Inteligencia Artificial Generativa en el Proceso Creativo y en el Desarrollo de Conceptos de Diseño. UMÁTICA. Revista Sobre Creación y Análisis de La Imagen, 6, 53–73. https://doi.org/10.24310/UMATICA.2023.V5I6.17153

Swamy, S., Tabari, N., Chen, C., & Gangadharaiah, R. (2023). Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems. EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 3102–3111. https://doi.org/10.18653/V1/2023.EACL-MAIN.226

Tepe, M., Emekli, E., Tepe, M., & Emekli, E. (2024). Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy. Cureus, 16(5). https://doi.org/10.7759/CUREUS.59960

Wang, B., Min, S., Deng, X., Shen, J., Wu, Y., Zettlemoyer, L., & Sun, H. (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 2717–2739. https://doi.org/10.18653/v1/2023.acl-long.153

Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K. W., & Lim, E. P. (2023). Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 2609–2634. https://doi.org/10.18653/v1/2023.acl-long.147

Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, brian, Xia, F., Chi, E. H., Le, Q. V, & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems, 35. https://doi.org/https://doi.org/10.48550/arXiv.2201.11903

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. Proceedings of the 30th Conference on Pattern Languages of Programs. https://arxiv.org/pdf/2302.11382

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2020). BERTScore: Evaluating Text Generation with BERT. 8th International Conference on Learning Representations, ICLR 2020. https://doi.org/https://doi.org/10.48550/arXiv.1904.09675

Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. Proceedings of Machine Learning Research, 139, 12697–12706. https://doi.org/https://doi.org/10.48550/arXiv.2102.09690

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. 11th International Conference on Learning Representations, ICLR 2023. https://doi.org/https://doi.org/10.48550/arXiv.2205.10625