Enhancing LLM Performance in Specialized Spanish Domains Using RAG and PEFT QLoRA

Escobar-Vega, Luis M.Badillo-Rangel, Erick2025-06-252025-06-252024-11Badillo-Rangel, E. (2024). Enhancing LLM performance in Specialized Spanish Domains using RAG and PEFT QLoRA. Trabajo de obtención de grado, Maestría en Sistemas Computacionales. Tlaquepaque, Jalisco: ITESO.https://hdl.handle.net/11117/11630This project explores improving the performance of large language models (LLMs) in Spanish legal domains by combining Retrieval-Augmented Generation (RAG) with Parameter-Efficient Fine-Tuning (PEFT) using the QLoRA technique. Four experiments were conducted to evaluate zero-shot performance across open-ended, closed-ended, and summarization tasks. These included a vanilla baseline, a RAG-enhanced version, and two fine-tuned models (with and without RAG). The training and retrieval data were synthetically generated through a cloud-based, serverless ETL process aligned with medallion architecture principles. Experiments focused on the Ley de Impuesto sobre la Renta 2024. Evaluation used BERTScore, ROUGE, and BLEU metrics to assess semantic similarity, n-gram overlap, and linguistic precision.enghttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.esRAGLarge Language ModelsLLMETLPEFTEnhancing LLM Performance in Specialized Spanish Domains Using RAG and PEFT QLoRAMejora del rendimiento de los LLM en dominios especializados en español utilizando RAG y PEFT QLoRAinfo:eu-repo/semantics/masterThesis