Enhancing LLM Performance in Specialized Spanish Domains Using RAG and PEFT QLoRA

Cargando...
Miniatura

Fecha

Título de la revista

ISSN de la revista

Título del volumen

Editor

ITESO

Resumen

This project explores improving the performance of large language models (LLMs) in Spanish legal domains by combining Retrieval-Augmented Generation (RAG) with Parameter-Efficient Fine-Tuning (PEFT) using the QLoRA technique. Four experiments were conducted to evaluate zero-shot performance across open-ended, closed-ended, and summarization tasks. These included a vanilla baseline, a RAG-enhanced version, and two fine-tuned models (with and without RAG).

The training and retrieval data were synthetically generated through a cloud-based, serverless ETL process aligned with medallion architecture principles. Experiments focused on the Ley de Impuesto sobre la Renta 2024. Evaluation used BERTScore, ROUGE, and BLEU metrics to assess semantic similarity, n-gram overlap, and linguistic precision.

Descripción

Palabras clave

RAG, Large Language Models, LLM, ETL, PEFT

Citación

Badillo-Rangel, E. (2024). Enhancing LLM performance in Specialized Spanish Domains using RAG and PEFT QLoRA. Trabajo de obtención de grado, Maestría en Sistemas Computacionales. Tlaquepaque, Jalisco: ITESO.