Enhancing LLM Performance in Specialized Spanish Domains Using RAG and PEFT QLoRA

Badillo-Rangel, Erick

Enhancing LLM Performance in Specialized Spanish Domains Using RAG and PEFT QLoRA

Archivos

TOG-ErickBadillo-741550-vf4.1.pdf (1.85 MB)

Fecha

2024-11

Autores

Badillo-Rangel, Erick

Editor

ITESO

Resumen

This project explores improving the performance of large language models (LLMs) in Spanish legal domains by combining Retrieval-Augmented Generation (RAG) with Parameter-Efficient Fine-Tuning (PEFT) using the QLoRA technique. Four experiments were conducted to evaluate zero-shot performance across open-ended, closed-ended, and summarization tasks. These included a vanilla baseline, a RAG-enhanced version, and two fine-tuned models (with and without RAG).

The training and retrieval data were synthetically generated through a cloud-based, serverless ETL process aligned with medallion architecture principles. Experiments focused on the Ley de Impuesto sobre la Renta 2024. Evaluation used BERTScore, ROUGE, and BLEU metrics to assess semantic similarity, n-gram overlap, and linguistic precision.

Palabras clave

RAG, Large Language Models, LLM, ETL, PEFT

Citación

Badillo-Rangel, E. (2024). Enhancing LLM performance in Specialized Spanish Domains using RAG and PEFT QLoRA. Trabajo de obtención de grado, Maestría en Sistemas Computacionales. Tlaquepaque, Jalisco: ITESO.

URI

https://hdl.handle.net/11117/11630

Colecciones

DESI - Trabajos de fin de Maestría en Sistemas Computacionales

Página completa del ítem

Enhancing LLM Performance in Specialized Spanish Domains Using RAG and PEFT QLoRA

Archivos

Fecha

Autores

Título de la revista

ISSN de la revista

Título del volumen

Editor

Resumen

Descripción

Palabras clave

Citación

URI

Colecciones