A Comparative Analysis of Algorithms to Address the Imbalanced Dataset Problem in Federated Learning

Durán-González, Erika S.

A Comparative Analysis of Algorithms to Address the Imbalanced Dataset Problem in Federated Learning

dc.contributor.advisor	Gudiño-Mendoza, Gema B.
dc.contributor.author	Durán-González, Erika S.
dc.date.accessioned	2025-06-03T18:52:45Z
dc.date.available	2025-06-03T18:52:45Z
dc.date.issued	2025-05
dc.description.abstract	Traditional training in Machine Learning (ML) algorithms requires data collected from various devices to be transferred to a central server, which poses potential security and data-privacy risks. An additional critical aspect of machine learning is class imbalance, which arises when certain classes are underrepresented, potentially leading to suboptimal performance, particularly for minority class data. Different approaches such as oversampling, undersampling, and synthetic data creation have been developed for machine learning to overcome this problem. Federated Learning (FL) is a promising privacy-preserving Artificial Intelligence (AI) framework that addresses the challenges presented in traditional machine learning training. In federated learning, class imbalance may also occur, but the previously mentioned approaches in machine learning are not directly applicable. In federated learning, the class distribution is unknown to protect privacy. Several federated learning algorithms have been developed to address this problem. This thesis aims to implement and compare three federating learning algorithms designed to address the class imbalance problem: Combinatorial Upper Confidence Bounds (CUCB), CLass IMBalance Federated Learning (CLIMB), and Federated Feature Distillation (FedFed). Three different data distributions were tested: label imbalance, quantitative imbalance, and double imbalance. To provide common ground for algorithm comparison, the implementation considers the same dataset and data pre-processing, the same neural network model, and hype-parameter training. After implementation, the results showed that CUCB had the best convergence rate, which is due to the algorithm inferring the data distribution from the test dataset. CLIMB addresses the local and global mismatch imbalance type, making the algorithm more robust and exhibiting the best performance in all data distributions. The FedFed does not perform as anticipated, despite utilizing the latest advancements in generative AI. Further exploration needs to be done in this implementation, where a complex environment is tested, such as increasing the number of clients.
dc.identifier.citation	Durán-González, E. S. (2025). A Comparative Analysis of Algorithms to Address the Imbalanced Dataset Problem in Federated Learning. Trabajo de obtención de grado, Maestría en Ciencia de Datos. Tlaquepaque, Jalisco: ITESO.
dc.identifier.uri	https://hdl.handle.net/11117/11583
dc.language.iso	eng
dc.publisher	ITESO
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subject	Federated Learning
dc.subject	Data Privacy
dc.subject	Imbalanced Dataset
dc.subject	Algorithm Design and Analysis
dc.title	A Comparative Analysis of Algorithms to Address the Imbalanced Dataset Problem in Federated Learning
dc.type	info:eu-repo/semantics/masterThesis
dc.type.version	info:eu-repo/semantics/acceptedVersion

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: ITESO_MAF_MScThesis_ED.pdf
Tamaño:: 2.49 MB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 4.89 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

DMAF - Trabajos de fin de Maestría en Ciencia de Datos