A Comparative Analysis of Algorithms to Address the Imbalanced Dataset Problem in Federated Learning

dc.contributor.advisorGudiño-Mendoza, Gema B.
dc.contributor.authorDurán-González, Erika S.
dc.date.accessioned2025-06-03T18:52:45Z
dc.date.available2025-06-03T18:52:45Z
dc.date.issued2025-05
dc.description.abstractTraditional training in Machine Learning (ML) algorithms requires data collected from various devices to be transferred to a central server, which poses potential security and data-privacy risks. An additional critical aspect of machine learning is class imbalance, which arises when certain classes are underrepresented, potentially leading to suboptimal performance, particularly for minority class data. Different approaches such as oversampling, undersampling, and synthetic data creation have been developed for machine learning to overcome this problem. Federated Learning (FL) is a promising privacy-preserving Artificial Intelligence (AI) framework that addresses the challenges presented in traditional machine learning training. In federated learning, class imbalance may also occur, but the previously mentioned approaches in machine learning are not directly applicable. In federated learning, the class distribution is unknown to protect privacy. Several federated learning algorithms have been developed to address this problem. This thesis aims to implement and compare three federating learning algorithms designed to address the class imbalance problem: Combinatorial Upper Confidence Bounds (CUCB), CLass IMBalance Federated Learning (CLIMB), and Federated Feature Distillation (FedFed). Three different data distributions were tested: label imbalance, quantitative imbalance, and double imbalance. To provide common ground for algorithm comparison, the implementation considers the same dataset and data pre-processing, the same neural network model, and hype-parameter training. After implementation, the results showed that CUCB had the best convergence rate, which is due to the algorithm inferring the data distribution from the test dataset. CLIMB addresses the local and global mismatch imbalance type, making the algorithm more robust and exhibiting the best performance in all data distributions. The FedFed does not perform as anticipated, despite utilizing the latest advancements in generative AI. Further exploration needs to be done in this implementation, where a complex environment is tested, such as increasing the number of clients.
dc.identifier.citationDurán-González, E. S. (2025). A Comparative Analysis of Algorithms to Address the Imbalanced Dataset Problem in Federated Learning. Trabajo de obtención de grado, Maestría en Ciencia de Datos. Tlaquepaque, Jalisco: ITESO.
dc.identifier.urihttps://hdl.handle.net/11117/11583
dc.language.isoeng
dc.publisherITESO
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subjectFederated Learning
dc.subjectData Privacy
dc.subjectImbalanced Dataset
dc.subjectAlgorithm Design and Analysis
dc.titleA Comparative Analysis of Algorithms to Address the Imbalanced Dataset Problem in Federated Learning
dc.typeinfo:eu-repo/semantics/masterThesis
dc.type.versioninfo:eu-repo/semantics/acceptedVersion

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
ITESO_MAF_MScThesis_ED.pdf
Tamaño:
2.49 MB
Formato:
Adobe Portable Document Format

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
4.89 KB
Formato:
Item-specific license agreed upon to submission
Descripción: