Accelerating Post-Silicon Debug: An Ensemble Machine Learning and Explainable AI Approach for Platform Boot Failures

Cargando...
Miniatura

Fecha

Título de la revista

ISSN de la revista

Título del volumen

Editor

ITESO

Resumen

As modern server platforms increase in complexity, debugging boot failures in FPGA-controlled power-up sequences becomes increasingly difficult, especially in post-silicon environments where reproducing issues is nontrivial and visibility is limited. This work introduces a machine learning-based framework for automatic classification of platform boot states by accessing Control and Status Register (CSR) data through the Board Management Controller (BMC) component. An ensemble model combining Neural Networks, Random Forest, Extreme Gradient Boosting (XGBoost), and a binary refinement classifier enables accurate differentiation across four platform boot conditions. The solution integrates Explainable Artificial Intelligence (XAI) techniques to highlight key signals that influence each decision, offering engineers insights for a faster triage. A Python-based inference script connects pre-silicon training and post-silicon deployment by mapping real-time hardware readings to the input format of the model. The experimental results demonstrate high accuracy, reduced boot state classification overlap, and effective generalization to previously unobserved datasets. This framework significantly improves the speed and clarity of post-silicon debug, reducing the dependency on traditional techniques.

Descripción

Palabras clave

Explainable AI, Hardware, Accuracy, Training, Servers, Neural Networks, Post-Silicon Validation, Power-Up Sequence, Ensemble Learning, Machine Learning, Boot State Classification, Field Programmable Gate Arrays

Citación

Michel-Torres, D. A. (2026). Accelerating Post-Silicon Debug: An Ensemble Machine Learning and Explainable AI Approach for Platform Boot Failures. Trabajo de obtención de grado, Maestría en Diseño Electrónico. Tlaquepaque, Jalisco: ITESO.