Light-weight prediction for improving energy consumption in HPC platforms - Systèmes Répartis, Calcul Parallèle et Réseaux
Communication Dans Un Congrès Année : 2024

Light-weight prediction for improving energy consumption in HPC platforms

Résumé

With the increase of demand for computing resources and the struggle to provide the necessary energy, power-aware resource management is becoming a major issue for the High-performance computing (HPC) community. Including reliable energy management to a supercomputer's resource and job management system (RJMS) is not an easy task. The energy consumption of jobs is rarely known in advance and the workload of every machine is unique and different from the others. We argue that the first step toward properly managing energy is to deeply understand the energy consumption of the workload, which involves predicting the workload's power consumption and exploiting it by using smart power-aware scheduling algorithms. Crucial questions are (i) how sophisticated a prediction method needs to be to provide accurate workload power predictions, and (ii) to what point an accurate workload's power prediction translates into efficient energy management. In this work, we propose a method to predict and exploit HPC workloads' power consumption, with the objective of reducing the supercomputer's power consumption while maintaining the management (scheduling) performance of the RJMS. Our method exploits workload submission logs with power monitoring data, and relies on a mix of light-weight power prediction methods and a classical EASY Backfillling inspired heuristic. We base this study on logs of Marconi 100, a 980 servers supercomputer. We show using simulation that a light-weight history-based prediction method can provide accurate enough power prediction to improve the energy management of a large scale supercomputer compared to energy-unaware scheduling algorithms. These improvements have no significant negative impact on performance.
Fichier principal
Vignette du fichier
Energumen_Scheduling_Jobs_Energy (2).pdf (460.6 Ko) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04566184 , version 1 (02-05-2024)
hal-04566184 , version 2 (29-08-2024)

Licence

Identifiants

Citer

Danilo Carastan-Santos, Georges da Costa, Millian Poquet, Patricia Stolf, Denis Trystram. Light-weight prediction for improving energy consumption in HPC platforms. Euro-Par 2024, Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M., Aug 2024, Madrid, Spain. pp.152-165, ⟨10.1007/978-3-031-69577-3_11⟩. ⟨hal-04566184v2⟩
1130 Consultations
368 Téléchargements

Altmetric

Partager

More