The race to Exascale poses significant challenges for the collection and analysis of the vast amount of data that future HPC systems will produce, in terms of the increasing complexity of the machines, the scalability and intrusiveness of the adopted monitoring solution, and the interpretability and effective inference driven by the acquired data.

After a very successful first installment last year we are inviting contributions to the 2nd ISC-HPC International Workshop on Monitoring and Operational Data Analytics (MODA). The goal is to provide a venue for sharing insight into current trends in MODA, to identify potential gaps, and to offer an outlook into the future of the involved fields high performance computing, databases, machine learning, and possible solutions for upcoming Exascale systems. Contributions matching the scope of the workshop will be related to:

  • Currently envisioned solutions and practices for monitoring systems at data centers and HPC sites. Significant focus will be placed on operational data collection mechanisms respectively i) covering different system levels, from building infrastructure sensor data to CPU-core performance metrics, and ii) targeting different end-users, from system administrators to application developers and computational scientists.
  • Effective strategies for analyzing and interpreting the collected operational data. Such strategies should particularly include (but are not limited to) different visualization approaches and machine learning-based techniques, potentially inferring knowledge of the system behavior and allowing for the realization of a proactive control loop.

This workshop is not targeting new solutions proposed in the context of application performance modeling and/or application performance analysis tools. Novel contributions in the area of compiler analysis, debugging, programming models and/or sustainability of scientific software are also considered out of the scope of the workshop.

While MODA is becoming common practice at various international HPC sites, each site adopts a different, insular approach, rarely adopted in production environments and mostly limited to the visualization of the system and building infrastructure metrics for health check purposes. In this regard, we observe a gap between the collection of operational data and its meaningful and effective analysis and exploitation, which prevents the closing of the feedback loop between the monitored HPC system, its operation, and its end-users. Under these premises, the goals of the workshop can then be summarized in the following way:

  1. Gather and share knowledge and establish a common ground within the international community with respect to best practices in monitoring and operational data analytics.
  2. Discuss future strategies and alternatives for MODA, potentially improving existing solutions and envisioning a common baseline approach in HPC sites and data centers.
  3. Establish a debate on the usefulness and applicability of AI techniques on collected operational data for optimizing the operation of production systems (e.g. for practices such as predictive maintenance, runtime optimization, optimal resource allocation and scheduling).
Online user: 1 Privacy