Durham HPC Days 2025

AI and simulations — We all need HPC

Kolen Cheung

June 7th, 2025

Introduction

Durham HPC Days

Theme: AI and simulations — We all need HPC

Officially,

there is no gap between HPC and AI: HPC underpins progress in both simulations — high-performance and high throughput — and AI

My general observation of the key themes are:

Keynotes

The changing shape of science funding in the U.S.

Rich Knepper, Chair of the Coalition for Academic Scientific Computation (CASC)

  • Spent about 1 slide per month to document what impacted science in the US since the new administration took place in Jan 20th, 2025.
  • Among strategies to deal with budget/funding cuts, some universities ask their researchers to continue use the fund and prepare to fight it in the court.
  • Cited a paper studying the Impact of World War II on German Science, which can have decades long impact.
  • Other countries are luring US scientists, e.g. €500 million from EU
  • Feel great to be here in the UK!

Finding the Fulcrum: Rethinking Supercomputing at Scale

Cristin Merritt (Chief Marketing Officer - Alces Flight Ltd.)

Molecular Simulation in Process Engineering: Impressions from the Era of Exascale Computing and Data Science

Prof Philipp Neumann

This involves topics such as load balancing, automated algorithm selection and coupled multiscale systems, all of which have been explored and covered in the open-source software packages ls1 mardyn, AutoPas and MaMiCo.

The UK’s Digital Research Infrastructure

Presenter: Afia Masood (UKRI)

Kick-off: The UK’s Knowledge Exchange Grant and Accelerate Computing initiatives

Convener: Helen Cooper, Nick Brown, Tobias Weinzierl

Unleash the control freak in yourself for fun and profit — and for science!

  • Slow code is easy to scale: use an example of strong scaling comparison between -O0 and -O2 (?) flag to demonstrates while the scaling of the latter looks worse, it is still faster than the former in absolute time.
  • Should spent more effort in finding out why, see “prime number effect” for example.
  • MachineState provides a systematic approach to gather as many performance influencing factors as known to the performance engineering community.
    • “more and more conferences and journals request artifact descriptions along with the paper to improve the reproducibility of research”
  • His opinion is that paper without said practices should not be accepted, with the context however applied to Computer Scientists only to maintain inclusivity with RSE.

Thomas Gruber (Regionales RechenZentrum Erlangen - RRZE)

We find that if the number of processes is prime, SpecI2M fails to work properly, which we can attribute to short inner loops emerging from the one-dimensional domain decomposition in this case.

  • Has been rejected twice

“Prime number effect” (fig. 2, Laukemann et al. 2024)

Challenges and Opportunities in HPC for Numerical Relativity (NR)

Dr Katy Clough (STFC Ernest Rutherford Research Fellow)

Symposium

The UK Centre of Excellence (CoE) for the Characterisation and Co-Design of Systems, Hardware and Enabling Software (SHES)

Benchmarking of HPC systems for simulation and AI

Conveners: DiRAC, ExCALIBUR, UKRI Living Benchmarks Lead: Mark Wilkinson

Use science benchmarks to optimise the design of large-scale computing services

Submitted talks

Isambard-AI and Isambard 3: Democratising the User Experience for AI and simulation HPC

Who: Richard Gilham, Bristol Centre for Supercomputing

The HPC Hardware Lab at Durham University

Who: Alastair Basden, Durham University

Commissioning Aire, a new HPC system at The University of Leeds

Who: Andrew Harvie, University of Leeds

Delivering Training with a mini HPC built from Raspberry Pis

Who: Jannetta Steyn, Senior Research Software Engineer, Head of Training and Community, Newcastle University

Driving energy efficiency of operation with wind turbine modelling

Who: Nick Brown, EPCC

IIRC, discussed using RISC-based GPU-like accelerator to demonstrates better energy efficiency comparing to traditional CPU

AI for Green HPC: How Machine Learning is Transforming Energy Efficiency

Who: Fawada Qaiser, Durham University

Case studies from leading supercomputing facilities highlight how reinforcement learning and neural networks enhance power-aware job scheduling, cooling management, and dynamic power adjustments for CPUs, GPUs, and accelerators.

HPC waste heat storage: the ICHS project at Durham University

Who: Paul Walker, Durham University

  1. HPC immersion tank
  2. exploring the use of flooded mine workings beneath the data centre

Advancing CATS, The Climate Aware Task Scheduler, for HPC and HTC application

Who: Sadie Bartholomew, NCAS

Benchmarking ML applications

Who: Adrian Jackson, EPCC

Discussed various challenges and solutions to benchmark ML applications due to project delay

Having it all: Can software be portable, performant and productive?

Who: Chris Maynard, Met Office

DSL, Gung Ho, PSyclone, and LFRic.

Scientific Computing with JAX: A Case Study Evaluating Gravitational Lensing Likelihood

Who: Kolen Cheung, University of Exeter

Simulating Discrete-Event Systems on HPC: Sleptsov Net Case Study

Who: Dmitry Zaitsev, University of Derby

GPU offloads for gravity calculations in SWIFT cosmology code

Who: Sarah Johnston, Durham University

DiRAC RSEs

DiRAC RSE support for SWIFT

Gokmen Kilic (Durham)

Parallel neighbour finding algorithm

Nicolin Govender (UCL)

Key message is that neighbour finding is common in many different domains, and there’s a potential benefits to have a general library with parallel algorithm to unify efforts.

Thoughts about mixed precision

Simon Burbidge (Leicester)

Key message is that mixed precision is great and people should start to explore.

Workshops

HEP: Generative AI for Lattice QCD calculations

Gurtej Kanwar (Edinburgh)

HEP: Determining the structure of the proton with Machine Learning

Roy Stegeman (Edinburgh)

HEP: Physics-focused system design

Antonin Portelli (Edinburgh)

I will summarise how lattice QCD benchmarks, based on the Grid library, were used during the procurement process as well as for optimising system energy efficiency in production.

Benchmarking: The Reframe framework

Tuomas Koskela (UCL)

Performance Modelling of Detrimental Task Execution Patterns in Mainstream OpenMP Runtimes

Adam Tuft (Durham)

Tuft et al. (2024)

While [OpenMP] provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how to implement its annotations.

… “quasi-standard” reference behaviour introduces performance flaws.

… we propose prescriptive clauses to constrain the OpenMP implementations.

Numerical Relativity: MHDuet: Modelling General Relativistic MHD on CPU/GPU Architectures

MHDuet is an automatically generated, efficient computational code designed to simulate the dynamics of strongly gravitating, high-density matter in astrophysical scenarios involving compact objects such as black holes and neutron stars.

… it is currently being ported to AMReX to exploit the capabilities of modern GPU-accelerated and massively parallel systems

… solves the equations of general relativistic magnetohydrodynamics (GRMHD)

Numerical Relativity: Improving eccentric gravitational waveform models with Numerical Relativity

Alice Bonino (University of Birmingham)

… whilst these provide an accurate description of the gravitational-wave signals throughout the inspiral, they are only valid up to moderate eccentricities and are not reliable as the binary approaches merger.

… appeal to Numerical Relativity simulations to help model the complete inspiral-merger-ringdown signal from eccentric binaries.

Numerical Relativity: Automated Kernel Generation for the Numerical Relativity Solver ExaGRyPE

Timothy Stokes (Durham University)

CoSeC & Creating a cohesive distributed Digital Research Infrastructure

Conveners: Stephen Longshaw (UKRI STFC), Damian Jones (UKRI STFC)

13:30 - 15:00 (talks) and 15:30 - 16:30 (panel) Talks

UKRI DRI

Supporting digital Research Technical Professionals (dRTPs): Projects, opportunities and challenges

Lightning talks

Overview

Sponsors

VAST?

One of the presenter from a vendor (spondor) discussed how they use Bε-trees in their storage solution and why it is superior, including “every write is a snapshot”.

It felt very much a sales pitch and it sounds too good to be true (that it doesn’t have compromise in other factors)

Cornelis networks

Presented a new Omni-Path product at 400Gbps with superior metrics comparing to competitor InfiniBand. More details will be presented at ISC2025.

Cambridge RCS

Not from the lightning talk, but the DAWN supercomputer at Cambridge RCS is a gift from Intel and Dell, alledgedly because they have built such a good collaborative relationship with the vendors.

Tutorial

Tutorial: AMD GPUs: Simplify your HPC Application Port to GPUs - OpenMP and Managed Memory on AMD MI300A and MI300X

Presenter: Bob Robey (AMD)

Fail to demonstrate how their compilers utilize the unified memory in their hardware, and in general a very poor tutorial.

References

Laukemann, Jan, Thomas Gruber, Georg Hager, Dossay Oryspayev, and Gerhard Wellein. 2024. “CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.” 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 27, 350–60. https://doi.org/10.1109/IPDPS57955.2024.00038.
Tuft, Adam S., Tobias Weinzierl, and Michael Klemm. 2024. “Detrimental Task Execution Patterns in Mainstream OpenMP® Runtimes.” In Advancing OpenMP for Future Accelerators, edited by Alexis Espinosa, Michael Klemm, Bronis R. De Supinski, Maciej Cytowski, and Jannis Klinkenberg, vol. 15195. Lecture Notes in Computer Science. Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-72567-8_14.