Theme: AI and simulations — We all need HPC

Officially,

there is no gap between HPC and AI: HPC underpins progress in both simulations — high-performance and high throughput — and AI

My general observation of the key themes are:

Big themes:
- GPU
- AI
Smaller themes:
- Benchmarking
- People (RSE)
- Green

The changing shape of science funding in the U.S.

Rich Knepper, Chair of the Coalition for Academic Scientific Computation (CASC)

Spent about 1 slide per month to document what impacted science in the US since the new administration took place in Jan 20th, 2025.
Among strategies to deal with budget/funding cuts, some universities ask their researchers to continue use the fund and prepare to fight it in the court.
Cited a paper studying the Impact of World War II on German Science, which can have decades long impact.
Other countries are luring US scientists, e.g. €500 million from EU
Feel great to be here in the UK!

National Science Foundation grant funding through May 21

Finding the Fulcrum: Rethinking Supercomputing at Scale

Cristin Merritt (Chief Marketing Officer - Alces Flight Ltd.)

A reflection on what she learnt in 2024, launching Move the Needle, a year-long initiative advancing workforce inclusivity in HPC
Emphasize on the people in the HPC community
As demands, technologies, etc. changes, rebalancing is key
Technical skill is represented by the stand
Soft skill probably is the beam, or one-side of the beam

Molecular Simulation in Process Engineering: Impressions from the Era of Exascale Computing and Data Science

Prof Philipp Neumann

This involves topics such as load balancing, automated algorithm selection and coupled multiscale systems, all of which have been explored and covered in the open-source software packages ls1 mardyn, AutoPas and MaMiCo.

The UK’s Digital Research Infrastructure

Presenter: Afia Masood (UKRI)

Kick-off: The UK’s Knowledge Exchange Grant and Accelerate Computing initiatives

Convener: Helen Cooper, Nick Brown, Tobias Weinzierl

Knowledge Exchange DRI grant
two DRI accelerate computing grants
- Accelerated Compute Infrastructure Training (ACIT)
- SHAREing

Unleash the control freak in yourself for fun and profit — and for science!

Slow code is easy to scale: use an example of strong scaling comparison between -O0 and -O2 (?) flag to demonstrates while the scaling of the latter looks worse, it is still faster than the former in absolute time.
Should spent more effort in finding out why, see “prime number effect” for example.
MachineState provides a systematic approach to gather as many performance influencing factors as known to the performance engineering community.
- “more and more conferences and journals request artifact descriptions along with the paper to improve the reproducibility of research”
His opinion is that paper without said practices should not be accepted, with the context however applied to Computer Scientists only to maintain inclusivity with RSE.

Thomas Gruber (Regionales RechenZentrum Erlangen - RRZE)

We find that if the number of processes is prime, SpecI2M fails to work properly, which we can attribute to short inner loops emerging from the one-dimensional domain decomposition in this case.

Has been rejected twice

“Prime number effect” (fig. 2, Laukemann et al. 2024)

Challenges and Opportunities in HPC for Numerical Relativity (NR)

Dr Katy Clough (STFC Ernest Rutherford Research Fellow)

GRTeclyn
- A “lucky” experience of porting GR code using the AMReX framework which utilizes the GPU efficiently to solve wave equations.
- GR can use off-the-shelf library is that GR can be rewritten as an initial value problem looks very much like a wave equation, and with the help of a gauge, without degeneracy such that it can be solved numerically.
- While high curvature near the sigularity is a problem, since there’s no information flowing back from within the event horizon, you can afford garbage solution near within the event horizon and just mask out the singularity!
- signatures:
  - merging of 2 black holes, which has direct observational science implications
    - proxies such as analytical model fitting is built by calibrating via NR to facilitate Bayesian Inference.
  - alien warp drives: NR can predicts the signature of warp drive engine failure! Author lements the current generations of young people not watching Star Trek enough. The frequency scale is not common in other kind of science so it is pretty hopeless that we’d build an experiment to observe such!
Avocates the importance of RSE helps (4 RSEs for this porting), and more people should be made known to the resource available. She only knew because she has access to DiRAC and other colleagues told her about a call.
UK community for NR: UKNR

The UK Centre of Excellence (CoE) for the Characterisation and Co-Design of Systems, Hardware and Enabling Software (SHES)

Goal: maximize ROI of HPC by a continuous improvement process with these systems, dynamically adapting and characterising them to suit ever-changing workloads and demands

Benchmarking of HPC systems for simulation and AI

Conveners: DiRAC, ExCALIBUR, UKRI Living Benchmarks Lead: Mark Wilkinson

Use science benchmarks to optimise the design of large-scale computing services

the ExCALIBUR Benchmarking project
the ExCALIBUR BASE-II project
the UKRI Living Benchmarks project: curating open-source benchmarks that support the procurement and performance assessment of high-performance computing systems
the DiRAC-4 design process
- Mark emphasized the continuous benchmarking of the DiRAC system (Cosma 8?) from procurement to deployment. Its performance has increased by 2x in the process by efforts from RSE. He avocates for a model similar to the US, where half the funding goes to the machine, half goes to the people.

Isambard-AI and Isambard 3: Democratising the User Experience for AI and simulation HPC

Who: Richard Gilham, Bristol Centre for Supercomputing

The HPC Hardware Lab at Durham University

Who: Alastair Basden, Durham University

The Durham HPC Hardware Lab is hosted by the DiRAC COSMA HPC facility and provides UK researchers with access to cutting edge technologies and facilities, to allow testing of codes, software migration to new hardware, and study of new paradigms.
One thing I remember from it is how they set up two identical systems with a different interconnects to study the difference just between the two technologies of the interconnects.

Commissioning Aire, a new HPC system at The University of Leeds

Who: Andrew Harvie, University of Leeds

IIRC, part of the challenge is to navigate how to frame it as an upgrade to retiring systems, not a new system

Delivering Training with a mini HPC built from Raspberry Pis

Who: Jannetta Steyn, Senior Research Software Engineer, Head of Training and Community, Newcastle University

Build a mini-HPC system with 7 Raspberry Pi
While similar prior work exists, this project aims to make it reproducible (from software to hardware to 3D printed housing to “blueprint”)
Experimented with alternative SBC with funding, but eventually replace it with all the RPis she can find in her home, because RPi is better documented and supported
Primarily for educating people to build HPC without £££ (i.e. lowering entry barrier)
- Much easier to see the effect of, say, overloading the cluster and see the effects on another workload
Would love to have access to GPU in the future, but suffices for now

Driving energy efficiency of operation with wind turbine modelling

Who: Nick Brown, EPCC

IIRC, discussed using RISC-based GPU-like accelerator to demonstrates better energy efficiency comparing to traditional CPU

AI for Green HPC: How Machine Learning is Transforming Energy Efficiency

Who: Fawada Qaiser, Durham University

Case studies from leading supercomputing facilities highlight how reinforcement learning and neural networks enhance power-aware job scheduling, cooling management, and dynamic power adjustments for CPUs, GPUs, and accelerators.

HPC waste heat storage: the ICHS project at Durham University

Who: Paul Walker, Durham University

HPC immersion tank
exploring the use of flooded mine workings beneath the data centre

Advancing CATS, The Climate Aware Task Scheduler, for HPC and HTC application

Who: Sadie Bartholomew, NCAS

Climate-Aware Task Scheduler, which schedules tasks to minimise the total estimated carbon intensity of the electricity grid for the job duration using real-time data from the UK’s National Grid ESO API
Version 1 (current): at command, targeted smaller-scale tasks on local machines
Version 2 (future): integrate with Slurm

Benchmarking ML applications

Who: Adrian Jackson, EPCC

Discussed various challenges and solutions to benchmark ML applications due to project delay

Having it all: Can software be portable, performant and productive?

Who: Chris Maynard, Met Office

DSL, Gung Ho, PSyclone, and LFRic.

Scientific Computing with JAX: A Case Study Evaluating Gravitational Lensing Likelihood

Who: Kolen Cheung, University of Exeter

Link
Expanded upon that I presented in the technical
Conclude that JAX is not good on multithreading on CPU, i.e. JAX is not good for traditional HPC with CPUs only
Interesting interactions from others:
- Why JAX is bad on multithreading on CPU? Probably just a lack of interested from primarily machine learning focused community (who wouldn’t have access to GPU/TPU?)
- One like to use JAX for things like einsum a lot, and concerns about the performance implications. My feedback was
  - It can be beneficial if users have access to GPU
  - Possibly keep a few similar implementations for different target systems
  - Profile it if performance is a concern, including the jit-lag if recompile per shape change is a concern
  - Think about long term requirements (or the lack of), scale of the project, etc.
- Another RSE is very concerned as PI(s) are pushing to port to JAX as their collegues have great experience
  - JAX is not the answer to everything (and point out limitations mentioned in my talk)
  - He don’t want to port 80k lines of C++ to another framework, would rather implement autodiff/autograd in that code base (Good luck!)
- Discussed other related things such as Cython, Julia, etc.

Simulating Discrete-Event Systems on HPC: Sleptsov Net Case Study

Who: Dmitry Zaitsev, University of Derby

Sleptsov nets
- Turing-complete
- parallel algorithm for Discrete-event systems (DES)
Very unapproachable
Often ask strange questions to other speakers
- recapturing heat of data centre and turn that into electricity to power data centre, described as perpetual motion machine by a panel speaker
- (To be fair of course he meant recapturing some energy and reduces waste)

GPU offloads for gravity calculations in SWIFT cosmology code

Who: Sarah Johnston, Durham University

Praised by others to be a very approachable presentation even to outsiders
Focused on porting the part of the simulation that involve gravity, accounted for ~60% of workload
The original CPU code has 3 terms: particle-particle, particle-multipole, multipole-multipole.
The GPU code dropped the particle-multipole code with better precision
- Why? Because more particles are included in the particle-particle term as GPU has many larger parallelism available
This code is memory intensive, benefits of porting fully to the GPU is unclear. But it is started as people saying it is important to port to the GPU (probably PhD advisor(s) is primarily driving it forward?)
IIRC, currently it (the GPU implementation) is slower than the CPU (implementation)

DiRAC RSE support for SWIFT

Gokmen Kilic (Durham)

I thought the SWIFT programming language is surprisingly having a presence in HPC
SWIFT stands for SPH With Inter-dependent Fine-grained Tasking, where SPH stands for Smoothed Particle Hydrodynamics.

Parallel neighbour finding algorithm

Nicolin Govender (UCL)

Key message is that neighbour finding is common in many different domains, and there’s a potential benefits to have a general library with parallel algorithm to unify efforts.

Thoughts about mixed precision

Simon Burbidge (Leicester)

Key message is that mixed precision is great and people should start to explore.

HEP: Generative AI for Lattice QCD calculations

Gurtej Kanwar (Edinburgh)

HEP: Determining the structure of the proton with Machine Learning

Roy Stegeman (Edinburgh)

Solve the inverse problem of extracting the parton distribution functions from a finite set of data
One of the key I can remember is that it is possible to validate the output by checking symmetry requirements of the solutions, which is key to design the ML model
They didn’t have the chance to discuss the ML model at all due to time constraints

HEP: Physics-focused system design

Antonin Portelli (Edinburgh)

I will summarise how lattice QCD benchmarks, based on the Grid library, were used during the procurement process as well as for optimising system energy efficiency in production.

Benchmarking: The Reframe framework

Tuomas Koskela (UCL)

ReFrame

Performance Modelling of Detrimental Task Execution Patterns in Mainstream OpenMP Runtimes

Adam Tuft (Durham)

Tuft et al. (2024)

While [OpenMP] provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how to implement its annotations.

… “quasi-standard” reference behaviour introduces performance flaws.

… we propose prescriptive clauses to constrain the OpenMP implementations.

Numerical Relativity: MHDuet: Modelling General Relativistic MHD on CPU/GPU Architectures

MHDuet is an automatically generated, efficient computational code designed to simulate the dynamics of strongly gravitating, high-density matter in astrophysical scenarios involving compact objects such as black holes and neutron stars.

… it is currently being ported to AMReX to exploit the capabilities of modern GPU-accelerated and massively parallel systems

… solves the equations of general relativistic magnetohydrodynamics (GRMHD)

Numerical Relativity: Improving eccentric gravitational waveform models with Numerical Relativity

Alice Bonino (University of Birmingham)

… whilst these provide an accurate description of the gravitational-wave signals throughout the inspiral, they are only valid up to moderate eccentricities and are not reliable as the binary approaches merger.

… appeal to Numerical Relativity simulations to help model the complete inspiral-merger-ringdown signal from eccentric binaries.

Numerical Relativity: Automated Kernel Generation for the Numerical Relativity Solver ExaGRyPE

Timothy Stokes (Durham University)

ExaHyPE is a numerical engine used to solve hyperbolic PDE systems
- ExaGRyPE is NR based on ExaHyPE
ExaHyPe-DSL eCSE project which replaces these manual kernels with those written using a Domain Specific Language (DSL)
- think PSyclone
- Python DSL \(\rightarrow\) Python AST \(\rightarrow\) ExCALIBUR xDSL toolkit \(\rightarrow\) High-Level MLIR \(\rightarrow\) … \(\rightarrow\) Lower-Level MLIR Dialects \(\rightarrow\) LLVM-IR \(\rightarrow\) Machine Code
- complicated build system with make to expose it back as Python module
- we have a nice dicussion about if this is a right abstraction, and also mention a possible Cython-style jit-like experience in Notebook
- he also mention if he’d do it again, he’d not use the Python AST as there are edge cases to fix, better implement his own parser instead.

CoSeC & Creating a cohesive distributed Digital Research Infrastructure

Conveners: Stephen Longshaw (UKRI STFC), Damian Jones (UKRI STFC)

13:30 - 15:00 (talks) and 15:30 - 16:30 (panel) Talks

Applied AI for the Digital Humanities (CCP-AHC; Karina Rodriguez Echavarria / Jeyan Thiyagalingam)
Computational Biology (Martyn Winn – joining remotely)
Computational Engineering (CCP-NTH, CCP-Turbulence, UKTC; Wei Wang)
Computational Materials and Molecular Science (Marcello Puligheddu / Rajany RV)
Panel discussion:

The basic idea is to discuss how (and whether?) it is possible for us to create a DRI using the distributed funding approach – i.e. funding the infrastructure as lots of smaller projects (relatively) that need to interoperate to be a single cohesive overall infrastructure.
- Mark commented the “80% rule” does not work well for project like DiRAC as it would not make sense for one university to pay 20% overhead for all collaborations.
- Some call that is not for early researchers often requires people to have experience in the field. For those people sensing their field is shrinking as there’s a loss of interest, they often find themselves rejects after rejects in calls.
- Small funds (~£5,000) and large funds (~£100,000) have big gaps. Something in between is needed.
- Too many different kinds of funds, might need to be consolidated.

Supporting digital Research Technical Professionals (dRTPs): Projects, opportunities and challenges

UNIVERSE-HPC will define a training curriculum framework – spanning from undergraduate to continuing professional development level - for Research Software Engineers (RSEs) specializing in high performance computing (HPC).
two Network+ projects:
- CHARTED
- SCALE-UP
projects:
- STEP-UP - A Strategic TEchnical Platform for University Technical Professionals. Supporting “digital Research Technical Professionals” (dRTPs) and researchers working with research software, research data and research computing infrastructure, in the London region and beyond.
- DRIFT is focused on the training for research facilitators and teams
- CCP-AHC: Collaborative Computational Project (CCP) serving Arts, Humanities, and Culture researchers
  - CCP-AHC Expression of Interest for Research Software - Codes, Pipelines, and Workflows

VAST?

One of the presenter from a vendor (spondor) discussed how they use Bε-trees in their storage solution and why it is superior, including “every write is a snapshot”.

It felt very much a sales pitch and it sounds too good to be true (that it doesn’t have compromise in other factors)

Cornelis networks

Presented a new Omni-Path product at 400Gbps with superior metrics comparing to competitor InfiniBand. More details will be presented at ISC2025.

Cambridge RCS

Not from the lightning talk, but the DAWN supercomputer at Cambridge RCS is a gift from Intel and Dell, alledgedly because they have built such a good collaborative relationship with the vendors.

Tutorial: AMD GPUs: Simplify your HPC Application Port to GPUs - OpenMP and Managed Memory on AMD MI300A and MI300X

Presenter: Bob Robey (AMD)

Fail to demonstrate how their compilers utilize the unified memory in their hardware, and in general a very poor tutorial.

References

Laukemann, Jan, Thomas Gruber, Georg Hager, Dossay Oryspayev, and Gerhard Wellein. 2024. “CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.” 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 27, 350–60. https://doi.org/10.1109/IPDPS57955.2024.00038.

Tuft, Adam S., Tobias Weinzierl, and Michael Klemm. 2024. “Detrimental Task Execution Patterns in Mainstream OpenMP® Runtimes.” In Advancing OpenMP for Future Accelerators, edited by Alexis Espinosa, Michael Klemm, Bronis R. De Supinski, Maciej Cytowski, and Jannis Klinkenberg, vol. 15195. Lecture Notes in Computer Science. Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-72567-8_14.

Durham HPC Days 2025

Introduction

Durham HPC Days