Durham HPC Days 2025
AI and simulations — We all need HPC
Kolen Cheung
June 7th, 2025
Introduction
Durham HPC Days
- From Durham University
- Strong ties with DiRAC HPC facility
- Seem to always be at the week predating the ISC Conference
Theme: AI and simulations — We all need HPC
Officially,
there is no gap between HPC and AI: HPC underpins progress in both simulations — high-performance and high throughput — and AI
My general observation of the key themes are:
- Big themes:
- Smaller themes:
- Benchmarking
- People (RSE)
- Green
Keynotes
The changing shape of science funding in the U.S.
Rich Knepper, Chair of the Coalition for Academic Scientific Computation (CASC)
- Spent about 1 slide per month to document what impacted science in the US since the new administration took place in Jan 20th, 2025.
- Among strategies to deal with budget/funding cuts, some universities ask their researchers to continue use the fund and prepare to fight it in the court.
- Cited a paper studying the Impact of World War II on German Science, which can have decades long impact.
- Other countries are luring US scientists, e.g. €500 million from EU
- Feel great to be here in the UK!
Finding the Fulcrum: Rethinking Supercomputing at Scale
Cristin Merritt (Chief Marketing Officer - Alces Flight Ltd.)
- A reflection on what she learnt in 2024, launching Move the Needle, a year-long initiative advancing workforce inclusivity in HPC
- Emphasize on the people in the HPC community
- As demands, technologies, etc. changes, rebalancing is key
- Technical skill is represented by the stand
- Soft skill probably is the beam, or one-side of the beam
Molecular Simulation in Process Engineering: Impressions from the Era of Exascale Computing and Data Science
Prof Philipp Neumann
This involves topics such as load balancing, automated algorithm selection and coupled multiscale systems, all of which have been explored and covered in the open-source software packages ls1 mardyn, AutoPas and MaMiCo.
The UK’s Digital Research Infrastructure
Presenter: Afia Masood (UKRI)
Kick-off: The UK’s Knowledge Exchange Grant and Accelerate Computing initiatives
Convener: Helen Cooper, Nick Brown, Tobias Weinzierl
- Knowledge Exchange DRI grant
- two DRI accelerate computing grants
Unleash the control freak in yourself for fun and profit — and for science!
- Slow code is easy to scale: use an example of strong scaling comparison between
-O0 and -O2 (?) flag to demonstrates while the scaling of the latter looks worse, it is still faster than the former in absolute time.
- Should spent more effort in finding out why, see “prime number effect” for example.
- MachineState provides a systematic approach to gather as many performance influencing factors as known to the performance engineering community.
- “more and more conferences and journals request artifact descriptions along with the paper to improve the reproducibility of research”
- His opinion is that paper without said practices should not be accepted, with the context however applied to Computer Scientists only to maintain inclusivity with RSE.
Thomas Gruber (Regionales RechenZentrum Erlangen - RRZE)
We find that if the number of processes is prime, SpecI2M fails to work properly, which we can attribute to short inner loops emerging from the one-dimensional domain decomposition in this case.
Challenges and Opportunities in HPC for Numerical Relativity (NR)
Dr Katy Clough (STFC Ernest Rutherford Research Fellow)
- GRTeclyn
- A “lucky” experience of porting GR code using the AMReX framework which utilizes the GPU efficiently to solve wave equations.
- GR can use off-the-shelf library is that GR can be rewritten as an initial value problem looks very much like a wave equation, and with the help of a gauge, without degeneracy such that it can be solved numerically.
- While high curvature near the sigularity is a problem, since there’s no information flowing back from within the event horizon, you can afford garbage solution near within the event horizon and just mask out the singularity!
- signatures:
- merging of 2 black holes, which has direct observational science implications
- proxies such as analytical model fitting is built by calibrating via NR to facilitate Bayesian Inference.
- alien warp drives: NR can predicts the signature of warp drive engine failure! Author lements the current generations of young people not watching Star Trek enough. The frequency scale is not common in other kind of science so it is pretty hopeless that we’d build an experiment to observe such!
- Avocates the importance of RSE helps (4 RSEs for this porting), and more people should be made known to the resource available. She only knew because she has access to DiRAC and other colleagues told her about a call.
- UK community for NR: UKNR
Symposium
The UK Centre of Excellence (CoE) for the Characterisation and Co-Design of Systems, Hardware and Enabling Software (SHES)
- Goal: maximize ROI of HPC by a continuous improvement process with these systems, dynamically adapting and characterising them to suit ever-changing workloads and demands
Benchmarking of HPC systems for simulation and AI
Conveners: DiRAC, ExCALIBUR, UKRI Living Benchmarks Lead: Mark Wilkinson
Use science benchmarks to optimise the design of large-scale computing services
- the ExCALIBUR Benchmarking project
- the ExCALIBUR BASE-II project
- the UKRI Living Benchmarks project: curating open-source benchmarks that support the procurement and performance assessment of high-performance computing systems
- the DiRAC-4 design process
- Mark emphasized the continuous benchmarking of the DiRAC system (Cosma 8?) from procurement to deployment. Its performance has increased by 2x in the process by efforts from RSE. He avocates for a model similar to the US, where half the funding goes to the machine, half goes to the people.
Submitted talks
Isambard-AI and Isambard 3: Democratising the User Experience for AI and simulation HPC
Who: Richard Gilham, Bristol Centre for Supercomputing
The HPC Hardware Lab at Durham University
Who: Alastair Basden, Durham University
- The Durham HPC Hardware Lab is hosted by the DiRAC COSMA HPC facility and provides UK researchers with access to cutting edge technologies and facilities, to allow testing of codes, software migration to new hardware, and study of new paradigms.
- One thing I remember from it is how they set up two identical systems with a different interconnects to study the difference just between the two technologies of the interconnects.
Commissioning Aire, a new HPC system at The University of Leeds
Who: Andrew Harvie, University of Leeds
- IIRC, part of the challenge is to navigate how to frame it as an upgrade to retiring systems, not a new system
Delivering Training with a mini HPC built from Raspberry Pis
Who: Jannetta Steyn, Senior Research Software Engineer, Head of Training and Community, Newcastle University
- Build a mini-HPC system with 7 Raspberry Pi
- While similar prior work exists, this project aims to make it reproducible (from software to hardware to 3D printed housing to “blueprint”)
- Experimented with alternative SBC with funding, but eventually replace it with all the RPis she can find in her home, because RPi is better documented and supported
- Primarily for educating people to build HPC without £££ (i.e. lowering entry barrier)
- Much easier to see the effect of, say, overloading the cluster and see the effects on another workload
- Would love to have access to GPU in the future, but suffices for now
Driving energy efficiency of operation with wind turbine modelling
Who: Nick Brown, EPCC
IIRC, discussed using RISC-based GPU-like accelerator to demonstrates better energy efficiency comparing to traditional CPU
HPC waste heat storage: the ICHS project at Durham University
Who: Paul Walker, Durham University
- HPC immersion tank
- exploring the use of flooded mine workings beneath the data centre
Advancing CATS, The Climate Aware Task Scheduler, for HPC and HTC application
Who: Sadie Bartholomew, NCAS
- Climate-Aware Task Scheduler, which schedules tasks to minimise the total estimated carbon intensity of the electricity grid for the job duration using real-time data from the UK’s National Grid ESO API
- Version 1 (current):
at command, targeted smaller-scale tasks on local machines
- Version 2 (future): integrate with Slurm
Benchmarking ML applications
Who: Adrian Jackson, EPCC
Discussed various challenges and solutions to benchmark ML applications due to project delay
Scientific Computing with JAX: A Case Study Evaluating Gravitational Lensing Likelihood
Who: Kolen Cheung, University of Exeter
- Link
- Expanded upon that I presented in the technical
- Conclude that JAX is not good on multithreading on CPU, i.e. JAX is not good for traditional HPC with CPUs only
- Interesting interactions from others:
- Why JAX is bad on multithreading on CPU? Probably just a lack of interested from primarily machine learning focused community (who wouldn’t have access to GPU/TPU?)
- One like to use JAX for things like einsum a lot, and concerns about the performance implications. My feedback was
- It can be beneficial if users have access to GPU
- Possibly keep a few similar implementations for different target systems
- Profile it if performance is a concern, including the jit-lag if recompile per shape change is a concern
- Think about long term requirements (or the lack of), scale of the project, etc.
- Another RSE is very concerned as PI(s) are pushing to port to JAX as their collegues have great experience
- JAX is not the answer to everything (and point out limitations mentioned in my talk)
- He don’t want to port 80k lines of C++ to another framework, would rather implement autodiff/autograd in that code base (Good luck!)
- Discussed other related things such as Cython, Julia, etc.
Simulating Discrete-Event Systems on HPC: Sleptsov Net Case Study
Who: Dmitry Zaitsev, University of Derby
- Sleptsov nets
- Turing-complete
- parallel algorithm for Discrete-event systems (DES)
- Very unapproachable
- Often ask strange questions to other speakers
- recapturing heat of data centre and turn that into electricity to power data centre, described as perpetual motion machine by a panel speaker
- (To be fair of course he meant recapturing some energy and reduces waste)
GPU offloads for gravity calculations in SWIFT cosmology code
Who: Sarah Johnston, Durham University
- Praised by others to be a very approachable presentation even to outsiders
- Focused on porting the part of the simulation that involve gravity, accounted for ~60% of workload
- The original CPU code has 3 terms: particle-particle, particle-multipole, multipole-multipole.
- The GPU code dropped the particle-multipole code with better precision
- Why? Because more particles are included in the particle-particle term as GPU has many larger parallelism available
- This code is memory intensive, benefits of porting fully to the GPU is unclear. But it is started as people saying it is important to port to the GPU (probably PhD advisor(s) is primarily driving it forward?)
- IIRC, currently it (the GPU implementation) is slower than the CPU (implementation)
DiRAC RSEs
DiRAC RSE support for SWIFT
Gokmen Kilic (Durham)
- I thought the SWIFT programming language is surprisingly having a presence in HPC
- SWIFT stands for SPH With Inter-dependent Fine-grained Tasking, where SPH stands for Smoothed Particle Hydrodynamics.
Parallel neighbour finding algorithm
Nicolin Govender (UCL)
Key message is that neighbour finding is common in many different domains, and there’s a potential benefits to have a general library with parallel algorithm to unify efforts.
Thoughts about mixed precision
Simon Burbidge (Leicester)
Key message is that mixed precision is great and people should start to explore.
Workshops
HEP: Generative AI for Lattice QCD calculations
Gurtej Kanwar (Edinburgh)
HEP: Determining the structure of the proton with Machine Learning
Roy Stegeman (Edinburgh)
- Solve the inverse problem of extracting the parton distribution functions from a finite set of data
- One of the key I can remember is that it is possible to validate the output by checking symmetry requirements of the solutions, which is key to design the ML model
- They didn’t have the chance to discuss the ML model at all due to time constraints
HEP: Physics-focused system design
Antonin Portelli (Edinburgh)
I will summarise how lattice QCD benchmarks, based on the Grid library, were used during the procurement process as well as for optimising system energy efficiency in production.
Benchmarking: The Reframe framework
Tuomas Koskela (UCL)
Performance Modelling of Detrimental Task Execution Patterns in Mainstream OpenMP Runtimes
Adam Tuft (Durham)
Tuft et al. (2024)
While [OpenMP] provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how to implement its annotations.
… “quasi-standard” reference behaviour introduces performance flaws.
… we propose prescriptive clauses to constrain the OpenMP implementations.
Numerical Relativity: MHDuet: Modelling General Relativistic MHD on CPU/GPU Architectures
MHDuet is an automatically generated, efficient computational code designed to simulate the dynamics of strongly gravitating, high-density matter in astrophysical scenarios involving compact objects such as black holes and neutron stars.
… it is currently being ported to AMReX to exploit the capabilities of modern GPU-accelerated and massively parallel systems
… solves the equations of general relativistic magnetohydrodynamics (GRMHD)
Numerical Relativity: Automated Kernel Generation for the Numerical Relativity Solver ExaGRyPE
Timothy Stokes (Durham University)
- ExaHyPE is a numerical engine used to solve hyperbolic PDE systems
- ExaHyPe-DSL eCSE project which replaces these manual kernels with those written using a Domain Specific Language (DSL)
- think PSyclone
- Python DSL \(\rightarrow\) Python AST \(\rightarrow\) ExCALIBUR xDSL toolkit \(\rightarrow\) High-Level MLIR \(\rightarrow\) … \(\rightarrow\) Lower-Level MLIR Dialects \(\rightarrow\) LLVM-IR \(\rightarrow\) Machine Code
- complicated build system with make to expose it back as Python module
- we have a nice dicussion about if this is a right abstraction, and also mention a possible Cython-style jit-like experience in Notebook
- he also mention if he’d do it again, he’d not use the Python AST as there are edge cases to fix, better implement his own parser instead.
CoSeC & Creating a cohesive distributed Digital Research Infrastructure
Conveners: Stephen Longshaw (UKRI STFC), Damian Jones (UKRI STFC)
13:30 - 15:00 (talks) and 15:30 - 16:30 (panel) Talks
Applied AI for the Digital Humanities (CCP-AHC; Karina Rodriguez Echavarria / Jeyan Thiyagalingam)
Computational Biology (Martyn Winn – joining remotely)
Computational Engineering (CCP-NTH, CCP-Turbulence, UKTC; Wei Wang)
Computational Materials and Molecular Science (Marcello Puligheddu / Rajany RV)
Panel discussion:
The basic idea is to discuss how (and whether?) it is possible for us to create a DRI using the distributed funding approach – i.e. funding the infrastructure as lots of smaller projects (relatively) that need to interoperate to be a single cohesive overall infrastructure.
Mark commented the “80% rule” does not work well for project like DiRAC as it would not make sense for one university to pay 20% overhead for all collaborations.
Some call that is not for early researchers often requires people to have experience in the field. For those people sensing their field is shrinking as there’s a loss of interest, they often find themselves rejects after rejects in calls.
Small funds (~£5,000) and large funds (~£100,000) have big gaps. Something in between is needed.
Too many different kinds of funds, might need to be consolidated.
UKRI DRI
Supporting digital Research Technical Professionals (dRTPs): Projects, opportunities and challenges
- UNIVERSE-HPC will define a training curriculum framework – spanning from undergraduate to continuing professional development level - for Research Software Engineers (RSEs) specializing in high performance computing (HPC).
- two Network+ projects:
- projects:
- STEP-UP - A Strategic TEchnical Platform for University Technical Professionals. Supporting “digital Research Technical Professionals” (dRTPs) and researchers working with research software, research data and research computing infrastructure, in the London region and beyond.
- DRIFT is focused on the training for research facilitators and teams
- CCP-AHC: Collaborative Computational Project (CCP) serving Arts, Humanities, and Culture researchers
Lightning talks
VAST?
One of the presenter from a vendor (spondor) discussed how they use Bε-trees in their storage solution and why it is superior, including “every write is a snapshot”.
It felt very much a sales pitch and it sounds too good to be true (that it doesn’t have compromise in other factors)
Cornelis networks
Presented a new Omni-Path product at 400Gbps with superior metrics comparing to competitor InfiniBand. More details will be presented at ISC2025.
Cambridge RCS
Not from the lightning talk, but the DAWN supercomputer at Cambridge RCS is a gift from Intel and Dell, alledgedly because they have built such a good collaborative relationship with the vendors.
Tutorial
Tutorial: AMD GPUs: Simplify your HPC Application Port to GPUs - OpenMP and Managed Memory on AMD MI300A and MI300X
Presenter: Bob Robey (AMD)
Fail to demonstrate how their compilers utilize the unified memory in their hardware, and in general a very poor tutorial.
References
Laukemann, Jan, Thomas Gruber, Georg Hager, Dossay Oryspayev, and Gerhard Wellein. 2024.
“CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion.” 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 27, 350–60.
https://doi.org/10.1109/IPDPS57955.2024.00038.
Tuft, Adam S., Tobias Weinzierl, and Michael Klemm. 2024.
“Detrimental Task Execution Patterns in Mainstream OpenMP® Runtimes.” In
Advancing OpenMP for Future Accelerators, edited by Alexis Espinosa, Michael Klemm, Bronis R. De Supinski, Maciej Cytowski, and Jannis Klinkenberg, vol. 15195. Lecture Notes in Computer Science. Springer Nature Switzerland.
https://doi.org/10.1007/978-3-031-72567-8_14.