Blending Machine Learning and Numerical Simulation, with Applications to Climate Modelling

Jack Atkinson

Senior Research Software Engineer
ICCS - University of Cambridge

The ICCS Team and Collaborators (see end)

2024-05-08

Precursors

Slides and Materials

To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1

Weather and Climate (and ML)

The first weather model

In ~1916 Lewis Fry Richardson¹ attemted² to compute a 1 day forecast by hand using partial differential equations³.

He went on to publish Weather Prediction by Numerical Process (Richardson 1922)

Images from the Met Office in the UK, Fair Use.

The first weather model

In Weather Prediction by Numerical Process (Richardson 1922) LFR envisaged a “Forecast Factory”.

He lived to see this realised, albeit in a different setting, on ENIAC¹ by Charney, the von Neumans et al. in 1950.

ENIAC by US serviceperson, Public Domain.
The Forecast Factory by Stephen Conlin (1986), Fair Use.

Models today

Climate models are large, complex, many-part systems.

Parameteristion

Subgrid processes are largest source of uncertainty

Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use

Parameteristion

Subgrid processes are largest source of uncertainty

Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use

Machine Learning in Science

We typically think of Deep Learning as an end-to-end process;
a black box with an input and an output¹.

Who’s that Pokémon?

\[\begin{bmatrix}\vdots\\a_{23}\\a_{24}\\a_{25}\\a_{26}\\a_{27}\\\vdots\\\end{bmatrix}=\begin{bmatrix}\vdots\\0\\0\\1\\0\\0\\\vdots\\\end{bmatrix}\] It’s Pikachu!

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Machine Learning in Science

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Challenges

Reproducibility
- Ensure net functions the same in-situ
Re-usability
- Make ML parameterisations available to many models
- Facilitate easy re-training/adaptation
Language Interoperation
- …

Language interoperation

Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.

Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Possible solutions

Implement a NN in Fortran
- Additional work, reproducibility issues, hard for complex architectures

Forpy
- Easy to add, harder to use with ML, GPL, barely-maintained

SmartSim
- Versatile, learning curve, data copying

Fortran-Keras Bridge
- Keras only, abandonware(?)

Efficiency

We consider 2 types:

Computational

Developer

FTorch

Approach

PyTorch (and TensorFlow) have C++ backends and provide APIs to access.
Binding Fortran to C is straightforward¹ from 2003 using iso_c_binding.

We will:

Archive PyTorch model as Torchscript
- Statically typed subset of Python
- Produces intermediate representation/graph of NN
- Read and run via any Torch API
Provide Fortran API to abstract complex details from users
- Wrapping the libtorch C++ API

Approach

Python
env

Python
runtime

xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5

Highlights - Computation

Use framework’s implementations directly
- feature and future support, and reproducible
No-copy access in memory (on CPU).
Make use of the Torch backends for GPU offload
Indexing issues and associated reshape¹ avoided with Torch strided accessor.

Find it on :

github.com/
Cambridge-ICCS/FTorch

Highlights - Developer

Easy to clone and install
- CMake, supported on linux/unix and Windows™
Easy to link
- Build using CMake, or link like NetCDF (instructions included)
User tools
- pt2ts.py aids users in saving PyTorch models to Torchscript
Examples suite
- Take users through full process from trained net to Fortran inference
FOSS licensed under MIT
- Contributions via GitHub welcome

Find it on :

github.com/
Cambridge-ICCS/FTorch

Some code

Creating Tensors

use, intrinsic :: iso_fortran_env, only : sp => real32

! Use the FTorch Library
use :: ftorch

implicit none

! Fortran variables
real(sp), dimension(1,3,244,244), target :: in_data
real(sp), dimension(1, 1000), target :: out_data
integer, parameter :: n_inputs = 1
integer :: in_layout(4) = [1,2,3,4]
integer :: out_layout(2) = [1,2]

! Torch Tensors
type(torch_tensor), dimension(1) :: in_tensors
type(torch_tensor) :: out_tensor

! Populate Fortran data
call random_number(in_data)

! Cast Fortran data to Tensors
! Create input/output tensors from the above arrays
in_tensors(1) = torch_tensor_from_array(in_data, in_layout, torch_kCPU)
out_tensor = torch_tensor_from_array(out_data, out_layout, torch_kCPU)

Loading and running a model

Loading

! Define a Torch module
type(torch_module) :: model

! Load in from Torchscript
model = torch_module_load('/path/to/saved/model.pt')

Running

! Infer
call torch_module_forward(model, in_tensors, n_inputs, out_tensor)

Cleaning up

! Cleanup
call torch_module_delete(model)
call torch_tensor_delete(in_tensors(1))
call torch_tensor_delete(out_tensor)

! Use Fortran array `out_data` elsewhere in code

GPU Acceleration¹

Cast Tensors to GPU in Fortran:

! Load in from Torchscript
model = torch_module_load('/path/to/saved/gpu/model.pt', torch_kCUDA, device_index=0)

! Cast Fortran data to Tensors
in_tensor(1) = torch_tensor_from_array(in_data, in_layout, torch_kCUDA, device_index=0)
out_tensor = torch_tensor_from_array(out_data, out_layout, torch_kCPU)

Case Studies¹

MiMA

Model of an idealised Moist Atmosphere
Proof of concept study - emulation of existing scheme
github.com/DataWaveProject/MiMA-machine-learning

CESM

The Community Earth System Model
Part of CMIP (Coupled Model Intercomparison Project)
FTorch integrated into the build system (CIME)
- github.com/Cambridge-ICCS/cime_je

Derecho by NCAR
Dawn by Joe Bishop, with permission

ML Component Design/Packaging

The challenges

Packaging:

When ML components are developed they typically have a:

normalisation routine
neural network architecture
de-normalisation routine
Enforcing of physical constraints
- e.g. conservation laws

Several constituents to be transferred to host model.

Portability:

A neural net trained on a different model/dataset requires:

Input data of the same format as the training model:
- grid resolution
- physical variables
- data type

to function correctly.

Case study¹

Neural Network model trained in SAM²

Simple Atmospheric Model
Large Eddy Simulation (LES) Cloud-Resolving Model (CRM)

To be re-deployed in CAM

Community Atmosphere Model of CESM
Global model

\(\Delta x \approx 100 \, m\)

\(\Delta x \approx 50-100 \, km\)

\(\Delta z \approx 50-100 \, m\)

\(\Delta z \approx 10-20 \, hPa\)

Temperature, humidity

Energy, vapour, rain, ice

Outputs are of the form \(\partial / \partial t\) so need to apply, convert, and difference.

Software architecture for ML parameterisations

The parameterisation:

Pure neural net core
- TorchScript net coupled using FTorch
- Easily swapped out as new nets trained or architecture changed
Physics layer
- Handles physical constraints and non-NN aspects of parameterisation
  e.g. Conservation laws.
Provided with a clear API of expected:
- variables, units, and grid/resolution
- appropriate parameter ranges

Software architecture for ML parameterisations

The coupling:

Host model
Interface layer
- Passes data from/to host model
- Handles physical variable transformations and regridding
- Passes data to/from parameterisation

Some other HPC Thoughts

CPU inference best done per-thread
GPU inference will likely require an MPI_Gather()
Make libtorch available on the software stack
- make FTorch available on the software stack?

Future Work

FTorch

MPS and XPU support
Online training
C documentation?

Parameterisation design

Findability and accessibility
- Distribution via HuggingFace
Reuseability
- Guidance on constructing a pipeline
Net architectures for ML components
- Make normalisation part of the net?

Thanks

ICCS Research Software Engineers:

Chris Edsall - Director
Marion Weinzierl - Senior
Jack Atkinson - Senior
Matt Archer - Senior
Tom Meltzer - Senior
Surbhi Ghoel
Tianzhang Cai
Joe Wallwork
Amy Pike
James Emberton
Dominic Orchard - Director/Computer Science

Previous Members:

Paul Richmond - Sheffield
Jim Denholm - AstraZeneca

FTorch:

Jack Atkinson
Simon Clifford - Cambridge RSE
Athena Elafrou - Cambridge RSE, now NVIDIA
Elliott Kasoar - STFC
Joseph Wallwork
Tom Meltzer

MiMA

Minah Yang - NYU, DataWave
Dave Conelly - NYU, DataWave

CESM

Will Chapman - NCAR/M2LInES
Jim Edwards - NCAR
Paul O’Gorman - MIT, M2LInES
Judith Berner - NCAR, M2LInES
Qiang Sun - U Chicago, DataWave
Pedram Hassanzadeh - U Chicago, DataWave
Joan Alexander - NWRA, DataWave

Thanks for Listening

Get in touch:

Jack Atkinson

jackatkinson.net

jwa34[AT]cam.ac.uk

jatkinson1000

@jatkinson1000@fosstodon.org

References

Richardson, Lewis F. 1922. Weather Prediction by Numerical Process. University Press.

Yuval, Janni, and Paul A O’Gorman. 2020. “Stable Machine-Learning Parameterization of Subgrid Processes for Climate Modeling at a Range of Resolutions.” Nature Communications 11 (1): 3295.

Blending Machine Learning and Numerical Simulation, with Applications to Climate Modelling

Precursors

Slides and Materials

Licensing

Weather and Climate (and ML)

The first weather model

The first weather model

Models today

Parameteristion

Parameteristion

Machine Learning in Science

Machine Learning in Science

Challenges

Language interoperation

Possible solutions

Efficiency

FTorch

Approach

Approach

Highlights - Computation

Highlights - Developer

Some code

Creating Tensors

Loading and running a model

Loading

Running

Cleaning up

GPU Acceleration1

Case Studies1

MiMA

CESM

ML Component Design/Packaging

The challenges

Case study1

Software architecture for ML parameterisations

Software architecture for ML parameterisations

Some other HPC Thoughts

Future Work

FTorch

Parameterisation design

Thanks

Thanks for Listening

References

GPU Acceleration¹

Case Studies¹

Case study¹