Blending Machine Learning and Numerical Simulation, with Applications to Climate Modelling

Jack Atkinson

Senior Research Software Engineer
ICCS - University of Cambridge

The ICCS Team and Collaborators (see end)

2024-05-08

Precursors

Slides and Materials

To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1

Weather and Climate (and ML)

The first weather model

In ~1916 Lewis Fry Richardson1 attemted2 to compute a 1 day forecast by hand using partial differential equations3.

He went on to publish Weather Prediction by Numerical Process (Richardson 1922)

Images from the Met Office in the UK, Fair Use.

The first weather model

In Weather Prediction by Numerical Process (Richardson 1922) LFR envisaged a “Forecast Factory”.

He lived to see this realised, albeit in a different setting, on ENIAC1 by Charney, the von Neumans et al. in 1950.

ENIAC by US serviceperson, Public Domain.
The Forecast Factory by Stephen Conlin (1986), Fair Use.

Models today

Climate models are large, complex, many-part systems.

Parameteristion

Subgrid processes are largest source of uncertainty

Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use

Parameteristion

Subgrid processes are largest source of uncertainty

Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use

Machine Learning in Science

We typically think of Deep Learning as an end-to-end process;
a black box with an input and an output1.

Who’s that Pokémon?

\[\begin{bmatrix}\vdots\\a_{23}\\a_{24}\\a_{25}\\a_{26}\\a_{27}\\\vdots\\\end{bmatrix}=\begin{bmatrix}\vdots\\0\\0\\1\\0\\0\\\vdots\\\end{bmatrix}\] It’s Pikachu!

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Machine Learning in Science

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Challenges

  • Reproducibility
    • Ensure net functions the same in-situ
  • Re-usability
    • Make ML parameterisations available to many models
    • Facilitate easy re-training/adaptation
  • Language Interoperation

Language interoperation

Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.

Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Possible solutions

  • Implement a NN in Fortran
    • Additional work, reproducibility issues, hard for complex architectures
  • Forpy
    • Easy to add, harder to use with ML, GPL, barely-maintained
  • SmartSim
    • Versatile, learning curve, data copying
  • Fortran-Keras Bridge
    • Keras only, abandonware(?)

Efficiency

We consider 2 types:

Computational

Developer

FTorch

Approach

  • PyTorch (and TensorFlow) have C++ backends and provide APIs to access.
  • Binding Fortran to C is straightforward1 from 2003 using iso_c_binding.

We will:

  • Archive PyTorch model as Torchscript
    • Statically typed subset of Python
    • Produces intermediate representation/graph of NN
    • Read and run via any Torch API
  • Provide Fortran API to abstract complex details from users
    • Wrapping the libtorch C++ API

Approach

Python
env

Python
runtime

xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5

Highlights - Computation

  • Use framework’s implementations directly
    • feature and future support, and reproducible
  • No-copy access in memory (on CPU).
  • Make use of the Torch backends for GPU offload
  • Indexing issues and associated reshape1 avoided with Torch strided accessor.

Find it on :

github.com/
Cambridge-ICCS/FTorch

Highlights - Developer

  • Easy to clone and install
    • CMake, supported on linux/unix and Windows™
  • Easy to link
    • Build using CMake, or link like NetCDF (instructions included)
  • User tools
    • pt2ts.py aids users in saving PyTorch models to Torchscript
  • Examples suite
    • Take users through full process from trained net to Fortran inference
  • FOSS licensed under MIT
    • Contributions via GitHub welcome

Find it on :

github.com/
Cambridge-ICCS/FTorch

Some code

Creating Tensors

use, intrinsic :: iso_fortran_env, only : sp => real32

! Use the FTorch Library
use :: ftorch

implicit none

! Fortran variables
real(sp), dimension(1,3,244,244), target :: in_data
real(sp), dimension(1, 1000), target :: out_data
integer, parameter :: n_inputs = 1
integer :: in_layout(4) = [1,2,3,4]
integer :: out_layout(2) = [1,2]

! Torch Tensors
type(torch_tensor), dimension(1) :: in_tensors
type(torch_tensor) :: out_tensor

! Populate Fortran data
call random_number(in_data)

! Cast Fortran data to Tensors
! Create input/output tensors from the above arrays
in_tensors(1) = torch_tensor_from_array(in_data, in_layout, torch_kCPU)
out_tensor = torch_tensor_from_array(out_data, out_layout, torch_kCPU)

Loading and running a model

Loading

! Define a Torch module
type(torch_module) :: model

! Load in from Torchscript
model = torch_module_load('/path/to/saved/model.pt')


Running

! Infer
call torch_module_forward(model, in_tensors, n_inputs, out_tensor)

Cleaning up

! Cleanup
call torch_module_delete(model)
call torch_tensor_delete(in_tensors(1))
call torch_tensor_delete(out_tensor)

! Use Fortran array `out_data` elsewhere in code

GPU Acceleration1

Cast Tensors to GPU in Fortran:

! Load in from Torchscript
model = torch_module_load('/path/to/saved/gpu/model.pt', torch_kCUDA, device_index=0)

! Cast Fortran data to Tensors
in_tensor(1) = torch_tensor_from_array(in_data, in_layout, torch_kCUDA, device_index=0)
out_tensor = torch_tensor_from_array(out_data, out_layout, torch_kCPU)

Case Studies1

MiMA

CESM

Derecho by NCAR
Dawn by Joe Bishop, with permission

ML Component Design/Packaging

The challenges

Packaging:

When ML components are developed they typically have a:

  • normalisation routine

  • neural network architecture

  • de-normalisation routine

  • Enforcing of physical constraints

    • e.g. conservation laws

Several constituents to be transferred to host model.

Portability:

A neural net trained on a different model/dataset requires:

  • Input data of the same format as the training model:
    • grid resolution
    • physical variables
    • data type

to function correctly.

Case study1

Neural Network model trained in SAM2

  • Simple Atmospheric Model
  • Large Eddy Simulation (LES) Cloud-Resolving Model (CRM)

To be re-deployed in CAM

  • Community Atmosphere Model of CESM
  • Global model
  • \(\Delta x \approx 100 \, m\)
  • \(\Delta x \approx 50-100 \, km\)
  • \(\Delta z \approx 50-100 \, m\)
  • \(\Delta z \approx 10-20 \, hPa\)
  • Temperature, humidity
  • Energy, vapour, rain, ice

Outputs are of the form \(\partial / \partial t\) so need to apply, convert, and difference.

Software architecture for ML parameterisations

The parameterisation:

  • Pure neural net core
    • TorchScript net coupled using FTorch
    • Easily swapped out as new nets trained or architecture changed
  • Physics layer
    • Handles physical constraints and non-NN aspects of parameterisation
      e.g. Conservation laws.
  • Provided with a clear API of expected:
    • variables, units, and grid/resolution
    • appropriate parameter ranges

Software architecture for ML parameterisations

The coupling:

  • Host model
  • Interface layer
    • Passes data from/to host model
    • Handles physical variable transformations and regridding
    • Passes data to/from parameterisation

Some other HPC Thoughts

  • CPU inference best done per-thread
  • GPU inference will likely require an MPI_Gather()
  • Make libtorch available on the software stack
    • make FTorch available on the software stack?

Future Work

FTorch

  • MPS and XPU support
  • Online training
  • C documentation?

Parameterisation design

  • Findability and accessibility
    • Distribution via HuggingFace
  • Reuseability
    • Guidance on constructing a pipeline
  • Net architectures for ML components
    • Make normalisation part of the net?

Thanks

ICCS Research Software Engineers:

  • Chris Edsall - Director
  • Marion Weinzierl - Senior
  • Jack Atkinson - Senior
  • Matt Archer - Senior
  • Tom Meltzer - Senior
  • Surbhi Ghoel
  • Tianzhang Cai
  • Joe Wallwork
  • Amy Pike
  • James Emberton
  • Dominic Orchard - Director/Computer Science

Previous Members:

  • Paul Richmond - Sheffield
  • Jim Denholm - AstraZeneca

FTorch:

  • Jack Atkinson
  • Simon Clifford - Cambridge RSE
  • Athena Elafrou - Cambridge RSE, now NVIDIA
  • Elliott Kasoar - STFC
  • Joseph Wallwork
  • Tom Meltzer

MiMA

  • Minah Yang - NYU, DataWave
  • Dave Conelly - NYU, DataWave

CESM

  • Will Chapman - NCAR/M2LInES
  • Jim Edwards - NCAR
  • Paul O’Gorman - MIT, M2LInES
  • Judith Berner - NCAR, M2LInES
  • Qiang Sun - U Chicago, DataWave
  • Pedram Hassanzadeh - U Chicago, DataWave
  • Joan Alexander - NWRA, DataWave

Thanks for Listening

References

Richardson, Lewis F. 1922. Weather Prediction by Numerical Process. University Press.
Yuval, Janni, and Paul A O’Gorman. 2020. “Stable Machine-Learning Parameterization of Subgrid Processes for Climate Modeling at a Range of Resolutions.” Nature Communications 11 (1): 3295.