Facilitating Hybrid Modelling in the Geosciences

Jack Atkinson

Principal Research Software Engineer
ICCS - University of Cambridge

Joe Wallwork

Senior Research Software Engineer
ICCS - University of Cambridge

2026-01-08

Precursors

Slides and Materials

To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1

Introduction

Better software better reseach by the SSI
NCAR Mesa Lab painting by Julie Leidel

Motivation

Weather and Climate Models

Large, complex, many-part systems.

Parameterisation

Subgrid processes are largest source of uncertainty

Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use

Parameterisation

Subgrid processes are largest source of uncertainty

Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use

Hybrid Modelling

Neural Net by 3Blue1Brown under fair dealing.

FAIR Challenges

  • Reproducibility
    • Ensure net functions the same in-situ
  • Re-usability
    • Make ML parameterisations available to many models
    • Facilitate easy re-training/adaptation
  • Language Interoperation

Language interoperation

Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.

Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Efficiency

We consider 2 types:

Computational

Developer

In research both have an effect on ‘time-to-science’.
Especially when extensive research software support is unavailable.

FTorch

Approach

  • PyTorch has a C++ backend and provides an API.
  • Binding Fortran to C is straightforward1 from 2003 using iso_c_binding.

We will:

  • Save the PyTorch models in a portable Torchscript format
    • to be run by libtorch C++
  • Provide a Fortran API
    • wrapping the libtorch C++ API
    • abstracting complex details from users

Highlights - Developer

  • Easy to clone and install
    • CMake, supported on linux/unix and Windows™
  • Easy to link
    • CMake or Make FCFLAGS/LDFLAGS
  • Extensive examples suite and user tools
  • FOSS
    • licensed under MIT
    • contributions from users via GitHub welcome

Find it on:
/Cambridge-ICCS/FTorch


User and API Docs

Highlights - Computation

  • Use framework’s implementations directly
    • feature and future support, and reproducible
  • Make use of the Torch backends for GPU offload
    • CUDA, HIP, MPS, and XPU enabled
  • Indexing issues and associated reshape1 avoided with Torch strided accessor.
  • No-copy access in memory (on CPU).

Find it on:
/Cambridge-ICCS/FTorch


User and API Docs

Highlights - Computation

  • Indexing issues and associated reshape1 avoided with Torch strided accessor.
  • No-copy access in memory (on CPU).

Find it on:
/Cambridge-ICCS/FTorch


User and API Docs

Some code

Fortran

 use ftorch
 
 implicit none
 
 real, dimension(5), target :: in_data, out_data  ! Fortran data structures
 
 type(torch_tensor), dimension(1) :: input_tensors, output_tensors  ! Set up Torch data structures
 type(torch_model) :: torch_net
 integer, dimension(1) :: tensor_layout = [1]
 
 in_data = ...  ! Prepare data in Fortran
 
 ! Create Torch input/output tensors from the Fortran arrays
 call torch_tensor_from_array(input_tensors(1), in_data, torch_kCPU)
 call torch_tensor_from_array(output_tensors(1), out_data, torch_kCPU)
 
 call torch_model_load(torch_net, 'path/to/saved/model.pt', torch_kCPU)  ! Load ML model
 call torch_model_forward(torch_net, input_tensors, output_tensors)      ! Infer
 
 call further_code(out_data)  ! Use output data in Fortran immediately
 
 ! Cleanup
 call torch_delete(model)
 call torch_delete(in_tensors)
 call torch_delete(out_tensor)

GPU Acceleration

Cast Tensors to GPU in Fortran:

! Load in from Torchscript
call torch_model_load(torch_net, 'path/to/saved/model.pt', torch_kCUDA, device_index=0)

! Cast Fortran data to Tensors
call torch_tensor_from_array(in_tensor(1), in_data, torch_kCUDA, device_index=0)
call torch_tensor_from_array(out_tensor(1), out_data, torch_kCPU)



FTorch supports NVIDIA CUDA, ARM HIP, Intel XPU, and AppleSilicon MPS hardwares.

Use of multiple devices supported.


Effective HPC simulation requires MPI_Gather() for efficient data transfer.

Online Training & Autograd

Ongoing work led by Joe Wallwork

  • 6 month grant from Jan 2026.

What and Why?

  • Offline vs. online performance (e.g. Mansfield and Sheshadri 2024)
  • Differentiable models (e.g. Kochkov et al. 2024)
  • Avoids saving large volumes of training data.
  • Possibility to expand loss function scope to include downstream model code.

Online Training & Autograd

Progress:

  • Autograd and backwards - ✓
  • Overload operators - ✓
    • +,-,*,/,**
  • Optimisers - ✓
    • SGD, Adam, AdamW
  • Loss functions -
    • sum and mean
  • Model interface -

\[\begin{bmatrix}f_1\\f_2\\f_3\\f_4\end{bmatrix}=\mathbf{f}(\mathbf{x};\mathbf{a})=\mathbf{a}\bullet\mathbf{x}\equiv\begin{bmatrix}a_1x_1\\a_2x_2\\a_3x_3\\a_4x_4\end{bmatrix}\]

Starting from \(\mathbf{a}=\mathbf{x}:=\begin{bmatrix}1,1,1,1\end{bmatrix}^T\), optimise \(\mathbf{a}\) such that \(\mathbf{f}(\mathbf{x};\mathbf{a})=\begin{bmatrix}1,2,3,4\end{bmatrix}^T\).

In both cases we achieve \(\mathbf{f}(\mathbf{x};\mathbf{a})=\begin{bmatrix}1,2,3,4\end{bmatrix}^T\).

Further Information

FTorch is published in JOSS!

Atkinson et al. (2025)

FTorch: a library for coupling PyTorch models to Fortran.
Journal of Open Source Software, 10(107), 7602,
doi.org/10.21105/joss.07602

Online documentation is available at: cambridge-iccs.github.io/FTorch


In addition to the comprehensive examples in the FTorch repository we provide an online workshop at /Cambridge-ICCS/FTorch-workshop

Applications and Case Studies

MiMA - Uncertainty Quant.

  • The origins of FTorch
    • Emulation of existing parameterisation
    • Coupled to an atmospheric model using forpy in Espinosa et al. (2022)1
    • Prohibitively slow and hard to implement
    • Asked for a faster, user-friendly implementation that can be used in future studies.
  • Follow up paper using FTorch: Uncertainty Quantification of a Machine Learning Subgrid-Scale Parameterization for Atmospheric Gravity Waves (Mansfield and Sheshadri 2024)
    • “Identical” offline networks have very different behaviours when deployed online.

ICON - Causality

  • Icosahedral Nonhydrostatic Weather and Climate Model
    • Developed by DKRZ (Deutsches Klimarechenzentrum)
    • Used by the DWD and Meteo-Swiss
  • Interpretable multiscale Machine Learning-Based Parameterizations of Convection for ICON (Heuer et al. 2024)1
    • Train U-Net convection scheme on high-res simulation
    • Deploy in ICON via FTorch coupling
    • Evaluate physical realism (causality) using SHAP values
    • Online stability improved when non-causal relations are eliminated from the net

ICON - Multiple Parameterisations

Work led by Julien Savre at DLR

  • Closely afifliated fork of ICON
  • Multiple ML components in one model
  • Common FTorch interface
  • A focus on running on GPU
  • Implemented without ICCS involvement

Slide from the Cambridge Hybrid Modelling Workshop courtesy of Julien Savre

CESM - Bias Correction

Work by Will Chapman of NCAR/M2LInES

  • As representations of physics models have inherent, sometimes systematic, biases.

  • Run CESM for 9 years relaxing hourly to ERA5 observation (data assimilation)

  • Train CNN to predict anomaly increment at each level

    • targeting just the MJO region
    • targeting globally
  • Apply online as part of predictive runs

CESM - Non-locality

  • Parameterisations often column-based
    • computationally efficient
    • localised
  • Data driven parameterisation of velocity fluxes from gravity waves.
    • Trained on ERA5
  • Column and Global UNet models trained.
  • Implemented online in CAM7
  • Some challenges from non-locality remain

UKCA - Online

  • Chemistry model used in UKESM and Met Office UM
    • 85-200 trackers with 300-700 interactions
    • 25% of the UM runtime
      • Solver is 40% of UKCA runtime (10% UM)

UKCA Flame Graph by Luke Abraham used with permission.

UKCA - Online

  • Time integration runs on column or slice chunks
    • Start with \(\Delta t=3600\).
    • If any grid-box fails, half the step and try again globally.
  • Propose to train an ML model to predict step-size given inputs
    • Requires large quantities of otherwise useless training data
  • A nice ‘safe’ application of machine learning in modelling

Others

See FTorch/community/case_studies for a full list.

  • ClimSim Convection scheme in ICON for stable 20-year AMIP run
    (Heuer et al. 2025) (preprint)
  • Review paper of hybrid modelling approaches
    (Zheng et al. 2025) (preprint)
  • Implementation of a new convection trigger in the CAM model.
    Miller et al. In Preparation.
  • Embedding of ML schemes for gravity waves in the CAM model.
    ICCS & DataWave.

Some Hybrid Modelling Lessons

Data Layout/Location

  • Data Layout
    • Columns vs. slices
  • Choose an ML method for inference data
    • e.g. CNNs want structured grids
  • Regridding will introduce overheads

  • Global methods are going to introduce MPI overheads

  • GPU data transfer is going to introduce overheads

Know your host model

  • Preprocessing and training a net offline is not the only goal

  • Think about online inference from the start

    • Is it in memory
    • Is it chunked
      • Load balancing!
    • Is the training grid the same as model
  • Loading weights is expensive - do it as little as possible cf. Looping Example

Packaging and Portability

Packaging:

When ML components are developed they typically have a:

  • normalisation routine

  • neural network architecture

  • de-normalisation routine

  • Enforcing of physical constraints

    • e.g. conservation laws

Several constituents to be transferred to host model.

Portability:

A neural net trained on a different model/dataset requires:

  • Input data of the same format as the training model:
    • grid resolution
    • physical variables
    • data type

to function correctly.

Software architecture for ML parameterisations

The parameterisation:

  • Pure neural net core
    • Easily swapped out as new nets trained or architecture changed
  • Physics layer
    • Handles physical constraints and non-NN aspects of parameterisation
      e.g. Conservation laws.
  • Provided with a clear API of expected:
    • variables, units, and grid/resolution
    • appropriate parameter ranges

Software architecture for ML parameterisations

The coupling:

  • Interface layer
    • Passes data from/to host model/parameterisation
    • Handles physical variable transformations and regridding
    • Handles generalisation

Net “Architecture”

Operate a principle of separation between physical model and net.

Concatenation and Normalisation

  • part of the NN, not part of the physics

The alternative is re-writing code to perform this in the physical model

  • taking time,
  • reducing reproducibility
  • adding complexity.

FTorch: Summary

  • Use of ML within traditional numerical models
    • A growing area that presents challenges
  • Language interoperation
    • FTorch provides a solution for scientists implementing torch models in Fortran
    • Designed for computational and developer efficiency
    • Has helped deliver science in climate research and beyond
      See FTorch/community/case_studies
  • Exploring options for online training and AD
  • Collaborative projects highlight various considerations when implementing hybrid models.

Thanks for Listening

Get in touch:

/Cambridge-ICCS/FTorch


Thanks to Joe Wallwork, Tom Meltzer, Elliott Kasoar,
Niccolò Zanotti and the rest of the FTorch team.

The ICCS received support from

FTorch has been supported by

References

Atkinson, Jack, Athena Elafrou, Elliott Kasoar, Joseph G. Wallwork, Thomas Meltzer, Simon Clifford, Dominic Orchard, and Chris Edsall. 2025. “FTorch: A Library for Coupling PyTorch Models to Fortran.” Journal of Open Source Software 10 (107): 7602. https://doi.org/10.21105/joss.07602.
Barker, Michelle, Neil P Chue Hong, Daniel S Katz, Anna-Lena Lamprecht, Carlos Martinez-Ortiz, Fotis Psomopoulos, Jennifer Harrow, et al. 2022. “Introducing the FAIR Principles for Research Software.” Scientific Data 9 (1): 622. https://doi.org/10.1038/s41597-022-01710-x.
Chapman, William E, and Judith Berner. 2025. “Improving Climate Bias and Variability via CNN-Based State-Dependent Model-Error Corrections.” Geophysical Research Letters 52 (6): e2024GL114106. https://doi.org/10.1029/2024GL114106.
Espinosa, Zachary I, Aditi Sheshadri, Gerald R Cain, Edwin P Gerber, and Kevin J DallaSanta. 2022. “Machine Learning Gravity Wave Parameterization Generalizes to Capture the QBO and Response to Increased CO2.” Geophysical Research Letters 49 (8): e2022GL098174.
Heuer, Helge, Tom Beucler, Mierk Schwabe, Julien Savre, Manuel Schlund, and Veronika Eyring. 2025. “Beyond the Training Data: Confidence-Guided Mixing of Parameterizations in a Hybrid AI-Climate Model.” arXiv Preprint arXiv:2510.08107. https://doi.org/10.48550/arXiv.2510.08107.
Heuer, Helge, Mierk Schwabe, Pierre Gentine, Marco A Giorgetta, and Veronika Eyring. 2024. “Interpretable Multiscale Machine Learning-Based Parameterizations of Convection for ICON.” Journal of Advances in Modeling Earth Systems 16 (8): e2024MS004398. https://doi.org/10.1029/2024MS004398.
Hu, Zeyuan, Akshay Subramaniam, Zhiming Kuang, Jerry Lin, Sungduk Yu, Walter M Hannah, Noah D Brenowitz, Josh Romero, and Michael S Pritchard. 2025. “Stable Machine-Learning Parameterization of Subgrid Processes in a Comprehensive Atmospheric Model Learned from Embedded Convection-Permitting Simulations.” Journal of Advances in Modeling Earth Systems 17 (7): e2024MS004618.
Ikuyajolu, Olawale James, Luke P Van Roekel, Steven R Brus, and Erin E Thomas. 2025. “NLML: A Deep Neural Network Emulator for the Exact Nonlinear Interactions in a Wind Wave Model.” Authorea Preprints. https://doi.org/10.22541/essoar.174366388.80605654/v1.
Kochkov, Dmitrii, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, Milan Klöwer, et al. 2024. “Neural General Circulation Models for Weather and Climate.” Nature 632 (8027): 1060–66. https://doi.org/10.1038/s41586-024-07744-y.
Mansfield, Laura A, and Aditi Sheshadri. 2024. “Uncertainty Quantification of a Machine Learning Subgrid-Scale Parameterization for Atmospheric Gravity Waves.” Journal of Advances in Modeling Earth Systems 16 (7): e2024MS004292. https://doi.org/10.1029/2024MS004292.
Park, Hyesung, and Sungwook Chung. 2025. “Utilization of a Lightweight 3D u-Net Model for Reducing Execution Time of Numerical Weather Prediction Models.” Atmosphere 16 (1): 60.
Wilkinson, Mark D, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 1–9. https://doi.org/10.1038/sdata.2016.18.
Zheng, Tian, Subashree Venkatasubramanian, Shuolin Li, Amy Braverman, Xinyi Ke, Zhewen Hou, Peter Jin, and Samarth Sanjay Agrawal. 2025. “Machine Learning Workflows in Climate Modeling: Design Patterns and Insights from Case Studies.” arXiv Preprint arXiv:2510.03305. https://doi.org/10.48550/arXiv.2510.03305.