2026-01-08
To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides
Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1



Better software better reseach by the SSI
NCAR Mesa Lab painting by Julie Leidel
Large, complex, many-part systems.
Subgrid processes are largest source of uncertainty


Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use
Subgrid processes are largest source of uncertainty


Microphysics by Sisi Chen Public Domain
Staggered grid by NOAA under Public Domain
Globe grid with box by Caltech under Fair use

Neural Net by 3Blue1Brown under fair dealing.
Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.


![]()


![]()


![]()
Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
We consider 2 types:
Computational
Developer
In research both have an effect on ‘time-to-science’.
Especially when extensive research software support is unavailable.
iso_c_binding.We will:
libtorch C++libtorch C++ APIFCFLAGS/LDFLAGS use ftorch
implicit none
real, dimension(5), target :: in_data, out_data ! Fortran data structures
type(torch_tensor), dimension(1) :: input_tensors, output_tensors ! Set up Torch data structures
type(torch_model) :: torch_net
integer, dimension(1) :: tensor_layout = [1]
in_data = ... ! Prepare data in Fortran
! Create Torch input/output tensors from the Fortran arrays
call torch_tensor_from_array(input_tensors(1), in_data, torch_kCPU)
call torch_tensor_from_array(output_tensors(1), out_data, torch_kCPU)
call torch_model_load(torch_net, 'path/to/saved/model.pt', torch_kCPU) ! Load ML model
call torch_model_forward(torch_net, input_tensors, output_tensors) ! Infer
call further_code(out_data) ! Use output data in Fortran immediately
! Cleanup
call torch_delete(model)
call torch_delete(in_tensors)
call torch_delete(out_tensor)Cast Tensors to GPU in Fortran:
! Load in from Torchscript
call torch_model_load(torch_net, 'path/to/saved/model.pt', torch_kCUDA, device_index=0)
! Cast Fortran data to Tensors
call torch_tensor_from_array(in_tensor(1), in_data, torch_kCUDA, device_index=0)
call torch_tensor_from_array(out_tensor(1), out_data, torch_kCPU)
FTorch supports NVIDIA CUDA, ARM HIP, Intel XPU, and AppleSilicon MPS hardwares.
Use of multiple devices supported.
Effective HPC simulation requires MPI_Gather() for efficient data transfer.
Ongoing work led by Joe Wallwork
What and Why?
Progress:
backwards - ✓+,-,*,/,**SGD, Adam, AdamWsum and mean\[\begin{bmatrix}f_1\\f_2\\f_3\\f_4\end{bmatrix}=\mathbf{f}(\mathbf{x};\mathbf{a})=\mathbf{a}\bullet\mathbf{x}\equiv\begin{bmatrix}a_1x_1\\a_2x_2\\a_3x_3\\a_4x_4\end{bmatrix}\]
Starting from \(\mathbf{a}=\mathbf{x}:=\begin{bmatrix}1,1,1,1\end{bmatrix}^T\), optimise \(\mathbf{a}\) such that \(\mathbf{f}(\mathbf{x};\mathbf{a})=\begin{bmatrix}1,2,3,4\end{bmatrix}^T\).

In both cases we achieve \(\mathbf{f}(\mathbf{x};\mathbf{a})=\begin{bmatrix}1,2,3,4\end{bmatrix}^T\).
FTorch is published in JOSS!

Atkinson et al. (2025)
FTorch: a library for coupling PyTorch models to Fortran.
Journal of Open Source Software, 10(107), 7602,
doi.org/10.21105/joss.07602
Online documentation is available at: cambridge-iccs.github.io/FTorch
In addition to the comprehensive examples in the FTorch repository we provide an online workshop at /Cambridge-ICCS/FTorch-workshop
forpy in Espinosa et al. (2022)1
![]()
Work led by Julien Savre at DLR

Work by Will Chapman of NCAR/M2LInES
As representations of physics models have inherent, sometimes systematic, biases.
Run CESM for 9 years relaxing hourly to ERA5 observation (data assimilation)
Train CNN to predict anomaly increment at each level
Apply online as part of predictive runs




UKCA Flame Graph by Luke Abraham used with permission.
See FTorch/community/case_studies for a full list.
GloSea6 Seasonal Forecasting modelCESM through learning model biases compared to ERA5WaveWatch III modelE3SMICON for stable 20-year AMIP runCAM model.CAM model.Regridding will introduce overheads
Global methods are going to introduce MPI overheads
GPU data transfer is going to introduce overheads
Preprocessing and training a net offline is not the only goal
Think about online inference from the start
Loading weights is expensive - do it as little as possible cf. Looping Example

Packaging:
When ML components are developed they typically have a:
normalisation routine
neural network architecture
de-normalisation routine
Enforcing of physical constraints
Several constituents to be transferred to host model.
Portability:
A neural net trained on a different model/dataset requires:
to function correctly.
The parameterisation:
The coupling:
Operate a principle of separation between physical model and net.
Concatenation and Normalisation
The alternative is re-writing code to perform this in the physical model
Get in touch:
Thanks to Joe Wallwork, Tom Meltzer, Elliott Kasoar,
Niccolò Zanotti and the rest of the FTorch team.
The ICCS received support from 
FTorch has been supported by 

