What’s New

Jack Atkinson

Principal Research Software Engineer
ICCS - University of Cambridge

Joe Wallwork

Senior Research Software Engineer
ICCS - University of Cambridge

2026-06-10

Precursors

Slides and Materials

To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1

Motivation

Climate Models

Large, complex, many-part systems.

Hybrid Modelling

Replacing select component(s) with ML scheme(s).

Neural Net by 3Blue1Brown under fair dealing.

FAIR Challenges

Reproducibility
- Ensure net functions the same in-situ
Re-usability
- Make ML parameterisations available to many models
- Facilitate easy re-training/adaptation
Language Interoperation

Language interoperation

Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.

Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Efficiency

We consider 2 types:

Computational

Developer

In research both have an effect on ‘time-to-science’.
Especially when extensive research software support is unavailable.

Publication

FTorch is published in JOSS!

Atkinson et al. (2025)
FTorch: a library for coupling PyTorch models to Fortran.
Journal of Open Source Software, 10(107), 7602,
DOI: 10.21105/joss.07602

Please cite if you use FTorch!

We also have a Mailing List: FTorch-Announce on JISC for future updates and usage examples!

What’s New - v1.1

No need to specify `layout`

Previously when calling torch_tensor_from_array one had to specify the memory layout. This was used to correctly stride in memory to avoid copying.

We now assume users want the [1, 2, ..., n] layout by default, with layout not needing to be specified.

Older code continues to work in v1.1, but in future layout will become an optional argument requiring a change in order of call arguments.

Advice: Use the default argument where possible.

The ICCS fork of the CESM FTorch interface has been updated to use the default.

Pull request #348

Finalisers for tensors and models

FTorch creates Torch C++ objects for manipulation under the hood. As detailed in the documentation and examples, proper handling is required, like allocate and deallocate to prevent memory leakage.

Previously one had to explicitly delete torch objects to ensure that C++ memory was cleaned up, otherwise leakage could occur.

Now torch_delete is a finalizer, meaning it will be called whenever a tensor, model, or array of tensors goes out of scope.

Continuing to call torch_delete will still work, so old code remains valid.

Pull Request #297

Finalisers - Code Comparison

Before: explicit cleanup required

use ftorch

implicit none

real, dimension(5), target :: in_data, out_data

type(torch_tensor), dimension(1) :: input_tensors, output_tensors
type(torch_model) :: torch_net

...

! Create Torch input/output tensors from the Fortran arrays
call torch_tensor_from_array(input_tensors(1), in_data, torch_kCPU)
call torch_tensor_from_array(output_tensors(1), out_data, torch_kCPU)

call torch_model_load(torch_net, 'path/to/saved/model.pt', torch_kCPU)
call torch_model_forward(torch_net, input_tensors, output_tensors)

...

! Cleanup
call torch_delete(torch_net)
call torch_delete(input_tensors)
call torch_delete(output_tensors)

After: finalizer handles cleanup

use ftorch

implicit none

real, dimension(5), target :: in_data, out_data

type(torch_tensor), dimension(1) :: input_tensors, output_tensors
type(torch_model) :: torch_net

...

! Create Torch input/output tensors from the Fortran arrays
call torch_tensor_from_array(input_tensors(1), in_data, torch_kCPU)
call torch_tensor_from_array(output_tensors(1), out_data, torch_kCPU)

call torch_model_load(torch_net, 'path/to/saved/model.pt', torch_kCPU)
call torch_model_forward(torch_net, input_tensors, output_tensors)

...

Finalizers - Aside

torch_tensor_delete has been made elemental, meaning that it applies to both tensors and arrays of tensors.

As such torch_tensor_array_delete has been removed.

This shoud not affect users as advice has always been to use the torch_delete interface instead of calling directly.

Pull request #545

Clarification on Batching

Batching works the same as in PyTorch — add a leading batch dimension to input arrays and FTorch applies the model independently to each element.

This has always been the case, but we have clarified this in the documentation and added a worked example - 04) Batching.

Pull Request #500

Key points:

Leading dimensions are batch size, Trailing dimensions must match the model’s expected feature size.
All input band output tensors must share the same batch dimensions (pre-allocated).
One model can handle both single and batched inference.

Clarification on Batching

! Single inference (1D input)
real(sp), dimension(5), target :: in_single, out_single

! Batched inference (3D input)
real(sp), dimension(2,3,5), target :: in_batch, out_batch

call torch_model_load(model, "model.pt", torch_kCPU)

! Single
call torch_tensor_from_array(in_tensors(1), in_single, torch_kCPU)
call torch_tensor_from_array(out_tensors(1), out_single, torch_kCPU)
call torch_model_forward(model, in_tensors, out_tensors)

! Batched
call torch_tensor_from_array(in_tensors(1), in_batch, torch_kCPU)
call torch_tensor_from_array(out_tensors(1), out_batch, torch_kCPU)
call torch_model_forward(model, in_tensors, out_tensors)

Improved Error Handling

Previously errors in Torch resulted in opaque error messages that referenced locations in the Torch library at the point of failure.

Now there are checks in Fortran to catch some input errors before they propagate to the C++ and raise them there, and C++ errors are now caught and handled in the CTorch layer to provide more information to the user about the point of failure.

Before: opaque Torch C++ exception

terminate called after throwing
  an instance of 'torch::Error'
  what(): Expected all tensors to
  be on the same device, but found
  at least two devices (cuda:0
  and cpu)!

After: caught at CTorch layer

[ERROR]: One of the inputs to torch_jit_module_forward is not a Tensor

Pull request #347

No passing of temporaries

Internally, FTorch makes the assumption that data passed in is contiguous. in memory. This is due to the shared memory feature for efficiency.

If this is violated then data could be read incorrectly by Torch.

Validation is now applied in torch_tensor_from_array to check that input data is pointer, contiguous rather than simply target.

This is considered a bugfix but if you were passing in temporaries you will now need to create an array first.

The following calls are now forbidden:

! slice — non-contiguous subsection
call torch_tensor_from_array(t, data(1:5:2), torch_kCPU)

! array expression — compiler-generated temporary
call torch_tensor_from_array(t, in_data * 2, torch_kCPU)

! function result — unnamed temporary
call torch_tensor_from_array(t, get_array(), torch_kCPU)

No passing of temporaries

What do you need to do?

Nothing, provided you always passed arrays into torch_tensor_from_array.

How shall ye know it?

The following error will be raised in compilation:

Error: There is no specific subroutine for the generic 'torch_tensor_from_array' at (1)

with the 1 identifying the input data argument.

GPU Acceleration - v1.0 Recap

FTorch v1.0 allowed running inference on GPU:

! Load model onto GPU device 0
call torch_model_load(torch_net_0, 'model.pt', torch_kCUDA, device_index=0)

! Cast data to tensors on different devices
call torch_tensor_from_array(in_tensors_0(1), in_data, torch_kCUDA, device_index=0)
call torch_tensor_from_array(out_tensor_0(1), out_data, torch_kCPU)

! Inference as usual — Torch handles cross-device transfers
call torch_model_forward(torch_net_0, in_tensors_0, out_tensor_0)

Supported backends: NVIDIA CUDA, Intel XPU, Apple Silicon MPS

Multiple devices supported (e.g., device_index=0, device_index=1).

New in v1.1 — AMD GPU Support (HIP)

FTorch now supports AMD GPUs via the HIP backend.

PyTorch recommends reusing torch.cuda interfaces for HIP — FTorch builds against the CUDA backend, aliasing HIP at the CMake level.

! Same interface, compiled with HIP flags
call torch_model_load(torch_net, 'model.pt', torch_kHIP, device_index=0)
call torch_tensor_from_array(in_tensor(1), in_data, torch_kHIP, device_index=0)

Pull request #385 and Pull request #388

Source code restructure

Since the early days all FTorch source code existed in src/ftorch.F90. As features grew this became unsustainable so it is now distributed across several files:

src
├── ctorch.cpp
├── ctorch.h
├── ftorch_devices.F90
├── ftorch_model.f90
├── ftorch_optim.f90
├── ftorch_tensor.f90
├── ftorch_tensor.fypp
├── ftorch_types.f90
└── ftorch.f90

No change to usage — users still import from a parent ftorch module.

Sysadmin

CMake 3.18 — minimum version bumped to match PyTorch PR #491
pkg-config — query compilation flags via pkg-config --libs ftorch and pkg-config --cflags ftorch PR #464
Static library — support building as a static library PR #448
RPATH — libftorch.so now includes RUNPATH to Torch so downstream targets find it automatically PR #437

Sustainability/Other

Comprehensive unit testing with pFUnit
- All new contributions to the software came with extensive unit testing to provide users with confidence.
- New features add integration tests and examples.
Extensive CI pipeline to catch issues and provide broad support.
- Linux, macOS, Windows
- GNU and Intel compilers
- CPU and GPU backends
Documentation overhaul
- cambridge-iccs.github.io/FTorch
- Improved aesthetics
- Tidied user pages and API docs, added community pages (PRs welcome)

V1.0 Contributors

10 Total, 5 new!

Jack Atkinson, Joe Wallwork - Maintainers
Mikolaj Kowalski, Tom Meltzer - ICCS RSEs
Niccolo Zanotti - PhD and placement student
Jared Frazier, Zhenkun Li, Zoltán Katona - Community Users
Dominic Orchard, Daniel Katz - code and paper review

ML Coupling Workshop

ICCS ran a two-day ML Coupling Workshop in Cambridge, September 2025, bringing together researchers, RSEs, and modelling centres working on hybrid modelling challenges.

Shared recent advances and expertise in coupling ML to large-scale scientific codes
Discussion sessions on key challenges and best practices
Summary blog post: Accelerate-C2D3

What’s New - Unreleased

Online training - Autograd

Exposed autograd functionality from Torch
- requires_grad argument on tensor construction
- torch_tensor_backward for reverse-mode differentiation
- torch_tensor_get_gradient to extract computed gradients
- torch_tensor_zero_grad to reset gradients between backward passes
Mathematical operator overloading (=, +, -, *, /, **)
- Enables tensor expressions in Fortran that build a computation graph
- Further expressions can be added - requests/PRs welcome

Online training - Optimizers

torch_optim derived type wrapping PyTorch optimizers
- torch_optim%zero_grad — zeroes gradients at start of each step
- torch_optim%step — takes one optimizer iteration
Optimizers exposed: SGD, Adam, AdamW
Others can be added - requests/PRs welcome

Testing optimisation of a single tensor against PyTorch:

Online training - Loss

torch_loss_<...> subroutines create loss tensors.
Loss functions exposed: MSE and CrossEntropy
Others can be added - requests/PRs welcome

! Old calculation
call torch_tensor_mean(loss, (output_vec - target_vec) ** 2)

! Now with loss subroutine
call torch_loss_mse(loss, output_vec, target_vec)

Online training - Model training

Full Fortran training loop now demonstrated in worked example 11
- Load a TorchScript model, train using an optimizer, and run inference
torch_model_parameters interface added to access model weights
Loss curves comparable to equivalent PyTorch training:

Training SimpleNet in Fortran:

Overhaul of pt2ts

utils is now ftorch_utils, installable via pip
pt2ts is now a command-line script
- No longer required to copy and modify script by hand.
- provide model definition file, the class name, and where to save:
```
pt2ts SimpleNet --model_definition_file simplenet.py \
                --output_model_file model.pt
```

Pull request #555

What’s New - Upcoming

Work In Progress

Further compiler/CI support:
- LFortran
- nvfortran
- Flang (LLVM)
Package and distribute via fpm (Fortran Package Manager)
Summer student working on UKCA project

Your Project Here?

Fortran-Enzyme

Automatic differentiation for Fortran, via Enzyme

Compiler plugin that differentiates LLVM IR, computing gradients of existing code without source-level rewriting. Currently supports C, C++, Julia, Rust. ICCS are working to add Fortran.

6-month (March-Aug) ICCS project led by Joe Wallwork.

Aiming to expose Enzyme through a Flang plugin and demonstrate differentiable Fortran on a UKCA atmospheric chemistry case study.

Eventual goal is to provide a simple tool for differentiable modelling of existing Fortran code.

/EnzymeAD/Enzyme

FTorch: Summary

Use of ML within traditional numerical models
- A growing area that presents challenges
Language interoperation
- FTorch provides a solution for scientists implementing torch models in Fortran
- Designed for computational and developer efficiency
- Has helped deliver science in climate research and beyond
  See FTorch/community/case_studies
Exploring options for online training and AD
Collaborative projects highlight various considerations when implementing hybrid models.

Thanks for Listening

Get in touch:

Jack Atkinson

jackatkinson.net

jwa34[AT]cam.ac.uk

jatkinson1000

@jatkinson1000@hachyderm.io

/Cambridge-ICCS/FTorch

Thanks to Joe Wallwork, Tom Meltzer, Elliott Kasoar,
Niccolò Zanotti and the rest of the FTorch team.

The ICCS received support from

FTorch has been supported by

References

Atkinson, Jack, Athena Elafrou, Elliott Kasoar, Joseph G. Wallwork, Thomas Meltzer, Simon Clifford, Dominic Orchard, and Chris Edsall. 2025. “FTorch: A Library for Coupling PyTorch Models to Fortran.” Journal of Open Source Software 10 (107): 7602. https://doi.org/10.21105/joss.07602.

Barker, Michelle, Neil P Chue Hong, Daniel S Katz, Anna-Lena Lamprecht, Carlos Martinez-Ortiz, Fotis Psomopoulos, Jennifer Harrow, et al. 2022. “Introducing the FAIR Principles for Research Software.” Scientific Data 9 (1): 622. https://doi.org/10.1038/s41597-022-01710-x.

Wilkinson, Mark D, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 1–9. https://doi.org/10.1038/sdata.2016.18.

What’s New

Precursors

Slides and Materials

Licensing

Motivation

Climate Models

Hybrid Modelling

FAIR Challenges

Language interoperation

Efficiency

Publication

What’s New - v1.1

No need to specify layout

Finalisers for tensors and models

Finalisers - Code Comparison

Finalizers - Aside

Clarification on Batching

Clarification on Batching

Improved Error Handling

No passing of temporaries

No passing of temporaries

GPU Acceleration - v1.0 Recap

New in v1.1 — AMD GPU Support (HIP)

Source code restructure

Sysadmin

Sustainability/Other

V1.0 Contributors

ML Coupling Workshop

What’s New - Unreleased

Online training - Autograd

Online training - Optimizers

Online training - Loss

Online training - Model training

Overhaul of pt2ts

What’s New - Upcoming

Work In Progress

Your Project Here?

Fortran-Enzyme

FTorch: Summary

Thanks for Listening

References

No need to specify `layout`