Reducing the Overhead of Coupled Machine Learning Models between Python and Fortran

RSECon23, Swansea

Jack Atkinson

ICCS/Cambridge

Simon Clifford

ICCS/Cambridge

Athena Elafrou

NVIDIA

Tom Meltzer

ICCS/Cambridge

Chris Edsall

ICCS/Cambridge

2023-09-05

Precursors

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo used under CC0(1.0)

Slides

To access links or follow on your own device these slides can be found at
https://jackatkinson.net/slides

Introduction

The ICCS1

The Institute of Computing for Climate Science

  • Domain-specific RSE group based at the University of Cambridge
  • Embedded support to several international climate science projects

Climate Modelling

Climate models are large, complex, many-part systems.

Machine Learning

We typically think of Deep Learning as an end-to-end process;
a black box with an input and an output.

Who’s that Pokémon?

\[\begin{bmatrix}\vdots\\a_{23}\\a_{24}\\a_{25}\\a_{26}\\a_{27}\\\vdots\\\end{bmatrix}=\begin{bmatrix}\vdots\\0\\0\\1\\0\\0\\\vdots\\\end{bmatrix}\] It’s Pikachu!

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Machine Learning in Science

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Replacing physics-based components

2 approaches:

  • emulation, or
  • data-driven.

Additional challenges:

  • Physical compatibility
    • Physics-based models have conservation laws
      Required for accuracy and stability
  • Language interoperation

Language interoperation

Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.

Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Solutions

Considerations

There are 2 types of efficiency:

  • Computational

  • Developer

An ideal solution should:

  • not generate excess additional work,
    • not require advanced computing skills,
    • have a minimal learning curve,
  • not add excess dependencies,
  • be easy to maintain, and
  • maximise performance.

Possible solutions

  • Implement a NN in Fortran
  • Forpy/CFFI
  • SmartSim/Pipes
  • Fortran-Keras Bridge

Possible solutions

  • Implement a NN in Fortran
  • Forpy/CFFI
  • SmartSim/Pipes
  • Fortran-Keras Bridge
  • e.g. inference-engine, neural-fortran, own solution etc.

  • removes the two-language problem

  • how do you ensure you port the model correctly?
  • ML libraries are highly optimised, probably more so than your code.

Possible solutions

  • Implement a NN in Fortran
  • Forpy/CFFI
  • SmartSim/Pipes
  • Fortran-Keras Bridge
  • brings python types into Fortran

  • easy to add forpy.mod file and compile

  • verbose, with a learning curve
  • need to manage and link python environment
  • increases dependencies

Possible solutions

  • Implement a NN in Fortran
  • Forpy/CFFI
  • SmartSim/Pipes
  • Fortran-Keras Bridge
  • pass data between workers through a network glue layer
  • may be necessary for certain architectures

  • steep learning curve
  • involves data copying

Possible solutions

  • Implement a NN in Fortran
  • Forpy/CFFI
  • SmartSim/Pipes
  • Fortran-Keras Bridge

  • pure Fortran

  • TensorFlow (Keras) only
  • inactive and incomplete

Possible solutions

Python
env

Python
runtime

xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5

Interfacing Libraries

Both PyTorch and TensorFlow have C++ backends and provide APIs to access.
Binding Fortran to C is straightforward1 from 2003 using iso_c_binding.


PyTorch

  • C++ API
  • Archive model as Torchscript
    • Statically typed subset of Python
  • Read and run via any Torch API

TensorFlow

  • C++ and C APIs
  • Archive model as Keras SavedModel
  • process_model provided to extract required opaque parameters and use API

Performant - Computational

No-copy access in memory.

Indexing issues and associated reshape can be avoided with Torch accessor.

Performant - Ease of use

Installation

CMake

  • Install libtorch, or the TensorFlow C API
  • Clone
  • Build using CMake (instructions provided)
  • Install
  • Link

CMake is a trademark of Kitware.

Tools

  • pt2ts.py script facilitates saving models to TorchScript.
  • process_model extracts TF model data.

Examples

  • Guide users through process from saving a python model to running in Fortran.
  • User-defined and preloaded (ResNet-18) cases.

Support

  • Use frameworks’ implementations directly
    • feature support
    • future support
    • direct translation of python models 1

Licensing and FOSS

The libraries are licensed under MIT and available as FOSS.

  • Highly permissive for use by all
  • OS development on GitHub using issues and PRs

Case Study

Gravity Wave parameterisation in MiMA

  • Neural Net
    • Emulating Alexander and Dunkerton (1999) gravity wave parameterisation.
    • Fully-connected multi-layer net with identical Pytorch and TensorFlow versions
    • Initially interfaced (slowly) using forpy (Espinosa et al. 2022)

Coding example

Replace the forpy connected net with our direct coupled approach.

Test both PyTorch and TensorFlow.


Given a Fortran program with model inputs in arrays,

the original coupling using forpy requires

67

lines of boilerplate code,

whilst our library takes

39.


A fork of MiMA with these implementations of the interfaces is at:
https://github.com/DataWaveProject/MiMA-machine-learning

e.g. Loading a Torch model

ie = forpy_initialize()

type(module_py) :: run_emulator
type(list) :: paths
type(object) :: model
type(tuple) :: args
type(str) :: py_model_dir

ie = str_create(py_model_dir, trim('/path/to/saved/model'))
ie = get_sys_path(paths)
ie = paths%append(py_model_dir)

! import python modules to `run_emulator`
ie = import_py(run_emulator, trim(model_name))
if (ie .ne. 0) then
    call err_print
    call error_mesg(__FILE__, __LINE__, "forpy model not loaded")
end if

! use python module `run_emulator` to load a trained model
ie = call_py(model, run_emulator, "name_of_init_function")
if (ie .ne. 0) then
    call err_print
    call error_mesg(__FILE__, __LINE__, "call to `initialize` failed")
end if

e.g. Loading a Torch model

type(torch_module) :: model

model = torch_module_load('/path/to/saved/model.pt'//c_null_char)

Conclusions

Take away messages

  • Machine learning has many potential applications in scientific computing
  • Leveraging it effectively requires care
  • We have developed libraries allowing easy and efficient deployment of ML within Fortran models
  • For new projects we advise using PyTorch

Future work

  • Provide a tagged first release on github
    • Publication through JOSS and zenodo
  • Further test GPU functionalities
  • Implement functionalities beyond inference?
    • Online training is likely to become important

Get involved

  • Inform potential users
    • Further testing and feedback wanted!
  • Developers welcome

Closing slide, thanks, and questions

The ICCS is funded by  

References

Alexander, MJ, and TJ Dunkerton. 1999. “A Spectral Parameterization of Mean-Flow Forcing Due to Breaking Gravity Waves.” Journal of the Atmospheric Sciences 56 (24): 4167–82.
Espinosa, Zachary I, Aditi Sheshadri, Gerald R Cain, Edwin P Gerber, and Kevin J DallaSanta. 2022. “Machine Learning Gravity Wave Parameterization Generalizes to Capture the QBO and Response to Increased CO2.” Geophysical Research Letters 49 (8): e2022GL098174.
Jucker, Martin, and EP Gerber. 2017. “Untangling the Annual Cycle of the Tropical Tropopause Layer with an Idealized Moist Model.” Journal of Climate 30 (18): 7339–58.

Code

The libraries can be found at:

Their implementation in the MiMA model can be found at:
https://github.com/DataWaveProject/MiMA-machine-learning

Benchmarking of PyTorch can be found at:
https://github.com/Cambridge-ICCS/fortran-pytorch-lib-benchmark/

Bonus Content

Code Example - PyTorch

  • Take model file
  • Save as TorchScript
import torch
import my_ml_model

trained_model = my_ml_model.initialize()
scripted_model = torch.jit.script(model)
scripted_model.save("my_torchscript_model.pt")

Code Example - PyTorch

Neccessary imports:

use, intrinsic :: iso_c_binding, only: c_int64_t, c_float, c_char, &
                                       c_null_char, c_ptr, c_loc
use ftorch

Loading a pytorch model:

model = torch_module_load('/path/to/saved/model.pt'//c_null_char)

Code Example - PyTorch

Tensor creation from Fortran arrays:

! Fortran variables
real, dimension(:,:), target  :: SST, model_output
! C/Torch variables
integer(c_int), parameter :: dims_T = 2
integer(c_int64_t) :: shape_T(dims_T)
integer(c_int), parameter :: n_inputs = 1
type(torch_tensor), dimension(n_inputs), target :: model_inputs
type(torch_tensor) :: model_output_T

shape_T = shape(SST)

model_inputs(1) = torch_tensor_from_blob(c_loc(SST), dims_T, shape_T &
                                         torch_kFloat64, torch_kCPU)

model_output = torch_tensor_from_blob(c_loc(output), dims_T, shape_T, &
                                      torch_kFloat64, torch_kCPU)

Code Example - PyTorch

Running the model

call torch_module_forward(model, model_inputs, n_inputs, model_output_T)

Cleaning up:

call torch_tensor_delete(model_inputs(1))
call torch_module_delete(model)

Results

Timings (real seconds) for computing gravity wave drag in-situ.1

Forpy Direct % Direct/Forpy
PyTorch 94.43 s 134.81 s 142.8 %
TensorFlow 667.16 s 170.31 s 25.5 %

Timing data (real seconds) for benchmarking gravity wave drag with PyTorch on CSD3.

intel Forpy Direct % Direct/Forpy
Mean 0.3126 s 0.3509 s 112.3 %
Std 0.0420 s 0.0547 s -
gcc Forpy Direct % Direct/Forpy
Mean 0.3405 s 0.3669 s 107.7 %
Std 0.0449 s 0.0586 s -