ML&DL Seminars, LSCE - IPSL, Paris
NVIDIA
2023-11-28
Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
To access links or follow on your own device these slides can be found at
https://jackatkinson.net/slides
The Institute of Computing for Climate Science
Climate models are large, complex, many-part systems.
We typically think of Deep Learning as an end-to-end process;
a black box with an input and an output.
Who’s that Pokémon?
\[\begin{bmatrix}\vdots\\a_{23}\\a_{24}\\a_{25}\\a_{26}\\a_{27}\\\vdots\\\end{bmatrix}=\begin{bmatrix}\vdots\\0\\0\\1\\0\\0\\\vdots\\\end{bmatrix}\] It’s Pikachu!
Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.
Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.
Many large scientific models are written in Fortran (or C, or C++).
Much machine learning is conducted in Python.
Mathematical Bridge by cmglee used under CC BY-SA 3.0
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.”
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
There are 2 types of efficiency:
Computational
Developer
An ideal solution should:
inference-engine
, neural-fortran
, own custom solution etc.forpy.mod
file and compile
Other suggestions include fifo pipes, YAC (Arnold et al. 2023)
Python
env
Python
runtime
xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5
iso_c_binding
.We will:
No-copy access in memory (CPU).
Indexing issues and associated reshape can be avoided with Torch accessor.
CMake
Tested on:
CMake is a trademark of Kitware.
pt2ts.py
script facilitates saving models to TorchScript.
The libraries are licensed under MIT and available as FOSS.
import torch
import torchvision
# Load pre-trained model and put in eval mode
model = torchvision.models.resnet18(weights="IMAGENET1K_V1")
model.eval()
# Create dummmy input
dummy_input = torch.ones(1, 3, 224, 224)
# Trace model and save
traced_model = torch.jit.trace(model, dummy_input)
frozen_model = torch.jit.freeze(traced_model)
frozen_model.save("saved_model.pt")
use, intrinsic :: iso_fortran_env, only : sp => real32
! Use the FTorch Library
use :: ftorch
implicit none
! Fortran variables
real(sp), dimension(1,3,244,244), target :: in_data
real(sp), dimension(1, 1000), target :: out_data
integer, parameter :: n_inputs = 1
integer :: in_layout(4) = [1,2,3,4]
integer :: out_layout(2) = [1,2]
! Torch Tensors
type(torch_tensor), dimension(1) :: in_tensor
type(torch_tensor) :: out_tensor
! Populate Fortran data
call random_number(in_data)
! Cast Fortran data to Tensors
! Create input/output tensors from the above arrays
in_tensor(1) = torch_tensor_from_array(in_data, in_layout, torch_kCPU)
out_tensor = torch_tensor_from_array(out_data, out_layout, torch_kCPU)
use, intrinsic :: iso_fortran_env, only : sp => real32
! Use the FTorch Library
use :: ftorch
implicit none
! Define a Torch module
type(torch_module) :: model
! Fortran variables
real(sp), dimension(1,3,244,244), target :: in_data
real(sp), dimension(1, 1000), target :: out_tensor
integer, parameter :: n_inputs = 1
integer :: in_layout(4) = [1,2,3,4]
integer :: out_layout(2) = [1,2]
! Load in from Torchscript
model = torch_module_load('/path/to/saved/model.pt')
! Populate Fortran data
call random_number(in_data)
! Cast Fortran data to Tensors
! Create input/output tensors from the above arrays
in_tensor(1) = torch_tensor_from_array(in_data, in_layout, torch_kCPU)
out_tensor = torch_tensor_from_array(out_data, out_layout, torch_kCPU)
! Infer
call torch_module_forward(model, in_tensor, n_inputs, out_tensor)
! Cleanup
call torch_module_delete(model)
call torch_tensor_delete(in_tensor(1))
call torch_tensor_delete(out_tensor)
! Use Fortran array `out_data` elsewhere in code
Save to TorchScript GPU from python:
# Set device as cuda
device = torch.device('cuda')
# Move model and dummy input to device before saving to TorchScript
model = model.to(device)
model.eval()
dummy_input = dummy_input.to(device)
# Trace model and save
traced_model = torch.jit.trace(model, dummy_input)
frozen_model = torch.jit.freeze(traced_model)
frozen_model.save("saved_gpu_model.pt")
Cast Tensors to GPU in Fortran:
Zonal-mean zonal winds (m/s)
Averaged over ±5 deg lat.
Pressure (height) vs. time Hovmöller diagram
FTorch exactly reproduces results of direct python call
NN is stable and reproduces QBO
Tends to slightly over/under-predict
Replace the forpy connected net with our direct coupled approach.
Test both PyTorch and TensorFlow.
Given a Fortran program with model inputs in arrays,
the original coupling using forpy requires
lines of boilerplate code,
whilst our library takes
A fork of MiMA with these implementations of the interfaces is at:
https://github.com/DataWaveProject/MiMA-machine-learning
Wilkes3 (CSD3)
Observations:
MPI_gather
to reduce overheadsFollowing the comparisons and MiMA experiments we performed detailed benchmarking to examine the library performance.
The libraries can be found at:
Torch: https://github.com/Cambridge-ICCS/FTorch
TensorFlow: https://github.com/Cambridge-ICCS/fortran-tf-lib
Slides available at: https://jackatkinson.net/slides/IPSL_FTorch/IPSL_FTorch.html
Get in touch:
The ICCS is funded by
process_model
provided to extract required opaque parameters and use APIie = forpy_initialize()
type(module_py) :: run_emulator
type(list) :: paths
type(object) :: model
type(tuple) :: args
type(str) :: py_model_dir
ie = str_create(py_model_dir, trim('/path/to/saved/model'))
ie = get_sys_path(paths)
ie = paths%append(py_model_dir)
! import python modules to `run_emulator`
ie = import_py(run_emulator, trim(model_name))
if (ie .ne. 0) then
call err_print
call error_mesg(__FILE__, __LINE__, "forpy model not loaded")
end if
! use python module `run_emulator` to load a trained model
ie = call_py(model, run_emulator, "name_of_init_function")
if (ie .ne. 0) then
call err_print
call error_mesg(__FILE__, __LINE__, "call to `initialize` failed")
end if