Climate Machine Learning Applications

ICCS Summer school 2023

Jack Atkinson

ICCS/Cambridge

Jim Denholm

Cambridge

2024-06-09

Teaching Material Recap

Teaching Material Recap

Over the ML sessions at the summer school we have learnt about:

  • Classification - categorising items based on information
  • Regression - using information to predict another value

using:

  • ANNs - using input features to make predictions
  • CNNs - using image-like data as an input

Considerations

Image-like data

Gravity waves image from Sheridan, Vosper, and Brown (2017).
MNIST Images from colah

Potential Applications

Applications in geosciences:

See review of Kashinath et al. (2021)

Climate Modelling

Climate models are large, complex, many-part systems.

Paramterisations

  • Parameterisations are typically expensive
    • Microphysics and Radiation are top offenders
  • Replace parameterisations with NNs
    • emulation of existing parameterisation
      e.g. Espinosa et al. (2022)
    • data-driven parameterisations
      • capture missing physics?
    • train with a high-resolution model
      access the benefits of subgrid model without the cost(?)

Machine Learning

We typically think of Deep Learning as an end-to-end process;
a black box with an input and an output.

Who’s that Pokémon?

\[\begin{bmatrix}\vdots\\a_{23}\\a_{24}\\a_{25}\\a_{26}\\a_{27}\\\vdots\\\end{bmatrix}=\begin{bmatrix}\vdots\\0\\0\\1\\0\\0\\\vdots\\\end{bmatrix}\] It’s Pikachu!

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Machine Learning in Science

Neural Net by 3Blue1Brown under fair dealing.
Pikachu © The Pokemon Company, used under fair dealing.

Replacing physics-based components

2 approaches:

  • emulation, or
  • data-driven.

Additional challenges:

  • Physical compatibility
    • Physics-based models have conservation laws
      Required for accuracy and stability
  • Language interoperation

Downscaling

  • Can we get information for ‘free’?
  • Train to predict ‘image’ from coarsened version.
    • Topography?

Image by Earth Lab

Forecasting

  • Time-series
    • popular use
    • Recurrent Neural Nets
  • Complete weather
    • FourCastNet, Pangu-Weather, GraphCast

Line plot image from Bi et al. (2023)
Global image from NVIDIA FourCastNet

Challenges

Training data - considerations

How should we prepare our training data?

  • Cyclic data?
    • e.g. diurnal, annual, other
    • use time as an input
    • use a [daily] average

Training data - implications

  • A NN only knows as much as its training data.
  • How do you predict the 1/100 event? 1/1000 event?
  • How do you train for a changing climate?
    • And tipping points?

Image by NASA

Structure/Physics-informed approach

There is a wide variety of ways to structure a Neural Net.

What is the most appropriate for our application.

What are potentiall pitfalls - don’t go in blind with an ML hammer!

Case study of Ukkonen (2022) for emulating radiative transfer:

  • Recurrent Neural Network reflects physical propogation,
  • and prevents spurious correlations.

Physical Compatibility

Many ML applications in climate science are more complex than other classical applications.

  • our ML useage is often not end-to-end
  • A stable/accurate offline model will not neccessarily be stable online (Furner et al. 2023).

Your NN is perfectly happy to have ‘negative rain’.

  • Even with heavy penalties
  • This is not a new problem in numerical parameterisations.
  • How is it best to enforce physical constraints in NNs.

Redeployability

How easy is it to redeploy a ML model? - exactly what has it learned?

  • Locked to a geographical location?
  • Locked to numerical model?
    • Locked to a specific grid!?
  • How do we handle inputs from different models?

Interfacing

Replacing physics-based components of larger models (emulation or data-driven) requires care.

  • Language interoperation
  • Physical compatibility

Interfacing - Possible solutions

Ideally need to:

  • Not generate excess additional work for user
    • Not require excess knowledge of computing
    • Minimal learning curve
  • Not add excess dependencies
  • Be easy to maintain
  • Maximise performance

Interfacing - Possible solutions

  • Implement a NN in Fortran
  • Forpy/CFFI
  • SmartSim/Pipes
  • Fortran-Keras Bridge

Interfacing - Our Solution

Python
env

Python
runtime

xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
  • Ease of use
  • Use frameworks’ implementations directly

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
    • avoids python runtime
    • no-copy transfer (shared memory)
  • Ease of use
  • Use frameworks’ implementations directly

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
  • Ease of use
    • pleasant API (see next slides)
    • utilities for generating TorchScript/TF module provided
    • examples provided
  • Use frameworks’ implementations directly

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
  • Ease of use
  • Use frameworks’ implementations directly
    • feature support
    • future support
    • direct translation of python models 1

Code Example - PyTorch

  • Take model file
  • Save as torchscript
import torch
import my_ml_model

trained_model = my_ml_model.initialize()
scripted_model = torch.jit.script(model)
scripted_model.save("my_torchscript_model.pt")

Code Example - PyTorch

Neccessary imports:

use, intrinsic :: iso_c_binding, only: c_int64_t, c_float, c_char, &
                                       c_null_char, c_ptr, c_loc
use ftorch

Loading a pytorch model:

model = torch_module_load('/path/to/saved/model.pt'//c_null_char)

Code Example - PyTorch

Tensor creation from Fortran arrays:

! Fortran variables
real, dimension(:,:), target  :: SST, model_output
! C/Torch variables
integer(c_int), parameter :: dims_T = 2
integer(c_int64_t) :: shape_T(dims_T)
integer(c_int), parameter :: n_inputs = 1
type(torch_tensor), dimension(n_inputs), target :: model_inputs
type(torch_tensor) :: model_output_T

shape_T = shape(SST)

model_inputs(1) = torch_tensor_from_blob(c_loc(SST), dims_T, shape_T &
                                         torch_kFloat64, torch_kCPU)

model_output = torch_tensor_from_blob(c_loc(output), dims_T, shape_T, &
                                      torch_kFloat64, torch_kCPU)

Code Example - PyTorch

Running the model

call torch_module_forward(model, model_inputs, n_inputs, model_output_T)

Cleaning up:

call torch_tensor_delete(model_inputs(1))
call torch_module_delete(model)

Further information

References

Bi, Kaifeng, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. 2022. “Pangu-Weather: A 3d High-Resolution Model for Fast and Accurate Global Weather Forecast.” arXiv Preprint arXiv:2211.02556.
———. 2023. “Accurate Medium-Range Global Weather Forecasting with 3D Neural Networks.” Nature, 1–6.
Espinosa, Zachary I, Aditi Sheshadri, Gerald R Cain, Edwin P Gerber, and Kevin J DallaSanta. 2022. “Machine Learning Gravity Wave Parameterization Generalizes to Capture the QBO and Response to Increased CO2.” Geophysical Research Letters 49 (8): e2022GL098174.
Furner, Rachel, Peter Haynes, Dave Munday, Brooks Paige, Emily Shuckburgh, et al. 2023. “An Iterative Data-Driven Emulator of an Ocean General Circulation Model.”
Giglio, Donata, Vyacheslav Lyubchich, and Matthew R Mazloff. 2018. “Estimating Oxygen in the Southern Ocean Using Argo Temperature and Salinity.” Journal of Geophysical Research: Oceans 123 (6): 4280–97.
Harris, Lucy, Andrew TT McRae, Matthew Chantry, Peter D Dueben, and Tim N Palmer. 2022. “A Generative Deep Learning Approach to Stochastic Downscaling of Precipitation Forecasts.” Journal of Advances in Modeling Earth Systems 14 (10): e2022MS003120.
Kashinath, Karthik, M Mustafa, Adrian Albert, JL Wu, C Jiang, Soheil Esmaeilzadeh, Kamyar Azizzadenesheli, et al. 2021. “Physics-Informed Machine Learning: Case Studies for Weather and Climate Modelling.” Philosophical Transactions of the Royal Society A 379 (2194): 20200093.
Ma, Donglai, Jacob Bortnik, Edurado Alves, Enrico Camporeale, Xiangning Chu, and Adam Kellerman. 2021. “Data-Driven Discovery of the Governing Equations Describing Radiation Belt Dynamics.” In AGU Fall Meeting Abstracts, 2021:SA15B–1928.
Pathak, Jaideep, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, et al. 2022. “Fourcastnet: A Global Data-Driven High-Resolution Weather Model Using Adaptive Fourier Neural Operators.” arXiv Preprint arXiv:2202.11214.
Rasp, Stephan, Peter D Dueben, Sebastian Scher, Jonathan A Weyn, Soukayna Mouatadid, and Nils Thuerey. 2020. “WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting.” Journal of Advances in Modeling Earth Systems 12 (11): e2020MS002203.
Shao, Qi, Wei Li, Guijun Han, Guangchao Hou, Siyuan Liu, Yantian Gong, and Ping Qu. 2021. “A Deep Learning Model for Forecasting Sea Surface Height Anomalies and Temperatures in the South China Sea.” Journal of Geophysical Research: Oceans 126 (7): e2021JC017515.
Sheridan, Peter, Simon Vosper, and Philip Brown. 2017. “Mountain Waves in High Resolution Forecast Models: Automated Diagnostics of Wave Severity and Impact on Surface Winds.” Atmosphere 8 (1): 24.
Ukkonen, Peter. 2022. “Exploring Pathways to More Accurate Machine Learning Emulation of Atmospheric Radiative Transfer.” Journal of Advances in Modeling Earth Systems 14 (4): e2021MS002875.
Yuval, Janni, and Paul A O’Gorman. 2020. “Stable Machine-Learning Parameterization of Subgrid Processes for Climate Modeling at a Range of Resolutions.” Nature Communications 11 (1): 3295.
Zanna, Laure, and Thomas Bolton. 2020. “Data-Driven Equation Discovery of Ocean Mesoscale Closures.” Geophysical Research Letters 47 (17): e2020GL088376.

Slides

These slides can be viewed at:
https://cambridge-iccs.github.io/slides/ml-training/applications.html

The html and source can be found on GitHub

Contact

For more information we can be reached at:

You can also contact the ICCS, make a resource allocation request, or visit us at the Summer School RSE Helpdesk.