Experiences using cf-python in cyclone tracking

CF Conventions Workshop 2025

Jack Atkinson

Senior Research Software Engineer
ICCS - University of Cambridge

2025-09-22

Precursors

Slides and Materials

To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1

Introduction

  • Senior Research Software Engineer at ICCS, University of Cambridge.
  • Experience with climate models and metadata standards.
  • Contributed SGRID conventions to the xgcm package:

XGCM Logo by XGCM under fair use
ICCS and University Logos with permission

FAIR

FAIR data and software is increasingly important:

  • Findable
  • Accessible
  • Interoperable
  • Reusable

BSBR by the SSI under fair use

Cyclone Tracking

  • Several cyclone tracking software tools exist:
    • Tempest Extremes,
    • TRACK,
    • TSTORMS, etc.
  • Often compiled codes/libraries
  • Generic approach: detect and stitch
  • Varying degrees of documentation
  • Each has its own data representation

Tracks generated using Tempest Extremes from JFM 1950 HadGEM3 data

TCTrack

  • TCTrack aims to provide:
    • Unified interface for different algorithms,
    • Standardised CF-compliant outputs,
    • Data for repeated use:
      • different time,
      • different place.

Tracks generated using Tempest Extremes from JFM 1950 HadGEM3 data

/Cambridge-ICCS/TCTrack

tctrack.readthedocs.io

Raw Output Data: TE GFDL

start   16      1950    1       1       6
        572     320     201.269531      -14.882813      1.003118e+05    0.000000e+00    1.389075e+01    1950    1       1       6
        569     318     200.214844      -15.351562      1.001075e+05    0.000000e+00    1.546506e+01    1950    1       1       12
        567     317     199.511719      -15.585937      1.001445e+05    0.000000e+00    1.590697e+01    1950    1       1       18
        565     320     198.808594      -14.882813      1.000154e+05    0.000000e+00    1.531368e+01    1950    1       2       0
        563     321     198.105469      -14.648438      1.001127e+05    0.000000e+00    1.522812e+01    1950    1       2       6
        555     321     195.292969      -14.648438      1.001399e+05    0.000000e+00    1.413657e+01    1950    1       3       0

...

start   22      1950    1       1       6
        163     332     57.480469       -12.070312      1.005820e+05    0.000000e+00    1.153856e+01    1950    1       1       6
        161     327     56.777344       -13.242188      1.003331e+05    0.000000e+00    9.956528e+00    1950    1       1       12
        158     324     55.722656       -13.945312      1.005362e+05    0.000000e+00    1.123346e+01    1950    1       1       18
        157     322     55.371094       -14.414062      1.001472e+05    0.000000e+00    1.184854e+01    1950    1       2       0
        156     317     55.019531       -15.585937      1.003412e+05    0.000000e+00    1.290700e+01    1950    1       2       6
        155     311     54.667969       -16.992188      1.001394e+05    0.000000e+00    1.204769e+01    1950    1       2       12

...

Raw Output Data: TE CSV

track_id, year, month, day, hour, i, j, lon, lat, psl, orog, sfcWind
0, 1950, 1, 1, 6, 572, 320, 201.269531, -14.882813, 1.003118e+05, 0.000000e+00, 1.389075e+01
0, 1950, 1, 1, 12, 569, 318, 200.214844, -15.351562, 1.001075e+05, 0.000000e+00, 1.546506e+01
0, 1950, 1, 1, 18, 567, 317, 199.511719, -15.585937, 1.001445e+05, 0.000000e+00, 1.590697e+01
0, 1950, 1, 2, 0, 565, 320, 198.808594, -14.882813, 1.000154e+05, 0.000000e+00, 1.531368e+01
0, 1950, 1, 2, 6, 563, 321, 198.105469, -14.648438, 1.001127e+05, 0.000000e+00, 1.522812e+01
0, 1950, 1, 3, 0, 555, 321, 195.292969, -14.648438, 1.001399e+05, 0.000000e+00, 1.413657e+01

...

1, 1950, 1, 1, 6, 163, 332, 57.480469, -12.070312, 1.005820e+05, 0.000000e+00, 1.153856e+01
1, 1950, 1, 1, 12, 161, 327, 56.777344, -13.242188, 1.003331e+05, 0.000000e+00, 9.956528e+00
1, 1950, 1, 1, 18, 158, 324, 55.722656, -13.945312, 1.005362e+05, 0.000000e+00, 1.123346e+01
1, 1950, 1, 2, 0, 157, 322, 55.371094, -14.414062, 1.001472e+05, 0.000000e+00, 1.184854e+01
1, 1950, 1, 2, 6, 156, 317, 55.019531, -15.585937, 1.003412e+05, 0.000000e+00, 1.290700e+01
1, 1950, 1, 2, 12, 155, 311, 54.667969, -16.992188, 1.001394e+05, 0.000000e+00, 1.204769e+01

...

Raw Output Data: TRACK

netcdf ff_trs.test_out {
dimensions:
        tracks = 703 ;
        record = UNLIMITED ; // (78 currently)
variables:
        int TRACK_ID(tracks) ;
                TRACK_ID:add_fld_num = 0 ;
                TRACK_ID:tot_add_fld_num = 0 ;
                TRACK_ID:loc_flags = "" ;
        int FIRST_PT(tracks) ;
        int NUM_PTS(tracks) ;
        int index(record) ;
        int time(record) ;
        float longitude(record) ;
        float latitude(record) ;
        float intensity(record) ;
}

Process and Challenges

  • Read generated data into a generic Track class
  • Write CF-compliant NetCDF using cf-python.
  • Challenges:
    • Back-filling missing metadata:
      • extract from input files,
      • based on algorithm outputs,
      • stored in Python object.
    • Understanding CF data structures:
      • Fields = variables, not trajectories.

cf-python Field structure flowchart from the cf-python documentation

CF-Compliant Output Data

netcdf my_cf_tracks {
dimensions:
        trajectory = 36 ;
        observation = 60 ;
variables:
        int64 trajectory(trajectory) ;
                trajectory:standard_name = "trajectory" ;
                trajectory:cf_role = "trajectory_id" ;
                trajectory:long_name = "trajectory index" ;
        int64 observation(observation) ;
                observation:standard_name = "observation" ;
                observation:long_name = "observation index" ;
        double time(trajectory, observation) ;
                time:standard_name = "time" ;
                time:long_name = "time" ;
                time:units = "days since 1970-01-01" ;
                time:calendar = "360_day" ;
        double lat(trajectory, observation) ;
                lat:standard_name = "lat" ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
        double lon(trajectory, observation) ;
                lon:standard_name = "lon" ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
        double data(trajectory, observation) ;
                data :coordinates = "time lat lon" ;
        double data_1(trajectory, observation) ;
                data_1:coordinates = "time lat lon" ;
        double air_pressure_at_sea_level(trajectory, observation) ;
                air_pressure_at_sea_level:standard_name = "air_pressure_at_sea_level" ;
                air_pressure_at_sea_level:long_name = "Sea Level Pressure" ;
                air_pressure_at_sea_level:units = "Pa" ;
                air_pressure_at_sea_level:coordinates = "time lat lon" ;
                air_pressure_at_sea_level:cell_methods = "area: point" ;
        double surface_altitude(trajectory, observation) ;
                surface_altitude:standard_name = "surface_altitude" ;
                surface_altitude:long_name = "Surface Altitude" ;
                surface_altitude:units = "m" ;
                surface_altitude:coordinates = "time lat lon" ;
                surface_altitude:cell_methods = "area: point" ;
        double wind_speed(trajectory, observation) ;
                wind_speed:standard_name = "wind_speed" ;
                wind_speed:long_name = "Near-Surface Wind Speed" ;
                wind_speed:units = "m s-1" ;
                wind_speed:coordinates = "time lat lon" ;
                wind_speed:cell_methods = "area: maximum (great circle of radius 2.0 degrees)" ;

// global attributes:
                :Conventions = "CF-1.12" ;
                :featureType = "trajectory" ;
}

Challenges

  • UDUNITS:
    • Dependency of cf-python.
    • Caused issues for:
      • More naïve users.
      • Continuous Integration (CI) workflows and documentation.
  • Loading and using cf-python is slow:
    • Significantly increases the runtime of the test suite.
    • Looking to ‘quarantine’ usage in TCTrack
  • Documentation
    • There is a lot!
    • Hard to know where to start - e.g. cell methods were a little confusing at first.
    • Lots of hopping back and forth. cf.Field’s attrs and methods are behemothic
    • “Canonical use” in the conventions was very helpful

Future Work

  • Wrap further storm tracking softwares
  • Implement new tracking approaches in development by researchers
  • Test in MPhil projects to process CMIP data
  • Handle input data to be more faIR
  • Basic visualisation using huracanpy
  • Development of storm tracks web-dashboard by project partners

Summary

  • TCTrack:
    • Open Source Python tool.
    • Aiming to provide a uniform, documented interface to cyclone tracking tools.
    • Outputs CF-compliant data for interoperability and reuse.
  • Key Takeaways:
    • CF Conventions useful as ‘truth’ guidance
    • cf-python useful
      • trusted method of generating cf-compliant files
      • quite a learning curve

Thanks for Listening

Get in touch:

Find it on :

/Cambridge-ICCS/TCTrack

Thanks to Sam Avis and Alison Ming.

The ICCS received support from the Inigo InSPIRe project

References

Barker, Michelle, Neil P Chue Hong, Daniel S Katz, Anna-Lena Lamprecht, Carlos Martinez-Ortiz, Fotis Psomopoulos, Jennifer Harrow, et al. 2022. “Introducing the FAIR Principles for Research Software.” Scientific Data 9 (1): 622. https://doi.org/10.1038/s41597-022-01710-x.
Wilkinson, Mark D, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 1–9. https://doi.org/10.1038/sdata.2016.18.