or, Maybe RSEs just really like a spec
2026-04-23
To access links or follow on your own device these slides can be found at:
jackatkinson.net/slides
Code used in demonstrations is available at:
/jatkinson1000/NetCDF-examples
Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
Vectors and icons by SVG Repo under CC0(1.0) or FontAwesome under SIL OFL 1.1
FAIR data and software is increasingly important:

BSBR by the SSI under fair use
Other notes:
A file containing:

UML Block diagram from the NetCDF Documentation
A file containing:
A file containing:
netcdf basic_dataset {
:title = "A simple example NetCDF dataset" ;
dimensions:
lon = 2 ;
lat = 5 ;
variables:
float temperature(lon, lat) ;
temperature:standard_name = "air_temperature" ;
temperature:units = "K" ;
float lon(lon) ;
lon:standard_name = "longitude" ;
lon:units = "degree_east" ;
float lon:valid_range = 0, 360 ;
data:
temperature =
1.1, 2.2, 3.3, 4.4, 5.5,
6.6, 7.7, 8.8, 9.9, 10 ;
lon = 0, 5 ;
}A file containing:
UNLIMITEDnetcdf basic_dataset {
:title = "A simple example NetCDF dataset" ;
dimensions:
time = UNLIMITED ;
lon = 2 ;
lat = 5 ;
variables:
float temperature(time, lon, lat) ;
temperature:standard_name = "air_temperature" ;
temperature:units = "K" ;
float lon(lon) ;
lon:standard_name = "longitude" ;
lon:units = "degree_east" ;
float lon:valid_range = 0, 360 ;
data:
temperature =
1.1, 2.2, 3.3, 4.4, 5.5,
6.6, 7.7, 8.8, 9.9, 10 ;
lon = 0, 5 ;
}Data Types:
double – IEEE 64-bit floatfloat – IEEE 32-bit float, also realint – 32-bit signed integer, also longshort – 16-bit signed integerbyte – 8-bit integerschar – CharactersncviewA useful and reasonably powerful utility to quickly visualise netcdf datasets.
Installable from source or various package managers (apt, brew, spack).
Usable from remote machines with X-forwarding.
On CSD3 (Thank you to Kacper):
**demo**
We can take a look at the binary file data as an octal dump using od to get some idea of how it is packaged.
Hints can be gleaned by adding flags:
-b for bytes-s for shorts-c for chars0000000 103 104 106 001 000 000 000 000 000 000 000 012 000 000 000 001
C D F 001 \0 \0 \0 \0 \0 \0 \0 \n \0 \0 \0 001
17475 326 0 0 0 2560 0 256
0000020 000 000 000 003 144 151 155 000 000 000 000 005 000 000 000 000
\0 \0 \0 003 d i m \0 \0 \0 \0 005 \0 \0 \0 \0
0 768 26980 109 0 1280 0 0
0000040 000 000 000 000 000 000 000 013 000 000 000 001 000 000 000 003
\0 \0 \0 \0 \0 \0 \0 \v \0 \0 \0 001 \0 \0 \0 003
0 0 0 2816 0 256 0 768
0000060 166 141 162 000 000 000 000 001 000 000 000 000 000 000 000 000
v a r \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0 \0
24950 114 0 256 0 0 0 0
0000100 000 000 000 000 000 000 000 003 000 000 000 014 000 000 000 120
\0 \0 \0 \0 \0 \0 \0 003 \0 \0 \0 \f \0 \0 \0 P
0 0 0 768 0 3072 0 20480
0000120 000 003 000 001 000 004 000 001 000 005 200 001
\0 003 \0 001 \0 004 \0 001 \0 005 200 001
768 256 1024 256 1280 384
0000134Perhaps the most useful tool when working with netcdf files is ncdump which ships with netcdf.
It allows us to “dump” out a representation of the data in the file for inspection.
The natural question to ask is, is this reversible?
Yes (mostly)
The “dump” representation of NetCDF data is a language itself:
CDL - The NetCDF Common Data Language
Comes complete with its own language specification
The basis of much knowledge, and a constant reference in preparing this talk!
The “mostly” comes from CDL comments // that are not preserved.
So how do we actually go the other direction?
ncgen
Everything so far has been what is called the NetCDF Classic model.
This is where data packaging to a binary representation was fully handled by NetCDF. Runs up to (and including) NetCDF 3.
NetCDF 4 introduced a new underlying format building on HDF5.
NetCDF4 is a subset of HDF5. We can verify this with:
This allows:
UML Block diagram from the NetCDF Documentation
Variable length array of UTF-8 unicode
Does what unicode did for everyone everywhere.
netcdf languages {
dimensions:
n_lang = 5 ;
variables:
string languages(n_lang) ;
string phrase(n_lang) ;
data:
languages = "English", "Ogham", "Welsh", "Anglo-Saxon", "Braille" ;
phrase =
"Hello World!",
"/ ᚛᚛ᚉᚑᚅᚔᚉᚉᚔᚋ ᚔᚈᚔ ᚍᚂᚐᚅᚑ ᚅᚔᚋᚌᚓᚅᚐ᚜",
"Dw i'n gallu bwyta gwydr, 'dyw e ddim yn gwneud dolur i mi.",
"/ ᛁᚳ᛫ᛗᚨᚷ᛫ᚷᛚᚨᛋ᛫ᛖᚩᛏᚪᚾ᛫ᚩᚾᛞ᛫ᚻᛁᛏ᛫ᚾᛖ᛫ᚻᛖᚪᚱᛗᛁᚪᚧ᛫ᛗᛖ᛬",
"/ ⠊⠀⠉⠁⠝⠀⠑⠁⠞⠀⠛⠇⠁⠎⠎⠀⠁⠝⠙⠀⠊⠞⠀⠙⠕⠑⠎⠝⠞⠀⠓⠥⠗⠞⠀⠍⠑" ;
}Store integer values that can be converted to a string:
netcdf clouds {
types:
ubyte enum cloud_class_t {Clear = 0, Cumulonimbus = 1, Stratus = 2,
Stratocumulus = 3, Cumulus = 4, Altostratus = 5, Nimbostratus = 6,
Altocumulus = 7, Cirrostratus = 8, Cirrocumulus = 9, Cirrus = 10,
Missing = 255} ;
dimensions:
station = 5 ;
variables:
cloud_class_t primary_cloud(station) ;
data:
primary_cloud = Clear, Stratus, Clear, Cumulonimbus, Cirrus ;
}Store integer values that can be converted to a string:
netcdf irish_rover {
types:
uint64 enum cargo {bags\ of\ the\ best\ Sligo\ rags = 1000000,
barrels\ of\ bones = 2000000,
bails\ of\ old\ nanny\ goats\'\ tails = 3000000,
barrels\ of\ stones = 4000000, dogs = 5000000, hogs = 6000000,
barrels\ of\ porter = 7000000,
sides\ of\ old\ blind\ horses\ hides = 8000000} ;
dimensions:
dim = 4 ;
variables:
cargo in_the_hold_of_the_Irish_Rover(dim) ;
data:
in_the_hold_of_the_Irish_Rover = bags\ of\ the\ best\ Sligo\ rags ,
barrels\ of\ bones, hogs, dogs ;
}Arrays of a type with variable length allowed.
Denoted by enclosing {}
Like C structs.
Combination of other types.
Raw data represented as a hex string (0x....)
int length indicates how many hex bytes per blob. Recall our experiments with od…
Let’s not go to opaque, it’s a silly place.
Organise your data like a Unix filesystem with the power of recursion!
Refer to the contents of other groups!
try:
netcdf groups {
dimensions:
dim = 4 ;
variables:
float var(dim) ;
data:
var = 1, 2, 3, 4 ;
group: grp1 {
dimensions:
dim = 2 ;
variables:
float var(dim) ;
data:
var = -1, -2 ;
} // group grp1
group: grp2 {
dimensions:
dim = 2 ;
variables:
float var(/grp1/dim, /dim) ;
data:
var = 5, 6, 7, 8,
-1, -2, -3, -4 ;
} // group grp2
}ncdump:
# What kind of NetCDF file are we dealing with?
ncdump -k myfile.nc
# File information and metadata (for NetCDF4)
ncdump -s myfile.ncncgen:
And what I love about RSE.
4us or 4suncgen only parses u first e.g. 4us
Figure I.1 from the CF-Conventions
If you use nothing else, use these!
standard_name
units of the quantity
standard_name, not free to choose!long_name is optional and can be descriptive
“Timezones are hard” - C. Edsall (2023)
units conforming to UDUNITScalendar
netcdf ancilliary {
...
variables:
float u(time, z);
u:standard_name = "wind_speed";
u:units = "m s-1";
u:long_name = "Windspeed measured during radiosone ascent";
u:ancillary_variables = "windspeed_qc";
int u_qc(time, z);
u_qc:standard_name = "quality_flag";
u_qc:long_name = "Windspeed observation quality flag";
data:
u = 12.0, 12.3, 12.2, 1000.0, 12.4 ... ;
u_qc = 1, 1, 1, 0, 1 ... ;
}coordinates attributenetcdf auxiliary {
dimensions
plev = 2 , time = UNLIMITED ;
...
variables:
float u(plev, time) ;
u:coordinates = "sigma z"
...
int plev(plev) ;
plev:standard_name = "model_level_number" ;
plev:units = "1" ;
plev:long_name = "model level from top of atmosphere" ;
plev:positive = "down" ;
float sigma(plev) ;
sigma:standard_name = "atmosphere_sigma_coordinate" ;
plev:units = "1" ;
sigma:positive = "down" ;
float z(plev) ;
z:standard_name = "geopotential_height" ;
z:units = "m" ;
z:positive = "up" ;
...
}But Jack, how did you end up this deep in the netcdf-spec in the first place?
I present to you: Tree Sitter CDL
The result of a past learning and development week:
“how hard could it be to write a tree-sitter grammar?”.
Motivated by the difficulty reading files, as seen in this talk, we can now run ncoldump!
Also usable in text-editors.
Get in touch:
