Software skills to enhance research code.
2024-04-05
To access links or follow on your own device these slides can be found at:
https://jackatkinson.net/slides
All materials are available at:
Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
Research background in fluid mechanics and atmosphere:
Now a Research Software Engineer (RSE) at the Institute of Computing for Climate Science (ICCS) working with various groups and projects.
I have a particular interest in climate model design and parameterisation.
This talk can be summarised as “things I wish I’d known sooner.”
Major Computational Programs
Data processing
Experiment support
Bathymetry by NOAA under public domain
CTD Bottles by WHOI under public domain
Keeling Curve by Scripps under public domain
Climate simulation by Los Alamos National Laboratory under CC BY-NC-ND
Dawn HPC by Joe Bishop with permission
More widely than publishing papers, code is used in control and decision making:
Your code (or its derivatives) may well move from research to operational one day.
Margaret Hamilton and the Apollo XI by NASA under public domain
def calc_p(n,t):
return n*1.380649e-23*t
data = np.genfromtxt("mydata.csv")
p = calc_p(data[0,:],data[1,:]+273.15)
print(np.sum(p)/len(p))
What does this code do?
# Boltzmann Constant and 0 Kelvin
Kb = 1.380649e-23
T0 = 273.15
def calc_pres(n, t):
"""
Calculate pressure using ideal gas law p = nkT
Parameters:
n : array of number densities of molecules [N m-3]
t : array of temperatures in [K]
Returns:
array of pressures [Pa]
"""
return n * Kb * t
# Read in data from file and convert T from [oC] to [K]
data = np.genfromtxt("mydata.csv")
n = data[0, :]
temp = data[1, :] + T0
# Calculate pressure, average, and print
pres = calc_pres(n, temp)
pres_av = np.sum(pres) / len(pres)
print(pres_av)
For more information see the Real Python article on environments.
For those using conda it also has environments, set up in a slightly different way.
Also consider uv.
Scenario: you have just finished some simulations with a climate model that should improve precipitation modelling and have the output data as a netCDF file.
You know that your colleague has produced relevant figures and analysis before, so you ask them for a copy of their code (yay, reuse :+1:).
Go to exercise 1 (exercises/01_base_code/
) and:
precipitation_climatology.py
requirements.txt
file in the root of the repo.Relevant to us today are:
By ensuring code aligns with PEP8 we:
“Readability counts”
- Tim Peters in the Zen of Python
“But I don’t have time to read and memorise all of this…”
Black (Langa 2020) - black.readthedocs.io
def long_func(x, param_one, param_two=[], param_three=24, param_four=None,
param_five="Empty Report", param_six=123456):
val = 12*16 +(24) -10*param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print('You have called long_func.')
print("This function has several params.")
param_2.append(x*val)
return param_2
def long_func(
x,
param_one,
param_two=[],
param_three=24,
param_four=None,
param_five="Empty Report",
param_six=123456,
):
val = 12 * 16 + (24) - 10 * param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print("You have called long_func.")
print("This function has several params.")
param_2.append(x * val)
return param_2
Other notes:
pip install "black[jupyter]"
Go to exercise 2 (exercises/02_formatting/
) and:
precipitation_climatology.py
Static Analysis
There are various tools available:
def long_func(
x,
param_one,
param_two=[],
param_three=24,
param_four=None,
param_five="Empty Report",
param_six=123456,
):
val = 12 * 16 + (24) - 10 * param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print("You have called long_func.")
print("This function has several params.")
param_2.append(x * val)
return param_2
(myvenv) $ pylint long_func.py
************* Module long_func
long_func.py:1:0: C0116: Missing function or method docstring (missing-function-docstring)
long_func.py:1:0: W0102: Dangerous default value [] as argument (dangerous-default-value)
long_func.py:1:0: R0913: Too many arguments (7/5) (too-many-arguments)
long_func.py:24:4: E0602: Undefined variable 'param_2' (undefined-variable)
long_func.py:25:11: E0602: Undefined variable 'param_2' (undefined-variable)
long_func.py:4:4: W0613: Unused argument 'param_two' (unused-argument)
long_func.py:5:4: W0613: Unused argument 'param_three' (unused-argument)
------------------------------------------------------------------
Your code has been rated at 0.00/10
(myvenv) $
def long_func(
x,
param_one,
param_two=[],
param_four=None,
param_five="Empty Report",
param_six=123456,
):
val = 12 * 16 + (24) - 10 * param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print("You have called long_func.")
print("This function has several params.")
param_two.append(x * val)
return param_two
(myvenv) $ pylint long_func.py
************* Module long_func
long_func.py:1:0: C0114: Missing module docstring (missing-module-docstring)
long_func.py:1:0: C0116: Missing function or method docstring (missing-function-docstring)
long_func.py:1:0: W0102: Dangerous default value [] as argument (dangerous-default-value)
long_func.py:1:0: R0913: Too many arguments (6/5) (too-many-arguments)
------------------------------------------------------------------
Your code has been rated at 6.36/10 (previous run: 0.00/10, +6.36)
(myvenv) $
Search the error code to understand the issue:
Other notes:
#pylint: disable=rule-name
Go to exercise 3 (exercises/03_linting/
) and:
precipitation_climatology.py
W0611
Unused imports, C0412
Ungrouped imports, W0102
Dangerous defaultW0621
Redefining name, W1514
Unexplicit openR0913
Too many arguments, C0103
Unconforming naming style.Extensions:
Comments are tricky, and very much to taste.
Some thoughts:1
“Programs must be written for people to read and […] machines to execute.”
- Hal Abelson
“A bad comment is worse than no comment at all.”
“A comment is a lie waiting to happen.”
=> Comments have to be maintained, just like the code, and there is no way to check them!
Cat code comment image by 35_equal_W
Dead code e.g.
Variable definitions e.g.
Redundant comments e.g. i += 1 # Increment i
These are what make your code reusable (by you and others).
"""..."""
.Various formatting options exist: numpy, Google, reST, etc.
We will use numpydoc it is readable and widely used in scientific code.
Full guidance for numpydoc is available.
Key components:
Parameters
).Returns
).Consider also:
Key components:
Parameters
).Returns
).def calculate_gyroradius(mass, v_perp, charge, B, gamma=None):
"""
Calculates the gyroradius of a charged particle in a magnetic field
Parameters
----------
mass : float
The mass of the particle [kg]
v_perp : float
velocity perpendicular to magnetic field [m/s]
charge : float
particle charge [coulombs]
B : float
Magnetic field strength [teslas]
gamma : float, optional
Lorentz factor for relativistic case. default=None for non-relativistic case.
Returns
-------
r_g : float
Gyroradius of particle [m]
Notes
-----
.. [1] Walt, M, "Introduction to Geomagnetically Trapped Radiation,"
Cambridge Atmospheric and Space Science Series, equation (2.4), 2005.
"""
r_g = mass * v_perp / (abs(charge) * B)
if gamma:
r_g = r_g * gamma
return r_g
pydocstyle is a tool we can use to help ensure the quality of our docstrings.1
(myvenv) $ pip install pydocstyle
(myvenv) $ pydocstyle myfile.py
(myvenv) $ pydocstyle mydirectory/
(myvenv) $
(myvenv) $
(myvenv) $ pydocstyle gyroradius.py
gyroradius.py:2 in public function `calculate_gyroradius`:
D202: No blank lines allowed after function docstring (found 1)
gyroradius.py:2 in public function `calculate_gyroradius`:
D400: First line should end with a period (not 'd')
gyroradius.py:2 in public function `calculate_gyroradius`:
D401: First line should be in imperative mood (perhaps 'Calculate', not 'Calculates')
(myvenv) $
Note: pydocstyle does not catch missing variables in docstrings. This can be done with Pylint’s docparams and docstyle extensions but is left as an exercise to the reader.
def calculate_gyroradius(mass, v_perp, charge, B, gamma=None):
"""
Calculates the gyroradius of a charged particle in a magnetic field
Parameters
----------
mass : float
The mass of the particle [kg]
v_perp : float
velocity perpendicular to magnetic field [m/s]
charge : float
particle charge [coulombs]
B : float
Magnetic field strength [teslas]
gamma : float, optional
Lorentz factor for relativistic case. default=None for non-relativistic case.
Returns
-------
r_g : float
Gyroradius of particle [m]
Notes
-----
.. [1] Walt, M, "Introduction to Geomagnetically Trapped Radiation,"
Cambridge Atmospheric and Space Science Series, equation (2.4), 2005.
"""
r_g = mass * v_perp / (abs(charge) * B)
if gamma:
r_g = r_g * gamma
return r_g
Go to exercise 4 (exercises/04_docstrings_and_comments/
) and examine the comments:
Docstrings:
Extensions:
A better way to format strings since Python 3.6
Not catching on because of self-teaching from old code.
Strings are prepended with an f
allowing variables to be used in-place:
name = "electron"
mass = 9.1093837015E-31
# modulo
print("The mass of an %s is %.3e kg." % (name, mass))
# format
print("The mass of an {} is {:.3e} kg.".format(name, mass))
# f-string
print(f"The mass of an {name} is {mass:.3e} kg.")
f-strings can take expressions:
See Real Python for more information.
Numbers in code that are not immediately obvious.
Instead:
numberwang by Mitchell and Webb under fair use
"""Module implementing pendulum equations."""
import numpy as np
def get_period(l):
"""..."""
return 2.0 * np.pi * np.sqrt(l / 9.81)
def max_height(l, theta):
"""..."""
return l * np.cos(theta)
def max_speed(l, theta):
"""..."""
return np.sqrt(2.0 * 9.81 * max_height(l, theta))
def energy(m, l, theta):
"""..."""
return m * 9.81 * max_height(l, theta)
def check_small_angle(theta):
"""..."""
if theta <= np.pi / 1800.0:
return True
return False
def bpm(l):
"""..."""
return 60.0 / get_period(l)
"""Module implementing pendulum equations."""
import numpy as np
GRAV = 9.81
def get_period(l):
"""..."""
return 2.0 * np.pi * np.sqrt(l / GRAV)
def max_height(l, theta):
"""..."""
return l * np.cos(theta)
def max_speed(l, theta):
"""..."""
return np.sqrt(2.0 * GRAV * max_height(l, theta))
def energy(m, l, theta):
"""..."""
return m * GRAV * max_height(l, theta)
def check_small_angle(theta, small_ang=np.pi/1800.0):
"""..."""
if theta <= small_ang:
return True
return False
def bpm(l):
"""..."""
# Divide 60 seconds by period [s] for beats per minute
return 60.0 / get_period(l)
Instead:
{
"config_name": "June 2022 m01 n19 run",
"start_date": "2022-05-28 00:00:00",
"end_date": "2022-06-12 23:59:59",
"satellites": ["m01", "n19"],
"noise_floor": [3.0, 3.0, 3.0],
"check_SNR": true,
"L_lim": [1.5, 8.0],
"telescopes": [90],
"n_bins": 27
}
{'config_name': 'June 2022 m01 n19 run', 'start_date': '2022-05-28 00:00:00', 'end_date': '2022-06-12 23:59:59', 'satellites': ['m01', 'n19'], 'noise_floor': [3.0, 3.0, 3.0], 'check_SNR': True, 'L_lim': [1.5, 8.0], 'telescopes': [90], 'n_bins': 27}
Magic Numbers
f-strings
Configuration settings
"__main__"
.Beyond the scope of today are a few other honourable mentions:
__init__.py
pyproject.toml
These lessons will be added to the course content in future but are beyond the scope of today.
ICCS runs Climate Code Clinics that can be booked by any researcher in climate science or related fields at any time.
Apply online for a 1hr slot where 2 ICCS RSEs will sit down to take a look at your code, answer your questions, and help you improve it.
Recent topics have included:
Get in touch:
The code in this workshop is based on a script from (Irving 2019).
Comments and Docstrings