2024-02-27
To access links or follow on your own device these slides can be found at:
https://jackatkinson.net/slides
All materials are available at:
Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
Research background in fluid mechanics and atmosphere:
Now a Research Software Engineer (RSE) at the Institute of Computing for Climate Science (ICCS) working with various groups and projects.
I have a particular interest in climate model design and parameterisation.
This talk can be summarised as “things I wish I’d known sooner.”
Major Computational Programs
Data processing
Experiment support
Bathymetry by NOAA under public domain
CTD Bottles by WHOI under public domain
Keeling Curve by Scripps under public domain
Climate simulation by Los Alamos National Laboratory under CC BY-NC-ND
Dawn HPC by Joe Bishop with permission
More widely than publishing papers, code is used in control and decision making:
Your code (or its derivatives) may well move from research to operational one day.
Margaret Hamilton and the Apollo XI by NASA under public domain
def calc_p(n,t):
return n*1.380649e-23*t
data = np.genfromtxt("mydata.csv")
p = calc_p(data[0,:],data[1,:]+273.15)
print(np.sum(p)/len(p))
What does this code do?
# Boltzmann Constant and 0 Kelvin
Kb = 1.380649e-23
T0 = 273.15
def calc_pres(n, t):
"""
Calculate pressure using ideal gas law p = nkT
Parameters:
n : array of number densities of molecules [N m-3]
t : array of temperatures in [K]
Returns:
array of pressures [Pa]
"""
return n * Kb * t
# Read in data from file and convert T from [oC] to [K]
data = np.genfromtxt("mydata.csv")
n = data[0, :]
temp = data[1, :] + T0
# Calculate pressure, average, and print
pres = calc_pres(n, temp)
pres_av = np.sum(pres) / len(pres)
print(pres_av)
git is a version control system developed by Linus Torvalds1.
It tracks changes made to files over time.
Git comes preinstalled on most Linux distributions and macOS.
You can check it is on your system by running which git
.
If you are on Windows, or do not have git, check the git docs1 or the GitHub guide to installing git. https://github.com/git-guides/install-git
Setting up a new git repository is beyond the scope of this talk but involves using the
git --init
command.
We will assume that you have created a repository using an online hosting service (GitLab, GitHub etc.) that provides a nice UI wrapper around the process.
A mental model:
commit
work git stores it as a diff
.
+
/-
).git diff
command.diff
s are stored in a tree
.
diff
one at a time we can reconstruct files.diff --git a/mycode/functions.py b/mycode/functions.py
index b784b07..d08024a 100644
--- a/mycode/functions.py
+++ b/mycode/functions.py
@@ -340,11 +341,10 @@ def rootfind_score(
fpre = fcur
if abs(scur) > delta:
xcur += scur
+ elif sbis > 0:
+ xcur += delta
else:
- if sbis > 0:
- xcur += delta
- else:
- xcur -= delta
+ xcur -= delta
fcur = f_root(xcur, score, rnd)
val = xcur
Actually:
commit
work git creates a snapshot
tree
.tree
is a list of files in the repo at this commit.
tree
of tree
s for efficiency!pack
ed files at time of commit.pack
ed files are efficiently compressed.
delta
s which are a bit like diff
s.pack
ing we can reconstruct the repo at a state in time given by the commit hash.Evans (2024)
git clone <repo> [<dir>]
$ git clone git@github.com:jatkinson1000/git-for-science.github git4sci
Cloning into 'git4sci'...
remote: Enumerating objects: 42, done.
remote: Counting objects: 100% (42/42), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 42 (delta 26), reused 31 (delta 15), pack-reused 0
Receiving objects: 100% (42/42), 69.62 MiB | 5.64 MiB/s, done.
Resolving deltas: 100% (26/26), done.
$
$ cd git4sci/
$
$ echo "This is a new file." > newfile.txt
$
git clone <repo> [<dir>]
git status
git add <filepath>
$ git status
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
newfile.txt
no changes added to commit (use "git add" and/or "git commit -a")
$
$ git add newfile.txt
$
$ git status
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: newfile.txt
$
git clone <repo> [<dir>]
git status
git add <filepath>
git commit
git commit -m <message>
$ git status
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: newfile.txt
$
$ git commit -m "Add newfile with placeholder text."
1 file changed, 1 insertion(+)
create mode 100644 newfile.txt
$
$ git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
no changes added to commit (use "git add" and/or "git commit -a")
$
git clone <repo> [<dir>]
git status
git add <filepath>
git commit
git commit -m <message>
git push <remote> <branch>
$ git status
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
no changes added to commit (use "git add" and/or "git commit -a")
$
$ git push origin main
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 8 threads
Compressing objects: 100% (8/8), done.
Writing objects: 100% (8/8), 1.89 KiB | 1.89 MiB/s, done.
Total 8 (delta 7), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (7/7), completed with 7 local objects.
remote:
To github.com:jatkinson1000/git-for-science.git
7647d3a..7ab12ff main -> main
$
Locations
These (and more) can be explored in Andrew Peterson’s Interactive git cheat sheet
You are doing some work on pendula and your colleague says they have written some code that solves the equations and they can share with you.
This is made easy by the fact that it is on git!
Let’s see how we get on…
Go to the workshop repository:
If you have an account then fork
the repository and clone your fork.
If you do not have an account clone
my repository.
Take a look around, how useful is this?1
README.md
makeareadme.com and readme.so are great tools to help.
Add as soon as you can in a project and update as you go along.
How can we improve the README in the pyndulum code?
All public codes should have a license attached!
LICENSE
file in the main directoryThe right selection may depend on your organisation and/or funder.
See choosealicense.com for more information.
GitHub and GitLab contain helpers to create popular licenses.
Add a license to our pyndulum code.
If you use a helper feature to do this online, don’t forget to
git pull <repo> <branch>
to get this locally before you make further changes.
It is a good idea to add a .gitignore
file to your projects.
.DS_store
mycode.o
or mymodule.mod
etc.50_year_run.nc
or my_thesis.pdf
etc.Again, GitHub and GitLab contain helpers and templates to create .gitignore.
Add a .gitignore to the pyndulum code?
Again, if you use a helper feature to do this online, don’t forget to git pull
.
Both GitHub and GitLab have methods for tacking issues.
These are useful for keeping track of work.
Issues should be opened on the main project, not individual forks.
Example issue log on GitHub: jatkinson1000/archeryutils
NOTE: This is part of GitHub/GitLab, NOT the git repository.
They will not be kept if you move the project elsewhere and do not appear on your local system.
It would be nice if we could add functions to calculate pendulum length from desired period and energy.
If part of the workshop, open an issue for these on the the repository.
So far we have been using origin main
in everything we do.
origin
is the location of our online repository
So far our commits look something like this:
But what if:
Branches help with all of the aforementioned situations, but are a sensible way to organise your work even if you are the only contributor.
My advice is that all development is done in branches that are merged into main when completed:
git branch <branchname>
git checkout <branchname>
git merge <branchname>
This comes into its own when working concurrently on different features.
git is not just about backups – it is about project organisation.
This way danger and obscurity lies:
This is manageable and understandable:
The examples so far have been quite simple, but this gives a good audiovisual example of the power of branches:
We want to add a functions to calculate pendulum length from desired period and energy.
Create a branch and add the new length equation to
pyndulum/pendulum_equations.py
.
Return to main
and create another new branch to add the energy calculation.
Commit your work but don’t merge it.
Instead push it up to a remote feature branch.
When opening a request you should include:
From the branches you pushed up in the previous exercise open a pull request either:
Code review is not:
Code review is:
Again, GitHub and GitLab have nice infrastructure to make this an effective and visual process.
Anyone can conduct a code review on a public repository.
If working alone ask colleagues for help and return the favour.
Do:
Do not:
We will work through the two pull requests we opened.
If anyone here has opened a public one we will look at that, otherwise we will review my own code.
ICCS runs Climate Code Clinics that can be booked by any researcher in climate science or related fields at any time.
Apply online for a 1hr slot where 2 ICCS RSEs will sit down to take a look at your code, answer your questions, and help you improve it.
Recent topics have included:
Get in touch: