Someone else is probably wondering the same thing.
I will make mistakes.
Not all of them will be intentional.
whoami
Research background in fluid mechanics and atmosphere:
Numerics and fluid mechanics in Engineering,
Cloud microphysics & volcanic plumes in Geography,
Radiation belts and satellite data at BAS.
Now a Research Software Engineer (RSE) at the Institute of Computing for Climate Science (ICCS) working with various groups and projects.
I have a particular interest in climate model design and parameterisation.
This talk can be summarised as “things I wish I’d known sooner.”
What is Research Software?
Major Computational Programs
Data processing
Experiment support
Bathymetry by NOAA under public domain
CTD Bottles by WHOI under public domain
Keeling Curve by Scripps under public domain
Climate simulation by Los Alamos National Laboratory under CC BY-NC-ND
Dawn HPC by Joe Bishop with permission
Why does this matter?
Why does this matter?
More widely than publishing papers, code is used in control and decision making:
Weather forecasting
Climate policy
Disease modelling (e.g. Covid)
Satellites and spacecraft1
Medical Equipment
Your code (or its derivatives) may well move from research to operational one day.
Margaret Hamilton and the Apollo XI by NASA under public domain
# Boltzmann Constant and 0 KelvinKb =1.380649e-23T0 =273.15def calc_pres(n, t):""" Calculate pressure using ideal gas law p = nkT Parameters: n : array of number densities of molecules [N m-3] t : array of temperatures in [K] Returns: array of pressures [Pa] """return n * Kb * t# Read in data from file and convert T from [oC] to [K]data = np.genfromtxt("mydata.csv")n = data[0, :]temp = data[1, :] + T0# Calculate pressure, average, and printpres = calc_pres(n, temp)pres_av = np.sum(pres) /len(pres)print(pres_av)
git 101
What is git
git is a version control system developed by Linus Torvalds1.
It tracks changes made to files over time.
Installation and setup
Git comes preinstalled on most Linux distributions and macOS.
You can check it is on your system by running which git.
If you are on Windows, or do not have git, check the git docs1 or the GitHub guide to installing git. https://github.com/git-guides/install-git
Setting up a new git repository is beyond the scope of this talk but involves using the git --init command.
We will assume that you have created a repository using an online hosting service (GitLab, GitHub etc.) that provides a nice UI wrapper around the process.
How does it work?
A mental model:
Each time you commit work git stores it as a diff.
This shows specific lines of a file and how they changed (+/-).
This is what you see with the git diff command.
diffs are stored in a tree.
By applying each diff one at a time we can reconstruct files.
We do not need to do this in order
see cherry-picking and merge conflicts…
$ git clone git@github.com:jatkinson1000/git-for-science.github git4sciCloning into 'git4sci'...remote: Enumerating objects: 42, done.remote: Counting objects: 100% (42/42), done.remote: Compressing objects: 100% (39/39), done.remote: Total 42 (delta 26), reused 31 (delta 15), pack-reused 0Receiving objects: 100% (42/42), 69.62 MiB | 5.64 MiB/s, done.Resolving deltas: 100% (26/26), done.$$ cd git4sci/$$ echo "This is a new file." > newfile.txt$
The basic commands
git clone <repo> [<dir>]
Clone a repository into a new directory
git status
Check the state of the directory
git add <filepath>
Update the index with any changes
$ git statusOn branch mainYour branch is up to date with 'origin/main'.Untracked files: (use "git add <file>..." to include in what will be committed) newfile.txtno changes added to commit (use "git add" and/or "git commit -a")$$ git add newfile.txt$$ git statusOn branch mainYour branch is up to date with 'origin/main'.Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: newfile.txt$
The basic commands
git clone <repo> [<dir>]
Clone a repository into a new directory
git status
Check the state of the directory
git add <filepath>
Update the index with any changes
git commit
git commit -m <message>
Commit to record changes in the index
$ git statusOn branch mainYour branch is up to date with 'origin/main'.Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: newfile.txt$$ git commit -m "Add newfile with placeholder text." 1 file changed, 1 insertion(+) create mode 100644 newfile.txt$$ git statusOn branch mainYour branch is ahead of 'origin/main' by 1 commit. (use "git push" to publish your local commits)no changes added to commit (use "git add" and/or "git commit -a")$
The basic commands
git clone <repo> [<dir>]
Clone a repository into a new directory
git status
Check the state of the directory
git add <filepath>
Update the index with any changes
git commit
git commit -m <message>
Commit to record changes in the index
git push <remote> <branch>
Send your locally committed changes to the remote repo
$ git statusOn branch mainYour branch is ahead of 'origin/main' by 1 commit. (use "git push" to publish your local commits)no changes added to commit (use "git add" and/or "git commit -a")$$ git push origin mainEnumerating objects: 3, done.Counting objects: 100% (3/3), done.Delta compression using up to 8 threadsCompressing objects: 100% (8/8), done.Writing objects: 100% (8/8), 1.89 KiB | 1.89 MiB/s, done.Total 8 (delta 7), reused 0 (delta 0), pack-reused 0remote: Resolving deltas: 100% (7/7), completed with 7 local objects.remote:To github.com:jatkinson1000/git-for-science.git 7647d3a..7ab12ff main -> main$
You are doing some work on pendula and your colleague says they have written some code that solves the equations and they can share with you.
This is made easy by the fact that it is on git!
Let’s see how we get on…
NOTE: This is part of GitHub/GitLab, NOT the git repository.
They will not be kept if you move the project elsewhere and do not appear on your local system.
Exercise - Issues
It would be nice if we could add functions to calculate pendulum length from desired period and energy.
If part of the workshop, open an issue for these on the the repository.
Branches
So far we have been using origin main in everything we do.