A small cookiecutter template for projects that depend on
openghg and openghg_inversions.
The directory layout includes places for notebooks and data; this is meant to make it easier to manage experiments with notebooks.
It creates a repo with:
src/for importable Python codenotebooks/for experimentsdata/for local datasets and generated artefactspyproject.tomlconfigured foruv- optional Jupyter, Dask dashboard,
nbdime, andjupytextextras - a helper module for stable paths from notebooks
- a kernel installation script for the project
.venv
When you run the template, it will prompt you for some metadata:
ab12345$ uvx cookiecutter openghg-cookiecutter-template/
Installed 21 packages in 111ms
[1/5] repo_name (my-openghg-project): my-test-project
[2/5] package_name (my_openghg_project): my_test_project
[3/5] project_description (Notebook-first OpenGHG/OpenGHG Inversions project): A test of the template.
[4/5] python_requires (>=3.11):
[5/5] author_name (Your Name): Brendan Murphy
To run the template you have a few options.
uvx cookiecutter gh:openghg/openghg-project-cookiecutterOr you can clone this repo and use:
uvx cookiecutter /path/to/openghg-cookiecutter-templatepipx install cookiecutter
cookiecutter gh:openghg/openghg-project-cookiecutterpython -m pip install cookiecutter
cookiecutter /path/to/openghg-cookiecutter-templateFrom inside your project:
uv sync
uv run python -m ipykernel install \
--user \
--name my-project \
--display-name "Python (my-project)"uv sync installs dependencies and makes your package importable in the
project environment.
Then use your code (in the src directory) in a notebook with:
%load_ext autoreload
%autoreload 2
from my_project.flux import compute_fluxgit init
git add .
git commit -m "Initial commit"
git branch -M main
gh repo create openghg/<repo_name> \
--source=. \
--remote=origin \
--pushUse a different org (or omit the org prefix) if you want a non-OpenGHG target.
If you do not have gh installed:
- Go to https://github.com/new
- Create a repository with the same name
- Then run:
git remote add origin git@github.com:<USER_OR_ORG>/<repo_name>.git
git branch -M main
git push -u origin main- Put reusable code in
src/<package_name>/ - Import it in notebooks
- Avoid copying code between notebooks
from my_project.flux import compute_fluxCommit:
src/notebooks/pyproject.tomlREADME.md
Do NOT commit:
data/*.nc*.zarr.ipynb_checkpoints/
.gitignore already handles this.
Use this pattern when you expect code to outgrow a single notebook and want reusable, testable modules.
How to work
- Put reusable code in
src/<package_name>/ - Keep notebooks focused on orchestration and plotting
- Import from the package rather than copying code
Example structure
src/
my_project/
inversion/
model.py
likelihood.py
io/
loaders.py
diagnostics/
plots.py
notebooks/
00_explore_inputs.ipynb
10_run_inversion.ipynb
Example usage
from my_project.inversion.model import run_inversion
from my_project.io.loaders import load_inputs
inputs = load_inputs(...)
result = run_inversion(inputs)This keeps logic reusable and makes it easier to move stable code into openghg or openghg_inversions.
This is the typical workflow:
- Prototype in a notebook
- Move stable functions into
src/<package_name>/ - Import them back into the notebook
Typical flow
# Prototype in notebook
def compute_flux(...):
...
# Move to src/my_project/flux.py
def compute_flux(...):
...
# Import in notebook
from my_project.flux import compute_flux
There are three common approaches.
Option A: editable install (preferred)
With uv, use:
uv sync
or:
pip install -e .
Use pip install -e . as a fallback for non-uv environments.
Then import normally:
from my_project.flux import compute_flux
Pros:
- Clean imports
- Closer to real package usage
Option B: bootstrap (fallback for HPC / existing kernels)
The preferred approach is to use uv sync (editable install) with autoreload.
Add this near the top of the notebook:
import sys
from pathlib import Path
ROOT = Path().resolve().parents[1]
SRC = ROOT / "src"
if str(SRC) not in sys.path:
sys.path.insert(0, str(SRC))
Then:
from my_project.flux import compute_flux
Pros:
- No environment changes
- Works well on HPC
- No kernel restart required
Option C: dedicated kernel
Create a kernel from the project environment:
uv run python -m ipykernel install \
--user \
--name my-project \
--display-name "Python (my-project)"
Then select it in Jupyter.
Pros:
- Can be used from any Jupyter Lab instance
This is recommended when using editable installs or a dedicated kernel.
To automatically reload code changes:
%load_ext autoreload
%autoreload 2
This reloads imported modules before each execution.
Pitfalls
- Existing objects may still use old definitions → re-run cells
- Class changes can behave inconsistently → recreate objects
- Large changes → restart kernel
%run ../src/my_project/flux.py
Useful for quick iteration, but:
- pollutes the namespace
- not representative of real usage
- harder to refactor later
- Start in a notebook
- Move stable code into
src/ - Import it back into notebooks
- Use
%autoreload 2while iterating - Restart kernel when behaviour becomes unclear
This keeps notebooks flexible while building a reusable codebase.
mkdir -p ~/bin
cd ~/bin
curl -L https://github.com/cli/cli/releases/download/vX.Y.Z/gh_X.Y.Z_linux_amd64.tar.gz | tar xz
cp gh_X.Y.Z_linux_amd64/bin/gh ~/bin/
export PATH="$HOME/bin:$PATH"
gh auth loginReplace X.Y.Z with the version you want to install. Add the PATH export to
your shell config (for example ~/.bashrc) to make it persistent.
For shared setup instructions (uv, gh, HPC workflows), consider maintaining a separate repository such as:
openghg-tools
TODO (@brendan-m-murphy): decide whether to create and maintain openghg-tools
as the shared setup repository.