Skip to content

Python Project

Roman Ludwig edited this page Jun 30, 2025 · 4 revisions

Here we outline how our Python projects are structured, how packages are tested, built, and released.

Project Structure

We follow a modern Python project structure. That means we store metadata and configurations in a pyproject.toml file. The code itself lives in a src/<package-name> directory, while tests are located in tests and documentation in docs. Here is an example of how this minimally looks like:

my_project/
├── docs/
│   ├── conf.py
│   └── index.rst
├── src/
│   └── my_package/
│       ├── __init__.py
│       └── my_module.py
├── tests/
│   ├── __init__.py
│   └── test_my_module.py
├── pyproject.toml
├── README.md
├── LICENSE
|   ...
└── .gitignore

Important

Some repositories might still use a so-called "flat" layout, where the code lives in the root directory of the repository instead of a src directory. This seems simpler, but is not recommended. See this article for more information on why we prefer the src layout.

Build System

In the pyproject.toml file, one also defines how exactly the package should be built. This is done using the build-system table. Here is an example of how this looks like in our case:

[project]
name = "my_package"
...
dynamic = ["version"]   # This tells setuptools to dynamically determine the version

[build-system]
requires = [
    "setuptools >= 64",
    "setuptools_scm",   # Necessary to determine version from git tags
]
build-backend = "setuptools.build_meta"

[tool.setuptools_scm]   # Anything in here defines *how* to get version from git tags
write_to = "src/my_package/_version.py"
local_scheme = "no-local-version"

This configuration specifies that we use setuptools as the build system, and that we want to dynamically determine the version of the package using setuptools_scm, which reads the latest git tag and uses it as the version number. The write_to option specifies where to write the version information, which is useful for importing it in the code.

There are many other build systems available for Python, but on the surface, the user will very likely not notice any difference.

For the actual source control and versioning, refer to the source control page in this wiki.

Dependencies

Every pyproject.toml file also contains the a list of dependencies that are necessary to run/use the package. This should be minimal in the sense that it only contains the dependencies that are absolutely necessary to run the package. Anything necessary to test the code, build the documentation, or is used during development should be listed in the respective optional dependencies.

Here's an example of how this may look like inside a pyproject.toml file:

[project]
name = "my_package"
...
dependencies = [
    "pydantic != 2.2.0",  # Exclude a specific version known to cause problems.
    "numpy >= 2.0.0",     # Lower bounds are OK, if necessary, but no upper bounds!
]

[project.optional-dependencies]
dev = [
    "pre-commit",
    "git-cliff",
]
test = [
    "pytest",
    "pytest-cov",
]
docs = [
    "sphinx",
    "sphinx-book-theme",
]

If it is set up like this, any user can simply use pip install my_package to install the minimal set of dependencies necessary to run the package. However, if you want to work on the code, then you can still do something like uv sync --all-extras and get a complete set of dependencies for development, testing, and documentation.

Warning

Do NOT specify upper bounds for your dependencies! Here's an article why that is bad practice and causes a lot of pain down the line, mainly for users of your package: https://iscinumpy.dev/post/bound-version-constraints/

If anything, you can exclude specific versions of a dependency that are known to cause problems, or set a lower bound.

Testing

We use pytest for testing our code. However, some older tests may still be written using unittest. But the nice thing about pytest is that it can run tests written with unittest as well, so we can simply write new tests using pytest and keep the old unittests around.

Also, we are big fans of doctests! Especially for small and simple functions that can be tested without any fancy setup, these serve a double purpose: they are both documentation and tests at the same time, which falls squarely into the DRY (Don't Repeat Yourself) principle.

The default way to run the doctests would simply be the following command:

pytest --doctest-modules src/my_package

Warning

This will NOT test the installed version of the package, but rather the source code. I think there is no straightforward way to run doctests on the installed version of the package. This is why we typically first run pytest against the installed version and only then run the doctests against the source code.

Documentation

Our published Python packages all use sphinx to automatically compile documentation from the docstrings in the code and publish it on Read the Docs. sphinx is extremely powerful and flexible, allowing us to write docstrings in many different formats. Most commonly we would use reStructuredText or Markdown (but that requires using the MyST extension). It also allows us to reference symbols across the package and even across other packages, which is very useful for our documentation.

Here's a typical example of a good docstring, according to our standards:

def update_and_expand(
    left: pd.DataFrame,
    right: pd.DataFrame,
    **update_kwargs: Any,
) -> pd.DataFrame:
    """Update ``left`` with values from ``right``, also adding columns from ``right``.

    The added feature of this function over pandas' :py:meth:`~pandas.DataFrame.update`
    is that it also adds columns that are present in ``right`` but not in ``left``.

    Any keyword arguments are also directly passed to the
    :py:meth:`~pandas.DataFrame.update`.

    >>> left = pd.DataFrame({"a": [1, 2, None], "b": [3, 4, 5]})
    >>> right = pd.DataFrame({"a": [None, 3, 4], "c": [6, 7, 8]})
    >>> update_and_expand(left, right)
         a  b  c
    0  1.0  3  6
    1  3.0  4  7
    2  4.0  5  8
    """
    result = left.copy()
    result.update(right, **update_kwargs)

    for column in right.columns:
        if column not in result.columns:
            result[column] = right[column]

    return result

Let's break down what's nice about this function and its docstring:

  • The function is well-named and describes what it does.
  • All arguments have pretty clear names and are annotated with types (this helps you, too: Good type hints allow for better auto-completion and suggestions in your IDE).
  • The docstring starts with a short summary of what the function does.
  • A longer explanation details the added functionality compared to the standard pandas.DataFrame.update.
  • It explicitly links to the pandas method using sphinx's cross-referencing syntax.
  • It contains a doctest that shows how the function can be used and what the expected output is.

Note

We do not use the docstring notation where a list of parameters is given, followed by a description of each parameter. When descriptive names and type hints are used, a lot of this is redundant and makes the docstring unnecessarily long. Instead, we try to mention every parameter in the main description of the function. This is, by the way, also how the main Python docs do it.

Below is how the rendered documentation of the above function looks like:

screenshot of the lyDATA package docs

Sphinx Configuration

sphinx is mostly configured using a conf.py file in the docs/source directory. This file may import your package, set some metadata to be displayed and configure some options and extensions of sphinx.

For example, the theme of the rendered documentation can be set using the html_theme variable. For that, we use the nice-looking "sphinx_book_theme", which comes with the sphinx-book-theme package (see above in the optional dependencies).

For the cross-referencing to work, you need to configure sphinx correctly using the docs/source/conf.py file. This is explained in more detail below.:

  1. Add the intersphinx extension, and
  2. set up the intersphinx_mapping dictionary to point to the documentation of the packages you want to reference.

Below is an example of how this can look like in the docs/source/conf.py file:

extensions = [
    "sphinx.ext.autodoc",
    "sphinx.ext.intersphinx",
    ...
]

intersphinx_mapping = {
    "python": ("https://docs.python.org/3.10", None),
    "lymph": ("https://lymph-model.readthedocs.io/latest/", None),
    "pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
    "numpy": ("https://numpy.org/doc/stable/", None),
    "pandera": ("https://pandera.readthedocs.io/en/stable/", None),
    "pydantic": ("https://docs.pydantic.dev/latest/", None),
}

Clone this wiki locally