KIPP NorCal's Python Style Guide

TABLE OF CONTENTS

Introduction
- Breaking Conventions
- Consistency
Styles
Coding Practices

Introduction

There are already many style guides out there for Python, so instead of re-creating the wheel, this document borrows (i.e. copy and paste) heavily from them, with a few tweaks here and there. The styles captured here are to emphasize them because of their usefulness to creating consistent, maintainable, and readable code.

There are also a lot of styles not captured here that are still worth knowing about, and reading the following guides is highly encouraged. The guides listed below are in agreement on many coding conventions, but offer different perspectives on their reasoning for certain styles.

PEP 8 - The official style guid of the Python community
Google Python Style Guide - If it’s good for Google, it should be good enough for us
The Hitchhiker's Guide to Python

Styles

Breaking Conventions

Don't. Some of the above guides (specifically Google's) give leeway to breaking conventions. The most valid reason for this has to deal with backwards compatibility. Backwards compatibility is not something we need to worry to deal with at our organization.

There is a lot of code that existed at KIPP NorCal before this style guide, and so there will be code that doesn't conform this guide. Whenever refactoring this code, we should also work to clean up areas that don't follow convention.

Consistency

There are some conventions that will lay out multiple options for a style convention (although, usually no more than two). Which ever option you choose to use when building new code, use it consistently. Do not switch between them. When working on existing code, stick to the convention that the original author chose to use.

Naming Conventions

We will follow what PEP 8 has laid out for naming conventions. Below is a copy and paste of some key points.

Names to Avoid

Never use the characters ‘l’ (lowercase letter el), ‘O’ (uppercase letter oh), or ‘I’ (uppercase letter eye) as single character variable names.

In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use ‘l’, use ‘L’ instead.

Package and Modules

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

Classes

Class names should normally use the CapWords (aka CamelCase) convention.

There is a note in PEP 8 where "The naming convention for functions may be used instead". Disregard this. We will always use CapWords.

Methods/Functions

Function names should be lowercase, with words separated by underscores.

There is a note in PEP 8 where underscores are used optionally to improve readability. Disregard this. Underscores always improve readability.

Leading and Trailing Underscores

Use single leading underscore names to denote a method or a function as private. Additionally from PEP 8 - "weak “internal use” indicator. E.g. from M import * does not import objects whose name starts with an underscore".

def _single_leading_underscore() -> None:
    """This will not get imported.""" 
    return None

Use single trailing underscore to avoid conflicts with Python keywords:

def single_trailing_underscore_(x: int) -> bool:
    class_ = 5  # Can use with variables, too! 
    if x == class_:
        return True
    else:
        return False

Fucntion/Method Arguements

Always use self for the first argument to instance methods.

Always use cls for the first argument to class methods.

If a function argument’s name clashes with a reserved keyword, it is generally better to append a single trailing underscore rather than use an abbreviation or spelling corruption. Thus class_ is better than clss. (Perhaps better is to avoid such clashes by using a synonym.)

Variables

Variables should follow the naming conventions of methods/functions.

Imports

Follow the guide for imports in PEP 8. In general, here are the key points:

Import Practices

Importing modules or pakages need their own line.

BAD

import os, sys

GOOD

import os
import sys

Importing multiple items that are contained within a module or package on one line is okay.

# Both acceptable
from os import chdir, getcwd 

from os import chdir
from os import getcwd

Import Location and Organization

Imports should always be at the top of a file after any module docstrings and before any module globals or constants are defined.

Imports should be grouped into three groups with a blank line between them. Within the groups, it is recommended to alphabetize the imports by module or package. The three groupings are:

standard library imports
related third party imports
local application/library specific imports

Example:

"""
Module docstring
"""

import datetime
import os

import pandas
import requests

from some_local_module import foo

SOME_CONSTANT = None

Wildcard Imports Using *

Avoid using wildcard imports (*) when importing. This method of import loads everything from that module or package directly into your module's name space. In most cases, this is unnecessary. If you only need one object from a package or module, then explicitly import that object instead of everything with a wildcard. If you do need everything from a package or module, then import the package or module to avoid namespace issues.

EXAMPLE

# instead of this
from os import *

# do this if you need all of the os library
import os

# or this if you need one function
from os import getcwd

Exception to this rule is when using the Django (Galaxy). It is common to import models into views.py using a wildcard, or importing views into urls.py using a wildcard import. This is because of Django's structure, and it is idiomatic for the Django framework.

Boolean and None Type Testing

TODO: Add content here

Type Hinting

Type hinting is a little controversial in the Python community. Many people feel that it goes against dynamic typing, which is one of the features of Python that makes it unique.

For KIPP NorCal, many of our repos are designed for our own business purposes with a very specific set of requirements in mind, and type hinting can be a useful way to document code. Type hints can also speed up code development with auto-completion in IDEs. Type hints can also help catch bugs when used with a linter (such as Pylint or Flake8).

Type Hinting Syntax

The syntax for type hints was defined in PEP 3107. Below are some examples:

def simple_example(a: str) -> None:
    """Takes one string param (a) and returns None"""
    print(a)
    

def foo(a: str, b: bool = False) -> bool:
    """
    Takes a string parameter with a boolean parameter defaulting to False. 
    Returns a boolean.
    """
    if b:
        return True
    else:
        return False
    
    
from typing import Union

def bar(a: Union[int, float]) -> Union[None, int]:
    """
    One parameter (a) which can be an integer or a float.
    Returns either an integer or None.
    """
    if isinstance(a, int):
        return a
    else:
        return None

Comments

The general rule of thumb for good commenting is that your comments should be adding additional context that might not be apparent in the code. Comments should not restate exactly what your code is doing.

BAD

# print results
print(results)

Single Line Comments

TODO: Add content

Docstrings

For docstrings, follow PEP 257.

Here is an example for a single line docstring:

def foo():
    """A single line docstring."""

Here is an example of a multi-line docstrings These are equivalent:

def foo():
    """A multi-line
    docstring.
    """

def bar():
    """
    A multi-line
    docstring.
    """

Exception Handling

Whenever handling exceptions with a try/except block, do not use a bare exception as this can hide bugs in your code. Always capture specific exceptions.

BAD

def raises_an_exception(some_list):
    try:
        return some_list[1000]
    except:     # or except Exception:
        return None

GOOD

def raises_an_exception(some_list):
    try:
        return some_list[1000]
    except IndexError:
        return None

The one exception to this rule is using a bare try/except block within the if __name__ == '__main__'. The bare try/except blocks here capture the error, log it, and send a Slack notification before terminating the code.

Opening Files/Writing to Files

Whenever opening files, always use the with command. This creates a context manager, which reads cleaner and reduces the risk of corrupting your file.

BAD

f = open('file.txt', "w")
f.write("Hello World!")
f.close()

GOOD

with open('file.txt') as f:
    f.write("Hello World!")

Quotes

Python doesn't have a stance on whether single or double quotes are better for strings. Since the vast majority of existing code uses double quotes, lets stick to using double quotes.

Multi-line Signatures and Function Calls

When the signature of a function or a call to a function exceeds the set line length, then separate the signature or call over multiple lines with each parameter getting it's own line.

EXAMPLE

# Pretend these are really long

# Long function signature
def some_long_func(
        a: str, 
        b: str, 
        c: str,
        d: str
) -> None:
    # Do some stuff
    return None

# Long function call
some_long_func(
  "my", 
  "dog", 
  "eats", 
  "rocks"
)

You don't have to only do this when exceeding the line limit. If at any point you feel that breaking these up over multiple lines makes your code more readable, then have at it!

Line Length

TODO: Add content here

Coding Practices

Repo Must-Haves

Logging

Readme

Slack Notifications

Development Workflow

Development should always be done on your local machine to avoid breaking production or losing data. The pipelines on our servers are for production and should always be on the main branch whenever possible.

A high level development workflow example:

Write a tech spec for the product/feature you're building
Prep for development
1. If new work, create a new repo and create a development branch
  - Add branch protections to require reviews on PRs and to block commits to main
2. If refactoring/creating a feature, checkout whichever branch you are planning to develop off of (this should almost always be main), and run git pull to get the most up-to-date code. Then create your development branch
  - Helpful hint is to use a Jira issue ID or a semvar version number in the name. This can help point to documentation for your work incase you forget what the branch was for or incase someone else needs to look at the code.
Write your code. Commit and push often. You'll be happy you did if anything happens to your computer.
Test your code. This can look different from project to project. Whatever you choose to do, make sure you are covering your edge cases and the code is working as expected.
Once done with dev and testing, open a PR in GitHub
Once the PR is approved and merged to main:
1. If new repo:
  - ssh into server
  - run git clone <git repo address> in jobs directory
  - build docker image
  - schedule job to run in crontab
2. If existing repo:
  - ssh into server and navigate to the repo's directory (should be in /home/data_admin/jobs)
  - Make sure repo is on main branch and run git pull
  - Rebuild docker image
  - Check that the new image name matches the image name in the command in crontab, so it will still run as expected. If not, then rename the image or update the command

Tech Specs

New repos or enhancements to code need to be accompanied with a tech spec. Annual code rollover and bug fixes do not require tech specs, however, it's highly encouraged for large bug fixes.

There is a Python/Dev folder on our Data Team drive where tech specs (and other docs) live. There is a template tech spec that can be found here. All of our specs are stored in their respective repo's folder stored here.

The spec template is meant to be flexible. Use what you need and delete what you don't. Feel free to add sections if needed. The purpose of these documents is to capture the why, the how, the expected outcomes and decisions made around new work or an enhancement.

One thing that is required to follow with tech specs is the naming conventions. Naming conventions need to follow: [SemVar number] - [Repo name] - [Title of work]. An example would be 4.0 - Google Accounts - Internal Refactor

A note about Semvar

Semvar stands for semantic versioning. It might seem like overkill, but using semantic versioning when naming our specs will help give a timeline to the specs.

Environments

We use Pipenv to manage our environments in dev and production. We'll give a brief overview below, but Pipenv documentation can be found here if you want to know more.

A little about Pipenv

Pipenv generates two files: Pipfile and Pipfile.lock. Both of these files need to be included in our git repos.

The Pipfile is a file that tracks all of our dependencies for a repo with broad versioning. Most of our dependencies will have an * which indicates that we are using the latest version of a package. Some packages may also indicate that we are only using a version before/after some specific version.

The Pipfile.lock file is a file that tracks the specific packages that are installed in the environment (based on what is in the Pipfile). The Pipfile.lock file is not meant to be edited.

How we use Pipenv

It is recommended by Pipenv and others to always set up you environment by installing from the Pipfile.lock file. We don't do it this way, so forget what you just read.

We always build our docker images by installing Pipenv and then running pipenv intall --skip-lock. This command will create a virtual environment with the most up-to-date versions of a repo's dependencies allowed by the Pipfile. The benefit to this is that it ensures our repos are operating on the most up-to-date code as possible. The downside is that sometimes a new release of a package might not be compatible with your code or other dependencies in your repo. This is where the Pipfile.lock comes in handy.

The Pipfile.lock file is our plan to handle any dependency issues. While developing, keep your Pipfile.lock file up to date by running pipenv update regularly. If you hit an issue, you can fall back to the Pipfile.lock on main in Github. When you finish developing, make sure your most up to date Pipfile.lock is included in your PR so we are able to recreate the last known stable environment for the repo.

Repo Conventions

Below is a guide as to what general practices we follow with structure and naming conventions of our repos. This section is not meant to be mandatory as the needs of each repo's structure is different depending on the complexity of the code. It is strongly encouraged to follow the below convention if your code begins to get complex. The benefit of following the conventions below is that others will be able to understand your code and its intended purpose just by the structure and names of packages.

Repository Layouts

Flat Layout

A flat layout is where all of the files of the repo are in the root directory. This is recommended for smaller projects with few files and little code where there isn't any real benefit to structuring the code.

src Layout

This is a specific layout where none of your code is in the root directory. Instead, the code is inside a src directory which contains a package where your code lives. The name of the package in the src directory should be the same as the name for your repo.

The biggest benefit to this layout is that it allows you to install your code as a package that is an editable install. This type of install is benificial for testing, and you can create a tests package in your root directory along side the src directory. More information can be found here. # TODO: Add link

Hybrid

A hybrid layout is the inbetween of the flat layout and the src layout. This is for projects that are complex enough where separating your code into packages will help with the organization, but creating a full src layout might be too much. With this layout, usually a main.py file will be in the root directory and the rest of the code is in packages that also live in the root directory along with main.py.

Common Package Names in Repositories

Here are common names that you may find among our repos and an explanation of what they are and what their purpose is. As mentioned in the intro of the section, every repository does not need to have these packages unless there is a need for it. If you find yourself needing to build separate workflows, then create a workflows package to store them in.

Models/Entities

These packages have classes that are abstract representations of a concept and manages a state (ex. a class that represents an employee) or value (ex. a class that manages a queue).

Repos or Repositories

Not to be confused with git repos, these packages will contain code that implements the repository design pattern. The classes that implement this pattern usually wraps a data resource (ex. external API or data warehouse connection) with common CRUD operations.

Services or Formatters

These packages have two uses.

One use is that they sit between a repository object and the business logic and perform some transformation to data. Sometimes they get data from a repository object and prepare it for use by a workflow, or take data from a workflow and shape it for insertion into a repository.

Another use is where they perform a common operation that is shared across multiple workflows.

Sessions

Sessions are objects that manage meta data around a job that is running. They might keep track or runtime arguments or any other data that the job might rely on.

Utils

This package is for ancillary parts of the code. Run time args, exceptions, data maps, or helper functions that might be used across packages.

Workflows

Workflows are the business logic. Some code bases may only have one job to perform and a workflow package might not be needed, others might have multiple workflows where each one handles a different edge case. Workflows are typically built using services and don't usually work directly with repositories.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

KIPP NorCal's Python Style Guide

Introduction

Styles

Breaking Conventions

Consistency

Naming Conventions

Names to Avoid

Package and Modules

Classes

Methods/Functions

Leading and Trailing Underscores

Fucntion/Method Arguements

Variables

Imports

Import Practices

Import Location and Organization

Wildcard Imports Using *

Boolean and None Type Testing

Type Hinting

Type Hinting Syntax

Comments

Single Line Comments

Docstrings

Exception Handling

Opening Files/Writing to Files

Quotes

Multi-line Signatures and Function Calls

Line Length

Coding Practices

Repo Must-Haves

Logging

Readme

Slack Notifications

Development Workflow

Tech Specs

A note about Semvar

Environments

A little about Pipenv

How we use Pipenv

Repo Conventions

Repository Layouts

Flat Layout

src Layout

Hybrid

Common Package Names in Repositories

Models/Entities

Repos or Repositories

Services or Formatters

Sessions

Utils

Workflows

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages