A template for building skids that use palletjack to update AGOL data from a tabular data source and that are run as Google Cloud Run Jobs
For an example of a working skid, see nfhl-skid or uocc-skid.
The first step in creating a skid is to create a new repository in GitHub that uses this repo as a template. You'll then clone the repo to your computer and start developing.
See this answer on our Stack Overflow for GitHub repo creation via Terraform (the preferred way).
You'll need to do a few steps to set up your environment to develop a skid. You may also want to make sure you've got the skid working locally before adding the complexity of the cloud function.
This all presumes you're working in Visual Studio Code.
- Create new environment for the project and install Python
conda create --name PROJECT_NAME python=3.13conda activate PROJECT_NAME
- Open the repo folder in VS Code
- Rename
src/skidnamefolder to your desired skid name - Edit the
setup.py:name, url, description, keywords, and entry_pointsto reflect your new skid name - Edit the
test_skidname.pyto match your skid name.- You will have one
test_filename.pyfile for each program file in yoursrcdirectory and you will write tests for the specific file in thetest_filename.pyfile
- You will have one
- Reset the version to 1.0.0 in
version.py - Install the skid in your conda environment as an editable package for development
- This will install all the normal and development dependencies (palletjack, supervisor, etc)
cd c:\path\to\repopip install -e .[tests]- add any additional project requirements to the
setup.py:install_requireslist
- Set config variables and secrets
secrets.jsonholds passwords, secret keys, etc, and will not (and should not) be tracked in gitconfig.pyholds all the other configuration variables that can be publicly exposed in git- Copy
secrets_template.jsontosecrets.jsonand change/add whatever values are needed for your skid - Change/add variables in
config.pyas needed
- Write your skid-specific code inside
process()inmain.py- If it makes your code cleaner, you can write other methods and call them within
process() - Any
print()statements should instead usemodule_logger.info/debug. The loggers set up in the_initialize()method will write to both standard out (the terminal) and to a log file. - Add any captured statistics (number of rows updated, etc) to the
summary_rowslist near the end ofprocess()to add them to the email message summary (the log file is already included as an attachment)
- If it makes your code cleaner, you can write other methods and call them within
- Run the tests in VS Code
- Testing -> Run Tests
Skids run as Cloud Run Jobs triggered by Cloud Scheduler on a regular schedule.
Cloud Run jobs run in a Docker container built using the project's Dockerfile. The GitHub deploy action in .github/actions/deploy/action.yml handles creating the container, setting the container's resource limits, and setting up Cloud Scheduler. It uses the following GitHub secrets to do this, which should all be set as part of the Terraform repo creation:
- Identity provider
- GCP service account email
- Project ID
- Storage Bucket ID
Work with the GCP maestros to set up a Google project via terraform. They can use the uocc configuration as a starting point. Skids use some or all of the following GCP resources:
- Cloud Functions (executes the python)
- Cloud Storage (writing the data files and log files for mid-term retention)
- Set a data retention policy on the storage bucket for file rotation (90 days is good for a weekly process)
- Cloud Scheduler (sends a notification to a pub/sub topic)
- Cloud Pub/Sub (creates a topic that links Scheduler and the cloud function)
- Secret Manager
- A
secrets.jsonwith the requisite login info - A
known_hostsfile (for loading from sftp) or a service account private key file (for loading from Google Sheets)
- A
Because the Docker container is just pip installing your module and running the entry point defined in setup.py, you can generally run your code locally by doing the same (it should already be installed in your conda environment in the development steps listed above). You can run it via VS Code's debugger as well running it as a module. A .main entry point is predefined in .vscode/launch.json (be sure to update skidname to match the folder name under /src).
To test it in the Docker container's environment, you can run use the Dockerfile to create a container and run it locally using a tool like Podman.
Skids use GCP Secrets Manager to make secrets available to the function. They are mounted as local files with a specified mounting directory (/secrets). In this mounting scheme, a folder can only hold a single secret, so multiple secrets are handled via nesting folders (ie, /secrets/app and secrets/ftp). These mount points are specified in the GitHub CI action workflow.
The secrets.json folder holds all the login info, etc. A template is available in the repo's root directory. This is read into a dictionary with the json package via the _get_secrets() function. Other files (known_hosts, service account keys) can be handled in a similar manner or just have their path available for direct access.
A separate config.py module holds non-secret configuration values. These are accessed by importing the module and accessing them directly.
This project was developed with the assistance of GitHub Copilot.