Move to GitHub Actions for deployments; other deployment updates/cleanup by GUI · Pull Request #714 · NatLabRockies/REopt_API

GUI · 2026-04-23T20:41:42Z

Overview of changes

This revamps the deployment process for the REopt API to use GitHub Actions instead of Jenkins. Tied in with this change are various other improvements we've found that help make the deployment process easier to run and maintain. This also includes behind-the-scenes updates to ensure this API will continue to function after the nlr.gov domain switch occurs.

The main caveat is that this relies on various GitHub Actions that are only available on our internal GitHub Enterprise server. So while ideally, this would run out here on the public GitHub repo, for now, this will only run on an internal mirror of this repo that exists on the GitHub Enterprise server. This may mean that in some cases you'll have 2 PRs for work (one on the public repo for review, and the other internally for deployment purposes), but it's hopefully no worse than the previous split with manual things happening in the internal Jenkins server.

Longer term, we could maybe revisit some of this to get the full deployment process running on the public GitHub.com repo, but for now, hopefully this is at least a stepping stone that will still make this easier to maintain and use.

Impact for admins/deployers

Staging branched deploys: For the REopt team that performs deployments, the main difference is that to deploy a branch to staging, you'll need to go to the GitHub Enterprise mirror of this repo and open a PR on that mirror with the deploy label. Essentially, PRs with this deploy label become the mechanism for deploying branches. As long as that PR is open on the Enterprise mirror and with that deploy label is assigned, then subsequent commits to the branch will automatically be deployed.

When the deploy finishes, you'll find a "View deployment" button in the mirrored PR that provides a link to the staging URL of the API.
master deploys: For master deploys, those should mostly work the same as before, where changes that land on master will automatically get deployed to the primary staging and then production locations.
Potential breaking change for "development" deploys: I have not configured deployments to the separate "development" environment and URL. Are these still used or needed? Since they were being deployed to the same Kubernetes cluster as the staging deploys, I wasn't really sure how they differed from staging deployments, but happy to discuss more if you all still relied on these for your workflow.
Mirroring delay: For the mirrored repo on GitHub Enterprise, it should hopefully pick up any changes pushed to the public GitHub repo within 15-30 seconds. That's likely quicker than the old Jenkins setup automatically picked up changes, but just be aware of that small delay.

Impact for developers

keys.py changed to .env replacement: The gitignored keys.py file has been replaced by using a .env local file instead. So for people doing local development, if they had a custom API key in their local keys.py, it will need to be translated into a .env file instead.

Similarly, if developers had other custom settings in their local keys.py (like for custom database connections, etc), those will also need to be migrated to .env. I got the sense that most developers would have just customized the API key, but if that's wrong and people have more extensive local customizations to their settings, happy to chat more about migration paths.

Alternatively, if this change is too disruptive for local development purposes, we can roll this back. The reason I made this switch was to try and cleanup several layers of different configuration and secrets management that had built up over the years (all the way back to pre-Kubernetes days). This tries to hew closer to the more Docker/Kubernetes/12-factor-app native approach of using environment variables for configuration things.

I tried to minimize overall code changes with this switch, while also preveniting git merge issues for existing developers. I moved the logic from keys.py (and the template) into a keys_env.py that reads these values from environment variables so that it can be checked in (I opted to rename the file to prevent conflicts with people's already gitignored keys.py files). I then configured the docker-compose.yml to read environment variables that can be defined in a git-ignored .env file. I supplied a .env.example file for reference purposes. But otherwise, I tried to leave all of the variable names and other things defined in keys.py mostly alone.

The current .env files will only work for Docker Compose users, so if folks are doing local development outside of Docker Compose, then there are other ways we could integrate these types of .env files (like through python-dotenv). However, I wasn't sure if anyone was doing development outside of Docker compose, so I didn't hook that up.

Miscellaneous questions/details

Consolidated to single reopt_api/settings.py config file: I got rid of the largely duplicative dev_settings.py, production_settings.py, and staging_settings.py files and tried to consolidate all of this to a single settings.py file. I believe each environment should behave the same as before; this just consolidates the duplicative settings into the single file and tweaks settings as needed for different environments. This helped align with the shift to environment variables for configuring more things and hopefully makes things easier to maintain.
Removal of nginx vestiges: As far as I could tell, the local nginx container and configuration were no longer being used, so I removed those bits.
Removal of c110p/VM deployments: I removed various defunct deployment pieces related to deploying to old VMs that predate the current Kubernetes deployment being used. I'm pretty sure those servers had been retired and were no longer being used.
Fix staging django memory limits: This was an existing issue, but it looked like in the staging environment, the Django server was constantly bumping into the Kubernetes memory limits and restarting processes. I've increased the staging memory limits to prevent this churn. This was not an issue in production which already had higher memory limits.
Re-enabled some health checks on Kubernetes deployments: Our new deployment setup assumes all Kubernetes resources have health checks defined so Kubernetes can better detect when apps are ready or healthy. The old Kubernetes config files had some of these health checks commented out, so they weren't being used. Since our new deployments assume there will always be health checks, I've reenabled and tweaked some of these to get them working again. So hopefully this change shouldn't really matter (and if anything, help with stability), but just mentioning it in case the health checks prove problematic. But in that case, we can certainly tweak the health checks to allow for longer delays, etc.
GitHub.com versus GitHub Enterprise action workflows: The new deployment workflows are only intended to run on GitHub Enterprise, while the existing test workflows are only intended to run on GitHub.com. I've explicitly disabled the public workflows from running on the Enterprise server and vice versa. But just be aware that if you introduce new GitHub Action workflows in the future, they may need to be disabled in one environment or the other, or else the actions will probably just spin and never run.
Optimized Dockerfile for build caching: I tweaked the order files were copied into the Django docker image and adjusted the .dockerignore file to help prevent unnecessary builds or work when things haven't actually changed.

Testing

I've locally pointed to this branch deployed to staging and got the web app's example site to come back with results. So I think at least the basics are working, but not sure if there are additional things that would be good to test.

- Remove old deployment files and dependencies. Some of these were even from older setups no longer used. - Try to begin work on cleaning up configuration approach to make things easier to configure via environment variables. The previous approach had several layers of legacy approaches that had built up over time. There was also a lot of duplicate config where there were subtle drifts in configuration that this attempts to de-duplicate and unify. None of this is actually hooked up yet, this is just the beginnings of trying to work on new deployments.

Fixes various syntax errors and missing imports. Also renames `keys.py` to `keys_env.py` just so we don't end up conflicting with people's existing `keys.py` that was gitinored previously. Also sets up docker to use optional `.env` files which should simplify the setup instructions and migration for existing users with custom keys.py files.

This leverages GitHub Actions, along with other changes we've adopted, like Vault for secrets, Pkl for Kubernetes configuration management, and Carvel for deployments. This also removes some areas that I believe are now unused, like the nginx proxy layer, since it doesn't seem like that was previously being used in production any longer.

Since these values were output in the metadata's own output, GitHub Actions would fail, since it would prevent the metadata's output from occurring since it thought it contained masked secrets.

This appears to be an existing staging issue, but it looks like django keeps bumping into this limit and restarting workers inside the container.

This brings back the restart job that used to be implemented in Jenkins.

GUI added 18 commits April 20, 2026 20:08

Try workaround for extracting non-sensitive secrets.

cc173d6

Since these values were output in the metadata's own output, GitHub Actions would fail, since it would prevent the metadata's output from occurring since it thought it contained masked secrets.

Try different approach to extracting non-sensitive secrets.

edcf190

Fix staging config rendering.

4726ab9

Fix image name during build

35135ab

Update vault kubeconfig paths for reopt api cluster.

1b27d5e

Improve docker cacheablity of build process.

2218e0f

Fix julia deployment using incorrect container.

29f52d6

Fix service URLs

5c53ff5

Additional dockerignore rules for better build caching.

f379f43

Fix missing URL hosts for deployments.

2e5dbcc

Use new settings to deduplicate some things.

111b9a1

Increase staging django memory limit.

a5c306d

This appears to be an existing staging issue, but it looks like django keeps bumping into this limit and restarting workers inside the container.

Tweaks to use stable versions of dependencies we've merged updates to.

e355daa

Provide .env example file and update instructions

b3095ed

Setup restart celery/julia restart job in GitHub Actions.

4e231cc

This brings back the restart job that used to be implemented in Jenkins.

GUI requested a review from Bill-Becker April 23, 2026 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to GitHub Actions for deployments; other deployment updates/cleanup#714

Move to GitHub Actions for deployments; other deployment updates/cleanup#714
GUI wants to merge 18 commits intomasterfrom
deploy-updates

GUI commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GUI commented Apr 23, 2026

Overview of changes

Impact for admins/deployers

Impact for developers

Miscellaneous questions/details

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant