Skip to content

Move to GitHub Actions for deployments; other deployment updates/cleanup#714

Open
GUI wants to merge 18 commits intomasterfrom
deploy-updates
Open

Move to GitHub Actions for deployments; other deployment updates/cleanup#714
GUI wants to merge 18 commits intomasterfrom
deploy-updates

Conversation

@GUI
Copy link
Copy Markdown
Member

@GUI GUI commented Apr 23, 2026

Overview of changes

This revamps the deployment process for the REopt API to use GitHub Actions instead of Jenkins. Tied in with this change are various other improvements we've found that help make the deployment process easier to run and maintain. This also includes behind-the-scenes updates to ensure this API will continue to function after the nlr.gov domain switch occurs.

The main caveat is that this relies on various GitHub Actions that are only available on our internal GitHub Enterprise server. So while ideally, this would run out here on the public GitHub repo, for now, this will only run on an internal mirror of this repo that exists on the GitHub Enterprise server. This may mean that in some cases you'll have 2 PRs for work (one on the public repo for review, and the other internally for deployment purposes), but it's hopefully no worse than the previous split with manual things happening in the internal Jenkins server.

Longer term, we could maybe revisit some of this to get the full deployment process running on the public GitHub.com repo, but for now, hopefully this is at least a stepping stone that will still make this easier to maintain and use.

Impact for admins/deployers

  • Staging branched deploys: For the REopt team that performs deployments, the main difference is that to deploy a branch to staging, you'll need to go to the GitHub Enterprise mirror of this repo and open a PR on that mirror with the deploy label. Essentially, PRs with this deploy label become the mechanism for deploying branches. As long as that PR is open on the Enterprise mirror and with that deploy label is assigned, then subsequent commits to the branch will automatically be deployed.

    When the deploy finishes, you'll find a "View deployment" button in the mirrored PR that provides a link to the staging URL of the API.

  • master deploys: For master deploys, those should mostly work the same as before, where changes that land on master will automatically get deployed to the primary staging and then production locations.

  • Potential breaking change for "development" deploys: I have not configured deployments to the separate "development" environment and URL. Are these still used or needed? Since they were being deployed to the same Kubernetes cluster as the staging deploys, I wasn't really sure how they differed from staging deployments, but happy to discuss more if you all still relied on these for your workflow.

  • Mirroring delay: For the mirrored repo on GitHub Enterprise, it should hopefully pick up any changes pushed to the public GitHub repo within 15-30 seconds. That's likely quicker than the old Jenkins setup automatically picked up changes, but just be aware of that small delay.

Impact for developers

  • keys.py changed to .env replacement: The gitignored keys.py file has been replaced by using a .env local file instead. So for people doing local development, if they had a custom API key in their local keys.py, it will need to be translated into a .env file instead.

    Similarly, if developers had other custom settings in their local keys.py (like for custom database connections, etc), those will also need to be migrated to .env. I got the sense that most developers would have just customized the API key, but if that's wrong and people have more extensive local customizations to their settings, happy to chat more about migration paths.

    Alternatively, if this change is too disruptive for local development purposes, we can roll this back. The reason I made this switch was to try and cleanup several layers of different configuration and secrets management that had built up over the years (all the way back to pre-Kubernetes days). This tries to hew closer to the more Docker/Kubernetes/12-factor-app native approach of using environment variables for configuration things.

    I tried to minimize overall code changes with this switch, while also preveniting git merge issues for existing developers. I moved the logic from keys.py (and the template) into a keys_env.py that reads these values from environment variables so that it can be checked in (I opted to rename the file to prevent conflicts with people's already gitignored keys.py files). I then configured the docker-compose.yml to read environment variables that can be defined in a git-ignored .env file. I supplied a .env.example file for reference purposes. But otherwise, I tried to leave all of the variable names and other things defined in keys.py mostly alone.

    The current .env files will only work for Docker Compose users, so if folks are doing local development outside of Docker Compose, then there are other ways we could integrate these types of .env files (like through python-dotenv). However, I wasn't sure if anyone was doing development outside of Docker compose, so I didn't hook that up.

Miscellaneous questions/details

  • Consolidated to single reopt_api/settings.py config file: I got rid of the largely duplicative dev_settings.py, production_settings.py, and staging_settings.py files and tried to consolidate all of this to a single settings.py file. I believe each environment should behave the same as before; this just consolidates the duplicative settings into the single file and tweaks settings as needed for different environments. This helped align with the shift to environment variables for configuring more things and hopefully makes things easier to maintain.

  • Removal of nginx vestiges: As far as I could tell, the local nginx container and configuration were no longer being used, so I removed those bits.

  • Removal of c110p/VM deployments: I removed various defunct deployment pieces related to deploying to old VMs that predate the current Kubernetes deployment being used. I'm pretty sure those servers had been retired and were no longer being used.

  • Fix staging django memory limits: This was an existing issue, but it looked like in the staging environment, the Django server was constantly bumping into the Kubernetes memory limits and restarting processes. I've increased the staging memory limits to prevent this churn. This was not an issue in production which already had higher memory limits.

  • Re-enabled some health checks on Kubernetes deployments: Our new deployment setup assumes all Kubernetes resources have health checks defined so Kubernetes can better detect when apps are ready or healthy. The old Kubernetes config files had some of these health checks commented out, so they weren't being used. Since our new deployments assume there will always be health checks, I've reenabled and tweaked some of these to get them working again. So hopefully this change shouldn't really matter (and if anything, help with stability), but just mentioning it in case the health checks prove problematic. But in that case, we can certainly tweak the health checks to allow for longer delays, etc.

  • GitHub.com versus GitHub Enterprise action workflows: The new deployment workflows are only intended to run on GitHub Enterprise, while the existing test workflows are only intended to run on GitHub.com. I've explicitly disabled the public workflows from running on the Enterprise server and vice versa. But just be aware that if you introduce new GitHub Action workflows in the future, they may need to be disabled in one environment or the other, or else the actions will probably just spin and never run.

  • Optimized Dockerfile for build caching: I tweaked the order files were copied into the Django docker image and adjusted the .dockerignore file to help prevent unnecessary builds or work when things haven't actually changed.

Testing

I've locally pointed to this branch deployed to staging and got the web app's example site to come back with results. So I think at least the basics are working, but not sure if there are additional things that would be good to test.

GUI added 18 commits April 20, 2026 20:08
- Remove old deployment files and dependencies. Some of these were even
  from older setups no longer used.

- Try to begin work on cleaning up configuration approach to make things
  easier to configure via environment variables. The previous approach
  had several layers of legacy approaches that had built up over time.
  There was also a lot of duplicate config where there were subtle
  drifts in configuration that this attempts to de-duplicate and unify.

None of this is actually hooked up yet, this is just the beginnings of
trying to work on new deployments.
Fixes various syntax errors and missing imports.

Also renames `keys.py` to `keys_env.py` just so we don't end up
conflicting with people's existing `keys.py` that was gitinored
previously.

Also sets up docker to use optional `.env` files which should simplify
the setup instructions and migration for existing users with custom
keys.py files.
This leverages GitHub Actions, along with other changes we've adopted,
like Vault for secrets, Pkl for Kubernetes configuration management, and
Carvel for deployments.

This also removes some areas that I believe are now unused, like the
nginx proxy layer, since it doesn't seem like that was previously being
used in production any longer.
Since these values were output in the metadata's own output, GitHub
Actions would fail, since it would prevent the metadata's output from
occurring since it thought it contained masked secrets.
This appears to be an existing staging issue, but it looks like django
keeps bumping into this limit and restarting workers inside the
container.
This brings back the restart job that used to be implemented in Jenkins.
@GUI GUI requested a review from Bill-Becker April 23, 2026 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant