Move to GitHub Actions for deployments; other deployment updates/cleanup#714
Open
Move to GitHub Actions for deployments; other deployment updates/cleanup#714
Conversation
- Remove old deployment files and dependencies. Some of these were even from older setups no longer used. - Try to begin work on cleaning up configuration approach to make things easier to configure via environment variables. The previous approach had several layers of legacy approaches that had built up over time. There was also a lot of duplicate config where there were subtle drifts in configuration that this attempts to de-duplicate and unify. None of this is actually hooked up yet, this is just the beginnings of trying to work on new deployments.
Fixes various syntax errors and missing imports. Also renames `keys.py` to `keys_env.py` just so we don't end up conflicting with people's existing `keys.py` that was gitinored previously. Also sets up docker to use optional `.env` files which should simplify the setup instructions and migration for existing users with custom keys.py files.
This leverages GitHub Actions, along with other changes we've adopted, like Vault for secrets, Pkl for Kubernetes configuration management, and Carvel for deployments. This also removes some areas that I believe are now unused, like the nginx proxy layer, since it doesn't seem like that was previously being used in production any longer.
Since these values were output in the metadata's own output, GitHub Actions would fail, since it would prevent the metadata's output from occurring since it thought it contained masked secrets.
This appears to be an existing staging issue, but it looks like django keeps bumping into this limit and restarting workers inside the container.
This brings back the restart job that used to be implemented in Jenkins.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview of changes
This revamps the deployment process for the REopt API to use GitHub Actions instead of Jenkins. Tied in with this change are various other improvements we've found that help make the deployment process easier to run and maintain. This also includes behind-the-scenes updates to ensure this API will continue to function after the
nlr.govdomain switch occurs.The main caveat is that this relies on various GitHub Actions that are only available on our internal GitHub Enterprise server. So while ideally, this would run out here on the public GitHub repo, for now, this will only run on an internal mirror of this repo that exists on the GitHub Enterprise server. This may mean that in some cases you'll have 2 PRs for work (one on the public repo for review, and the other internally for deployment purposes), but it's hopefully no worse than the previous split with manual things happening in the internal Jenkins server.
Longer term, we could maybe revisit some of this to get the full deployment process running on the public GitHub.com repo, but for now, hopefully this is at least a stepping stone that will still make this easier to maintain and use.
Impact for admins/deployers
Staging branched deploys: For the REopt team that performs deployments, the main difference is that to deploy a branch to staging, you'll need to go to the GitHub Enterprise mirror of this repo and open a PR on that mirror with the
deploylabel. Essentially, PRs with thisdeploylabel become the mechanism for deploying branches. As long as that PR is open on the Enterprise mirror and with thatdeploylabel is assigned, then subsequent commits to the branch will automatically be deployed.When the deploy finishes, you'll find a "View deployment" button in the mirrored PR that provides a link to the staging URL of the API.
masterdeploys: Formasterdeploys, those should mostly work the same as before, where changes that land onmasterwill automatically get deployed to the primary staging and then production locations.Potential breaking change for "development" deploys: I have not configured deployments to the separate "development" environment and URL. Are these still used or needed? Since they were being deployed to the same Kubernetes cluster as the staging deploys, I wasn't really sure how they differed from staging deployments, but happy to discuss more if you all still relied on these for your workflow.
Mirroring delay: For the mirrored repo on GitHub Enterprise, it should hopefully pick up any changes pushed to the public GitHub repo within 15-30 seconds. That's likely quicker than the old Jenkins setup automatically picked up changes, but just be aware of that small delay.
Impact for developers
keys.pychanged to.envreplacement: The gitignoredkeys.pyfile has been replaced by using a.envlocal file instead. So for people doing local development, if they had a custom API key in their localkeys.py, it will need to be translated into a.envfile instead.Similarly, if developers had other custom settings in their local
keys.py(like for custom database connections, etc), those will also need to be migrated to.env. I got the sense that most developers would have just customized the API key, but if that's wrong and people have more extensive local customizations to their settings, happy to chat more about migration paths.Alternatively, if this change is too disruptive for local development purposes, we can roll this back. The reason I made this switch was to try and cleanup several layers of different configuration and secrets management that had built up over the years (all the way back to pre-Kubernetes days). This tries to hew closer to the more Docker/Kubernetes/12-factor-app native approach of using environment variables for configuration things.
I tried to minimize overall code changes with this switch, while also preveniting git merge issues for existing developers. I moved the logic from
keys.py(and the template) into akeys_env.pythat reads these values from environment variables so that it can be checked in (I opted to rename the file to prevent conflicts with people's already gitignoredkeys.pyfiles). I then configured thedocker-compose.ymlto read environment variables that can be defined in a git-ignored.envfile. I supplied a.env.examplefile for reference purposes. But otherwise, I tried to leave all of the variable names and other things defined inkeys.pymostly alone.The current
.envfiles will only work for Docker Compose users, so if folks are doing local development outside of Docker Compose, then there are other ways we could integrate these types of.envfiles (like through python-dotenv). However, I wasn't sure if anyone was doing development outside of Docker compose, so I didn't hook that up.Miscellaneous questions/details
Consolidated to single
reopt_api/settings.pyconfig file: I got rid of the largely duplicativedev_settings.py,production_settings.py, andstaging_settings.pyfiles and tried to consolidate all of this to a singlesettings.pyfile. I believe each environment should behave the same as before; this just consolidates the duplicative settings into the single file and tweaks settings as needed for different environments. This helped align with the shift to environment variables for configuring more things and hopefully makes things easier to maintain.Removal of nginx vestiges: As far as I could tell, the local nginx container and configuration were no longer being used, so I removed those bits.
Removal of c110p/VM deployments: I removed various defunct deployment pieces related to deploying to old VMs that predate the current Kubernetes deployment being used. I'm pretty sure those servers had been retired and were no longer being used.
Fix staging django memory limits: This was an existing issue, but it looked like in the staging environment, the Django server was constantly bumping into the Kubernetes memory limits and restarting processes. I've increased the staging memory limits to prevent this churn. This was not an issue in production which already had higher memory limits.
Re-enabled some health checks on Kubernetes deployments: Our new deployment setup assumes all Kubernetes resources have health checks defined so Kubernetes can better detect when apps are ready or healthy. The old Kubernetes config files had some of these health checks commented out, so they weren't being used. Since our new deployments assume there will always be health checks, I've reenabled and tweaked some of these to get them working again. So hopefully this change shouldn't really matter (and if anything, help with stability), but just mentioning it in case the health checks prove problematic. But in that case, we can certainly tweak the health checks to allow for longer delays, etc.
GitHub.com versus GitHub Enterprise action workflows: The new deployment workflows are only intended to run on GitHub Enterprise, while the existing test workflows are only intended to run on GitHub.com. I've explicitly disabled the public workflows from running on the Enterprise server and vice versa. But just be aware that if you introduce new GitHub Action workflows in the future, they may need to be disabled in one environment or the other, or else the actions will probably just spin and never run.
Optimized Dockerfile for build caching: I tweaked the order files were copied into the Django docker image and adjusted the .dockerignore file to help prevent unnecessary builds or work when things haven't actually changed.
Testing
I've locally pointed to this branch deployed to staging and got the web app's example site to come back with results. So I think at least the basics are working, but not sure if there are additional things that would be good to test.