Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,25 @@ pins that tag, so this file is the human-readable answer to "what's in v0.2.0?".

## [Unreleased]

_Nothing yet._
### Changed
- **`compute-engine` now manages many VMs, and no longer hard-codes environment identity.**
- **Multi-VM:** the single hard-coded `google_compute_instance` is replaced by an `instances`
map fanned out with `for_each` (mirrors `github`'s `repositories` pattern) — add a map key to
add a VM. VM name is `<environment_name>-<key>` (the map key is the name, env-prefixed —
e.g. `postiz` → `dev-postiz`). Per-VM spec (`machine_type`, `boot_image`, `boot_disk_size_gb`, `zone`,
`assign_public_ip`, `startup_script`, `network_tags`) moved from top-level vars into each map
entry, all `optional(...)` with cost-safe defaults. IAM grants fan out **member × VM** via
`setproduct` on stable keys (no reindex churn). Outputs collapse to a single `instances` map
keyed by VM key (the map key, not the full `<env>-<key>` VM name) — each value carries `name`,
`instance_id`, `internal_ip`, `zone`, `ssh_command`.
- **Env identity out of the module:** `project_id` lost its dev-project default and is now
**required** (a forgotten value fails loudly instead of silently provisioning into the wrong
project); `access_members` now defaults to `[]` (no SSH) instead of two named engineers. A
reusable module should know *how* to build a VM, not *where* or *who* — that belongs at the
call site. Cost-safe `how` defaults (`e2-micro`, `debian-12`, 20 GB) are unchanged.
- ⚠️ **Breaking:** existing state re-keys `google_compute_instance.this` → `this["<key>"]` (IAM
members too) — consumers must `terragrunt state mv` or recreate. Consumers must now set
`project_id` explicitly and list `access_members` (the empty default grants no access).

## [0.4.0] - 2026-06-15

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ and **[CHANGELOG.md](./CHANGELOG.md)** for what changed in each tagged version.
| Component | Cloud | Purpose | Key outputs |
| ---------------- | ------ | ------------------------------------------------ | ----------------------------------------------- |
| `network` | GCP | Network foundation — wraps CFT network + cloud-router modules | `network_self_link`, `subnetwork_self_link`, `ssh_tag` |
| `compute-engine` | GCP | VM (bootstrap-agnostic); OS Login + IAP access, no public IP | `instance_name`, `internal_ip`, `ssh_command` |
| `compute-engine` | GCP | One or more VMs (`instances` map, bootstrap-agnostic); OS Login + IAP access, no public IP | `instances` (map keyed by VM key) |
| `github` | GitHub | GitHub repositories as code (repo factory) | `repository_names`, `repository_urls` |

`network` and `compute-engine` form a dependency chain:
Expand Down
108 changes: 66 additions & 42 deletions compute-engine/README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,28 @@
# compute-engine

A **GCP Compute Engine VM with Docker** installed on first boot. The VM has **no external IP** —
access follows the same model as AWS SSM Session Manager: **OS Login + IAP TCP forwarding**, where
you connect through Google's infrastructure with your own identity and short-lived,
IAM-governed keys (no public SSH port, no key files to manage).
**One or more GCP Compute Engine VMs**, defined as a map (`instances`) and fanned out with
`for_each` — add a key to add a VM. Each VM has **no external IP** by default; access follows the
same model as AWS SSM Session Manager: **OS Login + IAP TCP forwarding**, where you connect through
Google's infrastructure with your own identity and short-lived, IAM-governed keys (no public SSH
port, no key files to manage). The module is **bootstrap-agnostic** — it runs whatever
`startup_script` each VM passes (e.g. a Docker install), but owns no userdata itself.

## What it creates

For each entry in `instances` (keyed by a short name — that key also keys the `instances` **output**; the VM's GCP name is `<environment_name>-<key>`, so a `postiz` key → output `instances["postiz"]` whose `.name` is `dev-postiz`):

- `google_compute_instance` — the VM. No external IP by default; `enable-oslogin = TRUE` so SSH
access is governed by IAM, not project metadata keys. Runs the caller-supplied `startup_script`
on first boot (empty string = no bootstrap). **This module is bootstrap-agnostic** — the actual
first-boot script (e.g. installing Docker) is **userdata owned by the consuming environment**,
not baked into the module. See "Bootstrap / userdata" below.
- `google_compute_instance_iam_member` (per member) — grants **`roles/compute.osLogin`** to each
`access_members` principal, letting them log in over SSH.
- `google_iap_tunnel_instance_iam_member` (per member) — grants **`roles/iap.tunnelResourceAccessor`**
to each `access_members` principal, letting them open an IAP tunnel to the VM.
access is governed by IAM, not project metadata keys. Runs that entry's `startup_script` on first
boot (empty string = no bootstrap). **This module is bootstrap-agnostic** — the actual first-boot
script (e.g. installing Docker) is **userdata owned by the consuming environment**, not baked into
the module. See "Bootstrap / userdata" below.
- `google_compute_instance_iam_member` — grants **`roles/compute.osLogin`** to each `access_members`
principal on every VM, letting them log in over SSH.
- `google_iap_tunnel_instance_iam_member` — grants **`roles/iap.tunnelResourceAccessor`** to each
`access_members` principal on every VM, letting them open an IAP tunnel to it.

IAM grants fan out as **member × VM** (via `setproduct`) on stable keys, so adding a VM or a member
never reindexes existing grants.

Inbound SSH from the IAP range is allowed by the **`network` component's** `allow-iap-ssh` firewall rule,
and outbound internet (for the Docker install) comes from the `network`'s Cloud NAT — so this module
Expand All @@ -37,7 +44,7 @@ gcloud compute ssh <instance-name> \
--zone <zone> --tunnel-through-iap
```

The `ssh_command` output prints this for you. To grant someone access, add their principal to
Each `instances[<key>].ssh_command` output prints this for you. To grant someone access, add their principal to
`access_members` (e.g. `user:alice@officialdad.com`, `serviceAccount:ci@<project>.iam.gserviceaccount.com`)
and re-apply — no keys to distribute, and access is revoked the moment IAM is removed.

Expand All @@ -48,21 +55,28 @@ caller passes, on first boot, as the instance's `metadata_startup_script`. Pass
for a plain VM.

The **consuming environment owns the bootstrap**. In `infra-environments-dev` the Docker install
lives as a script file (e.g. `compute-engine/userdata/docker-bootstrap.sh`) and a switch in the
unit's `terragrunt.hcl` decides whether to pass it:
lives as a script file (`compute-engine/userdata/docker-bootstrap.sh`), read with `file()` and
passed as that VM's `startup_script` inside the `instances` map:

```hcl
locals {
install_docker = true
}
inputs = {
startup_script = local.install_docker ? file("${get_terragrunt_dir()}/userdata/docker-bootstrap.sh") : ""
network = dependency.network.outputs.network_self_link
subnetwork = dependency.network.outputs.subnetwork_self_link

instances = {
dev = {
machine_type = "e2-micro"
network_tags = [dependency.network.outputs.ssh_tag]
startup_script = file("${get_terragrunt_dir()}/userdata/docker-bootstrap.sh")
}
}
}
```

This keeps the cookbook module generic (any VM, any bootstrap) and puts the "what runs on boot"
decision where environment-specific choices belong. A Docker bootstrap needs outbound internet
(apt + docker.com), which the `network`'s Cloud NAT provides to this otherwise-private VM.
Leave `startup_script` unset (or `""`) for a plain VM. This keeps the cookbook module generic (any
VM, any bootstrap) and puts the "what runs on boot" decision where environment-specific choices
belong. A Docker bootstrap needs outbound internet (apt + docker.com), which the `network`'s Cloud
NAT provides to these otherwise-private VMs.

## Auth

Expand All @@ -73,29 +87,39 @@ must be enabled on the project.

## Inputs

| Name | Type | Default | Description |
| ------------------- | ------------ | ------------------------ | ------------------------------------------------------------------------ |
| `global` | object | — | Env-wide context (`environment_name`, `deploy_region`, `tags`). |
| `project_id` | string | — | GCP project the VM is created in. |
| `network` | string | — | Network self link / name (from `network.network_self_link`). |
| `subnetwork` | string | — | Subnetwork self link / name (from `network.subnetwork_self_link`). |
| `zone` | string | `""` | Zone for the VM. Empty → `"<deploy_region>-a"`. |
| `machine_type` | string | `e2-micro` | Machine type. |
| `boot_image` | string | `debian-cloud/debian-12` | Boot image (`project/family` or full self link). |
| `boot_disk_size_gb` | number | `20` | Boot disk size in GB. |
| `startup_script` | string | `""` | First-boot script (userdata). Empty = no bootstrap. Supplied by the env. |
| `assign_public_ip` | bool | `false` | Attach an ephemeral external IP. Leave `false` for the IAP-only model. |
| `access_members` | list(string) | `[]` | IAM principals granted OS Login + IAP tunnel access (see access model). |
| Name | Type | Default | Description |
| ---------------- | ------------ | ------- | ------------------------------------------------------------------------------------------------ |
| `global` | object | — | Env-wide context (`environment_name`, `deploy_region`, `tags`). |
| `project_id` | string | — | **Required.** GCP project the VMs are created in. Set per environment (see note below). |
| `network` | string | — | Network self link / name (from `network.network_self_link`). Shared by all VMs. |
| `subnetwork` | string | — | Subnetwork self link / name (from `network.subnetwork_self_link`). Shared by all VMs. |
| `access_members` | list(string) | `[]` | IAM principals granted OS Login + IAP access on **every** VM. Empty = no SSH access (see note). |
| `instances` | map(object) | `{}` | VMs to create, keyed by short name. Per-VM fields below; each entry overrides only what it needs. |

Per-VM fields inside each `instances` entry (all optional):

| Field | Default | Description |
| ------------------- | ------------------------ | ----------------------------------------------------------------------- |
| `machine_type` | `e2-micro` | Machine type. |
| `boot_image` | `debian-cloud/debian-12` | Boot image (`project/family` or full self link). |
| `boot_disk_size_gb` | `20` | Boot disk size in GB. |
| `zone` | `""` | Zone. Empty → `"<deploy_region>-a"`. |
| `assign_public_ip` | `false` | Attach an ephemeral external IP. Leave `false` for the IAP-only model. |
| `startup_script` | `""` | First-boot script (userdata). Empty = no bootstrap. Supplied by the env. |
| `network_tags` | `[]` | Firewall tags (e.g. `[network.ssh_tag]`). Empty = no tag-scoped inbound. |

> **Why `project_id` is required and `access_members` defaults to `[]`:** a reusable module should
> know *how* to build a VM, never *where* or *who* — that's environment identity, owned by the call
> site. `project_id` has no safe default (defaulting to one env's project risks another env silently
> provisioning into it), so it's required and fails loudly when unset. `access_members` defaults to
> "no access" (safe when forgotten) rather than a hard-coded person. The cost-safe *how* knobs
> (`e2-micro`, `debian-12`, 20 GB) keep defaults, since a forgotten value there is harmless.

## Outputs

| Name | Description |
| --------------- | ---------------------------------------------------------------- |
| `instance_name` | The VM name (`<environment_name>-compute-engine`). |
| `instance_id` | The instance ID. |
| `internal_ip` | The VM's internal IP. |
| `zone` | The zone the VM runs in. |
| `ssh_command` | Ready-to-run `gcloud compute ssh ... --tunnel-through-iap` line. |
| Name | Type | Description |
| ----------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `instances` | map(object) | Per-VM details keyed by instance key. Each value has `name`, `instance_id`, `internal_ip`, `zone`, and a ready-to-run `ssh_command` (`gcloud compute ssh … --tunnel-through-iap`). |

## Dependencies

Expand Down
50 changes: 27 additions & 23 deletions compute-engine/terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,28 @@ provider "google" {
}

locals {
name_prefix = "${var.global.environment_name}-compute-engine"

# Default to zone "a" in the deploy region unless the caller pins one.
zone = var.zone != "" ? var.zone : "${var.global.deploy_region}-a"

# GCP labels must be lowercase; only this subset of the tags convention maps cleanly.
labels = {
environment = lower(var.global.environment_name)
managed_by = "terraform"
}

# member x instance -> one stable key per pair, so for_each never reindexes
# when a VM or a member is added/removed.
vm_access = {
for pair in setproduct(keys(var.instances), var.access_members) :
"${pair[0]}:${pair[1]}" => { instance = pair[0], member = pair[1] }
}
}

resource "google_compute_instance" "this" {
name = local.name_prefix
machine_type = var.machine_type
zone = local.zone
for_each = var.instances

name = "${var.global.environment_name}-${each.key}"
machine_type = each.value.machine_type
zone = each.value.zone != "" ? each.value.zone : "${var.global.deploy_region}-a"
labels = local.labels
tags = var.network_tags
tags = each.value.network_tags

# Destroy-friendly: no accidental lock, and changing machine_type stops the VM
# instead of forcing a full recreate.
Expand All @@ -31,8 +35,8 @@ resource "google_compute_instance" "this" {
boot_disk {
auto_delete = true # disk is deleted with the VM -> no orphaned disk cost
initialize_params {
image = var.boot_image
size = var.boot_disk_size_gb
image = each.value.boot_image
size = each.value.boot_disk_size_gb
}
}

Expand All @@ -42,7 +46,7 @@ resource "google_compute_instance" "this" {

# Emitting access_config = an external IP. Omit it (default) = no public IP.
dynamic "access_config" {
for_each = var.assign_public_ip ? [1] : []
for_each = each.value.assign_public_ip ? [1] : []
content {}
}
}
Expand All @@ -52,25 +56,25 @@ resource "google_compute_instance" "this" {
}

# Caller-supplied userdata; null (unset) when empty.
metadata_startup_script = var.startup_script != "" ? var.startup_script : null
metadata_startup_script = each.value.startup_script != "" ? each.value.startup_script : null
}

# OS Login: who may SSH in (identity-based).
# OS Login: who may SSH in (identity-based). Every member on every VM.
resource "google_compute_instance_iam_member" "os_login" {
for_each = toset(var.access_members)
for_each = local.vm_access
project = var.project_id
zone = local.zone
instance_name = google_compute_instance.this.name
zone = google_compute_instance.this[each.value.instance].zone
instance_name = google_compute_instance.this[each.value.instance].name
role = "roles/compute.osLogin"
member = each.value
member = each.value.member
}

# IAP: who may open the tunnel that reaches the (private) VM's SSH port.
# IAP: who may open the tunnel that reaches each (private) VM's SSH port.
resource "google_iap_tunnel_instance_iam_member" "tunnel" {
for_each = toset(var.access_members)
for_each = local.vm_access
project = var.project_id
zone = local.zone
instance = google_compute_instance.this.name
zone = google_compute_instance.this[each.value.instance].zone
instance = google_compute_instance.this[each.value.instance].name
role = "roles/iap.tunnelResourceAccessor"
member = each.value
member = each.value.member
}
34 changes: 11 additions & 23 deletions compute-engine/terraform/outputs.tf
Original file line number Diff line number Diff line change
@@ -1,24 +1,12 @@
output "instance_name" {
value = google_compute_instance.this.name
description = "The VM name."
}

output "instance_id" {
value = google_compute_instance.this.instance_id
description = "The instance ID."
}

output "internal_ip" {
value = google_compute_instance.this.network_interface[0].network_ip
description = "The VM's internal IP."
}

output "zone" {
value = google_compute_instance.this.zone
description = "The zone the VM runs in."
}

output "ssh_command" {
value = "gcloud compute ssh ${google_compute_instance.this.name} --zone ${google_compute_instance.this.zone} --project ${var.project_id} --tunnel-through-iap"
description = "Ready-to-run IAP SSH command."
output "instances" {
value = {
for k, vm in google_compute_instance.this : k => {
name = vm.name
instance_id = vm.instance_id
internal_ip = vm.network_interface[0].network_ip
zone = vm.zone
ssh_command = "gcloud compute ssh ${vm.name} --zone ${vm.zone} --project ${var.project_id} --tunnel-through-iap"
}
}
description = "Per-instance details keyed by instance key: name, instance_id, internal_ip, zone, and a ready-to-run IAP ssh_command."
}
Loading