Skip to content

Security

Tim Krebs edited this page Apr 3, 2026 · 1 revision

Security

The Netlix Platform implements defense-in-depth security across infrastructure, secrets, network, and workload layers. This page documents all security controls and their rationale.


Credential Management

Zero static credentials in code. All authentication uses OIDC workload identity, dynamic secrets, or encrypted variable sets.

Credential Source Delivery Rotation
AWS access OIDC identity token assume_role_with_web_identity Per-run (short-lived JWT)
HCP service principal Variable set netlix-hcp Ephemeral store reference Manual
Vault authentication OIDC identity token auth_login_jwt (mount: jwt-tfc) Per-run
Database credentials Vault database engine Dynamic secrets via VSO Automatic (67% TTL)
TLS certificates Vault PKI engine VaultPKISecret CRD via VSO Automatic (before expiry)
GitHub PAT Deployment input Sensitive variable Manual
Datadog API key Kubernetes secret Manual kubectl create secret Manual

OIDC Workload Identity Flow

HCP Terraform Run
    │
    ├── identity_token "aws"   ─── JWT (aud: aws.workload.identity)
    │       │
    │       └── AWS STS AssumeRoleWithWebIdentity
    │               │
    │               └── Temporary AWS credentials (scoped IAM role)
    │
    └── identity_token "vault" ─── JWT (aud: vault.workload.identity)
            │
            └── Vault JWT Auth (mount: jwt-tfc, role: tfc-stacks)
                    │
                    └── Vault token (TTL: 20min, max: 1hr)

IAM Security

Bootstrap IAM Roles

The bootstrap creates per-environment IAM roles with scoped inline policies (not AdministratorAccess).

Allowed services:

SID Actions Purpose
Networking ec2:*, elasticloadbalancing:* VPC, subnets, NAT, security groups, ALBs
EKS eks:* Cluster, node groups, addons, OIDC
RDS rds:* Database instances, subnet groups
DNS route53:*, acm:* Hosted zones, records, certificates
IAM iam:* Roles, policies, OIDC providers (IRSA)
KMS kms:* Encryption keys for EKS/RDS
Observability logs:*, cloudwatch:*, sns:* Log groups, alarms, notifications
STS sts:GetCallerIdentity, sts:AssumeRole Identity verification

Not allowed: Lambda, DynamoDB, SQS, Redshift, S3 (except via EKS addons), and all other AWS services.

Role Trust Policy

Each role is locked to a specific TFC organization, project, and stack:

{
  "Condition": {
    "StringEquals": { "app.terraform.io:aud": "aws.workload.identity" },
    "StringLike": { "app.terraform.io:sub": "organization:tim-krebs-org:project:netlix-platform:stack:netlix-platform-dev:*" }
  }
}

IRSA (IAM Roles for Service Accounts)

Kubernetes workloads that need AWS API access use IRSA:

Service Account IAM Role Permissions
ALB Controller netlix-{env}-lb-controller ELB, EC2 (target groups, listeners)
ExternalDNS netlix-{env}-external-dns Route53 (record sets)
EBS CSI Driver netlix-{env}-ebs-csi EC2 (volumes, snapshots)

Vault Security

Namespace Isolation

HCP Vault uses hierarchical namespaces to isolate environments:

admin/                          # Root admin namespace
├── dev/                        # Dev environment
│   ├── kubernetes auth         # K8s auth for dev cluster
│   ├── pki + pki_int          # PKI CAs for dev
│   ├── database               # Dynamic DB creds for dev
│   └── secret                 # KV secrets for dev
├── staging/                    # Staging environment
│   ├── kubernetes auth
│   ├── pki + pki_int
│   ├── database
│   └── secret
├── jwt-tfc auth               # Shared: TFC OIDC auth
└── userpass auth               # Shared: bootstrap admin

Vault Policies

Policy Scope Capabilities Purpose
vso-policy Per-environment Read/list on secret/*, database/*, pki_int/* VSO secret injection
app-policy Per-environment Read on secret/data/netlix/*, database/creds/netlix-app Application pods
tfc-policy Shared (admin) Full access (path "*") TFC Stacks provisioning
admin-policy Shared (admin) Full access (path "*") Bootstrap admin (dev only)

Note on TFC policy: HCP Vault requires path "*" at the parent namespace for cross-namespace operations. Scoped paths (sys/mounts/*, auth/*, etc.) do not propagate to child namespaces (admin/dev, admin/staging). This is a known HCP Vault limitation, not a misconfiguration. See ADR-005.

Kubernetes Auth

HCP Vault authenticates Kubernetes service accounts via the TokenReview API:

  1. A dedicated vault-token-reviewer ServiceAccount in kube-system has system:auth-delegator cluster role
  2. A long-lived service account token is created (required for external Vault)
  3. Vault uses this token to call the K8s TokenReview API and validate pod identities

Auth roles:

Role Bound SAs Bound Namespaces Policy
netlix-vso vault-secrets-operator, vault-secrets-operator-controller-manager vault-secrets-operator-system, consul vso-policy
netlix-app netlix-app netlix app-policy

Dynamic Database Credentials

Vault's database secrets engine generates short-lived PostgreSQL credentials:

App Pod → VSO → Vault (database/creds/netlix-app) → Temporary PG credentials
  • Creation statements create a role with specific grants
  • Default TTL: 1 hour
  • Max TTL: 24 hours
  • VSO renews at 67% TTL automatically

Network Security

VPC Flow Logs

All VPC traffic is logged to CloudWatch:

  • Log format: REJECT + ACCEPT actions
  • Retention: configurable
  • CloudWatch metric filter on REJECT actions
  • Alarm: > 100 rejected connections in 5 minutes triggers SNS notification

Security Groups

Resource Inbound Outbound
EKS Control Plane Worker nodes (443) Worker nodes (all)
EKS Worker Nodes Control plane (all), self (all) All
RDS EKS SG (5432), HVN CIDR (5432) None
ALB Internet (443, 80) Worker nodes (target ports)

Kubernetes NetworkPolicies

The consul namespace has a default-deny-all policy with explicit allow rules:

Default Deny

spec:
  podSelector: {}         # Applies to ALL pods
  policyTypes:
    - Ingress
    - Egress

Web Pod Policy

Direction Source/Destination Ports Purpose
Ingress Any (ALB) TCP/9090 HTTP traffic from ALB
Ingress Any namespace TCP/20200 Prometheus metrics scraping
Egress app=api pods TCP/8080 Upstream API calls
Egress kube-system namespace UDP+TCP/53 DNS resolution
Egress consul namespace TCP/8301,8502,20000 Consul control plane
Egress vault-secrets-operator-system TCP/8200 Vault access
Egress datadog namespace TCP/8126, UDP/8125 APM traces + DogStatsD
Egress Same namespace pods All Envoy mesh traffic

API Pod Policy

Direction Source/Destination Ports Purpose
Ingress app=web pods TCP/8080 Traffic from web
Ingress Any namespace TCP/20200 Prometheus metrics
Egress kube-system UDP+TCP/53 DNS resolution
Egress consul namespace TCP/8301,8502,20000 Consul control plane
Egress vault-secrets-operator-system TCP/8200 Vault access
Egress datadog namespace TCP/8126, UDP/8125 APM + DogStatsD
Egress Same namespace pods All Envoy mesh traffic

Pod Security Standards

The consul namespace applies Kubernetes Pod Security Standards:

labels:
  pod-security.kubernetes.io/enforce: baseline     # Blocks privileged containers
  pod-security.kubernetes.io/warn: restricted       # Warns on restricted violations
  pod-security.kubernetes.io/audit: restricted      # Audits restricted violations

Why baseline (not restricted) for enforce? Consul Connect sidecars may require capabilities that violate the restricted profile (e.g., NET_BIND_SERVICE). The warn + audit labels for restricted allow monitoring violations without blocking deployments. Once validated, enforce can be upgraded to restricted.

Container Security Contexts

All application containers enforce:

  • runAsNonRoot: true
  • runAsUser: 65532 (nonroot user)
  • allowPrivilegeEscalation: false
  • readOnlyRootFilesystem: true
  • capabilities.drop: ["ALL"]
  • seccompProfile.type: RuntimeDefault

Container Images

  • Base image: gcr.io/distroless/static-debian12:nonroot
  • No shell, no package manager, no writable filesystem
  • Trivy scans in CI and CD pipelines (CRITICAL/HIGH severity fails the build)

EKS Security

Encryption

Feature Details
Secrets encryption KMS envelope encryption for etcd secrets
EBS encryption KMS-encrypted persistent volumes via EBS CSI driver
In-transit All K8s API communication over TLS

Endpoint Access

Environment Public Access CIDR Restriction
Dev Enabled 0.0.0.0/0 (open for debugging)
Staging Disabled Private only (VPC internal)

Addons

Managed EKS addons with automatic updates:

  • CoreDNS
  • kube-proxy
  • VPC CNI (for pod networking)
  • EBS CSI driver (for persistent volumes)

RDS Security

Feature Dev Staging
Encryption at rest KMS KMS
Multi-AZ No Yes
Deletion protection No Yes
Final snapshot on delete Skip Required
Performance insights Enabled (KMS encrypted) Enabled (KMS encrypted)
Network access EKS SG + HVN CIDR only EKS SG + HVN CIDR only
Admin password Random (Terraform-generated) Random (Terraform-generated)
App credentials Vault dynamic secrets Vault dynamic secrets

Supply Chain Security

Control Implementation
Container scanning Trivy (CRITICAL/HIGH) in CI and CD
Dependency scanning Go vet in CI
Image provenance GHCR with SHA-pinned tags
Pre-commit hooks detect-private-key, no-commit-to-branch main
Code ownership CODEOWNERS file with team rules
Branch protection Required CI checks before merge
Secrets detection Pre-commit hook + .gitignore patterns

Clone this wiki locally