Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Kuperator - Claude Code Project Guide

## Development Requirements

- Go 1.24+
- Kubernetes 1.22+ (envtest uses 1.22.1)
- Docker (for container builds)
- Kind (for e2e testing)

## Pre-commit Checklist

**IMPORTANT**: Before committing and pushing code, always run:

```bash
make build lint fmt vet
```

## Key Dependencies

- `kusionstack.io/kube-api`: CRD API definitions
- `kusionstack.io/kube-utils`: Utility packages (cert, certmanager, controller helpers)
- `kusionstack.io/kube-xset`: XSet workload framework
- `sigs.k8s.io/controller-runtime`: Kubernetes controller framework

## Related Documentation

- [Contribution Guide](docs/contributing.md)
- [Design Plans](docs/plans/)
- [Official Documentation](https://kusionstack.io/kuperator/introduction/)
218 changes: 218 additions & 0 deletions docs/plans/webhook-certificate-auto-rotation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# Webhook Certificate Auto-Rotation Design

## Background

### Problem Description

When users update kuperator via Helm, even after the controller starts normally, webhook calls fail with `TLS handshake timeout` error.

### Root Cause Analysis

1. **Missing volumeMounts in Helm Chart**

The StatefulSet defines a volume `webhook-certs` but never mounts it:
```yaml
volumes:
- name: webhook-certs
secret:
secretName: webhook-certs
# Missing: volumeMounts in container spec!
```

2. **Secret Name Mismatch**

- Code creates Secret: `kusionstack-webhook-certs`
- Helm chart references: `webhook-certs`

3. **No Certificate Expiration Detection**

Certificates have a 1-year validity but no automatic rotation mechanism.

4. **Timing Issue**

Certificate generation happens after webhook server tries to start, causing TLS handshake timeout.

5. **Temporary Directory Issue**

Without volumeMount, certificates are written to a temporary directory that disappears on pod restart.

## Solution

Use `kusionstack.io/kube-utils/webhook/certmanager` package which provides:

- Automatic certificate expiration detection and rotation
- Unified Secret name configuration
- Continuous CABundle synchronization via Watch mechanism
- Non-leader-election controller (all replicas run)

## Implementation

### Files Changed

| File | Change |
|------|--------|
| `pkg/webhook/webhook.go` | Refactored to use certmanager |
| `main.go` | Updated Initialize call signature |
| `charts/templates/statefulset.yaml` | Added volumeMounts |
| `pkg/utils/pki_helpers.go` | Deleted (no longer needed) |
| `go.mod` | Added afero, golib dependencies |

### Code Changes

#### 1. `pkg/webhook/webhook.go`

Before: ~260 lines with manual certificate generation
After: ~75 lines using certmanager

```go
func Initialize(ctx context.Context, mgr manager.Manager, dnsName string) error {
cfg := certmanager.CertConfig{
Host: dnsName,
Namespace: getNamespace(),
SecretName: "webhook-certs", // Unified name
MutatingWebhookNames: []string{"kusionstack-controller-manager-mutating"},
ValidatingWebhookNames: []string{"kusionstack-controller-manager-validating"},
}

certMgr := certmanager.New(mgr, cfg)
return certMgr.SetupWithManager(mgr)
}
```

#### 2. `main.go`

```go
// Before
if err := webhook.Initialize(ctx, config, dnsName, certDir); err != nil { ... }

// After
if err := webhook.Initialize(ctx, mgr, dnsName); err != nil { ... }
```

Default `cert-dir` changed from temp directory to `/webhook-certs`.

#### 3. `charts/templates/statefulset.yaml`

```yaml
containers:
- name: manager
volumeMounts:
- name: webhook-certs
mountPath: /webhook-certs
readOnly: false # Need write permission for cert rotation
```

## Certificate Workflow

### Startup Sequence

```
1. ctrl.NewManager()
└── webhookServer created with CertDir="/webhook-certs"

2. webhook.Initialize(ctx, mgr, dnsName)
└── certmanager.SetupWithManager
├── FSCertProvider initialized (path="/webhook-certs")
├── SecretCertProvider initialized (secret="webhook-certs")
└── Manual Reconcile called
├── Load/Generate certs from Secret
├── Write certs to filesystem
│ ├── tls.key (server private key)
│ ├── tls.crt (server certificate)
│ ├── ca.key (CA private key)
│ └── ca.crt (CA certificate)
└── Update CABundle in WebhookConfigurations

3. mgr.Start(ctx)
└── webhook.Server.Start()
├── certwatcher reads tls.key + tls.crt
├── Starts fsnotify watcher for file changes
└── TLS server ready
```

### Certificate Auto-Rotation

```
Secret/WebhookConfiguration change triggers Reconcile:

1. SecretCertProvider.Ensure()
├── Load existing certs from Secret
└── Validate(host) - check expiration and DNSName
└── If invalid: GenerateSelfSignedCerts()

2. FSCertProvider.Overwrite()
└── Write new certs to filesystem
└── fsnotify detects file change

3. certwatcher.handleEvent()
└── ReadCertificate() - reload certs
└── Update currentCert

4. New TLS connections automatically use new certificate
(no server restart needed)
```

## Certificate Files Explanation

| File | Purpose | Location |
|------|---------|----------|
| `ca.key` | CA private key for signing certificates | Secret, filesystem (not used directly) |
| `ca.crt` | CA certificate = CABundle for verification | Secret, WebhookConfiguration.CABundle |
| `tls.key` | Server private key for TLS encryption | Secret, filesystem (webhook server) |
| `tls.crt` | Server certificate for TLS identity | Secret, filesystem (webhook server) |

### TLS Handshake Flow

```
API Server (Client) Webhook Server
───────────────── ────────────────

1. Connect to port 9443 ────────────────────────▶

2. ◀─── Send tls.crt (server certificate)

3. Verify using CABundle (ca.crt) ───────────────▶
✓ Signature valid
✓ Not expired
✓ DNSName matches

4. Key exchange with tls.crt's public key ───────▶

5. ◀─── Decrypt with tls.key, establish encrypted channel

6. Send AdmissionReview (encrypted) ────────────▶

7. ◀─── Response (encrypted)
```

## Comparison

| Feature | Old Approach | New Approach (certmanager) |
|---------|-------------|---------------------------|
| Certificate expiration detection | ❌ None | ✅ Validate() auto-detects |
| Auto rotation | ❌ Manual delete Secret | ✅ Expired → auto regenerate |
| Secret name consistency | ❌ Mismatch | ✅ Unified to `webhook-certs` |
| volumeMounts | ❌ Missing | ✅ Added |
| CABundle sync | Only at startup | ✅ Watch + continuous sync |
| TLS handshake | May timeout | ✅ Certs ready before server starts |
| Multi-replica support | Leader election exclusive | ✅ All replicas run (NeedLeaderElection=false) |

## Dependencies Added

- `github.com/spf13/afero v1.11.0` - filesystem abstraction
- `github.com/zoumo/golib v0.2.0` - certificate utilities
- `kusionstack.io/kube-utils/cert` - cert provider package
- `kusionstack.io/kube-utils/webhook/certmanager` - cert manager controller

## Testing

All webhook tests pass:
```
ok kusionstack.io/kuperator/pkg/webhook/server/generic/collaset
ok kusionstack.io/kuperator/pkg/webhook/server/generic/pod/gracedelete
ok kusionstack.io/kuperator/pkg/webhook/server/generic/pod/opslifecycle
ok kusionstack.io/kuperator/pkg/webhook/server/generic/poddecoration
ok kusionstack.io/kuperator/pkg/webhook/server/generic/podtransitionrule
```

Build successful: `go build ./...`
4 changes: 3 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ require (
k8s.io/kubernetes v0.0.0-00010101000000-000000000000
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738
kusionstack.io/kube-api v0.7.5-0.20260127130112-9424ce325e09
kusionstack.io/kube-utils v0.2.1-0.20251120063041-6043805ee00d
kusionstack.io/kube-utils v0.2.1-0.20251219073659-c81662b5b6a3
kusionstack.io/kube-xset v0.0.2-0.20260127130229-a7a010eba7e0
kusionstack.io/resourceconsist v0.0.1
sigs.k8s.io/controller-runtime v0.17.3
Expand All @@ -34,6 +34,8 @@ require (
github.com/golang/protobuf v1.5.3 // indirect
github.com/matttproud/golang_protobuf_extensions/v2 v2.0.0 // indirect
github.com/samber/lo v1.47.0 // indirect
github.com/spf13/afero v1.11.0 // indirect
github.com/zoumo/golib v0.2.0 // indirect
)

require (
Expand Down
8 changes: 6 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,8 @@ github.com/soheilhy/cmux v0.1.5/go.mod h1:T7TcVDs9LWfQgPlPsdngu6I6QIoyIFZDDC6sNE
github.com/spaolacci/murmur3 v0.0.0-20180118202830-f09979ecbc72/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA=
github.com/spf13/afero v1.1.2/go.mod h1:j4pytiNVoe2o6bmDsKpLACNPDBIoEAkihy7loJ1B0CQ=
github.com/spf13/afero v1.2.2/go.mod h1:9ZxEEn6pIJ8Rxe320qSDBk6AsU0r9pR7Q4OcevTdifk=
github.com/spf13/afero v1.11.0 h1:WJQKhtpdm3v2IzqG8VMqrr6Rf3UYpEF239Jy9wNepM8=
github.com/spf13/afero v1.11.0/go.mod h1:GH9Y3pIexgf1MTIWtNGyogA5MwRIDXGUr+hbWNoBjkY=
github.com/spf13/cast v1.3.0/go.mod h1:Qx5cxh0v+4UWYiBimWS+eyWzqEqokIECu5etghLkUJE=
github.com/spf13/cobra v1.0.0/go.mod h1:/6GTrnGXV9HjY+aR4k0oJ5tcvakLuG6EuKReYlHNrgE=
github.com/spf13/cobra v1.1.3/go.mod h1:pGADOWyqRD/YMrPZigI/zbliZ2wVD/23d+is3pSWzOo=
Expand Down Expand Up @@ -594,6 +596,8 @@ github.com/yuin/goldmark v1.1.30/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9de
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.3.5/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1Zlc8k=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
github.com/zoumo/golib v0.2.0 h1:K6W8WWrgnl2bXRvUaiXjAaiFKsCTHwnrBkBHZoFr8lE=
github.com/zoumo/golib v0.2.0/go.mod h1:gOMPRvDgn9m49tfHoKUb2RO0NqplNoe/qj5/ZrczjgQ=
go.etcd.io/bbolt v1.3.2/go.mod h1:IbVyRI1SCnLcuJnV2u8VeU0CEYM7e686BmAb1XKL+uU=
go.etcd.io/bbolt v1.3.6/go.mod h1:qXsaaIqmgQH0T+OPdb99Bf+PKfBBQVAdyD6TY9G8XM4=
go.etcd.io/etcd/api/v3 v3.5.0/go.mod h1:cbVKeC6lCfl7j/8jBhAK6aIYO9XOjdptoxU/nLQcPvs=
Expand Down Expand Up @@ -1079,8 +1083,8 @@ k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738 h1:M3sRQVHv7vB20Xc2ybTt7ODCeFj6J
k8s.io/utils v0.0.0-20241104100929-3ea5e8cea738/go.mod h1:OLgZIPagt7ERELqWJFomSt595RzquPNLL48iOWgYOg0=
kusionstack.io/kube-api v0.7.5-0.20260127130112-9424ce325e09 h1:Kgc1N61F9KoBi1sHCrwoN8ax0j+0f1n6dQDQe2Luy9M=
kusionstack.io/kube-api v0.7.5-0.20260127130112-9424ce325e09/go.mod h1:e1jtrQH2LK5fD2nTyfIXG6nYrYbU8VXShRxTRwVPaLk=
kusionstack.io/kube-utils v0.2.1-0.20251120063041-6043805ee00d h1:iQtnK03ia/MN4K/6O75EMI91ep7jpcIG0pWyeREBqtg=
kusionstack.io/kube-utils v0.2.1-0.20251120063041-6043805ee00d/go.mod h1:KEHTfo1Y8SWMODnckF6daO2cSIW0FJ8fkk8PBA5O2GU=
kusionstack.io/kube-utils v0.2.1-0.20251219073659-c81662b5b6a3 h1:0cXP1HAHG06Rf2Zztcep1LXkksd/mwppnozHy+mco6I=
kusionstack.io/kube-utils v0.2.1-0.20251219073659-c81662b5b6a3/go.mod h1:Lz5SBYWg9+jw+kP0CAyf/b62D5DeUPf6+jE1d0WC4cI=
kusionstack.io/kube-xset v0.0.2-0.20260127130229-a7a010eba7e0 h1:mU1Jjdfgihju0xiYMmW/jSTGhovca/WEID7QzJrwkmw=
kusionstack.io/kube-xset v0.0.2-0.20260127130229-a7a010eba7e0/go.mod h1:FceKgqapMHhwiyIqCziTQRW27fsSXpPS611AApnyiYI=
kusionstack.io/resourceconsist v0.0.1 h1:+k/jriq5Ld7fQUYfWSMGynz/FesHtl3Rk2fmQPjBe0g=
Expand Down
15 changes: 5 additions & 10 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ import (
"context"
"flag"
"os"
"path/filepath"

"github.com/spf13/pflag"
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
Expand Down Expand Up @@ -62,8 +61,8 @@ func main() {
flag.BoolVar(&enableLeaderElection, "leader-elect", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.StringVar(&certDir, "cert-dir", webhookTempCertDir(), "The directory that contains the server key and certificate. If not set, webhook server would look up the server key and certificate in {TempDir}/k8s-webhook-server/serving-certs")
flag.StringVar(&dnsName, "dns-name", "kusionstack-controller-manager.kusionstack-system.svc", "The DNS name of the webhook server.")
flag.StringVar(&certDir, "cert-dir", "/webhook-certs", "The directory that contains the server key and certificate for webhook.")
flag.StringVar(&dnsName, "dns-name", "controller-manager.kusionstack-system.svc", "The DNS name of the webhook server.")

klog.InitFlags(nil)
defer klog.Flush()
Expand Down Expand Up @@ -122,9 +121,9 @@ func main() {
}

// +kubebuilder:scaffold:builder
setupLog.Info("initialize webhook")
if err := webhook.Initialize(context.Background(), config, dnsName, certDir); err != nil {
setupLog.Error(err, "unable to initialize webhook")
setupLog.Info("initialize webhook cert manager")
if err := webhook.Initialize(context.Background(), mgr, dnsName); err != nil {
setupLog.Error(err, "unable to initialize webhook cert manager")
os.Exit(1)
}

Expand All @@ -143,7 +142,3 @@ func main() {
os.Exit(1)
}
}

func webhookTempCertDir() string {
return filepath.Join(os.TempDir(), "k8s-webhook-server", "serving-certs")
}
Loading
Loading