Skip to content

Update fleetautoscalers.go#4556

Open
MusGaas wants to merge 2 commits intoagones-dev:mainfrom
MusGaas:main
Open

Update fleetautoscalers.go#4556
MusGaas wants to merge 2 commits intoagones-dev:mainfrom
MusGaas:main

Conversation

@MusGaas
Copy link
Copy Markdown

@MusGaas MusGaas commented May 7, 2026

/kind bug

What this PR does / Why we need it:
applyWebhookPolicy and applyWasmPolicy both dereference Response.Scale
without checking if Response is nil. When a webhook or Wasm policy
returns a JSON body that omits the response field (e.g., {}),
json.Unmarshal succeeds with Response as a nil pointer. The subsequent
dereference panics, crashing both HA controller replicas via the shared
workqueue and halting cluster-wide FleetAutoscaler reconciliation until
an admin removes the resource.

Both sites now return an explicit error when Response is nil, allowing
the reconcile loop to handle the error gracefully rather than panicking.

Tested against helm install agones release 1.57.0 in a kind cluster.
Both controller pods entered CrashLoopBackOff before the fix. After
the fix, the error is returned cleanly with no panic.

Which issue(s) this PR fixes:
Closes #4555

Special notes for your reviewer:
This fix was identified and reported via Google's OSS VRP (Agones
documented security intake at g.co/vulnz). Two sites fixed in the
same file — applyWebhookPolicy and applyWasmPolicy — same root cause,
same one-line nil check pattern.

MusGaas added 2 commits May 7, 2026 10:25
Signed-off-by: Muss <muss@imgsrcxonerroralertdocument.domain>
Signed-off-by: MusGaas <musgaas@gmail.com>
fix: nil-check Response field in webhook and Wasm autoscaler handlers

Signed-off-by:  MusGaas - Musgaas@gmail.com
Signed-off-by: Muss <muss@imgsrcxonerroralertdocument.domain>
Signed-off-by: MusGaas <musgaas@gmail.com>
@MusGaas
Copy link
Copy Markdown
Author

MusGaas commented May 7, 2026

Hi, could a maintainer please run /gcbrun to trigger the CI build? Thank you.

@markmandel
Copy link
Copy Markdown
Member

We should add a unit test for this, to make sure it continues to not be a problem.

@markmandel
Copy link
Copy Markdown
Member

This fix was identified and reported via Google's OSS VRP (Agones
documented security intake at g.co/vulnz). Two sites fixed in the
same file — applyWebhookPolicy and applyWasmPolicy — same root cause,
same one-line nil check pattern.

Two things:

  1. That's not our security policy anymore, and hasn't been for several weeks.
  2. I don't think this is a security issue, since it's not something that is accessible from the outside world (unless someone configures their cluster badly - which we can't control). Unless I'm missing something here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug These are bugs. size/XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FleetAutoscaler webhook and Wasm handlers panic when response body omits 'response' field

2 participants