Skip to content

Implement gameserver + cron support#4528

Open
Sivasankaran25 wants to merge 6 commits intoagones-dev:mainfrom
Sivasankaran25:feature/gamservercronjob
Open

Implement gameserver + cron support#4528
Sivasankaran25 wants to merge 6 commits intoagones-dev:mainfrom
Sivasankaran25:feature/gamservercronjob

Conversation

@Sivasankaran25
Copy link
Copy Markdown
Collaborator

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug
/kind cleanup
/kind documentation

/kind feature

/kind hotfix
/kind release

What this PR does / Why we need it:

Which issue(s) this PR fixes:

Closes #3907

Special notes for your reviewer:

@Sivasankaran25 Sivasankaran25 requested a review from igooch April 23, 2026 06:41
@Sivasankaran25 Sivasankaran25 self-assigned this Apr 23, 2026
@github-actions github-actions Bot added kind/feature New features for Agones size/XL labels Apr 23, 2026
@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 84565abc-4aa0-4e9e-a0c5-f6e956f7d794

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: d1ee6e01-8c72-4b6f-b347-49a9b14fc58e

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

1 similar comment
@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Failed 😭

Build Id: 07ef8820-ce5a-47f4-9b3f-f2198dc63f54

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@github-actions
Copy link
Copy Markdown

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@agones-bot
Copy link
Copy Markdown
Collaborator

Build Succeeded 🥳

Build Id: 36f52ae5-e103-41ec-97e7-ce2b43506ba6

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4528/head:pr_4528 && git checkout pr_4528
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.58.0-dev-2070ff2

Copy link
Copy Markdown
Collaborator

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR combines two distinct functional areas that should be discussed separately:

  1. In-place restart mechanism: The fundamental logic within the SDK sidecar to trigger container restarts via os.Exit(0) when a specific annotation is detected. This is similar to the discussion in #2781 which looks like it will be unblocked as of Kubernetes 1.35.

  2. Cron-based scheduling: The automation layer provided by the RestartController and the new RestartPolicy specification to manage periodic restarts and deadlines. If the foundation for #2781 is implemented, can this cron automation be simplified or addressed as a separate follow-up feature?

CC: @markmandel @stevefan1999-personal

@markmandel
Copy link
Copy Markdown
Member

Thanks for the ping @igooch - I've been meaning to jump into this.

I haven't checked the code yet (and I should), but I was going to push back on this whole PR pretty hard for a variety of reasons:

  1. No design in the ticket -- what is the right approach for this? Why this approach. Our contributing guide says to do design first in an issue before developing as a general principle.
  2. No broad consensus on the ticket. I know @stevefan1999-personal is super passionate about it (and we love you for that) - it would be good to get some general consensus from approvers that this is a feature and maintenance burden we want to take on (at first pass, it feels like something that people could do with a script inside the container?). Also would be ideal if more than one person said "yes, we want this" or if we (maintainers) think this will apply broadly. Just because one person wants something, doesn't mean it's something that we all agree on.
  3. It's a huge PR -- which makes it hard to review, and also easy to miss issues. Assuming we're happy with (a) the design (b we agree it should be something we want to do , then it should be broken up into smaller PRs to make it easy for review (this should also be part of our contributing guide as well -- I would like it to be when we review it as part of CNCF onboarding).

So I kinda want to see this closed, and have the conversation come back to the issue, see if we can reach consensus, and then we can talk plan and implementation.

@markmandel
Copy link
Copy Markdown
Member

I'll also make a semi side comment - can always post on an issue, but there is always #development on the slack server! I recently made it less noisy so that if anyone is ever wondering about best practices or talking about issues - that's a great place to have the conversations.

COME ON DOWN! 😄

@stevefan1999-personal
Copy link
Copy Markdown

It looks like this PR combines two distinct functional areas that should be discussed separately:

  1. In-place restart mechanism: The fundamental logic within the SDK sidecar to trigger container restarts via os.Exit(0) when a specific annotation is detected. This is similar to the discussion in Allow graceful restarts of game server containers instead of shutdown #2781 which looks like it will be unblocked as of Kubernetes 1.35.
  2. Cron-based scheduling: The automation layer provided by the RestartController and the new RestartPolicy specification to manage periodic restarts and deadlines. If the foundation for Allow graceful restarts of game server containers instead of shutdown #2781 is implemented, can this cron automation be simplified or addressed as a separate follow-up feature?

CC: @markmandel @stevefan1999-personal

Surface area of this PR is good enough so split this into two, so I'm in favor of two separate PRs.

I need the container to restart to download the new dedicated server updates and within game the game server intrinsically detects if there is a new update right now, but I have code the SourceMod or AMXX plugins to detect if we need to restart the servers.

However, sometimes you can't do that with self hosted game servers, those are usually like UE4 or UE5, so a more simple solution is to periodcally restart the server on, say like 4AM in the morning, since there are usually no people playing around that time, then the hours-long update downloading init container will kick in...until like 11AM where there are some people coming. That's the reason I needed cron support to chronically restart the servers on schedules...another benefit is to restart tactically to battle the memory leak problem over a long session of serving the games, as the memory gets more fragmented, so does the performance of the game server, I've seen it went down from 66 ticks per second down to 30-40 ticks per second, just because the game has too many memory allocations over 24 hours and the memory is scattered around different places, it can't keep up. But that's more like a plus reason, and not out of necessity.

Basically it would make so much more sense if some plugins that detects whether there is a new CS2 dedicated server update and tell Agones's "hey, don't mind me if I can serve more people, just kill me after the game is finished".

An even better solution for myself: code another program to run this on schedule, build a new docker image when there is a new dedicated server, push the image, and suspend the old game servers, only run new game servers with new image, Agones will phase them out one by one. Not sure if that's a better

@markmandel
Copy link
Copy Markdown
Member

An even better solution for myself: code another program to run this on schedule, build a new docker image when there is a new dedicated server, push the image, and suspend the old game servers, only run new game servers with new image, Agones will phase them out one by one. Not sure if that's a better

This sounds like bash/python/go script you could write on your own. I don't know if it needs to be an Agones feature.

@stevefan1999-personal
Copy link
Copy Markdown

An even better solution for myself: code another program to run this on schedule, build a new docker image when there is a new dedicated server, push the image, and suspend the old game servers, only run new game servers with new image, Agones will phase them out one by one. Not sure if that's a better

This sounds like bash/python/go script you could write on your own. I don't know if it needs to be an Agones feature.

That's why it is "an even better solution for myself" as it is not a general pattern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature New features for Agones size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cron Job to restart the game server in-place

5 participants