Skip to content

Conversation

@anagno
Copy link

@anagno anagno commented Dec 7, 2025

According to the Kubernetes docs the probes should be executed after the initialDelaySeconds. So to be consistent with the kubernetes specs, skip the execution of the probes until the initialDelaySeconds is elapsed.

Closes #27678

Executing the same test mentioned in the issue we get:

podman exec -it healthcheck-test-tester tail -f /tmp/healthcheck.log
Starting container at 2025-12-07 12:33:23
Healthcheck at 2025-12-07 12:35:27
Healthcheck at 2025-12-07 12:35:33
Healthcheck at 2025-12-07 12:35:39
Healthcheck at 2025-12-07 12:35:45

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, use git commit -s --amend). The author email must match
    the sign-off email address. See CONTRIBUTING.md
    for more information.
  • Referenced issues using Fixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed) -- To be honest I do not know how to add a bats script to check that the healthcheck cmd is executed afterwards.
  • Documentation has been updated (or no documentation changes are needed)
  • All commits pass make validatepr (format/lint checks)
  • Release note entered in the section below (or None if no user-facing changes)

Does this PR introduce a user-facing change?

The first liveness probe is executed after the `initialDelaySeconds` is elapsed

@anagno anagno marked this pull request as draft December 7, 2025 12:23
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 7, 2025
@anagno anagno marked this pull request as ready for review December 7, 2025 12:42
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 7, 2025
According to the [Kubernetes docs](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes)
the probes should be executed after the `initialDelaySeconds`. So to be
consistent with the kubernetes specs, skip the execution of the probes until
the `initialDelaySeconds` is elapsed.

Closes containers#27678

Signed-off-by: Vasileios Anagnostopoulos <anagnwstopoulos@hotmail.com>
@baude
Copy link
Member

baude commented Dec 8, 2025

is there any possible way you can add a test to re-enforce the behavior ?

@anagno
Copy link
Author

anagno commented Dec 8, 2025

is there any possible way you can add a test to re-enforce the behavior ?

I am not certain that there is a "visible effect" using only the podman API. Already the podman API was not reporting the results of the health check until the initialDelaySeconds was elapsed. Is there a way of retrieving the results of the healtcheck that I have missed ?

If not, then the only alternative is to use some "special" container as it was described in the issue. Is that something that can be implemented using bats ?

Copy link
Member

@Honny1 Honny1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I suggest you try implementing a test using bats and an issue reproducer.

The test requires a simple container that, on startup, writes a start timestamp to a log file. The health check should also add a health check timestamp to this file. These two timestamps should be different by at least a few seconds.

I think you can reuse what is described in the issue. It will be challenging to check this without creating a flaky test.

@mheon
Copy link
Member

mheon commented Dec 9, 2025

I think it'd be easier to write a healthcheck command that created a sentinel file in the container filesystem (something like touch /myfile) and an extremely long initialDelaySeconds to ensure that it never ran. To guarantee it's not racing, you can run podman healthcheck run manually on the container to ensure that the HC ran at least once, then do a podman exec to make sure the file wasn't created?

@mheon
Copy link
Member

mheon commented Dec 9, 2025

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anagno, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 9, 2025
@anagno anagno force-pushed the fix/probe branch 2 times, most recently from 87cb558 to 88e2c12 Compare December 10, 2025 08:59
@packit-as-a-service
Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@anagno
Copy link
Author

anagno commented Dec 10, 2025

I think it'd be easier to write a healthcheck command that created a sentinel file in the container filesystem (something like touch /myfile) and an extremely long initialDelaySeconds to ensure that it never ran. To guarantee it's not racing, you can run podman healthcheck run manually on the container to ensure that the HC ran at least once, then do a podman exec to make sure the file wasn't created?

I went with this approach :)

@anagno anagno marked this pull request as draft December 10, 2025 09:25
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
@anagno
Copy link
Author

anagno commented Dec 10, 2025

And I discover a small mistake... Give some minutes and I will fix it and then make the PR again ready :)

update: fixed

@anagno anagno marked this pull request as ready for review December 10, 2025 09:40
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
@anagno
Copy link
Author

anagno commented Dec 10, 2025

And I do not think that my changes are causing the failure ... Is there a way of re-triggering the test ?

Add a test to ensure the liveness probe is not executed until the
initialDelaySeconds period has elapsed

Signed-off-by: Vasileios Anagnostopoulos <anagnwstopoulos@hotmail.com>
@mheon
Copy link
Member

mheon commented Dec 10, 2025

LGTM
@Luap99 PTAL

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one blocker as the test var is wrong

And also please squash the commits into one, we want fix and test as one commit.

args:
- /bin/sh
- -c
- touch /tmp/healthy && sleep 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never do anything with touch /tmp/healthy so this could just be dropped

ts=$(date +"%Y-%m-%d %H:%M:%S");
echo "Healthcheck at $ts" | tee -a /tmp/healthcheck.log;
# Real check: just always succeed
true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason to overcomplicate this compared to touch /tmp/healthcheck.log, we never do anything with the date

# THEN the execution of the liveness probe should be skipped because initialDelaySeconds has not yet elapsed
run_podman 1 exec $ctrName test -e /tmp/healthcheck.log

run_podman '?' stop -t0 $ctrNam
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems wrong. We should not ignore the exit code of stop but also the car name is broken, it needs to be $ctrName

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

initialDelaySeconds is not respected and probes are executed

5 participants