Skip to content

OCPBUGS-78767 [release-4.21]: fix: add timeout to sysfs writes to prevent daemon hang#1182

Open
zeeke wants to merge 2 commits into
openshift:release-4.21from
zeeke:worktree-backport-sysfs-timeout-4.21
Open

OCPBUGS-78767 [release-4.21]: fix: add timeout to sysfs writes to prevent daemon hang#1182
zeeke wants to merge 2 commits into
openshift:release-4.21from
zeeke:worktree-backport-sysfs-timeout-4.21

Conversation

@zeeke

@zeeke zeeke commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

Kernel drivers (e.g. i40e) can block indefinitely when writing to sriov_numvfs if the device is in a bad state. For example, the following error has been hit on a Intel XXV710 NIC:

Feb 16 13:53:01 worker0 kernel: 06c73374b594186: left promiscuous mode
Feb 16 13:53:01 worker0 kernel: i40e 0000:3b:00.0: Setting MAC 5e:28:32:f0:80:20 on VF 1
Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Bring down and up the VF interface to make this change effective.
Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Unable to configure VFs, other operation is pending.
Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Unable to configure VFs, other operation is pending.
Feb 16 13:53:02 worker0 kernel: 152a5b6a3b44739: left promiscuous mode

Replace direct os.WriteFile calls in SetSriovNumVfs with a new WriteFileWithTimeout utility that runs the write in a goroutine and returns a timeout error after 2 minutes.

Kernel drivers (e.g. i40e) can block indefinitely when writing to sriov_numvfs if the
device is in a bad state. For example, the following error has been hit on a `Intel XXV710` NIC:

```
Feb 16 13:53:01 worker0 kernel: 06c73374b594186: left promiscuous mode
Feb 16 13:53:01 worker0 kernel: i40e 0000:3b:00.0: Setting MAC 5e:28:32:f0:80:20 on VF 1
Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Bring down and up the VF interface to make this change effective.
Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Unable to configure VFs, other operation is pending.
Feb 16 13:53:02 worker0 kernel: i40e 0000:3b:00.0: Unable to configure VFs, other operation is pending.
Feb 16 13:53:02 worker0 kernel: 152a5b6a3b44739: left promiscuous mode
```

Replace direct `os.WriteFile` calls in SetSriovNumVfs with a new `WriteFileWithTimeout` utility that
runs the write in a goroutine and returns a timeout error after 2 minutes.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@zeeke

zeeke commented Mar 24, 2026

Copy link
Copy Markdown
Contributor Author

/jira cherrypick OCPBUGS-78767

@openshift-ci openshift-ci Bot requested review from Billy99 and MrSanketkumar March 24, 2026 09:04
@openshift-ci

openshift-ci Bot commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zeeke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 24, 2026
Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@openshift-ci

openshift-ci Bot commented Mar 24, 2026

Copy link
Copy Markdown
Contributor

@zeeke: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@SchSeba

SchSeba commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

/title OCPBUGS-78767 [release-4.21]: fix: add timeout to sysfs writes to prevent daemon hang

@cgoncalves

Copy link
Copy Markdown
Contributor

/retitle OCPBUGS-78767 [release-4.21]: fix: add timeout to sysfs writes to prevent daemon hang

@openshift-ci openshift-ci Bot changed the title [release-4.21]: fix: add timeout to sysfs writes to prevent daemon hang OCPBUGS-78767 [release-4.21]: fix: add timeout to sysfs writes to prevent daemon hang May 26, 2026
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 26, 2026
@openshift-ci

openshift-ci Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants