Skip to content

fix: remove hardcoded CUDA assumptions in ERFNet to support CPU-only execution#554

Open
Fahmid-Arman wants to merge 1 commit into
kubeedge:mainfrom
Fahmid-Arman:fix-issue-471-erfnet-cuda-hardware
Open

fix: remove hardcoded CUDA assumptions in ERFNet to support CPU-only execution#554
Fahmid-Arman wants to merge 1 commit into
kubeedge:mainfrom
Fahmid-Arman:fix-issue-471-erfnet-cuda-hardware

Conversation

@Fahmid-Arman

Copy link
Copy Markdown

What type of PR is this?
/kind bug

What this PR does / why we need it:
The ERFNet training and evaluation utilities contained hardcoded CUDA availability assumptions, causing the pipeline to crash immediately on CPU-only machines and Apple Silicon devices.

  • Dynamically assigned self.cuda = torch.cuda.is_available() (and self.no_cuda) within TrainArgs and ValArgs in utils/args.py.
  • Refactored checkpoint loading in train.py to use a dynamic map_location supporting CPU fallback.
  • Added a missing if self.args.cuda: guard around a target.cuda() call within the training() loop in train.py to match the surrounding tensor handling logic.

This unblocks ERFNet execution for contributors running the benchmark on non-NVIDIA hardware.

Which issue(s) this PR fixes:
Fixes #471

@kubeedge-bot kubeedge-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 15, 2026
@kubeedge-bot

Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Fahmid-Arman
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 15, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request dynamically configures CUDA usage based on GPU availability instead of hardcoding it. Key changes include conditional checkpoint loading and tensor GPU transfer in train.py, and initializing CUDA flags dynamically using torch.cuda.is_available() in utils/args.py. The review feedback suggests improving this by using the configured args.gpu_ids instead of hardcoding 'cuda:0' when loading checkpoints, and allowing cuda and no_cuda parameters to be overridden via kwargs in both TrainArgs and ValArgs to prevent conflicting states and support explicit CPU execution.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…execution

Signed-off-by: Fahmid Arman <fahmid.brac@gmail.com>
@Fahmid-Arman Fahmid-Arman force-pushed the fix-issue-471-erfnet-cuda-hardware branch from ba1d607 to 4596a15 Compare June 15, 2026 22:03
@Fahmid-Arman

Copy link
Copy Markdown
Author

Feedback addressed:

  • Parameterized GPU ID indexing in map_location and added kwargs overrides for cuda and no_cuda flags to prevent state conflicts.

/assign @jaypume

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ERFNet training crashes on non-CUDA hardware due to hardcoded CUDA assumptions throughout args.py and train.py

3 participants