Skip to content

Guard nil asset.contentURL in ETCoreMLModel initWithAsset:#20336

Closed
metascroy wants to merge 1 commit into
pytorch:mainfrom
metascroy:export-D108334793
Closed

Guard nil asset.contentURL in ETCoreMLModel initWithAsset:#20336
metascroy wants to merge 1 commit into
pytorch:mainfrom
metascroy:export-D108334793

Conversation

@metascroy

@metascroy metascroy commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary:
Fixes EXC_BAD_ACCESS / KERN_INVALID_ADDRESS in -[ETCoreMLModel initWithAsset:configuration:orderedInputNames:orderedOutputNames:error:] at xplat/executorch/backends/apple/coreml/runtime/delegate/ETCoreMLModel.mm:197.

Root cause: the initializer calls [asset keepAliveAndReturnError:] which validates the asset's file lock but does not validate that asset.contentURL is non-nil. When the compiled .mlmodelc was reaped from disk between the lock and the load (or the asset was never fully materialized), asset.contentURL returns nil. +[MLModel modelWithContentsOfURL:configuration:error:] does not tolerate a nil URL — it deref's into CoreML internals and faults.

Fix: capture asset.contentURL into a local, return nil with ETCoreMLErrorCorruptedModel populated in the out-error if the URL is nil. No behavior change for valid assets — callers already handle the documented nil + error contract.

Differential Revision: D108334793

Summary:
Fixes EXC_BAD_ACCESS / KERN_INVALID_ADDRESS in `-[ETCoreMLModel initWithAsset:configuration:orderedInputNames:orderedOutputNames:error:]` at `xplat/executorch/backends/apple/coreml/runtime/delegate/ETCoreMLModel.mm:197`.

MID: `2bcd0a9cec05c99079249d043c273b07` — https://www.internalfb.com/logview/facebook_ios_fads/2bcd0a9cec05c99079249d043c273b07

Context from MID: 3,094 occurrences in the past 7 days, first seen 2026-04-01, last seen 2026-06-11. Surfaced in the `FBVideoHomeUnifiedPlayerViewController:warion_vdd_feed` Scuba slice (https://fburl.com/scuba/errorreporting_facebook_ios_fads/0x31y7si) but the actual fault is in the ExecuTorch CoreML delegate, not the player VC.

Root cause: the initializer calls `[asset keepAliveAndReturnError:]` which validates the asset's file lock but does not validate that `asset.contentURL` is non-nil. When the compiled `.mlmodelc` was reaped from disk between the lock and the load (or the asset was never fully materialized), `asset.contentURL` returns nil. `+[MLModel modelWithContentsOfURL:configuration:error:]` does not tolerate a nil URL — it deref's into CoreML internals and faults.

Fix: capture `asset.contentURL` into a local, return nil with `ETCoreMLErrorCorruptedModel` populated in the out-error if the URL is nil. No behavior change for valid assets — callers already handle the documented `nil` + `error` contract.

Differential Revision: D108334793
@metascroy metascroy requested a review from shoumikhin as a code owner June 17, 2026 16:32
@pytorch-bot

pytorch-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20336

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 23 Pending, 1 Unrelated Failure

As of commit 76d7024 with merge base ef5c8a7 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2026
@meta-codesync

meta-codesync Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

@metascroy has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108334793.

@linux-foundation-easycla

Copy link
Copy Markdown

CLA Missing ID

  • ❌ The email address for the commit (76d7024) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@metascroy

Copy link
Copy Markdown
Contributor Author

@claude review this code

@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown

Claude finished @metascroy's task in 4m 10s —— View job


Code Review: Guard nil asset.contentURL

I reviewed the change in backends/apple/coreml/runtime/delegate/ETCoreMLModel.mm. Overall this is a clean, well-scoped defensive fix that matches the existing style in the initializer (same ETCoreMLLogErrorAndSetNSError + ETCoreMLErrorCorruptedModel + return nil pattern used for orderedInputNames/orderedOutputNames a few lines below). The contract is preserved — nil return with the out-error populated — so callers that already handle the documented nil + error path need no changes.

Verdict: LGTM. The fix is correct and the root-cause analysis is sound. A few minor observations below, none blocking.

Observations

  1. NS_ASSUME_NONNULL annotation mismatch (worth noting). In ETCoreMLAsset.h:30 the property is declared inside an NS_ASSUME_NONNULL_BEGIN/END block:

    @property (copy, readonly, nonatomic) NSURL* contentURL;   // implicitly nonnull

    So the type system claims contentURL can never be nil, yet this PR (correctly) guards against exactly that. The runtime reality wins here, but the annotation is now misleading. Consider marking it nullable to make the contract honest and let the compiler flag other unguarded call sites:

    @property (copy, readonly, nonatomic, nullable) NSURL* contentURL;

    This is the more complete fix, since other consumers (e.g. ETCoreMLAsset.mm:103, and the various sdk/ files) dereference contentURL without a check. Out of scope for this hotfix, but flagging since the same crash class could surface elsewhere.

  2. keepAliveAndReturnError: already opens the files (ETCoreMLAsset.mm:103), building each fileURL relativeToURL:self.contentURL. If contentURL were nil, those fileURLWithPath:relativeToURL: calls would produce paths relative to nothing and fopen would likely fail first — so in the lock-then-reap race described, keepAlive may sometimes catch it. The new guard is still valuable as the authoritative check right before +[MLModel modelWithContentsOfURL:], which is where the actual fault occurs. No change needed; just confirming the guard is in the right place.

  3. Error message: "asset.contentURL is nil" is clear and consistent with the neighboring messages. 👍

The change does what it says and is safe to land. The only follow-up I'd suggest is auditing the other contentURL dereferences and/or making the property nullable so this failure mode is caught at compile time rather than discovered via EXC_BAD_ACCESS.
• branch export-D108334793

@metascroy metascroy left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Purely defensive.

@metascroy metascroy closed this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant