Skip to content

feat: add ability to abort running OTA#1759

Merged
Koenkk merged 2 commits into
masterfrom
abort-ota
Jun 3, 2026
Merged

feat: add ability to abort running OTA#1759
Koenkk merged 2 commits into
masterfrom
abort-ota

Conversation

@Nerivec
Copy link
Copy Markdown
Collaborator

@Nerivec Nerivec commented May 16, 2026

On paper (i.e. spec), this should allow to abort an in-progress OTA.
Though, since vastly untested, and with the many stacks out there, it needs testing with various devices to ensure no undesired behavior can result of this.
The device has to properly process the response, and abort the OTA process on its side. While it should be fairly simple, the lack of handling could send the device into a permanently unanswered repeat of last block request (until it eventually times out on its end, if). I guess worse case scenario, albeit unlikely, would be bricking the device if for some reason the abort isn't properly handled (could try to flash with a partial image, keep a partial image in slot that could mess with device state detection...).

@andrei-lazarov @burmistrzak if you have some time to test this with actual devices.

@burmistrzak
Copy link
Copy Markdown
Contributor

Question is, what do we do when a devices replies with Malformed Command and requests another block afterwards to continue OTAU?
IIRC, that's especially an issue with Hue accessories Koenkk/zigbee2mqtt#31590 (comment)

@Nerivec
Copy link
Copy Markdown
Collaborator Author

Nerivec commented May 17, 2026

@burmistrzak this should already be fixed in #1758. We can just ignore the malformed since the device is expected to continue requesting blocks at the proper offset. The ignore was missing, which caused the undesired state after the recent PR that added default response matching.

@burmistrzak
Copy link
Copy Markdown
Contributor

@Nerivec That's good to know! I've got a bunch of outdated Hue remotes still around. So I should be able to verify the fix. 🤞🤞🤞

@andrei-lazarov
Copy link
Copy Markdown
Contributor

andrei-lazarov commented May 23, 2026

Seems to work as expected on Telink!
I tested with original Tuya fw and romasku fw, router and end-device.

Device just stops requesting blocks after receiving "ota abort". I don't see a default response or some confirmation.
One time it apparently didn't receive the "ota abort" signal (I don't see it in Wireshark), and it kept requesting the same offset. But it stopped after 10 tries, with an "upgrade end request".
So if we could have feedback + resend abort if needed, it would be perfect!

If I update again, the device continues its progress, requesting the offset where it stopped. Even if I give it "no image available" or I reboot it, it doesn't delete the stored progress.
This seemed a little dangerous.. so I tested with different images. Thankfully it behaves correctly. It deletes the progress and starts from 0 if the file version or even just the file size is different!

Also device is still working properly after all the OTAs and aborts.

Side-note, I'm not sure if I get the full picture in Wireshark with ember-zli, SLZB-06M MG21. I sometimes get this error during OTA (with fast settings 🙂).

error: zh:ember:ezsp: The adapter has run out of buffers, causing general malfunction. Remediate network congestion, if present. Last Frame: [FRAME: ID=138:"MFGLIB_INTERNAL_SET_CHANNEL" Seq=5 Len=9].

@Nerivec
Copy link
Copy Markdown
Collaborator Author

Nerivec commented May 24, 2026

Nice details, fantastic!

I don't see a default response or some confirmation.

I wouldn't expect one, since what we send is a response (response to block request with ABORT).

it kept requesting the same offset. But it stopped after 10 tries, with an "upgrade end request".

I assume it was an end request with status ABORT?
We should already be sending a default response to that one, with SUCCESS (per spec).

Interesting about the resume behavior. Looks like it's handling edge cases properly though.

Any chance you have a Silabs device to test with? Covers a lot of devices nowadays 😁

@andrei-lazarov
Copy link
Copy Markdown
Contributor

andrei-lazarov commented May 24, 2026

end request with status ABORT?

Will check again

Silabs device to test with?

Yes, I have plenty. I aborted a Silabs IKEA button a few times. It worked well, but I didn't have time to finish the ota. This one never saved progress on abort. Will test some more

@Nerivec Nerivec marked this pull request as ready for review June 3, 2026 14:54
@Koenkk
Copy link
Copy Markdown
Owner

Koenkk commented Jun 3, 2026

Thanks!

@Koenkk Koenkk merged commit a70e57b into master Jun 3, 2026
4 checks passed
@Koenkk Koenkk deleted the abort-ota branch June 3, 2026 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants