Add nxp backend profiling support by irtrukhina · Pull Request #19225 · pytorch/executorch

irtrukhina · 2026-04-30T13:01:26Z

Summary

Add profiling support for the NXP backend.

Test plan

All CI tests passed including new test for the profiling feature.

cc @robert-kalmar @JakeStevens @digantdesai

pytorch-bot · 2026-04-30T13:01:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19225

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Cancelled Job, 2 Unrelated Failures, 2 Unclassified Failures

As of commit 54cf6d4 with merge base e88fd04 ():

NEW FAILURES - The following jobs have failed:

Cadence Build & Test / hifi-build / hifi4 (gh)
Input required and not supplied: aws-region
Cadence Build & Test / vision-build / vision (gh)
Input required and not supplied: aws-region
pull / test-arm-backend-no-driver (test_pytest_models_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 6f9fee96ea8b2d527e98074a67a2bcb0d17ba6acc0e69d067efdae527ce61c04 /exec failed with exit code 1
pull / test-lora-linux / linux-job (gh)
RuntimeError: Command docker exec -t 835cf67329f87ec2bb648427bb7e6931492cf1ae6515d8d0906aa7ca66b7dd73 /exec failed with exit code 1
pull / test-phi-3-mini-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 2b7d5bdf50f881fb9e7cf480359b83012b2641b90fbd66ab0df6550ae4cc0a1a /exec failed with exit code 1
pull / test-voxtral-realtime-xnnpack-linux / linux-job (gh)
RuntimeError: Command docker exec -t 4f33ce3c7ca04d5f819ce886f9efebf02419b974301d1a68bab10a65f00879b3 /exec failed with exit code 1

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Aarch64 Linux Wheels / pytorch/executorch / build-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/__w/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:722:48: error: ‘inmemoryfs::InMemoryFileSystem::InMemoryNode::Kind’ has not been declared
Build Aarch64 Linux Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_aarch64

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest / linux / linux-job (gh)
##[error]The operation was canceled.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-moshi-linux / linux-job (gh) (matched linux rule in flaky-rules.json)
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/mesa-vdpau-drivers_23.2.1-1ubuntu3.1%7e22.04.3_amd64.deb 404 Not Found [IP: 104.20.28.246 80]
pull / unittest / macos / macos-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

MartinPavella · 2026-05-04T09:43:18Z

+PATTERN_VERBOSE_KERNELS = (r"\"subgraph_(?P<subgraph>\d+)\"\:\s*"
+                           r"Operators:[\s\S]*?"
+                           r"Kernels:\s*(?P<kernels>[\s\S]*?)"
+                           r"\s*(NeutronOperator|^$|=)")


Why is the r"\s*(NeutronOperator|^$|=)") at the end of the pattern, when in the example above the NeutronOperator is in the beginning?

I will add one more NeutronOperator entry to the example above. Each subgraph block starts with a line containing NeutronOperator "subgraph_xxx" and includes all following lines (such as Operators and Kernels) until the next NeutronOperator line. This way, the entire subgraph information is captured.

Added one more NeutronOperator entry to the example above the pattern

MartinPavella · 2026-05-04T09:46:16Z

+
+    [name, inputs[], outputs[]]
+
+    """


Nit: This doc-string does not seem very useful. Is it intentionally like this?

MartinPavella · 2026-05-04T10:15:36Z

+                        if name_matches == len(tf_node.outputs):
+                            subgraph_idxs.append(subgraph.location)
+            if subgraph_idxs:
+                tflite_to_neutron_dict[tf_idx] = tuple(subgraph_idxs)


So if a TFLite operator has the same inputs as a Neutron operator, they are matched in the dictionary?

I think the doc-string for this method should mention the approach.

Is it reliable? What about models with residual connections (or any parallel nodes). You can have many different parallel operators which have the exact same inputs.

It is a slightly more complicated than just the same. When a single TFLite layer is mapped to multiple Neutron layers, the inputs are named using the format "tflite_input/additional_name". For this reason, I use "in" for comparison.

I will add the approach to the doc-string.

Reliability: some times the algorithm cannot find corresponding layers according inputs/outputs, but I can see them with additional effort. It concerns mainly PAD operators. In this case they just not mapped.
Residual connections and parallel nodes: the algorithm compares all inputs. For example, a convolution has three inputs: input tensor, filter tensor and bias tensor. For parallel nodes input tensor can be the same, but filter and bias differ.

Thank you for the explanation regarding the tensor names. It makes more sense now.

Regarding point 2:
You could still have a MaxPool/AvgPool (or something else that only has 1 input tensor) in parallel consuming the same input. Wouldn't the algorithm fail in this case?

Fixed the issue by adding a post-check to ensure that a TFLite node is not mapped to more than one single-input node. If it is mapped to multiple single-input nodes, only the node with the corresponding name is retained.

MartinPavella · 2026-05-04T12:08:21Z

+for additional details.
+
+The following command creates a profilable `cifar10_nxp_delegate.pte` model and the corresponding `ETRecord` file, 
+compatible with the **i.MX RT700** board using **MCUXpresso SDK 25.06**:


Is 25.06 correct?
I have no insights here, it just seems a bit old.

The latest version is 26.3. But the overview section references minimum SDK version 25.12. I tested everything using version 25.06, and the dim-order section also refers to 25.06. I prefer to keep the version I tested with.

... or remove the sentence.

And what eIQ Neutron SDK you used?
What is the dependency on MCUX 25.06?

Removed the MCUX version from documentation. I used 25.06.

MartinPavella · 2026-05-04T12:18:47Z

+        # Check if ETRecord was created and saved.
+        assert f"The ETRecord for the model was saved to" in process_output
+        assert etrecord_file.exists(), f"ETRecord file not created at {etrecord_file}"
+


How about also calling the analyzing_with_inspector.py here to make sure everything works as expected?

To run Inspector, an etdump file is required. It contains profiling information from the board. The generated etrecord is only an optional input for Inspector and cannot be opened without the corresponding etdump file.
I added profiling tests in generic_tests/tests_profiling.py. These tests verify the correctness of the generated mapping for simpler models. But don't store any etrecords.
This test should only confirm that the record is created and stored successfully during model convertion.

Makes sense, thanks for the explanation. If this test focuses on checking that the ETRecord is generated, can't we add another test somewhere which also runs the profiling and checks the results? I didn't find such a test in test_profiling.py, sorry if I missed something.

test_aot_example.py checks that an ETRecord is generated and stored when profiling is enabled.
test_profiling.py checks that the debug handle map is generated correctly.
It may look like ETRecord is not generated in test_profiling.py, but this is not true. It is generated, just not saved to a file.
How it works:

I enable automatic ETRecord generation in to_edge_transform_and_lower (in executorch_pipeline.py):
edge_program_manager = to_edge_transform_and_lower( ..., **generate_etrecord=use_profiling** )

Then I compute the neutron-to-edge map. This part is covered by the test_profiling.py tests.

I return the map as part of PreprocessResult in the preprocess method (in nxp_backend.py):
return PreprocessResult( processed_bytes=binary, debug_handle_map=neutron_to_edge_map )

ETRecord generation and attaching the map are handled by devtools, which already have their own tests.
My task is only to check that the ETRecord contains the correct map. I print the map in the logs, so it is enough to verify it there. And also verify that profiling does not affect the model output.
The reason why I take map from a log is that I use lower_run_compare() function and it doesn't return exec_prog, which contain ETRecord.
I can move the CIFAR test from test_profiling.py into test_aot_example.py and check the map correctness directly from exec_prog and not from log. However, I think that test_aot_example.py and test_profiling.py together already provide sufficient coverage of profiling.

I'm sorry, I guess my comment wasn't clear. What I meant is that I don't see any tests that actually run the model. You have made changes to the NeutronBackend.cpp file and I don't see any test verifying those changes.

Understood now.
The CModel doesn’t support the profiling feature (only FPGA and the board do) because it doesn’t calculate ticks. Therefore, I didn’t consider testing the NeutronBackend.cpp changes with NSYS and instead tested them on the board.
Another issue is that etdump has to be saved manually at a breakpoint in MCUX.
One possible approach is to print the etdump binary to the console and parse it from the test. However, even in that case, there would be no neutron kernel information in the etdump — only CPU kernels and delegate call.

Ok, thank you for the explanation :)

robert-kalmar

Review (Part 1/x)

robert-kalmar · 2026-05-14T15:59:38Z

+for additional details.
+
+The following command creates a profilable `cifar10_nxp_delegate.pte` model and the corresponding `ETRecord` file, 
+compatible with the **i.MX RT700** board using **MCUXpresso SDK 25.06**:


And what eIQ Neutron SDK you used?
What is the dependency on MCUX 25.06?

robert-kalmar

Please fix the linting errors and look at our comments. For information how to run the lintrunner see https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md .

In short:

$ lintrunner init
$ lintrunner --merge-base-with <reference to last commit outside of your contribution>

robert-kalmar · 2026-05-14T18:53:14Z

Internal CI build: https://bamboo3.sw.nxp.com/browse/MLTECE-EXIGH146

linux-foundation-easycla · 2026-06-02T08:08:13Z

❌ - login: @irtrukhina / name: Irina Korchakova. The commit (54cf6d4, 77309b7, a0a145b, de53846) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

MartinPavella · 2026-06-11T12:40:19Z

+)
+# Two graphs are expected in the input log: original and converted.
+EXPECTED_GRAPHS = 2
+# List of single-input nodes that shouldn't be mapped on the same TFLite node.


I don't really understand which nodes is this list supposed to contain. What does the comment mean? Are there any single input nodes which shouldn't be in this list?
Also, is this list complete? Or will we ever need to add operators to it?

I created this list from the output of ./neutron-converter --show-kernel-kinds --target imxrt700. I selected all single-input kernels that can be executed in parallel with the same input.
As I understand, no additional kernels are expected for Neutron-C. If that is correct, this should be a complete list for Neutron-C.
The comment means that if two or more kernels from the list are mapped to the same TFLite node, this is incorrect. In such cases, we need to further analyze and filter the results to keep only one Neutron node that corresponds to the TFLite node name.

MartinPavella · 2026-06-12T07:56:07Z

+        assert neutron_map == {
+            0: (6,),
+            1: (),
+            2: (7,),
+            3: (),
+            4: (),
+            5: (),
+            6: (),
+            7: (),
+            8: (),
+            9: (),
+            10: (),
+        }


Is this really the expected result? Only 2 operators are mapped?
I would expect 5 mapped operators (based on the model used).

It is very "unhappy" example :) There are many mapping issues:

AvgPool → Conv2DDepthwiseV2 conversion
The mapping works correctly when there are no other single-input nodes in parallel. However, in this case there is also a MaxPool with the same input. As a result, the single-input node filter is triggered and removes all mapping entities if the Neutron and TFLite names do not match.
This is a simple way to eliminate incorrectly mapped nodes. While it has some side effects, it at least prevents incorrect mappings. I believe this is a relatively rare scenario.

Concat + Conv2d transformation
These are transformed into multiple layers that are not represented by either the Neutron Concat kernel or the Neutron Conv2DStandardV2 kernel. Consequently, the inputs and outputs do not match.
In a “better” example, the concatenation node does not split into multiple nodes and is correctly converted into a Concat kernel.

I could modify the model so that Concat and Conv2d are converted into the corresponding kernels. However, I actually like this test case, as it can serve as useful input for further profiling improvements.

For better understanding (I will add these comments to tests as well):
0: (6,), # Conv2DStandardV2
1: (), # Conv2DDepthwiseV2 (AvgPool)
2: (7,), # MaxPool
3: (), # TransposeCHW
4: (), # TransposeCHW
5: (), # TransposeCHW
6: (), # Slice
7: (), # Pad
8: (), # Conv2DPointwise
9: (), # Slice
10: () # Neutron Dump

MartinPavella · 2026-06-12T08:06:51Z

+        # Check if ETRecord was created and saved.
+        assert f"The ETRecord for the model was saved to" in process_output
+        assert etrecord_file.exists(), f"ETRecord file not created at {etrecord_file}"
+


Makes sense, thanks for the explanation. If this test focuses on checking that the ETRecord is generated, can't we add another test somewhere which also runs the profiling and checks the results? I didn't find such a test in test_profiling.py, sorry if I missed something.

MartinPavella · 2026-06-12T08:12:49Z

@@ -0,0 +1,205 @@
+# NXP eIQ Profiling Support


Very nicely documented 👍🏻

Signed-off-by: Irina Korchakova <irina.trukhina@nxp.com>

irtrukhina requested review from mergennachin and robert-kalmar as code owners April 30, 2026 13:01

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 30, 2026

robert-kalmar added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels May 4, 2026

robert-kalmar requested review from MartinPavella and jirioc May 4, 2026 09:26

MartinPavella reviewed May 4, 2026

View reviewed changes

robert-kalmar reviewed May 14, 2026

View reviewed changes

Comment thread examples/nxp/analyzing_with_inspector.py Outdated

Comment thread examples/nxp/aot_neutron_compile.py Outdated

Comment thread examples/nxp/aot_neutron_compile.py Outdated

irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from 7a2d30c to f7c7896 Compare June 2, 2026 08:08

irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch 2 times, most recently from 9d924bb to 9a8a99b Compare June 2, 2026 19:28

irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from 9a8a99b to d45d32c Compare June 9, 2026 14:26

MartinPavella reviewed Jun 12, 2026

View reviewed changes

robert-kalmar reviewed Jun 12, 2026

View reviewed changes

irtrukhina added 2 commits June 15, 2026 15:48

Add nxp backend profiling support

a0a145b

Signed-off-by: Irina Korchakova <irina.trukhina@nxp.com>

Mark newly added link with lint-ignore

77309b7

irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from c056d88 to 8ef1427 Compare June 16, 2026 06:25

Fix after review

de53846

irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch 2 times, most recently from 620af28 to ec062cc Compare June 17, 2026 21:35

Fix after rebase

54cf6d4

irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from ec062cc to 54cf6d4 Compare June 17, 2026 21:41

irtrukhina requested review from MartinPavella and robert-kalmar June 18, 2026 06:29

irtrukhina had a problem deploying to cadence June 18, 2026 09:03 — with GitHub Actions Failure

robert-kalmar approved these changes Jun 18, 2026

View reviewed changes

Conversation

irtrukhina commented Apr 30, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19225

❌ 6 New Failures, 1 Cancelled Job, 2 Unrelated Failures, 2 Unclassified Failures

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irtrukhina Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irtrukhina Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robert-kalmar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review (Part 1/x)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robert-kalmar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

irtrukhina commented Apr 30, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Apr 30, 2026 •

edited

Loading

irtrukhina Jun 12, 2026 •

edited

Loading

irtrukhina Jun 12, 2026 •

edited

Loading

robert-kalmar left a comment •

edited

Loading

robert-kalmar left a comment •

edited

Loading

linux-foundation-easycla Bot commented Jun 2, 2026 •

edited

Loading