Skip to content

Add nxp backend profiling support#19225

Open
irtrukhina wants to merge 4 commits into
pytorch:mainfrom
nxp-upstream:feature/EIEX-752-executorch-profiling
Open

Add nxp backend profiling support#19225
irtrukhina wants to merge 4 commits into
pytorch:mainfrom
nxp-upstream:feature/EIEX-752-executorch-profiling

Conversation

@irtrukhina

@irtrukhina irtrukhina commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Add profiling support for the NXP backend.

Test plan

All CI tests passed including new test for the profiling feature.

cc @robert-kalmar @JakeStevens @digantdesai

@pytorch-bot

pytorch-bot Bot commented Apr 30, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19225

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Cancelled Job, 2 Unrelated Failures, 2 Unclassified Failures

As of commit 54cf6d4 with merge base e88fd04 (image):

NEW FAILURES - The following jobs have failed:

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 30, 2026
@robert-kalmar robert-kalmar added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels May 4, 2026
Comment thread backends/nxp/backend/edge_program_converter.py Outdated
Comment thread backends/nxp/backend/neutron_converter_manager.py Outdated
Comment thread backends/nxp/backend/neutron_map.py Outdated
PATTERN_VERBOSE_KERNELS = (r"\"subgraph_(?P<subgraph>\d+)\"\:\s*"
r"Operators:[\s\S]*?"
r"Kernels:\s*(?P<kernels>[\s\S]*?)"
r"\s*(NeutronOperator|^$|=)")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the r"\s*(NeutronOperator|^$|=)") at the end of the pattern, when in the example above the NeutronOperator is in the beginning?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add one more NeutronOperator entry to the example above. Each subgraph block starts with a line containing NeutronOperator "subgraph_xxx" and includes all following lines (such as Operators and Kernels) until the next NeutronOperator line. This way, the entire subgraph information is captured.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one more NeutronOperator entry to the example above the pattern

Comment thread backends/nxp/backend/neutron_map.py Outdated

[name, inputs[], outputs[]]

"""

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This doc-string does not seem very useful. Is it intentionally like this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote

Comment thread backends/nxp/backend/neutron_map.py Outdated
if name_matches == len(tf_node.outputs):
subgraph_idxs.append(subgraph.location)
if subgraph_idxs:
tflite_to_neutron_dict[tf_idx] = tuple(subgraph_idxs)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a TFLite operator has the same inputs as a Neutron operator, they are matched in the dictionary?

  1. I think the doc-string for this method should mention the approach.
  2. Is it reliable? What about models with residual connections (or any parallel nodes). You can have many different parallel operators which have the exact same inputs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a slightly more complicated than just the same. When a single TFLite layer is mapped to multiple Neutron layers, the inputs are named using the format "tflite_input/additional_name". For this reason, I use "in" for comparison.

  1. I will add the approach to the doc-string.
  2. Reliability: some times the algorithm cannot find corresponding layers according inputs/outputs, but I can see them with additional effort. It concerns mainly PAD operators. In this case they just not mapped.
    Residual connections and parallel nodes: the algorithm compares all inputs. For example, a convolution has three inputs: input tensor, filter tensor and bias tensor. For parallel nodes input tensor can be the same, but filter and bias differ.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation regarding the tensor names. It makes more sense now.

Regarding point 2:
You could still have a MaxPool/AvgPool (or something else that only has 1 input tensor) in parallel consuming the same input. Wouldn't the algorithm fail in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the issue by adding a post-check to ensure that a TFLite node is not mapped to more than one single-input node. If it is mapped to multiple single-input nodes, only the node with the corresponding name is retained.

Comment thread backends/nxp/tests/generic_tests/test_aot_example.py Outdated
for additional details.

The following command creates a profilable `cifar10_nxp_delegate.pte` model and the corresponding `ETRecord` file,
compatible with the **i.MX RT700** board using **MCUXpresso SDK 25.06**:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 25.06 correct?
I have no insights here, it just seems a bit old.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest version is 26.3. But the overview section references minimum SDK version 25.12. I tested everything using version 25.06, and the dim-order section also refers to 25.06. I prefer to keep the version I tested with.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or remove the sentence.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what eIQ Neutron SDK you used?
What is the dependency on MCUX 25.06?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the MCUX version from documentation. I used 25.06.

Comment thread docs/source/backends/nxp/nxp-profiling.md Outdated
# Check if ETRecord was created and saved.
assert f"The ETRecord for the model was saved to" in process_output
assert etrecord_file.exists(), f"ETRecord file not created at {etrecord_file}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about also calling the analyzing_with_inspector.py here to make sure everything works as expected?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run Inspector, an etdump file is required. It contains profiling information from the board. The generated etrecord is only an optional input for Inspector and cannot be opened without the corresponding etdump file.
I added profiling tests in generic_tests/tests_profiling.py. These tests verify the correctness of the generated mapping for simpler models. But don't store any etrecords.
This test should only confirm that the record is created and stored successfully during model convertion.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the explanation. If this test focuses on checking that the ETRecord is generated, can't we add another test somewhere which also runs the profiling and checks the results? I didn't find such a test in test_profiling.py, sorry if I missed something.

@irtrukhina irtrukhina Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_aot_example.py checks that an ETRecord is generated and stored when profiling is enabled.
test_profiling.py checks that the debug handle map is generated correctly.
It may look like ETRecord is not generated in test_profiling.py, but this is not true. It is generated, just not saved to a file.
How it works:

  1. I enable automatic ETRecord generation in to_edge_transform_and_lower (in executorch_pipeline.py):
    edge_program_manager = to_edge_transform_and_lower( ..., **generate_etrecord=use_profiling** )
  2. Then I compute the neutron-to-edge map. This part is covered by the test_profiling.py tests.
  3. I return the map as part of PreprocessResult in the preprocess method (in nxp_backend.py):
    return PreprocessResult( processed_bytes=binary, debug_handle_map=neutron_to_edge_map )

ETRecord generation and attaching the map are handled by devtools, which already have their own tests.
My task is only to check that the ETRecord contains the correct map. I print the map in the logs, so it is enough to verify it there. And also verify that profiling does not affect the model output.
The reason why I take map from a log is that I use lower_run_compare() function and it doesn't return exec_prog, which contain ETRecord.
I can move the CIFAR test from test_profiling.py into test_aot_example.py and check the map correctness directly from exec_prog and not from log. However, I think that test_aot_example.py and test_profiling.py together already provide sufficient coverage of profiling.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I guess my comment wasn't clear. What I meant is that I don't see any tests that actually run the model. You have made changes to the NeutronBackend.cpp file and I don't see any test verifying those changes.

@irtrukhina irtrukhina Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood now.
The CModel doesn’t support the profiling feature (only FPGA and the board do) because it doesn’t calculate ticks. Therefore, I didn’t consider testing the NeutronBackend.cpp changes with NSYS and instead tested them on the board.
Another issue is that etdump has to be saved manually at a breakpoint in MCUX.
One possible approach is to print the etdump binary to the console and parse it from the test. However, even in that case, there would be no neutron kernel information in the etdump — only CPU kernels and delegate call.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you for the explanation :)

@robert-kalmar robert-kalmar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (Part 1/x)

Comment thread backends/nxp/backend/edge_program_converter.py Outdated
Comment thread backends/nxp/backend/neutron_converter_manager.py Outdated
Comment thread backends/nxp/backend/neutron_map.py Outdated
Comment thread backends/nxp/backend/neutron_map.py Outdated
Comment thread backends/nxp/tests/generic_tests/test_aot_example.py Outdated
Comment thread backends/nxp/tests/generic_tests/test_aot_example.py Outdated
Comment thread backends/nxp/nxp_backend.py Outdated
Comment thread docs/source/backends/nxp/nxp-profiling.md Outdated
Comment thread docs/source/backends/nxp/nxp-profiling.md Outdated
for additional details.

The following command creates a profilable `cifar10_nxp_delegate.pte` model and the corresponding `ETRecord` file,
compatible with the **i.MX RT700** board using **MCUXpresso SDK 25.06**:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what eIQ Neutron SDK you used?
What is the dependency on MCUX 25.06?

@robert-kalmar robert-kalmar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the linting errors and look at our comments. For information how to run the lintrunner see https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md .

In short:

$ lintrunner init
$ lintrunner --merge-base-with <reference to last commit outside of your contribution>

Comment thread examples/nxp/analyzing_with_inspector.py Outdated
Comment thread examples/nxp/aot_neutron_compile.py Outdated
Comment thread examples/nxp/aot_neutron_compile.py Outdated
@robert-kalmar

Copy link
Copy Markdown
Collaborator

Internal CI build: https://bamboo3.sw.nxp.com/browse/MLTECE-EXIGH146

@irtrukhina irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from 7a2d30c to f7c7896 Compare June 2, 2026 08:08
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 2, 2026

Copy link
Copy Markdown

CLA Not Signed

@irtrukhina irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch 2 times, most recently from 9d924bb to 9a8a99b Compare June 2, 2026 19:28
@irtrukhina irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from 9a8a99b to d45d32c Compare June 9, 2026 14:26
)
# Two graphs are expected in the input log: original and converted.
EXPECTED_GRAPHS = 2
# List of single-input nodes that shouldn't be mapped on the same TFLite node.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand which nodes is this list supposed to contain. What does the comment mean? Are there any single input nodes which shouldn't be in this list?
Also, is this list complete? Or will we ever need to add operators to it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this list from the output of ./neutron-converter --show-kernel-kinds --target imxrt700. I selected all single-input kernels that can be executed in parallel with the same input.
As I understand, no additional kernels are expected for Neutron-C. If that is correct, this should be a complete list for Neutron-C.
The comment means that if two or more kernels from the list are mapped to the same TFLite node, this is incorrect. In such cases, we need to further analyze and filter the results to keep only one Neutron node that corresponds to the TFLite node name.

Comment thread backends/nxp/tests/generic_tests/test_profiling.py Outdated
Comment on lines +87 to +99
assert neutron_map == {
0: (6,),
1: (),
2: (7,),
3: (),
4: (),
5: (),
6: (),
7: (),
8: (),
9: (),
10: (),
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the expected result? Only 2 operators are mapped?
I would expect 5 mapped operators (based on the model used).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very "unhappy" example :) There are many mapping issues:

  1. AvgPool → Conv2DDepthwiseV2 conversion
    The mapping works correctly when there are no other single-input nodes in parallel. However, in this case there is also a MaxPool with the same input. As a result, the single-input node filter is triggered and removes all mapping entities if the Neutron and TFLite names do not match.
    This is a simple way to eliminate incorrectly mapped nodes. While it has some side effects, it at least prevents incorrect mappings. I believe this is a relatively rare scenario.

  2. Concat + Conv2d transformation
    These are transformed into multiple layers that are not represented by either the Neutron Concat kernel or the Neutron Conv2DStandardV2 kernel. Consequently, the inputs and outputs do not match.
    In a “better” example, the concatenation node does not split into multiple nodes and is correctly converted into a Concat kernel.

I could modify the model so that Concat and Conv2d are converted into the corresponding kernels. However, I actually like this test case, as it can serve as useful input for further profiling improvements.

For better understanding (I will add these comments to tests as well):
0: (6,), # Conv2DStandardV2
1: (), # Conv2DDepthwiseV2 (AvgPool)
2: (7,), # MaxPool
3: (), # TransposeCHW
4: (), # TransposeCHW
5: (), # TransposeCHW
6: (), # Slice
7: (), # Pad
8: (), # Conv2DPointwise
9: (), # Slice
10: () # Neutron Dump

Comment thread backends/nxp/tests/generic_tests/test_profiling.py Outdated
# Check if ETRecord was created and saved.
assert f"The ETRecord for the model was saved to" in process_output
assert etrecord_file.exists(), f"ETRecord file not created at {etrecord_file}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the explanation. If this test focuses on checking that the ETRecord is generated, can't we add another test somewhere which also runs the profiling and checks the results? I didn't find such a test in test_profiling.py, sorry if I missed something.

@@ -0,0 +1,205 @@
# NXP eIQ Profiling Support

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely documented 👍🏻

Comment thread backends/nxp/tests/generic_tests/test_aot_example.py Outdated
Comment thread backends/nxp/tests/generic_tests/test_aot_example.py Outdated
Comment thread backends/nxp/tests/generic_tests/test_aot_example.py Outdated
Comment thread examples/nxp/aot_neutron_compile.py Outdated
Comment thread backends/nxp/runtime/NeutronBackend.cpp Outdated
Signed-off-by: Irina Korchakova <irina.trukhina@nxp.com>
@irtrukhina irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from c056d88 to 8ef1427 Compare June 16, 2026 06:25
@irtrukhina irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch 2 times, most recently from 620af28 to ec062cc Compare June 17, 2026 21:35
@irtrukhina irtrukhina force-pushed the feature/EIEX-752-executorch-profiling branch from ec062cc to 54cf6d4 Compare June 17, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants