Skip to content

Qualcomm AI Engine Direct - Test Framework Refactor#19660

Merged
psiddh merged 1 commit into
pytorch:mainfrom
CodeLinaro:dev_ci
Jun 16, 2026
Merged

Qualcomm AI Engine Direct - Test Framework Refactor#19660
psiddh merged 1 commit into
pytorch:mainfrom
CodeLinaro:dev_ci

Conversation

@haowhsu-quic

@haowhsu-quic haowhsu-quic commented May 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Co-authored-by: @winskuo-quic, @chenweng-quic

  • introduce pytest and reorganize the file architecture for finer-grained testing
  • wider coverage of operator test with combinations of different precisions, codebase was changed accordingly
  • add feature tests for HTP

Followup PRs

  • op test for GPU / LPAI / HTP v69-v81
  • feature test for GPU / LPAI / HTP v69-v81
  • utils test
  • passes test
  • GitHub action workflow

Test plan

  • op test for short summary: pytest backends/qualcomm/tests/rework/htp/op/v68/test.py --tb=no
  • test specific op with report: pytest backends/qualcomm/tests/rework/htp/op/v68/test.py -k "test_conv2d" --test_report test_report.xml
  • test specific op with 8bit activation only: pytest backends/qualcomm/tests/rework/htp/op/v68/test.py -k "test_mean & 8a"
  • feature test (device connection is required): pytest backends/qualcomm/tests/rework/htp/feature/v68/test.py --soc_model SM8850 --build_folder build-android --device $DEVICE_SERIAL

cc @cccclai @winskuo-quic @shewu-quic @DannyYuyang-quic @cbilgin @abhinaykukkadapu

@pytorch-bot

pytorch-bot Bot commented May 19, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19660

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 4 Unrelated Failures, 3 Unclassified Failures

As of commit 7f46b92 with merge base e28ef13 (image):

NEW FAILURES - The following jobs have failed:

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 19, 2026
@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@haowhsu-quic haowhsu-quic force-pushed the dev_ci branch 2 times, most recently from 29b0b6b to bdc8e4c Compare May 25, 2026 03:52
@linux-foundation-easycla

linux-foundation-easycla Bot commented May 25, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: haowhsu-quic / name: haowhsu (6a44c2a)

@haowhsu-quic haowhsu-quic marked this pull request as ready for review May 25, 2026 03:56
@psiddh

psiddh commented May 26, 2026

Copy link
Copy Markdown
Contributor

@haowhsu-quic

  • Do you happen to know the estimated CI bump (% increase in number of Qualcomm tests) once these PRs land ?
  • Also iirc , from the last meeting presentation, the interim proposal was to add PR labels to trigger some test runs on the PRs. Would that PR happen as a follow up ?

@haowhsu-quic

haowhsu-quic commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Hi @psiddh, thanks for the comments.

  • We have no coverage value currently; we could provide detailed information after all refactoring works are done. As far as I can tell, we make finer-grained test categories and almost all combinations of parameters for each pytorch operator are covered.
  • Yes, we do have one workflow to be contributed, something like:
name: Qualcomm

on:
  pull_request:
    paths:
      - backends/qualcomm/**
      - examples/qualcomm/**
    paths-ignore:
      - '**.md'

jobs:
  prepare:
    runs-on: ubuntu-latest
    outputs:
      run_backend_e2e_oss: ${{ steps.parse.outputs.run_backend_e2e_oss }}
      run_backend_e2e_genai: ${{ steps.parse.outputs.run_backend_e2e_genai }}
      run_backend_feature: ${{ steps.parse.outputs.run_backend_feature }}
      run_backend_op: ${{ steps.parse.outputs.run_backend_op }}
      run_common_pass: ${{ steps.parse.outputs.run_common_pass }}
      run_common_utils: ${{ steps.parse.outputs.run_common_utils }}
    steps:
      - name: checkout commit
        uses: actions/checkout@v3
      - name: build commit
        run: |
          echo "build finished"
      - name: parse labels
        id: parse
        shell: bash
        run: |
          set -eux
          #files=$(git diff --name-only ${{ github.event.pull_request.base.ref }} HEAD | xargs)
          files="utils"
          if [[ $files =~ examples && ! $files =~ llama ]]; then
            echo "run_backend_e2e_oss=1" >> $GITHUB_OUTPUT
          fi
          if [[ $files =~ llama ]]; then
            echo "run_backend_e2e_genai=1" >> $GITHUB_OUTPUT
          fi
          if [[ $files =~ aot|runtime|serialization ]]; then
            echo "run_backend_feature=1" >> $GITHUB_OUTPUT
          fi
          if [[ $files =~ aot|builders|quantizer|runtime|serialization ]]; then
            echo "run_backend_op=1" >> $GITHUB_OUTPUT
          fi
          if [[ $files =~ _passes ]]; then
            echo "run_common_pass=1" >> $GITHUB_OUTPUT
          fi
          if [[ $files =~ utils ]]; then
            echo "run_common_utils=1" >> $GITHUB_OUTPUT
          fi

  # test-backend-e2e-oss
  test-backend-e2e-oss-gpu:
    runs-on: ubuntu-latest
    needs: prepare
    if: needs.prepare.outputs.run_backend_e2e_oss
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-e2e-oss-htp:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 6
      matrix:
        arch: [v68, v69, v73, v75, v79, v81]
    if: needs.prepare.outputs.run_backend_e2e_oss
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-e2e-oss-lpai:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 1
      matrix:
        arch: [v6]
    if: needs.prepare.outputs.run_backend_e2e_oss
    steps:
      - name: run test
        run: |
          echo "test finished"

  # test-backend-e2e-genai
  test-backend-e2e-genai-gpu:
    runs-on: ubuntu-latest
    needs: prepare
    if: needs.prepare.outputs.run_backend_e2e_genai
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-e2e-genai-htp:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 6
      matrix:
        arch: [v68, v69, v73, v75, v79, v81]
    if: needs.prepare.outputs.run_backend_e2e_genai
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-e2e-genai-lpai:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 1
      matrix:
        arch: [v6]
    if: needs.prepare.outputs.run_backend_e2e_genai
    steps:
      - name: run test
        run: |
          echo "test finished"

  # test-backend-feature
  test-backend-feature-gpu:
    runs-on: ubuntu-latest
    needs: prepare
    if: needs.prepare.outputs.run_backend_feature
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-feature-htp:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 6
      matrix:
        arch: [v68, v69, v73, v75, v79, v81]
    if: needs.prepare.outputs.run_backend_feature
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-feature-lpai:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 1
      matrix:
        arch: [v6]
    if: needs.prepare.outputs.run_backend_feature
    steps:
      - name: run test
        run: |
          echo "test finished"

  # test-backend-operator
  test-backend-operator-gpu:
    runs-on: ubuntu-latest
    needs: prepare
    if: needs.prepare.outputs.run_backend_op
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-operator-htp:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 6
      matrix:
        arch: [v68, v69, v73, v75, v79, v81]
    if: needs.prepare.outputs.run_backend_op
    steps:
      - name: run test
        run: |
          echo "test finished"

  test-backend-operator-lpai:
    runs-on: ubuntu-latest
    needs: prepare
    continue-on-error: true
    strategy:
      max-parallel: 1
      matrix:
        arch: [v6]
    if: needs.prepare.outputs.run_backend_op
    steps:
      - name: run test
        run: |
          echo "test finished"

  # test-backend-common-pass
  test-backend-common-pass:
    runs-on: ubuntu-latest
    needs: prepare
    if: needs.prepare.outputs.run_common_pass
    steps:
      - name: run test
        run: |
          echo "test finished"

  # test-backend-common-utils
  test-backend-common-utils:
    runs-on: ubuntu-latest
    needs: prepare
    if: needs.prepare.outputs.run_common_utils
    steps:
      - name: run test
        run: |
          echo "test finished"

@psiddh

psiddh commented May 26, 2026

Copy link
Copy Markdown
Contributor

Requesting @mergennachin @digantdesai @JacobSzwejbka to also take a look at this PR

@digantdesai digantdesai added partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ labels May 27, 2026
@digantdesai

Copy link
Copy Markdown
Contributor

@claude review this PR.

@claude

claude Bot commented Jun 1, 2026

Copy link
Copy Markdown

Claude finished @digantdesai's task in 2m 58s —— View job


Review of PR #19660: Qualcomm AI Engine Direct - Test Framework Refactor

  • Read PR diff and understand the overall changes
  • Review new test framework architecture (conftest.py, test structure)
  • Review operator test definitions (src/op.py)
  • Review feature test definitions (src/feature.py)
  • Review changes to existing code (passes, builders, quantizer)
  • Post comprehensive review feedback

Overall Assessment

This is a well-structured refactoring that introduces a pytest-based test framework for Qualcomm backend testing, replacing/augmenting the existing unittest-based approach. The architecture is clean, with good separation of concerns between test infrastructure (conftest.py), test implementations (src/op.py, src/feature.py), and per-version test declarations (htp/op/v68/test.py, etc.).

The PR also includes meaningful functional improvements: merging ReduceDynamicRange and ReplaceInfValues passes into the quantizer (where quant range information is available), fixing several builder bugs, and expanding op coverage.


Architecture & Design (Positive)

  1. Layered conftest.py design - The root conftest.py provides shared infrastructure (metrics, export/verify functions, session fixtures), while htp/conftest.py adds HTP-specific quantizer/compile_spec fixtures. This scales well for GPU/LPAI backends in follow-up PRs.

  2. enumerate_activation_dtype decorator - Parametrizing each test across 8a, 16a, and fp modes with per-mode expected outcomes is a good pattern for catching precision-specific regressions.

  3. with_htp_context wrapper - Clean separation of fixture resolution from test body logic.

  4. Metrics classes (Tolerance, Sqnr, CosineSimilarity, SkipOutputCheck) - Well-designed abstraction with detailed mismatch reporting.


Issues & Suggestions

High Priority

1. Bare except in quantizer (quantizer.py:377)

try:
    dtype_info = torch.iinfo(
        reduce(getattr, ["output_qspec", "dtype"], quant_info)
    )
except:
    return

This silently swallows all exceptions including KeyboardInterrupt and SystemExit. Should be except (TypeError, ValueError): or at minimum except Exception:.

Fix this →

2. Use of all() with side-effecting walrus operator (quantizer.py:419-424)

if all(
    [
        node.op == "call_function",
        node.target in candidates,
        quant_range := self._get_quant_range(node),
    ]
):

Using := inside all() is clever but fragile — all() short-circuits on lists but since you're passing a literal list (not a generator), all expressions are evaluated. However, quant_range will be None (falsy) if _get_quant_range returns early, which makes this work only because it's a list. This would break silently if refactored to a generator. A simple if/and chain would be clearer:

if (node.op == "call_function" 
    and node.target in candidates
    and (quant_range := self._get_quant_range(node))):

3. globals()[f"verify_output_{mode}"] dynamic dispatch (conftest.py:506)

mode = "emulator" if qnn_config.build_folder == "build-x86" else "remote"
globals()[f"verify_output_{mode}"](...)

This is fragile and hard to grep for callers. A simple dict or if/else would be safer and more readable:

verify_fn = verify_output_emulator if qnn_config.build_folder == "build-x86" else verify_output_remote
verify_fn(...)

Medium Priority

4. with_htp_context loses function metadata (htp/conftest.py:23-39)

The wrapper function returned by with_htp_context doesn't use @functools.wraps(func), which means pytest won't see the original function's name/docstring. This could affect test discovery or reporting. Since this is a decorator factory (returns wrapper), consider adding @wraps(func).

5. Test file uses star import (htp/op/v68/test.py:23)

from executorch.backends.qualcomm.tests.rework.src.op import *  # noqa: F403

While pragmatic (avoids listing 133 classes), this makes it impossible to determine which names are used without runtime introspection. Since each test function explicitly calls a specific class, consider whether a grouped import or from ... import Abs, ACos, ... with a generated import list would be more maintainable as the op count grows.

6. Commented-out test cases in existing test file (test_qnn_delegate.py:367-374)

# AvgPoolModule((1280, 1280), (1280, 1280), (0, 0), True),  # noqa: F405
# AvgPoolModule((1280, 1280), (1280, 1280), (320, 320), True),  # noqa: F405

These should either be removed entirely (if the behavior is now handled by the new framework) or replaced with pytest.mark.skip with a reason. Commented-out code rots.

7. init_remote_env closure uses mutable state (conftest.py:311-333)

The initialized flag in init_remote_env is effectively a module-level singleton hidden in a closure. If two test sessions were ever run in the same process, this wouldn't reset. Since you're using scope="session" fixtures, consider making this a session-scoped fixture instead for clearer lifecycle management.

8. Hardcoded device workspace path (conftest.py:321)

device_workspace = f"{getattr(qnn_config, 'device_workspace', '')}_{qnn_config.backend}"

getattr(..., '') produces paths like _kHtpBackend when no workspace is set. This probably works but could be confusing when debugging device issues.

Low Priority

9. _session_start.time attribute access (conftest.py:549)

elapsed_time = time.time() - terminalreporter._session_start.time

Relying on private _session_start attribute of the terminal reporter is fragile across pytest versions.

10. requirements.txt adds pytest-subtests (requirements.txt)

Good addition. However, conftest.py imports from pytest_subtests.plugin import SubTestReport — if pytest-subtests isn't installed, this will fail at import time during pytest_terminal_summary. Consider guarding the import.

11. Missing __init__.py files

The rework/ test directory tree has no __init__.py files. This is fine for pytest (which uses rootdir-based discovery), but means these modules can't be imported as regular packages. The conftest import at htp/op/v68/test.py:13 uses the full dotted path — verify this works with rootdir outside the package tree.


Functional Changes (Non-test)

The following functional changes look correct:

  • _replace_inf moved to quantizer — Good call; the quantizer has the actual quant range info needed to choose replacement values rather than hardcoding 255/-255.
  • op_avg_pool2d.py ceil_mode simplification — The removed padding logic for ceil_mode looked like a workaround; delegating ceil behavior to QNN directly is cleaner.
  • op_flip.py, op_gather.py, op_index_select.py negative dim normalization — Using dim % len(shape) is the correct fix for negative dimension indices.
  • op_group_norm.py / op_layer_norm.py optional weight/bias — Making weight and bias optional (guarded by if weight_node:) is the right approach for models using LayerNorm/GroupNorm without affine parameters.
  • decompose_linalg_vector_norm.py inf handling — Short-circuiting to amax/amin for inf/-inf exponents is mathematically correct.
  • setup_common_args_and_variables short option removal — Removing -m, -b, -s, etc. avoids namespace collisions when shared with pytest's argument parser. The .ci/scripts/test_model.sh change to use --build_folder / --soc_model is consistent.
  • op_batch_norm.py zero_point fix — Using (quant_range + 1) // 2 for 16-bit quant instead of always 0 is correct for symmetric quantization with int16.

Summary

This is a solid refactoring with good test infrastructure design. The main concerns are:

  1. The bare except: in the quantizer should be narrowed
  2. The globals() dispatch should be replaced with explicit lookup
  3. Consider adding @wraps to decorator wrappers for better pytest reporting
  4. Remove commented-out test cases rather than leaving them

The architecture scales well for the follow-up PRs (GPU, LPAI, v69-v81 tests).


@digantdesai digantdesai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look. LGTM. Asked Claude review for nits. Thanks.

@digantdesai

Copy link
Copy Markdown
Contributor

Followup PRs
op test for GPU / LPAI / HTP v69-v81
feature test for GPU / LPAI / HTP v69-v81
utils test
passes test
GitHub action workflow

Please create issues, and a tracker issue for this

@digantdesai

Copy link
Copy Markdown
Contributor

@psiddh - @abhinaykukkadapu was trying to run these in OSS using BUCK for internal CI parity, are you planning to continue that?

@psiddh

psiddh commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

@psiddh - @abhinaykukkadapu was trying to run these in OSS using BUCK for internal CI parity, are you planning to continue that?

Yes @digantdesai I had a chat about this with @abhinaykukkadapu . Afaiu, BUCK'fying OSS is mostly done (80 - 90%). There are apparently few edge cases that may still break. The plan / next steps is to closely monitor any breakages due edge cases and fix them, as they surface.

@psiddh

psiddh commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Lint / link-check / lint-urls (pull_request)Failing after 28s needs to be fixed before this can be merged

@psiddh

psiddh commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@haowhsu-quic Can you rebase / resolve conflicts please ? Also fix lint checks, and we can land this PR

Co-authored-by: @winskuo-quic, @chenweng-quic
- introduce pytest and reorganize the file architecture for
  finer-grained testing
- wider coverage of operator test with combinations of different
  precisions, codebase was changed accordingly
- add feature tests for HTP
@psiddh

psiddh commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@claude Review the build failures on PR and check if they are related to the ongoing PR

@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown

Claude finished @psiddh's task in 1m 58s —— View job


Build failure analysis

  • Fetch CI status for this PR
  • Inspect failing job logs
  • Determine which failures are related to this PR's changes
  • Post analysis

Conclusion: none of the failures are related to this PR. This PR only touches backends/qualcomm/** and examples/qualcomm/**, and every failing job is in unrelated code paths or infrastructure. All Qualcomm-relevant jobs (test-qnn-wheel-packages-linux for py3.10–3.13, test-models-linux*) passed.

Breakdown of the 4 failing workflow runs:

Workflow / Job Failure Related?
pull / unittest / linux test_efficient_sam_export_to_executorchurllib.error.URLError: <urlopen error [SSL: UNEXPECTED_EOF_WHILE_READING]> ❌ Network flake downloading model weights
Build Aarch64 Linux Wheels / build-wheel-py3_10-cpu-aarch64 CoreML compile error in backends/apple/coreml/runtime/inmemoryfs/memory_buffer.hpp ('Kind' has not been declared, 'uint8_t' was not declared) ❌ Pre-existing CoreML build break (missing <cstdint>); unrelated backend
Build Aarch64 Linux Wheels / upload-wheel-py3_10-cpu-aarch64 Artifact not found ❌ Cascade from the failed build above
Cadence Build & Test / vision-build, hifi-build Input required and not supplied: aws-region (step "Assume Cadence artifacts role (host OIDC)") ❌ CI infra/OIDC config issue
trunk / test-models-macos-coreml (vit) The operation was canceled. ❌ Cancelled job (infra)

Details on the two that look the scariest:

1. unittest — efficient_sam (pull):

examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch
  - urllib.error.URLError: <urlopen error [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol>
=========== 1 failed, 1653 passed, 15 skipped in 2968.96s ===========

The traceback originates in urllib/request.py:do_open while fetching the EfficientSAM weights inside EagerModelFactory.create_model. This is a transient network/SSL failure, not a code defect, and EfficientSAM is unrelated to the Qualcomm backend.

2. Aarch64 wheel — CoreML:

backends/apple/coreml/runtime/inmemoryfs/memory_buffer.hpp:23:22: error: 'uint8_t' was not declared in this scope
backends/apple/coreml/runtime/inmemoryfs/memory_buffer.hpp:36:25: error: 'Kind' has not been declared

A missing standard-integer-header include in the CoreML in-memory filesystem. Dr. CI itself notes this job did not run on the merge base, so it cannot attribute it — but it is in backends/apple/coreml, which this PR does not touch, so it is pre-existing on trunk rather than introduced here.

The Qualcomm jobs Dr. CI listed under FLAKY (test-llama-runner-qnn-linux, test-mediatek-models-linux) were classified as infra flakes with no failing log, not real test failures.

Recommendation: none of these block the PR on its own merits. A rebase onto current main should clear the CoreML aarch64 error (once that fix lands on trunk) and re-running will likely clear the network-flaky efficient_sam and Cadence OIDC jobs — consistent with @psiddh's earlier request to rebase before landing.

@psiddh psiddh merged commit 370c576 into pytorch:main Jun 16, 2026
340 of 349 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants