Skip to content

feat(cli): apm experimental - feature-flag registry with list/enable/disable/reset#849

Open
sergio-sisternes-epam wants to merge 4 commits intomainfrom
feat/experimental-feature-flags-clean
Open

feat(cli): apm experimental - feature-flag registry with list/enable/disable/reset#849
sergio-sisternes-epam wants to merge 4 commits intomainfrom
feat/experimental-feature-flags-clean

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Summary

Introduces apm experimental — a first-class feature-flag subsystem that lets APM ship opt-in changes behind named flags so early adopters can try new behaviour without branching releases.

Ships with one worked example: verbose-version, which extends apm --version to also print Python version, platform, and install path.

Why

Today we have no safe way to iterate on potentially disruptive changes in main. Contributors either ship full switches (risk breaking users) or carry long-lived branches (rot risk, no real-user signal). A lightweight registry + O(1) lookup gives us a third path: merge the code, gate it behind a flag, collect signal from opt-in users, and graduate to default when proven.

What's in this PR

New command group

apm experimental                       # default: list
apm experimental list [--enabled|--disabled]
apm experimental enable <name>
apm experimental disable <name>
apm experimental reset [<name>] [--yes]
  • Accepts both kebab-case and snake_case on input; displays kebab-case.
  • difflib-powered "Did you mean?" on unknown flag names.
  • reset with no argument clears all overrides (interactive confirmation; --yes skips it for CI).
  • Stale keys from graduated/removed flags are surfaced in list and cleaned up by reset.

Core registry

  • src/apm_cli/core/experimental.py — frozen ExperimentalFlag dataclass, static FLAGS registry, module-level is_enabled(name).
  • Lookup cost: two dict reads on top of the already-cached apm_cli.config._config_cache. No I/O after first load, no extra caches, no singleton. Safe for use in hot paths.
  • is_enabled("typo") raises ValueError — fail loud on developer typos in shipped code.
  • Loader rejects non-bool override values (fails closed to registry default) to prevent truthy-string injection from a hand-edited ~/.apm/config.json.

Storage

Overrides persist under a new experimental key in ~/.apm/config.json:

{
  "default_client": "vscode",
  "experimental": {
    "verbose_version": true
  }
}

Only overrides are written. Unset flags fall through to the registry default (which is always false — asserted by an invariant test).

Worked example: verbose-version

When enabled, apm --version appends Python version, platform, and install path. The gated block is wrapped in try/except Exception: pass so a flag-subsystem bug can never break --version.

Security invariant

Experimental flags MUST NOT gate security-critical behaviour (content scanning, path validation, lockfile integrity, token handling, MCP trust checks). Documented in docs/src/content/docs/reference/experimental.md and CONTRIBUTING.md. The silent-ignore policy for stale keys is safe only because of this invariant.

How to add a flag (for contributors)

See the new "How to add an experimental feature flag" section in CONTRIBUTING.md. TL;DR:

  1. Add an ExperimentalFlag(...) entry to FLAGS in src/apm_cli/core/experimental.py.
  2. At the call site, import is_enabled at function scope (not module top-level — avoids config I/O at import time) and branch on it.
  3. Document the flag in docs/src/content/docs/reference/experimental.md.
  4. Add a test that exercises both flag states.

Testing

  • 71+ new targeted tests across tests/unit/core/test_experimental.py, tests/unit/commands/test_experimental_command.py, and extended tests/unit/commands/test_helpers_version.py.
  • Full unit suite green: uv run pytest tests/unit tests/test_console.py -> 4,842 passed.
  • Covers: registry invariants (name==key, default is False, printable-ASCII descriptions), round-trip enable/disable, stale-key diagnostic, unknown-flag error + difflib suggestions, reset confirmation (accept/decline/--yes), kebab/snake input parity, plain-text fallback when Rich is unavailable, and that apm --version default output is unchanged when the flag is off.

Documentation

  • New: docs/src/content/docs/reference/experimental.md — user-facing reference (commands, storage schema, graduation lifecycle, security boundary).
  • Updated: CONTRIBUTING.md — "How to add an experimental feature flag" recipe.
  • Updated: CHANGELOG.mdUnreleased > Added entry.
  • Updated: docs/src/content/docs/reference/cli-commands.md — new command entry.
  • Skill sync: packages/apm-guide/.apm/skills/apm-usage/commands.md — new command entry.
  • README: deliberately NOT updated at launch — we surface this in README only once we have >=3 meaningful flags shipping under it.

Review panel

Pre-open review via the apm-review-panel skill: python-architect, cli-logging-expert, devx-ux-expert, supply-chain-security-expert, oss-growth-hacker, and apm-ceo. All 10 ratified pre-PR fixes have been applied. CEO verdict: SHIP-WITH-FIXES — all fixes in; deferred follow-ups tracked separately.

Out of scope (deliberate)

  • Project-level flags in apm.yml.
  • Environment-variable overrides (APM_EXPERIMENTAL_*).
  • Remote / rollout-percentage flags.
  • Telemetry on flag usage.

Backward compatibility

Zero impact for existing users. Users with no experimental key in config.json see identical behaviour to today. apm --version output is unchanged unless a user opts in.


Re-opened from #845 with the branch moved to microsoft/apm so maintainers can push directly. All prior review history (Copilot + Panel) lives on #845.

Sergio Sisternes and others added 2 commits April 22, 2026 15:17
…disable/reset

Introduces the `apm experimental` command group and a lightweight feature-flag
registry so APM can ship opt-in behaviour behind named flags, gather signal
from early adopters, and graduate to default when proven.

Ships with one worked example: `verbose-version`, which extends
`apm --version` to also print Python version, platform, and install path.

Highlights:
- O(1) flag lookup on top of the cached config (no I/O after first load)
- `is_enabled("typo")` raises ValueError (fail loud in code paths)
- `apm experimental enable <typo>` surfaces difflib "Did you mean?" suggestion
- Loader rejects non-bool overrides; falls back to registry default
- Security invariant: experimental flags MUST NOT gate security-critical paths
- 71+ targeted tests; full unit suite green (4,842 passed)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix CHANGELOG placeholder (#TBD -> #845)
- Use typing.Any in test_helpers_version.py
- Align CONTRIBUTING.md recipe code with its function-scope-import prose
- Correct singular/plural "default(s)" in reset doc example
- Pass symbol="check" to logger.success on enable/disable/reset so they render [+]
- Switch already-enabled/disabled no-op messages to logger.warning ([!]) per traffic-light rule
- Route reset confirmation preamble through CommandLogger (was raw click.echo)
- Verified: removing ignore_unknown_options breaks `reset --yes`; keeping both flags

Addresses Copilot review + PR Review Panel CLI Logging UX findings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 22, 2026 17:09
@sergio-sisternes-epam sergio-sisternes-epam added documentation Improvements or additions to documentation enhancement New feature or request cli panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a first-class experimental feature-flag subsystem to APM, including a new apm experimental command group for managing user-scoped opt-in flags stored in ~/.apm/config.json, plus one initial flag (verbose-version) that extends apm --version output when enabled.

Changes:

  • Added a static experimental flag registry (FLAGS) with config-backed enable/disable/reset helpers and strict lookup semantics (is_enabled fails loud on unknown flags).
  • Added apm experimental CLI group with list/enable/disable/reset and Rich UI + plain-text fallbacks.
  • Added documentation + contributor guidance and comprehensive unit tests; added a gated verbose-version block to apm --version.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/apm_cli/core/experimental.py New feature-flag registry, config integration, and mutation/query APIs.
src/apm_cli/commands/experimental.py New apm experimental command group (list/enable/disable/reset) with Rich/table output and confirmations.
src/apm_cli/commands/_helpers.py Adds a verbose-version experimental branch to print_version.
src/apm_cli/cli.py Registers the new experimental command group with the main CLI.
tests/unit/core/test_experimental.py Unit tests for registry invariants, config parsing/validation, and mutator behavior.
tests/unit/commands/test_experimental_command.py CLI tests for list/filters, enable/disable flows, reset prompts, and -v behavior.
tests/unit/commands/test_helpers_version.py Tests the verbose-version behavior and the fail-open --version guard.
docs/src/content/docs/reference/experimental.md New user-facing reference page for experimental flags and workflows.
docs/src/content/docs/reference/cli-commands.md Adds apm experimental to the CLI reference.
packages/apm-guide/.apm/skills/apm-usage/commands.md Syncs the command list for the apm-guide skill resources.
CONTRIBUTING.md Adds contributor recipe and rules for adding experimental flags.
CHANGELOG.md Adds an Unreleased entry documenting apm experimental and the initial flag.

Comment thread src/apm_cli/commands/experimental.py Outdated
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (with minor follow-up items)


Per-persona findings


Python Architect

Overall structure is clean and idiomatic. Highlights and concerns:

(+) ExperimentalFlag as a frozen dataclass is the right call -- immutable registry, no accidental mutation.
(+) Function-scope imports for is_enabled are correctly enforced per the mandatory caller convention; the docstring makes this explicit.
(+) _get_experimental_section fails closed (returns {}) when the config key is not a dict -- defensive and correct.
(+) The is_enabled path rejects non-bool override values and falls back to the registry default -- fails closed.

(-) [Minor] Private API leakage across module boundaries. commands/experimental.py imports _display_name, _normalise_flag_name, and _validate_flag_name from core/experimental.py. These underscore-prefixed names signal implementation detail. Either promote them to a narrow public API (drop the underscore) or consolidate the validation+display logic into a single public helper the command layer can call. Follow-up PR is acceptable.

(-) [Minor] Double normalisation in _handle_unknown_flag. The callers (e.g. enable_flag) already call _normalise_flag_name(name) before checking if normalised not in FLAGS, then pass the original name to _handle_unknown_flag which calls _validate_flag_name(name) which normalises again. Not a bug but an unnecessary second pass. Can be cleaned up when the API is tightened.

(-) [Minor] reset return type vs call-site usage. core/experimental.reset(name) is typed -> list[str]. The command layer assigns it to reset_result and checks if reset_result: -- relying on empty-list falsiness. This works today but is fragile if the return type ever changes. Returning bool for the single-flag path or checking len(reset_result) > 0 explicitly would be cleaner.


CLI Logging Expert

(+) All four subcommands instantiate CommandLogger rather than calling _rich_* directly in the command functions. Correct architecture.
(+) logger.verbose_detail(f"Config file: {CONFIG_FILE}") gating is correct -- verbose-only detail.
(+) Success messages use symbol="check" consistently. Error path uses logger.error(...). Warning path uses logger.warning(...) for "already enabled/disabled" -- appropriate traffic-light mapping.
(+) The footer hint is context-aware (tip differs when flags are enabled vs all disabled) -- passes "So What?" test.
(+) _print_list_footer promotes the stale-key note -- surfaces actionable info without noise on the happy path.

(-) [Low] Preamble noise on list. logger.progress("Experimental features let you try new behaviour before it becomes default.") appears on every apm experimental list invocation, including when used in a script loop. This is borderline on the "So What?" test -- experienced users will skip it mentally, but it's a line of output that doesn't change per call. Consider demoting to verbose-only or printing only on first invocation. Not blocking.


DevX UX Expert

(+) apm experimental defaulting to list with no subcommand mirrors npm ls / gh extension list -- ergonomically correct.
(+) Kebab-case / snake_case parity on all inputs is the right decision; users should not have to remember the internal form.
(+) --yes/-y on reset and the explicit change-listing prompt before bulk reset match brew upgrade / apt upgrade conventions -- protective of users who run it from muscle memory.
(+) "Did you mean?" fuzzy suggestion on unknown flag names with difflib is high-signal UX for a low-cost implementation.
(+) The single-flag reset does not prompt -- correct (the action is inherently scoped and reversible).

(-) [Medium -- should fix before or shortly after merge] --verbose/-v must precede the subcommand. apm experimental --verbose list works; apm experimental list --verbose fails with "Error: No such option: --verbose" because --verbose is only declared on the parent group, not on individual subcommands. Every other APM command accepts flags after the subcommand name; this is the only command group where the verbose flag must precede it. The divergence will generate user confusion. Fix: add @click.option("--verbose", "-v", ...) to each subcommand and merge it with ctx.obj inheritance, or use click's pass_context + max_content_width approach to propagate parent options. Issue can be filed and addressed in a follow-up PR given the low risk of the rest of the change.

(-) [Low] No --json output mode for list. Scripting and CI use cases (e.g. apm experimental list | jq ...) have no structured output path. npm, pip, and cargo all offer --json / -q on list-style commands. Not blocking for v1, but worth a follow-up issue.

(-) [Low] ignore_unknown_options: True on the group. This setting is unusual and can silently swallow unrecognised options at the group level before subcommand dispatch. No dangerous downstream effect exists today, but it removes a guard that would otherwise catch user typos like apm experimental --enabeld verbose-version. Worth revisiting; allow_interspersed_args alone may be sufficient.


Supply Chain Security Expert

(+) Config path is ~/.apm/config.json -- no path construction from flag names, no path traversal surface. Flag names are validated against the static registry before any I/O.
(+) _get_experimental_section type-checks the config value before using it -- if a user hand-edits the file to a non-dict, every consumer fails closed to defaults.
(+) Non-bool override values in is_enabled fall back to registry default -- correct behaviour.
(+) Security invariant is documented in three places: core/experimental.py module docstring, CONTRIBUTING.md, and docs/reference/experimental.md. This is the right documentation density for a constraint that contributors must not violate.
(+) The verbose-version flag reveals Path(__file__).resolve().parent.parent (install path) only when the user explicitly opts in -- acceptable disclosure.
(+) The try/except Exception: pass guard in _helpers.py ensures that any failure in the experimental subsystem (including a corrupted config file) can never break apm --version. Correct.

No blocking security findings. No new path traversal, no token surfaces, no integrity weakening.


Auth Expert

No authentication surfaces are introduced or modified. update_config / get_config write to ~/.apm/config.json (user-local config, no token storage). No findings.


OSS Growth Hacker

(+) The experimental flag system is a mature OSS signal -- projects like Rust/Cargo and bun use feature flags to de-risk rollouts. Surfaces this narrative angle in release notes: "APM ships features opt-in before graduation to default."
(+) The contributor recipe in CONTRIBUTING.md is an unusually clear, bounded first-contribution task. It lowers the activation energy for drive-by contributors: "add a flag = one entry in FLAGS dict + tests + docs" is a concise on-ramp.
(+) verbose-version as the first flag produces debug output useful for support tickets and bug reports -- indirect growth benefit (better support experience).
(+) No first-run conversion path (apm init / apm install / apm run) is affected by this change. Existing user experience is unchanged.

Side-channel note to CEO: The experimental flag system, if communicated at the next release, gives APM a compelling "we ship responsibly" story that enterprise evaluators will value. Recommend a brief callout in the CHANGELOG / release post.


CEO arbitration

This PR delivers a well-scoped infrastructure capability with good test coverage, complete documentation, and explicit security invariant enforcement. The specialists reached consensus: no blocking issues, two medium items, and several low/minor follow-ups.

Strategic rationale for approving as-is: The --verbose flag position issue (DevX finding, medium) is real but affects only users who have already discovered apm experimental -- an advanced surface. Blocking the PR to fix it risks stalling a change that has already gone through one review cycle (#845). The right call is to merge now and file a focused follow-up issue against apm experimental list --verbose ergonomics. Similarly, the private API leakage and reset return-type ambiguity (Architect findings, minor) are best addressed in a hardening PR once the feature proves adopted.

The security invariant is correctly documented and enforced; no strategic risk. The feature flag system aligns with APM's "ship fast, communicate clearly" principle and gives the maintainer team a controlled rollout lever for future changes.

Required actions before merge: None blocking.


Required actions before merge

None. The implementation is correct, tested, and documented.


Optional follow-ups

  1. [Medium -- file issue] Fix apm experimental list --verbose (and all subcommands) so -v/--verbose is accepted after the subcommand name, not only before it. Affects discoverability for users who follow the standard <command> <subcommand> --flag mental model.
  2. [Minor] Promote _display_name, _normalise_flag_name, _validate_flag_name to public API (drop underscore prefix) since they are imported across module boundaries.
  3. [Minor] Change reset(name) single-flag return to bool, or document the list-truthiness contract explicitly at the call site.
  4. [Low] Add --json output to apm experimental list for CI / scripting composability.
  5. [Low] Evaluate whether ignore_unknown_options: True on the group is needed; allow_interspersed_args alone may be sufficient and safer.
  6. [Low] Demote the list preamble ("Experimental features let you try new behaviour...") to verbose-only output to reduce noise in scripted invocations.

Generated by PR Review Panel for issue #849 · ● 932.6K ·

- Reset now cleans malformed overrides of registered flags (#849, Copilot)
- --verbose works after subcommand name (#849, Panel P-UX-1)
- Promote name-handling helpers to public API (#849, Panel P-ARCH-1)
- Deduplicate flag-name normalisation (#849, Panel P-ARCH-2)
- reset() returns removal count instead of list (#849, Panel P-ARCH-3)
- Demote list preamble to --verbose only (#849, Panel P-LOG-1)
- Add --json output to list (#849, Panel P-UX-2)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

Starting doc sync for the final apm experimental CLI surface on feat/experimental-feature-flags-clean. I’m updating the changelog and existing command/reference docs, then I’ll push the doc-only commit back to this branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

Doc sync is complete and pushed to feat/experimental-feature-flags-clean in commit 67b8f91.

Updated:

  • CHANGELOG.md
  • docs/src/content/docs/reference/experimental.md
  • docs/src/content/docs/reference/cli-commands.md
  • packages/apm-guide/.apm/skills/apm-usage/commands.md

Coverage includes apm experimental list --json, per-subcommand --verbose / -v, the cleaner default list output, and the bulk reset fix for malformed overrides. This branch is ready for review from the documentation side.

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review. Rather than defer any of this to follow-ups, I've applied every actionable finding in this PR to keep the first release of apm experimental debt-free.

Commits on this branch:

  • 9a92f1d — implementation + tests
  • 67b8f91 — docs + CHANGELOG

Full unit suite: 5275 passed.


Findings addressed

ID Finding Resolution
Copilot C1 Bulk reset can't clean up non-bool overrides of registered flags get_malformed_flag_keys() added; bulk reset now handles overridden + stale + malformed uniformly. 6 new tests.
P-UX-1 --verbose only works on parent group Added --verbose/-v to list, enable, disable, reset. New _resolve_verbose() merges subcommand-local with group-level so both positions work. New TestVerboseAfterSubcommand (3 cases).
P-ARCH-1 Private helpers imported across modules Renamed _display_name, _normalise_flag_name, _validate_flag_name → public (dropped underscore). All call sites + tests updated.
P-ARCH-2 _handle_unknown_flag re-normalises Callers now pass normalised names; internal normalise call removed with docstring note on caller contract.
P-ARCH-3 reset returns list[str] — fragile reset() now returns int (count of removed keys). Call sites use > 0. New TestResetReturnType (3 cases).
P-LOG-1 List preamble prints on every invocation Demoted to verbose_detail. Default list is table-only; list --verbose prints the preamble. Regression test added.
P-UX-2 No --json output Added --json to list. Schema: [{name, enabled, default, description, source}] where source is "default" or "config". Honours --enabled/--disabled filters. Stdout-clean (no preamble/colour). New TestJsonOutput (4 cases).

P-UX-3 — keeping ignore_unknown_options=True

This is the same question Copilot raised on #845, and we already have empirical evidence: removing ignore_unknown_options=True from the group's context_settings breaks apm experimental reset --yes with Error: No such option: --yes. allow_interspersed_args=True alone isn't sufficient — it handles -v after positional args, but not pass-through of subcommand-only options. Both flags are required.

Keeping the flag as-is. Happy to revisit if the Panel sees a different configuration that preserves reset --yes behaviour.


Docs updated (commit 67b8f91)

  • CHANGELOG.md[Unreleased] Added/Changed/Fixed entries
  • docs/src/content/docs/reference/experimental.md--json sample, --verbose on subcommands, preamble behaviour, malformed-reset
  • docs/src/content/docs/reference/cli-commands.md — command index sync
  • packages/apm-guide/.apm/skills/apm-usage/commands.mdapm experimental options sync

Ready for merge.

@danielmeppiel
Copy link
Copy Markdown
Collaborator

@sergio-sisternes-epam , did you think of using a config key set to activate experimental functionality and just go with it? I see a lot of commands related to "experimental" which may be overcomplicating things? In Copilot CLI for example, one simply activated experimental mode and all experimental features were enabled. I think per-feature management is overkill at our stage.

Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify to single "experimental mode" activation with a config setting

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

sergio-sisternes-epam commented Apr 23, 2026

@danielmeppiel the rationale behind the design and implementation is the following:

  • We want to surface what "cool" new elements we are exploring. Creating a dedicated "experimental" command puts it in the spotlight. Internally, we are writing to config.
  • The initial UX design had 5 (sub)commands. We trimmed it down to the current three to reduce the amount of code. One of the commands was "--all", which enabled and disabled all experimental features.

Why not a "single experimental mode"? This comes from my personal frustration when using Copilot CLI, which follows this approach. Taking that as an example:

In Copilot CLI experimental, RUBBER_DUCK_AGENT takes extra tokens and time to perform a review, slowing me down while increasing my bill. On the other hand, SESSION_CLEANUP AND SESSION_STORAGE are amazing features I have fully embraced and I want to use. This is all-or-nothing. And this is a design pitfall from my user perspective. I tried to avoid this experience when designing and implementing our "experimental" capabilities.

Why not enable all? We don't expect to have as many experimental features. Hence, it made sense to drop the "--all" and let users enable the features that are relevant and work for them, and disable what is not working for them or are not willing to test/explore.

Let me know if the design rationale makes sense and we should keep it, or you prefer to fall back to the "single experimental mode"

NOTE: Internally, experimental relies on the "config" capabilities to store the flags. It only creates this UX I was mentioning earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli documentation Improvements or additions to documentation enhancement New feature or request panel-review Trigger the apm-review-panel gh-aw workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants