APMODE Technical Architecture

Version: 0.3 Date: 2026-04-25 Status: Current (tracks APMODE 0.6.1-rc1; Phase 3 in progress) Derived from: PRD v0.3 (§3–§8) Supersedes: v0.2 (2026-04-13). Change summary: Phase 2/3 framing removed where shipped (Bayesian backend, NODE, agentic LLM, Gate 2.5, FREM, Gate 3 ranking all active); observability stack re-grounded in the real dependency set; bundle layout regenerated from bundle/emitter.py; contracts regenerated from bundle/models.py; speculative migrations (Flyte/Temporal, Docker-Compose, Rust-parser, Langfuse/Aim/ Flowcept) moved to §10 Non-goals.

1. Design Principles

Custom-first, framework-ready. A thin purpose-built Python orchestrator (asyncio
- subprocess) is the main loop. The BackendRunner protocol is designed so Flyte or Temporal could be swapped in for GPU scheduling without rewriting adapters, but no such migration is planned (see §10).
Process isolation is non-negotiable. Every backend boundary (R/nlmixr2, Python/JAX, LLM SDK calls, cmdstanpy/CmdStan) is isolated from the orchestrator where practical. A segfault in R must not crash the orchestrator.
The DSL is the moat. All model specifications flow through the typed PK DSL (Formular). No backend emits or consumes model code outside of a deterministic function of the compiled DSLSpec.
Governance is a funnel, not a score. Gates are sequential disqualifiers (Gate 1 → Gate 2 → Gate 2.5 → Gate 3). Survivors are ranked; failures are logged with per-check reasons in the bundle.
Reproducibility is structural. Every run emits a self-contained bundle that is atomically sealed with a _COMPLETE sentinel carrying a SHA-256 digest of every artifact. The bundle schema is a Pydantic contract (bundle/models.py).
Determinism is auditable. All RNG seeds are escrowed in seed_registry.json. Non-deterministic boundaries (GPU, LLM) are documented and their outputs cached.
Data privacy by design. LLM inputs never include raw row-level patient data — the allow-list gate in diagnostic_summarizer.redact_for_llm() is the single enforcement point; unknown fields fail closed.

2. Technology Stack

2.1 Core stack — what is actually installed

Source of truth: pyproject.toml. Core dependencies install with uv sync; extras (node, bayesian, llm, test, dev) are opt-in.

Role	Package	Extra	License
Orchestrator language	Python 3.12–3.14	core	—
Primary PK engine	R 4.4+ / nlmixr2 / rxode2	— (subprocess)	GPL-2
Parser	`lark` ≥ 1.2	core	MIT
AST / schemas	`pydantic` ≥ 2.7	core	MIT
Data schemas	`pandera` ≥ 0.18	core	MIT
IDs	`sparkid` ≥ 0.2	core	MIT
CLI	`typer` ≥ 0.12	core	MIT
Structured logging	`structlog` ≥ 24.1	core	Apache-2 / MIT
Numerics	`numpy` ≥ 1.25, `pandas` ≥ 2.0, `scipy` ≥ 1.11	core	BSD-3
Terminal UI	`rich` ≥ 13	core	MIT
Bayesian backend	`cmdstanpy` ≥ 1.2, `arviz` ≥ 0.17, `pyarrow` ≥ 15	`bayesian`	BSD-3 / Apache-2
NODE backend	`jax[cpu]`, `diffrax`, `equinox`, `optax`, `jaxtyping`	`node`	Apache-2 / MIT
LLM providers	`anthropic`, `openai`, `google-genai`, `ollama`, `litellm`	`llm`	MIT / Apache-2
Tests	`pytest`, `pytest-xdist`, `pytest-asyncio`, `hypothesis`, `syrupy`	`test`	MIT / MPL-2 / Apache-2
Dev tooling	`ruff`, `mypy`, `pre-commit`, `bandit`, `pip-audit`	`dev`	MIT

Project license: GPL-2-or-later (compatible with nlmixr2 GPL-2 and Apache-2 dependencies).

2.2 DSL and compiler

See docs/FORMULAR.md for the language reference.

Component	Technology
Parser	Lark (Earley mode) via `src/apmode/dsl/grammar.py` (10 KB input guard)
AST	Pydantic v2 models in `src/apmode/dsl/ast_models.py` (6 fields: 5 grammar blocks + `priors: list[PriorSpec]`)
Parse-tree → AST	`src/apmode/dsl/transformer.py`
AST canonicalization	`src/apmode/dsl/normalize.py`
Semantic validator	`src/apmode/dsl/validator.py::validate_dsl(spec, lane=...)` — lane-aware, non-fail-fast
Priors + admissibility	`src/apmode/dsl/priors.py` (families, target taxonomy, `_VALID_FAMILIES` matrix)
Transforms	`src/apmode/dsl/transforms.py` (9) + `prior_transforms.py::SetPrior` (1) — 10 total
Emitters	`nlmixr2_emitter.py`, `stan_emitter.py`, `frem_emitter.py`
Testing	Hypothesis property tests + syrupy golden masters for emitter output

2.3 Orchestration

Component	Technology
Orchestrator	Custom Python (asyncio) in `src/apmode/orchestrator/` — sequential gate control flow
Process isolation	`asyncio.create_subprocess_exec`; on timeout, kill the full process group (`os.killpg`)
Backend interface	`BackendRunner` Protocol (`src/apmode/backends/protocol.py`)
R invocation	`Rscript src/apmode/r/harness.R` via subprocess; file-based I/O (§4.2)
Stan invocation	`cmdstanpy.CmdStanModel` via `src/apmode/bayes/harness.py` wrapped by `bayesian_runner.py`
Retry/timeout	Bespoke logic in each runner; timeout from policy file, killed attempts write to new `attempt_id/` subdir
CLI	Typer (`src/apmode/cli.py`) — 16 direct commands plus registered `bundle` / `completion` groups

Note on deployment posture. The codebase runs natively on the user's machine via uv; there is no Docker-Compose stack, no R container, no K8s. Users who want containerization manage it themselves.

2.4 Data ingestion and validation

Component	Path
Ingestion	`src/apmode/data/ingest.py` + `adapters.py` (NONMEM CSV primary; others extend via adapter)
Canonical schema	Pandera schemas in `src/apmode/data/schema.py`
Dose expansion (ADDL/II, infusions)	`src/apmode/data/dosing.py`
Profiler (Evidence Manifest)	`src/apmode/data/profiler.py` — structured `nonlinear_clearance_signals`
Profiler policy	`src/apmode/data/policy.py` loads `policies/profiler.json`; `policy_sha256` embedded in every `EvidenceManifest`
NCA + initial estimates	`src/apmode/data/initial_estimates.py` — Huang 2025 λz selector, linear-up/log-down AUC, unit-aware CL heuristic
Data splitter (k-fold, LORO-CV)	`src/apmode/data/splitter.py`
Missing-data directive resolver	`src/apmode/data/missing_data.py::resolve_directive`
Multiple-imputation providers (R-backed)	`src/apmode/data/imputers.py` (`R_MiceImputer`, `R_MissRangerImputer`) + `src/apmode/r/impute.R`
Categorical-encoding auto-remap	`src/apmode/data/categorical_encoding.py::auto_remap_binary_columns`

2.5 Initial-estimate strategy

Source	Method	Applies to
NCA-derived	Per-subject PKNCA-style (λz by curve-stripping + adj-R² tie-break, linear-up/log-down AUC) with QC gates	Root candidates
Warm-start	Parent's best-fit parameters → child	Phase 3 warm-start children
Fallback	Population-median NCA when ≥50 % of per-subject fits fail QC; `RunConfig.fallback_estimates` (dataset-card `published_model.key_estimates`) or conservative defaults	Degraded data
NODE init	Pre-trained weight library + transfer from classical fits	NODE backend

Per-subject diagnostics are emitted as nca_diagnostics.jsonl; the unit-aware scale factor (when applied) is recorded as _unit_scale_applied in initial_estimates.json.

2.6 Reproducibility and artifacts

Component	Technology
Bundle schema	Pydantic v2 models in `src/apmode/bundle/models.py` → JSON/JSONL
Bundle emitter	`src/apmode/bundle/emitter.py` (per-artifact `write_*` methods)
Run / candidate IDs	`sparkid.generate_id()` (`src/apmode/ids.py`)
Data hashing	`hashlib.sha256` over content
Atomic seal	`BundleEmitter.seal()` writes `_COMPLETE` sentinel with SHA-256 digest of every file; `apmode validate` refuses unsealed bundles
Schema version	`_COMPLETE_SCHEMA_VERSION = 2` — adds per-candidate `ScoringContract` on `DiagnosticBundle`

2.7 Determinism and seed management

Scope	Mechanism
Python / NumPy	Root seed from CLI `--seed`; one-call seeding
JAX	`jax.random.PRNGKey(root_seed)`; CPU-only is deterministic
R	`set.seed(seed)` + `RNGkind("L'Ecuyer-CMRG")` in `harness.R`; `.Random.seed` captured
R threads	`OMP_NUM_THREADS=1`, `MKL_NUM_THREADS=1` set by the harness to eliminate BLAS non-determinism
Bayesian (Stan)	`cmdstanpy` seeds per chain; warmup + sampling counts in `SamplerConfig`
LLM	`temperature=0` enforced (non-zero raises `ValueError`); SHA-256 payload hashes + `ReplayClient` for offline replay
Seed registry	`seed_registry.json` in every bundle

GPU boundary. JAX on GPU is not guaranteed deterministic even with fixed seeds. The current codebase is CPU-oriented; GPU execution is not wired into the CLI. If GPU is introduced later, bundle artifacts will gain an execution_mode field and a hardware_descriptor in backend_versions.json to key the replay cache.

2.8 Observability — what the project actually uses

This project does not ship with SaaS observability integrations. Prior revisions of this doc mentioned Langfuse, Aim, Flowcept, and a full OpenTelemetry stack — none are in pyproject.toml. The operational picture:

Layer	Mechanism
Structured logging	`structlog` JSON logs, context-bound (run_id, candidate_id, gate). Configuration in `src/apmode/logging.py`.
Terminal UI	`rich` tables / progress; the CLI's `inspect`, `log`, `trace`, `graph`, `policies`, `ls` commands render to the terminal
Run ledger	The bundle is the source of truth. `search_trajectory.jsonl`, `failed_candidates.jsonl`, `gate_decisions/`, and `ranking.json` carry the full audit trail.
Gate audit	Per-gate `GateResult` under `gate_decisions/gate{1,2,2_5,3}_{candidate_id}.json` with per-check `GateCheckResult`
LLM audit	`agentic_trace/` — per-iteration `{input.json, output.json, meta.json}` including prompt hash, token counts, cost; `classical_checkpoint.json` enables `--resume-agentic`
Report provenance	`report_provenance.json` captures generator identity, component versions, timestamps
Bundle completeness	`_COMPLETE` sentinel with SHA-256 digest
R-side logging	R harness writes `logs.jsonl` to the request tempdir; orchestrator reads after completion

External observability (Langfuse, Aim, Flowcept, OTel spans/exporters) is a non-goal for 0.x — see §10.

2.9 Data security and privacy

LLM inputs. agentic_runner summarizes fit diagnostics via diagnostic_summarizer.redact_for_llm(), which uses an allow-list of field names; unknown fields fail closed. Raw row-level patient data is never sent.
Imputation-stability cherry-picking guard. When MI is active, the LLM sees only pooled + stability diagnostics (not per-imputation draws), so it cannot exploit a single lucky imputation — see diagnostic_summarizer.summarize_stability_for_llm.
Bundle access. Bundles may contain pseudonymized subject-level references. Filesystem permissions and encryption-at-rest are deployment-specific.

2.10 License compatibility

GPL-2-or-later. Apache-2 dependencies (JAX, Diffrax, structlog, tenacity, OTel SDK if added later) are compatible. License audit is a CI item.

2.11 Testing

Layer	Location
Unit	`tests/unit/` — DSL, data, backends, search, governance, routing, bundle, report
Integration	`tests/integration/` — mock R pipeline, Discovery lane, LLM providers, Suite A/B/C E2E, BLQ flows
Property-based	`tests/property/` — Hypothesis on DSL round-trip, transforms, LORO-split invariants
Golden-master	`tests/golden/` — syrupy snapshots of emitter output
Fixtures	`tests/fixtures/` — Suite A CSVs + stored policies
Live-gated	`-m live` marker — LLM providers, R subprocess, CmdStan
Policy-file validation	CI hook `governance/validate_policies.py`

Current collected count is auto-synced into README/CLAUDE.md by scripts/sync_readme.py via  markers — do not hard-code here.

2.12 CI / CD

GitHub Actions: uv sync --all-extras → pytest → mypy strict → ruff check + format. Pre-commit runs ruff + mypy + the policy validator. Matrix is Python 3.12 / 3.13 / 3.14. Suite C has a dedicated workflow in .github/workflows/suite_c_phase1.yml; other benchmark cadences are operator-driven unless a workflow exists in .github/workflows/.

3. System Architecture

              ┌────────────────────────────────────────────┐
              │            apmode CLI (Typer)              │
              │    16 direct commands + registered groups    │
              │    datasets | explore | diff | log | trace  │
              │    lineage | graph | report | doctor | ls   │
              │    policies | bundle                        │
              └──────────────────────┬─────────────────────┘
                                     │
                        ┌────────────▼────────────┐
                        │      Orchestrator       │
                        │    (asyncio pipeline)   │
                        └────────────┬────────────┘
                                     │
         Ingest ──► Profiler ──► Initial Estimator ──► Splitter
              │                                             │
              ▼                                             ▼
       Missing-data                                   Lane Router
       directive + MI / FREM                    ┌──────────┼──────────┐
                                                ▼          ▼          ▼
                                         Classical    NODE        Bayesian
                                         (nlmixr2)   (JAX/Diffrax) (Stan/Torsten)
                                                │          │          │
                                                └───────┐  │  ┌───────┘
                                                        ▼  ▼  ▼
                                                    Agentic LLM
                                                    (transforms only,
                                                     ≤25 iters)
                                                        │
                                                        ▼
              ┌──────────────┬─────────────┬───────────────────┐
              │    Gate 1    │   Gate 2    │    Gate 2.5       │
              │  Technical   │    Lane     │   Credibility     │
              │  Validity    │ Admissibility│   (ICH M15)      │
              │  (PIT/NPDE-  │  (shrinkage, │                   │
              │   lite,      │   identif.,  │                   │
              │   CWRES,     │   NODE excl, │                   │
              │   R̂/ESS for  │   LORO-CV,   │                   │
              │   Bayesian)  │   priors)    │                   │
              └──────────────┴─────────────┴───────────────────┘
                                     │
                                     ▼
                              ┌─────────────┐
                              │   Gate 3    │
                              │  Ranking    │
                              │ (Borda or   │
                              │  weighted;  │
                              │  VPC + NPE  │
                              │  + AUC/Cmax)│
                              └──────┬──────┘
                                     │
                                     ▼
                     ┌──────────────────────────────┐
                     │  Reproducibility Bundle       │
                     │  (Pydantic → JSON/JSONL       │
                     │   + _COMPLETE sentinel with   │
                     │   SHA-256 digest)             │
                     └──────────────────────────────┘

3.1 Component inventory (verified against HEAD)

All paths rooted at src/apmode/.

DSL

File	Role
`dsl/pk_grammar.lark`	Lark EBNF (5 blocks)
`dsl/grammar.py`	`compile_dsl` entry (parse + AST build, 10 KB guard)
`dsl/transformer.py`	Parse tree → AST
`dsl/ast_models.py`	`DSLSpec` + all module Pydantic nodes
`dsl/normalize.py`	AST canonicalization
`dsl/validator.py`	Lane-aware `validate_dsl`
`dsl/transforms.py`	9 structural transforms + `FormularTransform` union
`dsl/prior_transforms.py`	`SetPrior` transform (10th)
`dsl/priors.py`	Prior families + `_VALID_FAMILIES` admissibility matrix
`dsl/nlmixr2_emitter.py`	AST → nlmixr2 R code
`dsl/stan_emitter.py`	AST → Stan program (IOV, NODE, maturation covariates, and v0.7 absorption preview forms ⇒ `NotImplementedError`)
`dsl/frem_emitter.py`	AST → FREM-augmented nlmixr2
`dsl/_emitter_utils.py`	Shared emitter helpers

Data

File	Role
`data/schema.py`	Pandera canonical-PK schema
`data/ingest.py` + `data/adapters.py`	Format-specific ingestion
`data/dosing.py`	ADDL/II expansion, infusion events, event-table builder
`data/profiler.py`	Evidence Manifest with structured `nonlinear_clearance_signals`
`data/policy.py`	`ProfilerPolicy` loader
`data/initial_estimates.py`	NCA + unit-aware CL heuristic
`data/splitter.py`	k-fold + LORO-CV
`data/missing_data.py`	Lane-tiered directive resolver
`data/imputers.py`	R-backed MI providers
`data/categorical_encoding.py`	Binary auto-remap
`data/datasets.py`	Public dataset registry (nlmixr2data auto-fetch)
`data/types.py`	Shared typed records

Backends

File	Role
`backends/protocol.py`	`BackendRunner` Protocol + `Lane` enum
`backends/nlmixr2_runner.py`	Classical NLME (SAEM/FOCEi) via R subprocess
`backends/bayesian_runner.py`	Bayesian backend (Stan/Torsten via `cmdstanpy`)
`bayes/harness.py`	`cmdstanpy` driver + Vehtari 2021 R̂/ESS + E-BFMI + Pareto-k
`backends/frem_runner.py`	FOCE-I FREM driver for missing-data workflow
`backends/node_runner.py`	JAX/Diffrax NODE backend
`backends/node_model.py`	Bräm-style MLP with RE on input-layer weights
`backends/node_ode.py`	Mechanistic skeleton + NODE sub-function (Diffrax Tsit5)
`backends/node_trainer.py`	Optax Adam with log-space params + early stopping
`backends/node_constraints.py`	5 enumerated constraint templates
`backends/node_init.py`	Pre-trained weights + transfer learning
`backends/node_distillation.py`	Sub-function viz, surrogate fitting, fidelity
`backends/agentic_runner.py`	Closed-loop LLM improvement (≤25 iters, transforms only)
`backends/diagnostic_summarizer.py`	LLM-facing redaction + stability summarization
`backends/llm_client.py` + `llm_providers.py`	Anthropic, OpenAI, Gemini, Ollama, OpenRouter, litellm
`backends/transform_parser.py`	JSON (LLM) → `list[FormularTransform]`
`backends/predictive_summary.py`	Canonical VPC / NPE / AUC-Cmax-BE builder (single path)
`backends/r_schemas.py`	R ↔ Python Pydantic wire schemas
`backends/prompts/`	LLM prompt templates
`r/harness.R`	nlmixr2 SAEM/FOCEi harness
`r/impute.R`	mice / missRanger dispatch
`r/install_deps.R`	R-side install helper

Search, governance, orchestration, bundle, reporting

File	Role
`search/candidates.py`	Phase 1/3 candidate generation
`search/engine.py`	Multi-backend dispatch, BIC scoring, Pareto frontier
`search/stability.py`	Rubin pooling + rank-stability metrics
`governance/gates.py`	Gates 1, 2, 2.5, 3 evaluators
`governance/policy.py`	Pydantic schema for lane policies
`governance/ranking.py`	Cross-paradigm ranking (Borda / weighted sum, uniform-drop rule)
`governance/validate_policies.py`	CI policy-file validator
`orchestrator/__init__.py`	Full pipeline: ingest → profile → NCA → search → FREM/MI → gates → bundle → report
`bundle/models.py`	All Pydantic schemas (≈60 classes)
`bundle/emitter.py`	Per-artifact `write_*`; `seal()` with `_COMPLETE` sentinel
`bundle/scoring_contract.py`	Cross-paradigm scoring-contract helper
`report/renderer.py`	HTML + Markdown regulatory report
`report/credibility.py`	ICH-M15-aligned credibility assessment
`evaluation/`	(benchmark scoring utilities)
`benchmarks/`	Suite A/B/C fixtures + runners

CLI + framework

File	Role
`cli.py`	Typer app — 16 direct commands plus `bundle` / `completion` groups
`paths.py`	`APMODE_POLICIES_DIR` env override + pyproject-walk fallback
`routing.py`	Lane Router — evidence-manifest-driven dispatch
`logging.py`	`structlog` configuration
`errors.py`	`BackendError` + subtypes
`ids.py`	`sparkid`-backed run / candidate ID generation
`_version.py`	Generated by `hatch-vcs`

4. Key interface contracts

4.1 BackendRunner Protocol

# src/apmode/backends/protocol.py

from enum import StrEnum
from pathlib import Path
from typing import Protocol, runtime_checkable

class Lane(StrEnum):
    SUBMISSION   = "submission"
    DISCOVERY    = "discovery"
    OPTIMIZATION = "optimization"

@runtime_checkable
class BackendRunner(Protocol):
    async def run(
        self,
        spec: DSLSpec,
        data_manifest: DataManifest,
        initial_estimates: dict[str, float],
        seed: int,
        timeout_seconds: int | None = None,
        *,
        data_path: Path | None = None,
        split_manifest: dict[str, object] | None = None,
        gate3_policy: Gate3Config | None = None,
        nca_diagnostics: list[NCASubjectDiagnostic] | None = None,
    ) -> BackendResult: ...

Keyword arguments carry optional context needed by predictive-diagnostics helpers and the cross-paradigm ranker (Gate 3).

4.2 BackendResult (current schema)

class BackendResult(BaseModel):
    model_id: str
    backend: Literal["nlmixr2", "jax_node", "agentic_llm", "bayesian_stan"]
    converged: bool
    ofv: float | None
    aic: float | None
    bic: float | None
    parameter_estimates: dict[str, ParameterEstimate]
    eta_shrinkage: dict[str, float]
    convergence_metadata: ConvergenceMetadata
    diagnostics: DiagnosticBundle
    wall_time_seconds: float
    backend_versions: dict[str, str]
    initial_estimate_source: Literal["nca", "warm_start", "fallback"]

    # Bayesian-only (populated by BayesianRunner, None otherwise):
    posterior_diagnostics: PosteriorDiagnostics | None
    sampler_config:        SamplerConfig | None
    posterior_draws_path:  str | None  # bundle-relative
    prior_manifest_path:   str | None  # bundle-relative
    simulation_protocol_path: str | None  # bundle-relative

4.3 DiagnosticBundle (canonical, 0.5)

class DiagnosticBundle(BaseModel):
    gof: GOFMetrics
    split_gof: SplitGOFMetrics | None
    vpc: VPCSummary | None
    pit_calibration: PITCalibrationSummary | None   # 0.4.2+: the Gate 1 gated metric
    identifiability: IdentifiabilityFlags
    blq: BLQHandling  # method: "none" | "m1" | "m3" | "m4" | "m6_plus" | "m7_plus"
    npe_score: float | None
    auc_cmax_be_score: float | None
    auc_cmax_source: Literal["observed_trapezoid"] | None
    diagnostic_plots: dict[str, str]
    scoring_contract: ScoringContract  # per-candidate record of NLPD kind,
                                       #   RE treatment, integrator, BLQ, obs model,
                                       #   float precision

Canonical predictive-diagnostics path. backends/predictive_summary.py::build_predictive_diagnostics is the single function from per-subject simulation matrices to DiagnosticBundle.{vpc, npe_score, auc_cmax_be_score, auc_cmax_source}. Backends must call it atomically — partial population (e.g. VPC without NPE) is banned.

4.4 R subprocess contract

Unchanged in substance from v0.2. The R harness reads /tmp/{request_id}/request.json, writes response.json + logs.jsonl, and uses non-zero exit codes only for process failures (nlmixr2 convergence failures are reported as status="error" with error_type populated, and the runner raises a typed ConvergenceError). Timeouts kill the whole process group (os.killpg); absence of response.json after subprocess exit classifies as CrashError.

4.5 Gate interface

class GateCheckResult(BaseModel):
    check_id: str                   # e.g. "convergence", "pit_median", "shrinkage"
    passed: bool
    observed: float | bool | str
    threshold: float | str | None
    units: str | None
    evidence_ref: str | None        # bundle-relative path

class GateResult(BaseModel):
    gate_id: str
    gate_name: str
    candidate_id: str
    passed: bool                    # True iff all checks pass
    checks: list[GateCheckResult]
    summary_reason: str
    policy_version: str
    timestamp: str                  # ISO 8601

Gate 3 contract. Gate 3 is a ranking gate (every survivor passes, but Ranking.ranked_entries orders them). Configuration on Gate3Config:

Field	Role
`composite_method`	`"weighted_sum"` (Submission) or `"borda"` (Discovery / Optimization)
`vpc_weight`, `npe_weight`, `bic_weight`, `auc_cmax_weight`	Component weights
`auc_cmax_nca_min_eligible` + `auc_cmax_nca_min_eligible_fraction`	AND-combined NCA eligibility floor; below either, `auc_cmax_be_score` is set to `None` and the uniform-drop rule removes that component from the composite for every candidate
`n_posterior_predictive_sims`	Backend-emitted draws per candidate
`vpc_n_bins`	Post-hoc time bins for VPC coverage
`vpc_concordance_target`	Target coverage for the concordance score

Uniform-drop rule: if any survivor lacks a score component, that component is removed for every survivor so ranking stays apples-to-apples.

Gate 1 PIT / NPDE-lite (0.4.2). The gated calibration metric is PIT / NPDE-lite on the posterior-predictive matrix (Brendel 2006, Comets 2008), subject-robust-aggregated across p ∈ {0.05, 0.50, 0.95}. Tolerance is tol(p, n) = max(floor, z_α · sqrt(p(1−p)/n_subjects)). Lane-specific z_α and floor values are in each lane policy. VPCSummary is retained for reporting and within-paradigm concordance but is no longer a Gate 1 gate.

4.6 Credibility report schema

class CredibilityContext(BaseModel):
    candidate_id: str
    lane: Lane
    context_of_use: str
    data_adequacy_statement: str
    ai_ml_role: str | None           # present for NODE / agentic
    limitations: list[str]
    sensitivity_refs: list[str]      # bundle-relative
    prior_justification_refs: list[str]  # Bayesian — points at prior_manifest.json

class CredibilityReport(BaseModel):
    candidate_id: str
    context: CredibilityContext
    risk_map: dict[str, str]         # limitation → risk class
    evidence_refs: dict[str, str]

Consumed by report/credibility.py and rendered into the HTML/Markdown report.

5. Reproducibility bundle structure

Layout matches PRD §4.3.2 as extended through 0.5. Names are canonical; drift breaks apmode validate.

runs/
└── {run_id}/                              # sparkid
    ├── _COMPLETE                          # JSON: {schema_version, run_id,
    │                                      #   file_digests: {path → sha256}}
    ├── data_manifest.json
    ├── split_manifest.json
    ├── evidence_manifest.json             # profiler policy SHA embedded
    ├── missing_data_directive.json
    ├── imputation_stability.json          # present when MI is active
    ├── categorical_encoding_provenance.json
    ├── initial_estimates.json
    ├── nca_diagnostics.jsonl
    ├── seed_registry.json
    ├── policy_file.json                   # versioned gate thresholds (copy of lane policy)
    ├── backend_versions.json              # Python / R / nlmixr2 / CmdStan / hardware
    ├── search_trajectory.jsonl            # per-candidate BIC/OFV/convergence
    ├── failed_candidates.jsonl            # per-check gate failures
    ├── candidate_lineage.json             # DAG edges (parent → child + label)
    ├── search_graph.json                  # full DAG for `apmode graph` (when --agentic)
    ├── classical_checkpoint.json          # enables `--resume-agentic`
    ├── ranking.json                       # Gate 3 output
    ├── report_provenance.json
    ├── gate_decisions/
    │   ├── gate1_{candidate_id}.json
    │   ├── gate2_{candidate_id}.json
    │   ├── gate25_{candidate_id}.json
    │   └── gate3_{candidate_id}.json
    ├── compiled_specs/
    │   ├── {candidate_id}.json            # DSLSpec (Pydantic)
    │   └── {candidate_id}.R               # lowered R code
    ├── results/
    │   ├── {candidate_id}_result.json          # BackendResult
    │   └── {candidate_id}_seed_{i}_result.json # multi-seed runs
    ├── bayesian/                          # when a Bayesian candidate was fit
    │   ├── prior_manifest.json            # prior_manifest_path points here
    │   ├── simulation_protocol.json
    │   ├── mcmc_diagnostics.json          # R̂/ESS/E-BFMI/Pareto-k
    │   └── posterior_draws/{candidate_id}.parquet
    ├── loro_cv/                           # Optimization lane only
    │   └── {candidate_id}_folds.json
    ├── credibility/
    │   └── {candidate_id}_credibility.json
    ├── agentic_trace/                     # when --agentic
    │   ├── {iteration_id}_input.json
    │   ├── {iteration_id}_output.json
    │   └── {iteration_id}_meta.json
    ├── run_lineage.json                   # multi-run provenance
    ├── report.html                        # regulatory report (HTML)
    └── report.md                          # regulatory report (Markdown)

JSON/JSONL artifacts are Pydantic-validated before writing. Binary outputs (PNGs, parquet draws, model weights) are checksummed and referenced via manifest entries. _COMPLETE is written atomically as the last step of a successful run; its absence signals an incomplete bundle and its SHA-256 manifest catches post-hoc tampering.

6. Phasing — historical

Phases 1 and 2 are complete. Phase 3 is in progress per CLAUDE.md. The per-month task list from v0.2 has been removed from this doc; it is preserved in the git history at docs/ARCHITECTURE.md@v0.2 and summarized in CHANGELOG.md.

What is active today (0.6.1-rc1):

DSL grammar + compiler + validator + 10 typed transforms.
Classical NLME backend (nlmixr2, SAEM/FOCEi) with warm-start children.
Bayesian backend (Stan / Torsten via cmdstanpy) with R̂ / ESS / E-BFMI / Pareto-k Gate 1 integration.
NODE backend (Bräm-style hybrid MLP, Diffrax Tsit5, Optax Adam).
Agentic LLM backend (6 providers: Anthropic / OpenAI / Gemini / Ollama / OpenRouter / litellm; ≤ 25 iterations, transforms only).
FREM + MI-PMM + MI-missRanger missing-data pipelines.
Gate 1 (PIT / NPDE-lite) + Gate 2 (lane-specific) + Gate 2.5 (ICH M15) + Gate 3 (Borda / weighted ranking with uniform-drop rule).
Reproducibility bundle with _COMPLETE sentinel and RO-Crate projection.
Typer CLI (run, bundle inspection/reporting, HTTP serve, RO-Crate/SBOM subcommands) + HTML / Markdown regulatory report.
Benchmark Suite A (8 scenarios), Suite B perturbation anchors, and Suite C literature fixtures.
FastAPI HTTP surface (apmode serve) with loopback default, static API-key floor for non-loopback binds, SQLite run store, and cancellation lifecycle.

What remains for Phase 3 completion: NODE posterior-predictive simulation (currently inert stub), Stan-side IOV + maturation-covariate lowering, full Stan/Torsten support for the v0.7 absorption preview forms, and broader production hardening around public API deployments.

7. Risk mitigations

Risk	Mitigation
R segfault	Process isolation; kill process group on timeout; typed error classification; retry to new `attempt_id/` subdir
R stdout contamination	File-based I/O contract (§4.2)
Custom orchestrator becomes tech debt	`BackendRunner` Protocol preserves a clean swap-in boundary — no migration required today
DSL grammar too rigid	Two-track extensibility (new module ADR vs. enum extension); see FORMULAR §"Extensibility"
JAX GPU non-determinism	CPU-first posture; GPU not wired into CLI; future GPU execution will key cache on `(root_seed, data_hash, code_version, hardware_descriptor)`
LLM provider versioning	Model version escrowed in `agentic_trace/*_meta.json`; verbatim output caching via `ReplayClient`
License incompatibility	GPL-2-or-later + CI license scanner
PHI/PII leakage in LLM traces	Redaction via `diagnostic_summarizer.redact_for_llm()` allow-list; imputation-stability cherry-picking guard
Bundle drift	`_COMPLETE` sentinel with SHA-256 digest; `apmode validate` refuses unsealed bundles
Cross-paradigm NLPD incomparability	`ScoringContract` is per-candidate; cross-paradigm ranking uses simulation-based metrics (VPC + NPE + AUC/Cmax BE) not raw NLPD
Agentic LLM cherry-picking across imputations	`summarize_stability_for_llm` substitutes pooled + stability scores for raw per-imputation diagnostics

8. Dependency summary

See pyproject.toml and §2.1. Binding version floors: python >= 3.12,<3.15, pydantic >= 2.7, lark >= 1.2, pandera >= 0.18, cmdstanpy >= 1.2 (bayesian extra), jax >= 0.4.30 (node + test extras), anthropic >= 0.39 (llm extra).

9. Open decisions (current)

Decided items from v0.2 are closed (licensing: GPL-2-or-later; covariate missingness: lane-tiered FREM / MI-PMM / MI-missRanger per data/missing_data.py; initial-estimate strategy: NCA-seeded + warm-start per §2.5). Items still open:

DSL extensibility process — Track 1 (new module types) needs an ADR template and a pharmacometric-review workflow. PRD §10 Q1.
Regulatory engagement timing — when to seek informal FDA / EMA feedback on the credibility framework (PRD §10 Q4).
NODE posterior-predictive simulation — NodeBackendRunner.sample_posterior_predictive is a discoverable inert stub; Phase 3 completion item. Concrete integration point: backends/node_trainer.py.

10. Non-goals

Items intentionally not in scope for 0.x. If and when these are revisited, this section is the canonical record of why they were deferred.

Docker / Docker-Compose / K8s. Users run natively via uv; containerization is out of scope.
Flyte / Temporal migration. BackendRunner is the escape hatch if it becomes necessary, but the current scale does not justify the orchestrator rewrite.
Langfuse / Aim / Flowcept integrations. The bundle is the run ledger; SaaS observability is not planned. Users who need it can shim on top of structlog JSON output.
Rust parser migration. Lark in Python is adequate; the Phase-2 LALRPOP+PyO3 branch was considered and rejected.
Web UI. A minimal browser UI was prototyped and removed; the CLI + HTML report is the current UX.

Cross-references

CLAUDE.md — operational guidance for Claude Code sessions in this repo.
docs/PRD_APMODE_v0.3.md — product requirements (source of truth for scope).
docs/FORMULAR.md — language reference for the DSL.
docs/PROFILER_REFINEMENT_PLAN.md — derivation + citations for profiler policy defaults.
docs/adr/ — Architecture Decision Records; review 0001-review-deferrals.md before re-filing a finding on from __future__ import annotations, Pyright, God-module decomposition, FREM goldens, type: ignore audits, or module-level Rich Consoles.
policies/*.json — versioned gate policies per lane.
CHANGELOG.md — per-release deltas; version history for this document's factual claims.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APMODE Technical Architecture

1. Design Principles

2. Technology Stack

2.1 Core stack — what is actually installed

2.2 DSL and compiler

2.3 Orchestration

2.4 Data ingestion and validation

2.5 Initial-estimate strategy

2.6 Reproducibility and artifacts

2.7 Determinism and seed management

2.8 Observability — what the project actually uses

2.9 Data security and privacy

2.10 License compatibility

2.11 Testing

2.12 CI / CD

3. System Architecture

3.1 Component inventory (verified against HEAD)

DSL

Data

Backends

Search, governance, orchestration, bundle, reporting

CLI + framework

4. Key interface contracts

4.1 BackendRunner Protocol

4.2 BackendResult (current schema)

4.3 DiagnosticBundle (canonical, 0.5)

4.4 R subprocess contract

4.5 Gate interface

4.6 Credibility report schema

5. Reproducibility bundle structure

6. Phasing — historical

7. Risk mitigations

8. Dependency summary

9. Open decisions (current)

10. Non-goals

Cross-references

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

APMODE Technical Architecture

1. Design Principles

2. Technology Stack

2.1 Core stack — what is actually installed

2.2 DSL and compiler

2.3 Orchestration

2.4 Data ingestion and validation

2.5 Initial-estimate strategy

2.6 Reproducibility and artifacts

2.7 Determinism and seed management

2.8 Observability — what the project actually uses

2.9 Data security and privacy

2.10 License compatibility

2.11 Testing

2.12 CI / CD

3. System Architecture

3.1 Component inventory (verified against HEAD)

DSL

Data

Backends

Search, governance, orchestration, bundle, reporting

CLI + framework

4. Key interface contracts

4.1 BackendRunner Protocol

4.2 BackendResult (current schema)

4.3 DiagnosticBundle (canonical, 0.5)

4.4 R subprocess contract

4.5 Gate interface

4.6 Credibility report schema

5. Reproducibility bundle structure

6. Phasing — historical

7. Risk mitigations

8. Dependency summary

9. Open decisions (current)

10. Non-goals

Cross-references