fdidonato · fdidonato · May 6, 2026 · May 6, 2026
diff --git a/.env.minimal b/.env.minimal
@@ -57,7 +57,7 @@ MORALSTACK_UI_PASSWORD=admin
 MORALSTACK_RISK_MODEL=gpt-4o-mini
 # if MORALSTACK_RISK_PARALLEL_ESTIMATORS = true then the following models are used for parallel estimation
 MORALSTACK_RISK_INTENT_MODEL=gpt-4o
-MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o-mini
+MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o
 MORALSTACK_RISK_OPERATIONAL_MODEL=gpt-4o-mini
 MORALSTACK_RISK_LOW_THRESHOLD=0.25
 MORALSTACK_RISK_MEDIUM_THRESHOLD=0.65
@@ -180,6 +180,12 @@ MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MIN_WEIGHTED_APPROVAL=0.78
 MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MAX_SEMANTIC_HARM=0.35
 MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MIN_PER_PERSPECTIVE_APPROVAL=0.70
 
+# -----------------------------------------------------------------------------
+# OpenAI-compatible bridge server (scripts/openai_compatible_server.py)
+# -----------------------------------------------------------------------------
+MORALSTACK_OPENAI_COMPATIBLE_API_HOST=localhost
+MORALSTACK_OPENAI_COMPATIBLE_API_PORT=8787
+
 # -----------------------------------------------------------------------------
 # Tracing & Debug
 # -----------------------------------------------------------------------------

diff --git a/.env.template b/.env.template
@@ -73,9 +73,10 @@ MORALSTACK_UI_PASSWORD=
 # See docs/modules/risk_estimator.md for full documentation of each variable.
 # Model for the semantic judge (if set, overrides OPENAI_MODEL for risk only)
 # MORALSTACK_RISK_MODEL=gpt-4o-mini
-# if MORALSTACK_RISK_PARALLEL_ESTIMATORS = true then the following models are used for parallel estimation
+# When parallel estimation is enabled, optional per-slot overrides below apply.
+# If a slot is unset or empty, it inherits MORALSTACK_RISK_MODEL when set, else OPENAI_MODEL, else gpt-4o.
 # MORALSTACK_RISK_INTENT_MODEL=gpt-4o
-# MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o-mini
+# MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o
 # MORALSTACK_RISK_OPERATIONAL_MODEL=gpt-4o-mini
 # MORALSTACK_RISK_LOW_THRESHOLD=0.25
 # MORALSTACK_RISK_MEDIUM_THRESHOLD=0.65
@@ -204,6 +205,18 @@ MORALSTACK_UI_PASSWORD=
 # MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MAX_SEMANTIC_HARM=0.35
 # MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MIN_PER_PERSPECTIVE_APPROVAL=0.70
 
+# -----------------------------------------------------------------------------
+# OpenAI-compatible bridge server (scripts/openai_compatible_server.py)
+# -----------------------------------------------------------------------------
+# Host and port for the standalone OpenAI-compatible FastAPI bridge.
+# Used to expose MoralStack as an OpenAI-compatible endpoint (e.g. for COMPL-AI).
+# MORALSTACK_OPENAI_COMPATIBLE_API_HOST=localhost
+# MORALSTACK_OPENAI_COMPATIBLE_API_PORT=8787
+# Max concurrent in-flight requests accepted by the bridge before returning overload (HTTP 503).
+# MORALSTACK_OPENAI_COMPATIBLE_MAX_INFLIGHT=8
+# Retry-After seconds for temporary overload responses (HTTP 503).
+# MORALSTACK_OPENAI_COMPATIBLE_RETRY_AFTER_SECONDS=10
+
 # -----------------------------------------------------------------------------
 # Tracing & Debug
 # -----------------------------------------------------------------------------

diff --git a/INSTALL.md b/INSTALL.md
@@ -158,12 +158,17 @@ See [docs/modules/openai_params.md](docs/modules/openai_params.md) for details a
 | MORALSTACK_UI_USERNAME         | -                         | Basic Auth for UI (required when running moralstack-ui)        |
 | MORALSTACK_UI_PASSWORD         | -                         | Basic Auth for UI                                              |
 | MORALSTACK_CONSTITUTION_MAX_PARALLEL_AGENTS | 2          | Parallel domain agents for constitution retrieval              |
+| MORALSTACK_OPENAI_COMPATIBLE_MAX_INFLIGHT | 8           | OpenAI-compatible bridge max in-flight requests before HTTP 503 |
+| MORALSTACK_OPENAI_COMPATIBLE_RETRY_AFTER_SECONDS | 10  | Retry-After seconds returned by OpenAI-compatible bridge overload responses |
 | MORALSTACK_VERBOSE             | -                         | Set to 1 for verbose output                                    |
 
-**Risk Estimator**: Optional overrides (e.g. `MORALSTACK_RISK_MODEL`, `MORALSTACK_RISK_LOW_THRESHOLD`,
-`MORALSTACK_RISK_MEDIUM_THRESHOLD`, `MORALSTACK_RISK_MAX_RETRIES`, …) are listed in `.env.template` and fully documented
-in [docs/modules/risk_estimator.md](docs/modules/risk_estimator.md#environment-variables). Leave them commented to use
-built-in defaults (risk estimator uses the same model as `OPENAI_MODEL` when `MORALSTACK_RISK_MODEL` is not set). **In
+**Risk Estimator**: Optional overrides (e.g. `MORALSTACK_RISK_MODEL`, `MORALSTACK_RISK_PARALLEL_ESTIMATORS`,
+`MORALSTACK_RISK_INTENT_MODEL`, `MORALSTACK_RISK_SIGNALS_MODEL`, `MORALSTACK_RISK_OPERATIONAL_MODEL`,
+`MORALSTACK_RISK_LOW_THRESHOLD`, `MORALSTACK_RISK_MEDIUM_THRESHOLD`, `MORALSTACK_RISK_MAX_RETRIES`, …) are listed in
+`.env.template` and fully documented in [docs/modules/risk_estimator.md](docs/modules/risk_estimator.md#environment-variables). Leave them commented to use
+built-in defaults (risk estimator uses the same model as `OPENAI_MODEL` when `MORALSTACK_RISK_MODEL` is not set). With
+parallel estimators enabled, each optional `MORALSTACK_RISK_*_MODEL` slot falls back to `MORALSTACK_RISK_MODEL` if set,
+otherwise `OPENAI_MODEL`, otherwise `gpt-4o`. **In
 both CLI run and benchmark, risk configuration is read only from the environment (`.env`); there is no CLI override —
 env is the single source of configuration.**
 

diff --git a/README.md b/README.md
@@ -88,7 +88,7 @@ Request
   │
   ▼
 [Risk Estimator] ─────────── parallel mini-estimators:
-  │                          intent · operational risk · signal detection
+  │                          intent · signal detection (q1–q17) · operational risk
   ▼
 [Policy Router] ──────────── applies domain overlay, computes action bounds
   │

diff --git a/docs/architecture_spec.md b/docs/architecture_spec.md
@@ -295,18 +295,22 @@ class RiskEstimation:
     score: float                       # [0, 1]
     confidence: float                  # [0, 1]
     risk_category: RiskCategory
-    semantic_signals: list[str]        # *[impl]* alias triggered_signals
+    semantic_signals: list[str]        # *[impl]* alias triggered_signals; calibrated strings (e.g. Qn:..., request_type:...)
     domain_sensitivity: str = "LOW"   # LOW | MEDIUM | HIGH
     operational_risk: str = "NONE"     # NONE | LOW | HIGH
     risk_policy_action: RiskPolicyAction = RiskPolicyAction.DELIBERATE
     rationale: str = ""
     intent_clarity: str = "HIGH"       # For SAFE_COMPLETE routing
     misuse_plausibility: str = "LOW"
     actionability_risk: str = "LOW"
+    stated_personal_bias: bool = False            # *[impl]* intent framing / falsification (prompts.py)
+    seeks_norm_circumvention: bool = False       # *[impl]* intent framing / falsification
+    q13_protected_class_targeting: bool = False    # *[impl]* harm-topic signal q13 (protected-class differential treatment)
+    estimation_mode: str = ""                    # *[impl]* "" | "monolithic" | "parallel"
 
 class RiskCategory(Enum):
     BENIGN = "benign"
-    MORALLY_NUANCED = "morally_nuanced"  # Dilemmi etici
+    MORALLY_NUANCED = "morally_nuanced"  # Ethical dilemmas
     SENSITIVE = "sensitive"
     POTENTIALLY_HARMFUL = "potentially_harmful"
     CLEARLY_HARMFUL = "clearly_harmful"
@@ -330,8 +334,12 @@ class RiskEstimator(Protocol):
 from environment variables (`MORALSTACK_RISK_*`);
 see [modules/risk_estimator.md](modules/risk_estimator.md#environment-variables).
 
-*[impl]* In `moralstack` il protocollo usa `estimate(prompt: str)`. L'implementazione è LLM-based (Policy con prompt
-strutturato), non un classificatore leggero; i segnali sono semantici (es. `ethical_dilemma`, `harm_potential`).
+*[impl]* In `moralstack` the protocol uses `estimate(prompt: str)`. The implementation is LLM-based (structured prompts
+in `models/risk/prompts.py`): either a **monolithic** judge JSON or **three parallel mini-estimators** (intent, harm signals
+q1–q17 + `domain_sensitivity`, operational risk) when `use_parallel_estimators` is enabled; merge and calibration live in
+`calibration.py`. **Q17** (`minor_exploitation`) extends the harm scanner for grooming or contact targeting minors;
+auxiliary fields such as `stated_personal_bias`, `seeks_norm_circumvention`, and **q13** support coherence and
+falsification rules on the intent classifier.
 
 ---
 

diff --git a/docs/constitution.md b/docs/constitution.md
@@ -140,6 +140,7 @@ Overlays are in `moralstack/constitution/data/overlays/`. Supported domains:
 | Gaming           | `overlays/gaming.yaml`           | Gaming context                         |
 | Coding           | `overlays/coding.yaml`           | Software development                   |
 | Customer Service | `overlays/customer_service.yaml` | Customer service                       |
+| Violent Crime    | `overlays/violent_crime.yaml`    | Operational instructions for real-world violent or acquisitive crime — refusal-only attractor |
 
 ### 4.2 Overlay structure
 
@@ -198,10 +199,15 @@ is optional (default `false`) and backward-compatible.
 | Financial     | `overlays/financial.yaml`     |
 | Journalism    | `overlays/journalism.yaml`    |
 | Political     | `overlays/political.yaml`     |
+| Violent Crime | `overlays/violent_crime.yaml` |
 
 Other overlays (creative, education, enterprise, science, relationships, emergency, coding, children, gaming,
 customer_service) remain with default `sensitive: false`.
 
+The `violent_crime` overlay also overrides the default `sensitive_risk_floor` from `0.35` to `0.80`, so any
+request landing on this domain enters the deliberative path with a high risk floor regardless of per-principle
+evaluation.
+
 ### 4.4 Property `excluded` (domain exclusion)
 
 Overlays can declare `excluded: true` to **disable** the domain for this deployment. The field is optional (default

diff --git a/docs/modules/constitution_store.md b/docs/modules/constitution_store.md
@@ -110,7 +110,7 @@ class Overlay(BaseModel):
 
 Overlays in `moralstack/constitution/data/overlays/` include: `medical`, `legal`, `financial`, `education`, `mental_health`,
 `healthcare`, `children`, `research`, `creative`, `cybersecurity`, `emergency`, `enterprise`, `journalism`, `science`,
-`political`, `relationships`, `gaming`, `coding`, `customer_service`.
+`political`, `relationships`, `gaming`, `coding`, `customer_service`, `violent_crime`.
 
 ---
 
@@ -216,9 +216,19 @@ Public API (`get_relevant_principles`, `detect_relevant_domains`, `get_debug_inf
 
 ## Domain Selection (DomainPrefilter)
 
-Domains are represented via **compact keyword maps** to minimize token consumption in LLM classification. Instead of
-long textual descriptions, the DomainPrefilter uses only keywords extracted from overlays (`keywords`) or from
-`description` (deterministic extraction). This reduces token consumption by 50–80% during domain selection.
+Domains are narrowed using **compact keyword maps** backed by YAML overlay metadata to keep token budgets small. When
+each overlay declares a human-authored `description` and the provider exposes `get_domain_descriptions()`, DomainPrefilter
+prints **`- {domain}: {description}` plus a Keywords line** in the classifier prompt (`moralstack/constitution/retriever.py`).
+Missing descriptions cleanly fall back to the historical keywords-only line. Keywords may likewise be derived purely from
+deterministic extraction when YAML lacks explicit lists.
+
+Descriptions may embed trailing **`NOT for: …`** sentences (see many overlays beneath `constitution/data/overlays/*.yaml`)
+to steer negative scoping—for example signalling that explosives requests should not collapse into a narrowly topical label.
+
+The classifier prompt instructs the model to treat **verbatim embedded segments that use arbitrary encoding or
+obfuscation** as inspectable when substantive meaning can be recovered **without fabricating absent material**; domains
+should follow that recovered substance. Opaque or non-recoverable material should favour empty-domain conservatism rather
+than guessed intent.
 
 **Prefilter cache:** `DomainPrefilter.set_domain_keywords` is idempotent: the in-memory prefilter cache is cleared
 only when the effective keyword map changes (canonical fingerprint over sorted domains and sorted de-duplicated
@@ -240,9 +250,9 @@ Retrieval is delegated to `ConstitutionRetriever` and uses a two-stage LLM flow:
 
 ### 1. Domain Selection (DomainPrefilter)
 
-Domain keywords (from overlay `keywords` or extracted from `description`) are passed to the LLM as compact
-descriptors. The LLM selects which domains are relevant to the query. This reduces token consumption vs. long
-textual descriptions.
+The LLM sees each candidate domain primarily through the description + Keywords bundle when present, selecting up to the
+configured cap (see `max_prefilter_domains`). Token usage remains constrained relative to injecting full principle payloads
+at this stage.
 
 ### 2. Domain Agent Evaluation
 
@@ -350,7 +360,8 @@ moralstack/constitution/data/
     ├── relationships.yaml
     ├── gaming.yaml
     ├── coding.yaml
-    └── customer_service.yaml
+    ├── customer_service.yaml
+    └── violent_crime.yaml
 ```
 
 ---