Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .env.minimal
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ MORALSTACK_UI_PASSWORD=admin
MORALSTACK_RISK_MODEL=gpt-4o-mini
# if MORALSTACK_RISK_PARALLEL_ESTIMATORS = true then the following models are used for parallel estimation
MORALSTACK_RISK_INTENT_MODEL=gpt-4o
MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o-mini
MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o
MORALSTACK_RISK_OPERATIONAL_MODEL=gpt-4o-mini
MORALSTACK_RISK_LOW_THRESHOLD=0.25
MORALSTACK_RISK_MEDIUM_THRESHOLD=0.65
Expand Down Expand Up @@ -180,6 +180,12 @@ MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MIN_WEIGHTED_APPROVAL=0.78
MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MAX_SEMANTIC_HARM=0.35
MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MIN_PER_PERSPECTIVE_APPROVAL=0.70

# -----------------------------------------------------------------------------
# OpenAI-compatible bridge server (scripts/openai_compatible_server.py)
# -----------------------------------------------------------------------------
MORALSTACK_OPENAI_COMPATIBLE_API_HOST=localhost
MORALSTACK_OPENAI_COMPATIBLE_API_PORT=8787

# -----------------------------------------------------------------------------
# Tracing & Debug
# -----------------------------------------------------------------------------
Expand Down
17 changes: 15 additions & 2 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,10 @@ MORALSTACK_UI_PASSWORD=
# See docs/modules/risk_estimator.md for full documentation of each variable.
# Model for the semantic judge (if set, overrides OPENAI_MODEL for risk only)
# MORALSTACK_RISK_MODEL=gpt-4o-mini
# if MORALSTACK_RISK_PARALLEL_ESTIMATORS = true then the following models are used for parallel estimation
# When parallel estimation is enabled, optional per-slot overrides below apply.
# If a slot is unset or empty, it inherits MORALSTACK_RISK_MODEL when set, else OPENAI_MODEL, else gpt-4o.
# MORALSTACK_RISK_INTENT_MODEL=gpt-4o
# MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o-mini
# MORALSTACK_RISK_SIGNALS_MODEL=gpt-4o
# MORALSTACK_RISK_OPERATIONAL_MODEL=gpt-4o-mini
# MORALSTACK_RISK_LOW_THRESHOLD=0.25
# MORALSTACK_RISK_MEDIUM_THRESHOLD=0.65
Expand Down Expand Up @@ -204,6 +205,18 @@ MORALSTACK_UI_PASSWORD=
# MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MAX_SEMANTIC_HARM=0.35
# MORALSTACK_ORCHESTRATOR_CYCLE1_EARLY_CONVERGENCE_MIN_PER_PERSPECTIVE_APPROVAL=0.70

# -----------------------------------------------------------------------------
# OpenAI-compatible bridge server (scripts/openai_compatible_server.py)
# -----------------------------------------------------------------------------
# Host and port for the standalone OpenAI-compatible FastAPI bridge.
# Used to expose MoralStack as an OpenAI-compatible endpoint (e.g. for COMPL-AI).
# MORALSTACK_OPENAI_COMPATIBLE_API_HOST=localhost
# MORALSTACK_OPENAI_COMPATIBLE_API_PORT=8787
# Max concurrent in-flight requests accepted by the bridge before returning overload (HTTP 503).
# MORALSTACK_OPENAI_COMPATIBLE_MAX_INFLIGHT=8
# Retry-After seconds for temporary overload responses (HTTP 503).
# MORALSTACK_OPENAI_COMPATIBLE_RETRY_AFTER_SECONDS=10

# -----------------------------------------------------------------------------
# Tracing & Debug
# -----------------------------------------------------------------------------
Expand Down
13 changes: 9 additions & 4 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,12 +158,17 @@ See [docs/modules/openai_params.md](docs/modules/openai_params.md) for details a
| MORALSTACK_UI_USERNAME | - | Basic Auth for UI (required when running moralstack-ui) |
| MORALSTACK_UI_PASSWORD | - | Basic Auth for UI |
| MORALSTACK_CONSTITUTION_MAX_PARALLEL_AGENTS | 2 | Parallel domain agents for constitution retrieval |
| MORALSTACK_OPENAI_COMPATIBLE_MAX_INFLIGHT | 8 | OpenAI-compatible bridge max in-flight requests before HTTP 503 |
| MORALSTACK_OPENAI_COMPATIBLE_RETRY_AFTER_SECONDS | 10 | Retry-After seconds returned by OpenAI-compatible bridge overload responses |
| MORALSTACK_VERBOSE | - | Set to 1 for verbose output |

**Risk Estimator**: Optional overrides (e.g. `MORALSTACK_RISK_MODEL`, `MORALSTACK_RISK_LOW_THRESHOLD`,
`MORALSTACK_RISK_MEDIUM_THRESHOLD`, `MORALSTACK_RISK_MAX_RETRIES`, …) are listed in `.env.template` and fully documented
in [docs/modules/risk_estimator.md](docs/modules/risk_estimator.md#environment-variables). Leave them commented to use
built-in defaults (risk estimator uses the same model as `OPENAI_MODEL` when `MORALSTACK_RISK_MODEL` is not set). **In
**Risk Estimator**: Optional overrides (e.g. `MORALSTACK_RISK_MODEL`, `MORALSTACK_RISK_PARALLEL_ESTIMATORS`,
`MORALSTACK_RISK_INTENT_MODEL`, `MORALSTACK_RISK_SIGNALS_MODEL`, `MORALSTACK_RISK_OPERATIONAL_MODEL`,
`MORALSTACK_RISK_LOW_THRESHOLD`, `MORALSTACK_RISK_MEDIUM_THRESHOLD`, `MORALSTACK_RISK_MAX_RETRIES`, …) are listed in
`.env.template` and fully documented in [docs/modules/risk_estimator.md](docs/modules/risk_estimator.md#environment-variables). Leave them commented to use
built-in defaults (risk estimator uses the same model as `OPENAI_MODEL` when `MORALSTACK_RISK_MODEL` is not set). With
parallel estimators enabled, each optional `MORALSTACK_RISK_*_MODEL` slot falls back to `MORALSTACK_RISK_MODEL` if set,
otherwise `OPENAI_MODEL`, otherwise `gpt-4o`. **In
both CLI run and benchmark, risk configuration is read only from the environment (`.env`); there is no CLI override —
env is the single source of configuration.**

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ Request
[Risk Estimator] ─────────── parallel mini-estimators:
│ intent · operational risk · signal detection
│ intent · signal detection (q1–q17) · operational risk
[Policy Router] ──────────── applies domain overlay, computes action bounds
Expand Down
16 changes: 12 additions & 4 deletions docs/architecture_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,18 +295,22 @@ class RiskEstimation:
score: float # [0, 1]
confidence: float # [0, 1]
risk_category: RiskCategory
semantic_signals: list[str] # *[impl]* alias triggered_signals
semantic_signals: list[str] # *[impl]* alias triggered_signals; calibrated strings (e.g. Qn:..., request_type:...)
domain_sensitivity: str = "LOW" # LOW | MEDIUM | HIGH
operational_risk: str = "NONE" # NONE | LOW | HIGH
risk_policy_action: RiskPolicyAction = RiskPolicyAction.DELIBERATE
rationale: str = ""
intent_clarity: str = "HIGH" # For SAFE_COMPLETE routing
misuse_plausibility: str = "LOW"
actionability_risk: str = "LOW"
stated_personal_bias: bool = False # *[impl]* intent framing / falsification (prompts.py)
seeks_norm_circumvention: bool = False # *[impl]* intent framing / falsification
q13_protected_class_targeting: bool = False # *[impl]* harm-topic signal q13 (protected-class differential treatment)
estimation_mode: str = "" # *[impl]* "" | "monolithic" | "parallel"

class RiskCategory(Enum):
BENIGN = "benign"
MORALLY_NUANCED = "morally_nuanced" # Dilemmi etici
MORALLY_NUANCED = "morally_nuanced" # Ethical dilemmas
SENSITIVE = "sensitive"
POTENTIALLY_HARMFUL = "potentially_harmful"
CLEARLY_HARMFUL = "clearly_harmful"
Expand All @@ -330,8 +334,12 @@ class RiskEstimator(Protocol):
from environment variables (`MORALSTACK_RISK_*`);
see [modules/risk_estimator.md](modules/risk_estimator.md#environment-variables).

*[impl]* In `moralstack` il protocollo usa `estimate(prompt: str)`. L'implementazione è LLM-based (Policy con prompt
strutturato), non un classificatore leggero; i segnali sono semantici (es. `ethical_dilemma`, `harm_potential`).
*[impl]* In `moralstack` the protocol uses `estimate(prompt: str)`. The implementation is LLM-based (structured prompts
in `models/risk/prompts.py`): either a **monolithic** judge JSON or **three parallel mini-estimators** (intent, harm signals
q1–q17 + `domain_sensitivity`, operational risk) when `use_parallel_estimators` is enabled; merge and calibration live in
`calibration.py`. **Q17** (`minor_exploitation`) extends the harm scanner for grooming or contact targeting minors;
auxiliary fields such as `stated_personal_bias`, `seeks_norm_circumvention`, and **q13** support coherence and
falsification rules on the intent classifier.

---

Expand Down
6 changes: 6 additions & 0 deletions docs/constitution.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ Overlays are in `moralstack/constitution/data/overlays/`. Supported domains:
| Gaming | `overlays/gaming.yaml` | Gaming context |
| Coding | `overlays/coding.yaml` | Software development |
| Customer Service | `overlays/customer_service.yaml` | Customer service |
| Violent Crime | `overlays/violent_crime.yaml` | Operational instructions for real-world violent or acquisitive crime — refusal-only attractor |

### 4.2 Overlay structure

Expand Down Expand Up @@ -198,10 +199,15 @@ is optional (default `false`) and backward-compatible.
| Financial | `overlays/financial.yaml` |
| Journalism | `overlays/journalism.yaml` |
| Political | `overlays/political.yaml` |
| Violent Crime | `overlays/violent_crime.yaml` |

Other overlays (creative, education, enterprise, science, relationships, emergency, coding, children, gaming,
customer_service) remain with default `sensitive: false`.

The `violent_crime` overlay also overrides the default `sensitive_risk_floor` from `0.35` to `0.80`, so any
request landing on this domain enters the deliberative path with a high risk floor regardless of per-principle
evaluation.

### 4.4 Property `excluded` (domain exclusion)

Overlays can declare `excluded: true` to **disable** the domain for this deployment. The field is optional (default
Expand Down
27 changes: 19 additions & 8 deletions docs/modules/constitution_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ class Overlay(BaseModel):

Overlays in `moralstack/constitution/data/overlays/` include: `medical`, `legal`, `financial`, `education`, `mental_health`,
`healthcare`, `children`, `research`, `creative`, `cybersecurity`, `emergency`, `enterprise`, `journalism`, `science`,
`political`, `relationships`, `gaming`, `coding`, `customer_service`.
`political`, `relationships`, `gaming`, `coding`, `customer_service`, `violent_crime`.

---

Expand Down Expand Up @@ -216,9 +216,19 @@ Public API (`get_relevant_principles`, `detect_relevant_domains`, `get_debug_inf

## Domain Selection (DomainPrefilter)

Domains are represented via **compact keyword maps** to minimize token consumption in LLM classification. Instead of
long textual descriptions, the DomainPrefilter uses only keywords extracted from overlays (`keywords`) or from
`description` (deterministic extraction). This reduces token consumption by 50–80% during domain selection.
Domains are narrowed using **compact keyword maps** backed by YAML overlay metadata to keep token budgets small. When
each overlay declares a human-authored `description` and the provider exposes `get_domain_descriptions()`, DomainPrefilter
prints **`- {domain}: {description}` plus a Keywords line** in the classifier prompt (`moralstack/constitution/retriever.py`).
Missing descriptions cleanly fall back to the historical keywords-only line. Keywords may likewise be derived purely from
deterministic extraction when YAML lacks explicit lists.

Descriptions may embed trailing **`NOT for: …`** sentences (see many overlays beneath `constitution/data/overlays/*.yaml`)
to steer negative scoping—for example signalling that explosives requests should not collapse into a narrowly topical label.

The classifier prompt instructs the model to treat **verbatim embedded segments that use arbitrary encoding or
obfuscation** as inspectable when substantive meaning can be recovered **without fabricating absent material**; domains
should follow that recovered substance. Opaque or non-recoverable material should favour empty-domain conservatism rather
than guessed intent.

**Prefilter cache:** `DomainPrefilter.set_domain_keywords` is idempotent: the in-memory prefilter cache is cleared
only when the effective keyword map changes (canonical fingerprint over sorted domains and sorted de-duplicated
Expand All @@ -240,9 +250,9 @@ Retrieval is delegated to `ConstitutionRetriever` and uses a two-stage LLM flow:

### 1. Domain Selection (DomainPrefilter)

Domain keywords (from overlay `keywords` or extracted from `description`) are passed to the LLM as compact
descriptors. The LLM selects which domains are relevant to the query. This reduces token consumption vs. long
textual descriptions.
The LLM sees each candidate domain primarily through the description + Keywords bundle when present, selecting up to the
configured cap (see `max_prefilter_domains`). Token usage remains constrained relative to injecting full principle payloads
at this stage.

### 2. Domain Agent Evaluation

Expand Down Expand Up @@ -350,7 +360,8 @@ moralstack/constitution/data/
├── relationships.yaml
├── gaming.yaml
├── coding.yaml
└── customer_service.yaml
├── customer_service.yaml
└── violent_crime.yaml
```

---
Expand Down
Loading
Loading