design: updated design/router-decision-matrix.md#9
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds formal machine-readable specifications for the router decision matrix design document, introducing a normative YAML-based decision table and a JSON schema for structural validation. The changes aim to codify routing behavior explicitly and enable CI-based contract enforcement to prevent silent drift in routing logic.
Changes:
- Added Section 8 with a machine-readable YAML decision table specifying routing rules, failure handling, and invariants
- Added Section 9 with a JSON schema for validating the decision table structure and CI enforcement guidelines
- Removed the generic statement about versioned modifications (line 200) in favor of more specific ADR requirements in the new sections
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| "$schema": "https://json-schema.org/draft/2020-12/schema", | ||
| "title": "RouterDecisionMatrix", | ||
| "type": "object", | ||
| "required": ["router_version", "defaults", "rules", "invariants"], | ||
| "properties": { | ||
| "router_version": { "type": "string" }, | ||
| "defaults": { | ||
| "type": "object", | ||
| "required": ["token_threshold", "fallback_allowed"], | ||
| "properties": { | ||
| "token_threshold": { "type": "integer", "minimum": 1 }, | ||
| "fallback_allowed": { "type": "boolean" } | ||
| } | ||
| }, | ||
| "rules": { | ||
| "type": "array", | ||
| "minItems": 1, | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["id", "condition", "action"], | ||
| "properties": { | ||
| "id": { "type": "string" }, | ||
| "condition": { "type": "object" }, | ||
| "action": { "type": "object" } | ||
| } | ||
| } | ||
| }, | ||
| "failure_rules": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["id", "condition", "action"], | ||
| "properties": { | ||
| "id": { "type": "string" }, | ||
| "condition": { "type": "object" }, | ||
| "action": { "type": "object" } | ||
| } | ||
| } | ||
| }, | ||
| "invariants": { | ||
| "type": "array", | ||
| "minItems": 1, | ||
| "items": { "type": "string" } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The JSON schema does not specify "additionalProperties": false for object definitions, which means unknown fields will be silently accepted during validation. This could allow "silent contract drift" (mentioned on line 353) if extra fields are inadvertently added to the YAML.
Consider adding "additionalProperties": false to the schema objects at lines 294, 298, 310, 322 to enforce strict validation and prevent unintended fields from passing validation.
| - id: LOCAL_EXECUTION_FAILURE | ||
| condition: | ||
| route: "local" | ||
| execution_failed: true | ||
| fallback_allowed: true |
There was a problem hiding this comment.
The LOCAL_EXECUTION_FAILURE rule checks "fallback_allowed: true" as a condition (line 256), but this creates a circular dependency with the routing decision. The fallback_allowed flag should be an input from the initial routing decision, not a condition to check during failure handling.
According to Section 3 (Rule 3), fallback should occur "If route == 'local' AND execution fails" when "fallback_allowed == true". The YAML should clarify whether fallback_allowed is checked as a stored routing decision property or as a runtime condition.
| - id: LOCAL_EXECUTION_FAILURE | |
| condition: | |
| route: "local" | |
| execution_failed: true | |
| fallback_allowed: true | |
| # Failure handling is based on the stored outcome of the initial routing decision. | |
| # `initial_routing.fallback_allowed` MUST be read from the original decision, | |
| # not recomputed at failure time. | |
| - id: LOCAL_EXECUTION_FAILURE | |
| condition: | |
| route: "local" | |
| execution_failed: true | |
| initial_routing: | |
| fallback_allowed: true |
| - id: AUTO_CLOUD | ||
| condition: | ||
| privacy_level: "auto" | ||
| otherwise: true |
There was a problem hiding this comment.
The "otherwise: true" construct in the AUTO_CLOUD rule (line 245) lacks clear semantic definition. In a deterministic routing system, catch-all conditions should be explicitly defined rather than using an implicit "otherwise" keyword.
Consider either:
- Explicitly defining the negation of AUTO_LOCAL conditions
- Documenting that "otherwise: true" means "evaluate this rule if no previous rules matched"
- Adding this construct to the JSON schema validation with specific semantics
This is important for the stated goal of deterministic, machine-validatable routing behavior.
| "condition": { "type": "object" }, | ||
| "action": { "type": "object" } | ||
| } | ||
| } | ||
| }, | ||
| "failure_rules": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["id", "condition", "action"], | ||
| "properties": { | ||
| "id": { "type": "string" }, | ||
| "condition": { "type": "object" }, | ||
| "action": { "type": "object" } |
There was a problem hiding this comment.
The JSON schema defines "condition" and "action" as generic objects with no structural constraints (lines 314-315, 326-327). This provides no validation of the actual routing logic and defeats the purpose of preventing "malformed rule definitions" (line 352).
To enable meaningful CI validation, the schema should define the expected properties for conditions (e.g., privacy_level, token_count_lte_threshold, local_model_available) and actions (e.g., route, fallback_allowed, return_error, log_escalation), potentially using oneOf or anyOf for different rule patterns.
| - id: AUTO_CLOUD | ||
| condition: | ||
| privacy_level: "auto" | ||
| otherwise: true | ||
| action: | ||
| route: "cloud" | ||
| fallback_allowed: false |
There was a problem hiding this comment.
The AUTO_LOCAL rule (lines 232-240) does not include a network_state condition, but Section 2 (lines 72-79) states "Network state MUST be checked before cloud route selection." Additionally, Section 3 Rule 2 explicitly states for auto mode: "network_state is irrelevant" (line 117).
However, if the system routes to cloud in AUTO_CLOUD mode without checking network availability, cloud execution could fail unnecessarily. Consider adding network_state validation either as a condition in AUTO_CLOUD or as a separate validation step before cloud execution.
| CI SHOULD: | ||
|
|
||
| 1. Extract the YAML block from Section 8. | ||
| 2. Convert YAML to JSON deterministically. | ||
| 3. Validate against this schema. | ||
| 4. Fail build if validation fails. |
There was a problem hiding this comment.
The CI enforcement policy uses "SHOULD" (line 342) which is a weak recommendation according to RFC 2119, but then states that changes "MUST pass schema validation before merge" (line 355). This creates ambiguity about whether CI validation is optional or mandatory.
For a formal contract enforcement mechanism, consider changing "CI SHOULD:" to "CI MUST:" to match the mandatory nature of validation stated in line 355.
| - deterministic: true | ||
| - no_probabilistic_routing: true | ||
| - no_hidden_fallback: true | ||
| - no_recursive_routing: true |
There was a problem hiding this comment.
The YAML invariants structure is inconsistent with the JSON schema definition. The YAML defines invariants as key-value pairs (objects with boolean/string values like "deterministic: true"), but the JSON schema on line 334 expects an array of strings. This inconsistency will cause schema validation to fail.
Either change the YAML invariants to be an array of strings (e.g., ["deterministic", "no_probabilistic_routing", ...]) or update the JSON schema to expect an array of objects with appropriate properties.
| - deterministic: true | |
| - no_probabilistic_routing: true | |
| - no_hidden_fallback: true | |
| - no_recursive_routing: true | |
| - deterministic | |
| - no_probabilistic_routing | |
| - no_hidden_fallback | |
| - no_recursive_routing |
| - id: AUTO_LOCAL | ||
| condition: | ||
| privacy_level: "auto" | ||
| token_count_lte_threshold: true |
There was a problem hiding this comment.
The condition "token_count_lte_threshold: true" is ambiguous in a machine-readable context. The YAML uses a boolean value to represent a comparison operation, which may not be suitable for automated validation. Consider using a more explicit structure like "token_count: { lte: "threshold" }" or documenting that boolean conditions represent state checks rather than comparisons.
This affects rules AUTO_LOCAL (lines 232-240) where token_count_lte_threshold, local_model_available, and intent_supported_by_local are all boolean flags representing computed conditions.
| token_count_lte_threshold: true | |
| token_count: | |
| lte: "token_threshold" |
This pull request adds a formal, machine-readable specification for the router decision matrix, along with a JSON schema for structural validation and CI enforcement guidelines. These changes ensure that routing behavior is explicitly defined, versioned, and validated to prevent contract drift or unintended changes.
Router Decision Matrix Formalization:
Continuous Integration (CI) Enforcement:
Documentation and Process: