Skip to content

Missing concrete implementation for text-based scoring (Axis 3 & 4) / Over-reliance on synthetic numerical streams #1

@DevolRaman750

Description

@DevolRaman750

Description

Currently, the Adaptive Multi-Dimensional Monitoring (AMDM) reference implementation abstracts away the actual data extraction and scoring logic required for text-based metrics (such as input_text and output_text).

If you look at simulate.py (lines 25-27), the code simply simulates these metrics as pure numerical streams using randomly generated distributions (Gaussian noise):

    base = {
        "latency": (100.0, 5.0),
        "throughput": (50.0, 2.0),
        "error_rate": (0.02, 0.005), # Simulating Axis 3 Robustness
        "toxicity": (0.01, 0.003),   # Simulating Axis 4 Safety
    }

While this works for testing the mathematical models (EWMA, Mahalanobis distance), it leaves a significant gap for real-world deployment. The algorithm requires continuous numerical data (e.g., between 0.0 and 1.0), but there is no guidance or reference implementation on how to bridge the gap between raw input_text

Suggested Solution

The repository should be updated to include documentation, examples, or helper functions that demonstrate how to actually compute these scores for Axis 3 (Robustness) and Axis 4 (Safety).

Some suggested additions:

  • Provide an LLM-as-a-judge stub: An example function demonstrating how to use a lightweight LLM call to score an [input_text] for prompt injection (Axis 3).
  • Text classification example: A small snippet illustrating how to pass [output_text] through a toxicity classifier (e.g., detoxify or a safety framework like HELM) to generate the float required for the safety metric (Axis 4).
  • Goal-drift tracking example: Documentation on how to track [tool_name] variance as a proxy for robustness degradation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions