Description
Currently, the Adaptive Multi-Dimensional Monitoring (AMDM) reference implementation abstracts away the actual data extraction and scoring logic required for text-based metrics (such as input_text and output_text).
If you look at simulate.py (lines 25-27), the code simply simulates these metrics as pure numerical streams using randomly generated distributions (Gaussian noise):
base = {
"latency": (100.0, 5.0),
"throughput": (50.0, 2.0),
"error_rate": (0.02, 0.005), # Simulating Axis 3 Robustness
"toxicity": (0.01, 0.003), # Simulating Axis 4 Safety
}
While this works for testing the mathematical models (EWMA, Mahalanobis distance), it leaves a significant gap for real-world deployment. The algorithm requires continuous numerical data (e.g., between 0.0 and 1.0), but there is no guidance or reference implementation on how to bridge the gap between raw input_text
Suggested Solution
The repository should be updated to include documentation, examples, or helper functions that demonstrate how to actually compute these scores for Axis 3 (Robustness) and Axis 4 (Safety).
Some suggested additions:
- Provide an LLM-as-a-judge stub: An example function demonstrating how to use a lightweight LLM call to score an [input_text] for prompt injection (Axis 3).
- Text classification example: A small snippet illustrating how to pass [output_text] through a toxicity classifier (e.g., detoxify or a safety framework like HELM) to generate the float required for the safety metric (Axis 4).
- Goal-drift tracking example: Documentation on how to track [tool_name] variance as a proxy for robustness degradation.
Description
Currently, the Adaptive Multi-Dimensional Monitoring (AMDM) reference implementation abstracts away the actual data extraction and scoring logic required for text-based metrics (such as
input_textandoutput_text).If you look at
simulate.py(lines 25-27), the code simply simulates these metrics as pure numerical streams using randomly generated distributions (Gaussian noise):While this works for testing the mathematical models (EWMA, Mahalanobis distance), it leaves a significant gap for real-world deployment. The algorithm requires continuous numerical data (e.g., between 0.0 and 1.0), but there is no guidance or reference implementation on how to bridge the gap between raw input_text
Suggested Solution
The repository should be updated to include documentation, examples, or helper functions that demonstrate how to actually compute these scores for Axis 3 (Robustness) and Axis 4 (Safety).
Some suggested additions: