feat: add context-aware adaptive risk scoring#157
Conversation
|
Warning Review limit reached
More reviews will be available in 43 minutes and 14 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughIntroduces a YAML-configurable ChangesAdaptive Risk Scoring Engine
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_risk_scoring.py (1)
65-174: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd regression tests for policy validation and negative-value clamping.
Current tests miss two critical contracts: rejecting invalid policy weights/thresholds and flooring negative continuous inputs to
0.0. Adding these will prevent silent scoring regressions.Suggested test cases
+def test_yaml_invalid_weight_sum_raises(tmp_path): + bad = tmp_path / "risk_policy.yaml" + bad.write_text( + _VALID_POLICY.replace("weight: 0.10", "weight: 0.50"), + encoding="utf-8", + ) + with pytest.raises(ValueError): + AdaptiveRiskScorer(policy_path=bad) + + +def test_negative_continuous_signals_floor_to_zero(scorer): + normalized = scorer._normalize_signals({ + "repeated_approach": -3, + "loitering": -15.0, + }) + assert normalized["repeated_approach"] == 0.0 + assert normalized["loitering"] == 0.0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_risk_scoring.py` around lines 65 - 174, Add two new test functions to cover missing regression scenarios for the AdaptiveRiskScorer class. First, create a test that verifies the scorer rejects or handles invalid policy weights and thresholds that do not sum to 1.0 or fall outside acceptable ranges. Second, add a test that passes negative values for continuous signal inputs (like negative loitering or repeated_approach values) to the _normalize_signals or score methods and verifies they are clamped to 0.0 rather than producing incorrect results. These tests will ensure the implementation properly validates configuration and handles edge cases in signal normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/reasoning/pipeline.py`:
- Around line 239-241: The is_after_hours calculation at line 239-241 uses
datetime.datetime.now().hour which depends on the server's current time, causing
the same event to be scored differently based on when the pipeline runs. Instead
of using datetime.datetime.now().hour, extract the event timestamp, apply the
configured timezone to it, and then check if the event's hour falls within the
configured after-hours window (currently hardcoded as 20 or less than 6).
Replace the server time dependency with event-based time logic that uses both
the event timestamp and timezone configuration.
- Line 236: The approach_count variable is currently counting the string
"repeated_approach" in seq.action_summary, which is fragile and format-dependent
and can lead to incorrect counts. Instead, iterate through seq.events directly
and count the number of events that contain ActionHint.REPEATED_APPROACH to
extract this signal in a stable, structured way that is independent of text
formatting.
In `@services/reasoning/risk_scoring.py`:
- Around line 166-167: The condition at lines 166-167 that checks if value > 0.0
is too permissive for the reasoning_confidence factor, causing low confidence
values like 0.1 to incorrectly receive the "High reasoning confidence" label.
Replace the simple value > 0.0 check with a more appropriate threshold (such as
checking if the value exceeds a meaningful confidence cutoff like 0.5 or 0.7) to
ensure only actually high confidence values produce that factor label.
Alternatively, implement conditional logic to apply different factor labels
based on confidence tiers if low and medium confidence levels should be
distinguished.
- Around line 139-140: The normalization expressions for "repeated_approach" and
"loitering" only clamp to the upper bound of 1.0 using min(), but do not clamp
to the lower bound of 0.0. This allows negative normalized values to be
returned, which can suppress total risk calculation. Wrap both the
"repeated_approach" and "loitering" normalization expressions with an additional
max() function call to ensure the result is clamped between 0.0 and 1.0,
preventing negative values from affecting the risk scoring.
- Around line 195-201: The _load_policy method currently only validates that
required sections exist in the policy data but does not enforce constraints on
the actual values within those sections (weights, thresholds, normalization
values). Add comprehensive schema validation after the section presence checks
to verify that weights and thresholds are valid numeric values within acceptable
ranges, and that normalization parameters meet expected constraints. This
ensures invalid configurations are caught at load-time rather than causing
silent failures during risk score calculations.
- Line 29: The risk_scoring.py module imports the yaml module, but PyYAML is not
declared as a dependency in services/reasoning/requirements.txt. This will cause
a ModuleNotFoundError when the service is deployed independently. Add the line
pyyaml>=6.0 to the services/reasoning/requirements.txt file to ensure the
required dependency is installed.
---
Outside diff comments:
In `@tests/test_risk_scoring.py`:
- Around line 65-174: Add two new test functions to cover missing regression
scenarios for the AdaptiveRiskScorer class. First, create a test that verifies
the scorer rejects or handles invalid policy weights and thresholds that do not
sum to 1.0 or fall outside acceptable ranges. Second, add a test that passes
negative values for continuous signal inputs (like negative loitering or
repeated_approach values) to the _normalize_signals or score methods and
verifies they are clamped to 0.0 rather than producing incorrect results. These
tests will ensure the implementation properly validates configuration and
handles edge cases in signal normalization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 23b60538-b348-4b80-a680-838d2c067487
📒 Files selected for processing (4)
configs/risk_policy.yamlservices/reasoning/pipeline.pyservices/reasoning/risk_scoring.pytests/test_risk_scoring.py
| import datetime | ||
| current_hour = datetime.datetime.now().hour | ||
| is_after_hours = current_hour >= 20 or current_hour < 6 |
There was a problem hiding this comment.
After-hours currently depends on server clock, not event time.
Line 239–241 makes severity depend on when the pipeline runs, so replaying the same track at a different hour can produce a different score. Compute after-hours from event timestamp + configured timezone/window instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/reasoning/pipeline.py` around lines 239 - 241, The is_after_hours
calculation at line 239-241 uses datetime.datetime.now().hour which depends on
the server's current time, causing the same event to be scored differently based
on when the pipeline runs. Instead of using datetime.datetime.now().hour,
extract the event timestamp, apply the configured timezone to it, and then check
if the event's hour falls within the configured after-hours window (currently
hardcoded as 20 or less than 6). Replace the server time dependency with
event-based time logic that uses both the event timestamp and timezone
configuration.
| from pathlib import Path | ||
| from typing import Any, TypedDict | ||
|
|
||
| import yaml |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Candidate dependency manifests =="
fd -HI 'pyproject.toml|requirements.*|Pipfile|setup.py|poetry.lock'
echo
echo "== Entries mentioning yaml/pyyaml =="
rg -n --hidden \
--iglob 'pyproject.toml' \
--iglob 'requirements*.txt' \
--iglob 'Pipfile' \
--iglob 'setup.py' \
--iglob 'poetry.lock' \
'(?i)\bpyyaml\b|\byaml\b'Repository: Devnil434/Eagle
Length of output: 14533
🏁 Script executed:
cat services/reasoning/requirements.txtRepository: Devnil434/Eagle
Length of output: 148
Add PyYAML to services/reasoning/requirements.txt.
The module imports yaml but PyYAML is not listed in the service's dependency manifest. If services/reasoning/ is deployed independently, startup will fail with ModuleNotFoundError. Add pyyaml>=6.0 to services/reasoning/requirements.txt.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/reasoning/risk_scoring.py` at line 29, The risk_scoring.py module
imports the yaml module, but PyYAML is not declared as a dependency in
services/reasoning/requirements.txt. This will cause a ModuleNotFoundError when
the service is deployed independently. Add the line pyyaml>=6.0 to the
services/reasoning/requirements.txt file to ensure the required dependency is
installed.
| for section in ("risk_scoring", "risk_levels", "normalization"): | ||
| if section not in data: | ||
| raise ValueError( | ||
| f"Risk policy missing required section: '{section}'" | ||
| ) | ||
|
|
||
| return data |
There was a problem hiding this comment.
Enforce full policy schema constraints at load-time.
_load_policy currently checks section presence only. Invalid weights/thresholds/normalization values pass startup and produce silently wrong risk scores.
Proposed hard validation in loader
for section in ("risk_scoring", "risk_levels", "normalization"):
if section not in data:
raise ValueError(
f"Risk policy missing required section: '{section}'"
)
+
+ weights = []
+ for signal, cfg in data["risk_scoring"].items():
+ weight = cfg.get("weight")
+ if not isinstance(weight, (int, float)) or weight < 0:
+ raise ValueError(f"Invalid weight for '{signal}': {weight}")
+ weights.append(float(weight))
+ if abs(sum(weights) - 1.0) > 1e-6:
+ raise ValueError("Risk policy weights must sum to 1.0")
+
+ low_max = data["risk_levels"].get("low_max")
+ medium_max = data["risk_levels"].get("medium_max")
+ if not (isinstance(low_max, (int, float)) and isinstance(medium_max, (int, float))):
+ raise ValueError("risk_levels.low_max and medium_max must be numeric")
+ if not (0 <= low_max <= medium_max <= 100):
+ raise ValueError("Risk thresholds must satisfy 0 <= low_max <= medium_max <= 100")
+
+ loitering_max = data["normalization"].get("loitering_max_seconds")
+ approach_max = data["normalization"].get("repeated_approach_max_count")
+ if not (isinstance(loitering_max, (int, float)) and loitering_max > 0):
+ raise ValueError("normalization.loitering_max_seconds must be > 0")
+ if not (isinstance(approach_max, (int, float)) and approach_max > 0):
+ raise ValueError("normalization.repeated_approach_max_count must be > 0")
return data🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/reasoning/risk_scoring.py` around lines 195 - 201, The _load_policy
method currently only validates that required sections exist in the policy data
but does not enforce constraints on the actual values within those sections
(weights, thresholds, normalization values). Add comprehensive schema validation
after the section presence checks to verify that weights and thresholds are
valid numeric values within acceptable ranges, and that normalization parameters
meet expected constraints. This ensures invalid configurations are caught at
load-time rather than causing silent failures during risk score calculations.
Related Issue
Closes #152
Overview
This PR introduces a Context-Aware Adaptive Risk Scoring Engine that replaces the existing static severity calculation approach with a configurable, policy-driven scoring mechanism.
The implementation enhances alert prioritization by considering contextual signals such as restricted-zone presence, repeated approaches, loitering behavior, after-hours activity, and reasoning confidence while preserving all existing workflows, APIs, and frontend functionality.
What Changed
1. Added YAML-Based Risk Policy Configuration
New File
Introduced configurable risk weights for contextual signals:
This allows risk-scoring behavior to be adjusted without modifying application code.
2. Added Adaptive Risk Scoring Engine
New File
Implemented a reusable
AdaptiveRiskScorerthat:The scorer is designed to be modular and extensible for future risk signals.
3. Integrated Scoring into Existing Reasoning Pipeline
Modified File
Updated the existing severity calculation workflow to use the new adaptive scoring engine internally.
Key points:
The adaptive scorer now powers severity calculation while preserving the current system behavior.
4. Added Unit Tests
New File
Added tests covering:
🔄 Updated Workflow
Previous Workflow
Updated Workflow
The adaptive scorer now evaluates contextual signals before producing the final severity score used by the existing alerting pipeline.
🧪 Testing
All existing tests continue to pass.
Additional tests were added to validate:
📌 Summary
This PR introduces a configurable, context-aware risk scoring system that improves alert prioritization while maintaining full backward compatibility with Eagle's existing architecture and workflows.
Summary by CodeRabbit
Release Notes
New Features
Tests