Skip to content

feat: add context-aware adaptive risk scoring#157

Open
varshini-nandula wants to merge 2 commits into
Devnil434:mainfrom
varshini-nandula:feat/context-aware-risk-scoring
Open

feat: add context-aware adaptive risk scoring#157
varshini-nandula wants to merge 2 commits into
Devnil434:mainfrom
varshini-nandula:feat/context-aware-risk-scoring

Conversation

@varshini-nandula

@varshini-nandula varshini-nandula commented Jun 17, 2026

Copy link
Copy Markdown

Related Issue

Closes #152

Overview

This PR introduces a Context-Aware Adaptive Risk Scoring Engine that replaces the existing static severity calculation approach with a configurable, policy-driven scoring mechanism.

The implementation enhances alert prioritization by considering contextual signals such as restricted-zone presence, repeated approaches, loitering behavior, after-hours activity, and reasoning confidence while preserving all existing workflows, APIs, and frontend functionality.


What Changed

1. Added YAML-Based Risk Policy Configuration

New File

configs/risk_policy.yaml

Introduced configurable risk weights for contextual signals:

  • Restricted zone presence
  • Repeated approach behavior
  • Loitering / dwell time
  • After-hours activity
  • Reasoning confidence

This allows risk-scoring behavior to be adjusted without modifying application code.


2. Added Adaptive Risk Scoring Engine

New File

services/reasoning/risk_scoring.py

Implemented a reusable AdaptiveRiskScorer that:

  • Loads risk weights from YAML
  • Normalizes contextual signals
  • Calculates weighted risk scores
  • Produces risk levels (Low / Medium / High)
  • Returns explainable contributing factors

The scorer is designed to be modular and extensible for future risk signals.


3. Integrated Scoring into Existing Reasoning Pipeline

Modified File

services/reasoning/pipeline.py

Updated the existing severity calculation workflow to use the new adaptive scoring engine internally.

Key points:

  • No existing API contracts were changed
  • No frontend changes were required
  • Existing response structures remain intact
  • Existing severity-based alert prioritization continues to function

The adaptive scorer now powers severity calculation while preserving the current system behavior.


4. Added Unit Tests

New File

tests/test_risk_scoring.py

Added tests covering:

  • YAML policy loading
  • Signal normalization
  • Weighted score calculation
  • Risk level classification
  • Output structure validation
  • Deterministic scoring behavior

🔄 Updated Workflow

Previous Workflow

Detection
    ↓
Tracking
    ↓
Temporal Memory
    ↓
VLM
    ↓
LLM Reasoning
    ↓
Static Severity Calculation
    ↓
Alert Generation

Updated Workflow

Detection
    ↓
Tracking
    ↓
Temporal Memory
    ↓
VLM
    ↓
LLM Reasoning
    ↓
Adaptive Risk Scoring Engine
    ↓
Severity Score
    ↓
Alert Generation

The adaptive scorer now evaluates contextual signals before producing the final severity score used by the existing alerting pipeline.


🧪 Testing

All existing tests continue to pass.

Additional tests were added to validate:

  • Policy loading
  • Scoring calculations
  • Risk classification
  • Pipeline integration behavior

📌 Summary

This PR introduces a configurable, context-aware risk scoring system that improves alert prioritization while maintaining full backward compatibility with Eagle's existing architecture and workflows.

Summary by CodeRabbit

Release Notes

  • New Features

    • Implemented an adaptive risk-scoring system that evaluates multiple contextual signals including restricted zone proximity, repeated approaches, loitering duration, and after-hours activity.
    • Risk scores are now normalized on a 0–100 scale and classified into Low, Medium, and High severity levels.
    • Added configurable risk-assessment policies to fine-tune scoring weights and classification thresholds.
  • Tests

    • Added comprehensive test coverage for the risk-scoring system, including signal normalization, weighted scoring, and severity classification validation.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@varshini-nandula, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 43 minutes and 14 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fdf5c580-3bc1-4985-bf98-63a031ba1627

📥 Commits

Reviewing files that changed from the base of the PR and between f0dd265 and b7370c6.

📒 Files selected for processing (3)
  • services/reasoning/pipeline.py
  • services/reasoning/risk_scoring.py
  • tests/test_risk_scoring.py
📝 Walkthrough

Walkthrough

Introduces a YAML-configurable AdaptiveRiskScorer in services/reasoning/risk_scoring.py that computes a normalized 0–100 risk score from contextual signals. ReasoningPipeline is updated to inject this scorer, replacing the removed static _W weights. A YAML policy file and a deterministic pytest suite are added alongside.

Changes

Adaptive Risk Scoring Engine

Layer / File(s) Summary
YAML policy config and RiskScoringResult contract
configs/risk_policy.yaml, services/reasoning/risk_scoring.py
Defines signal weights, classification thresholds (Low/Medium/High), and normalization limits in YAML; declares the RiskScoringResult TypedDict and module-level signal-label mapping constants.
AdaptiveRiskScorer implementation
services/reasoning/risk_scoring.py
Implements __init__ (YAML loading with FileNotFoundError/ValueError validation and section checks), score (normalization → weighted aggregation → 0–100 scaling → classification → factor extraction), and all helper methods.
ReasoningPipeline wiring and _attach_severity rewrite
services/reasoning/pipeline.py
Adds AdaptiveRiskScorer import, removes static _W weight dict, injects optional risk_scorer into the constructor, and rewrites _attach_severity to derive contextual signals (restricted zone, repeated approach, dwell/loitering, after-hours, confidence) and delegate to self._risk_scorer.score.
Test suite
tests/test_risk_scoring.py
Deterministic pytest suite with fixtures for temporary YAML policy; covers YAML loading, normalization ratios, clamping, weighted scores for inactive/active signal scenarios, classification boundary verification, and result structure invariants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Devnil434/Eagle#52: Introduces a RiskAnalyzer that computes a capped 0–100 risk score from zone/object/person factors — directly overlaps with the weighted multi-signal risk scoring logic introduced by AdaptiveRiskScorer.

Poem

🐇 A rabbit once weighed every sign,
From loitering dwell to the after-hours shrine,
With YAML in paw and weights summed to one,
The score climbs to High when the signals have won,
Low, Medium, High — now the risk engine's fine! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add context-aware adaptive risk scoring' clearly and concisely summarizes the main change—introducing an adaptive risk scoring system that is context-aware.
Linked Issues check ✅ Passed All primary objectives from issue #152 are met: YAML-based configurable policy [152], contextual signal evaluation (restricted zone, loitering, repeated approach, after-hours, reasoning confidence) [152], normalized risk scoring with Low/Medium/High classification [152], integration into reasoning pipeline [152], and comprehensive test coverage [152].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #152 objectives: risk policy configuration, adaptive scorer implementation, pipeline integration, and tests. No unrelated modifications or scope creep detected.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_risk_scoring.py (1)

65-174: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add regression tests for policy validation and negative-value clamping.

Current tests miss two critical contracts: rejecting invalid policy weights/thresholds and flooring negative continuous inputs to 0.0. Adding these will prevent silent scoring regressions.

Suggested test cases
+def test_yaml_invalid_weight_sum_raises(tmp_path):
+    bad = tmp_path / "risk_policy.yaml"
+    bad.write_text(
+        _VALID_POLICY.replace("weight: 0.10", "weight: 0.50"),
+        encoding="utf-8",
+    )
+    with pytest.raises(ValueError):
+        AdaptiveRiskScorer(policy_path=bad)
+
+
+def test_negative_continuous_signals_floor_to_zero(scorer):
+    normalized = scorer._normalize_signals({
+        "repeated_approach": -3,
+        "loitering": -15.0,
+    })
+    assert normalized["repeated_approach"] == 0.0
+    assert normalized["loitering"] == 0.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_risk_scoring.py` around lines 65 - 174, Add two new test functions
to cover missing regression scenarios for the AdaptiveRiskScorer class. First,
create a test that verifies the scorer rejects or handles invalid policy weights
and thresholds that do not sum to 1.0 or fall outside acceptable ranges. Second,
add a test that passes negative values for continuous signal inputs (like
negative loitering or repeated_approach values) to the _normalize_signals or
score methods and verifies they are clamped to 0.0 rather than producing
incorrect results. These tests will ensure the implementation properly validates
configuration and handles edge cases in signal normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/reasoning/pipeline.py`:
- Around line 239-241: The is_after_hours calculation at line 239-241 uses
datetime.datetime.now().hour which depends on the server's current time, causing
the same event to be scored differently based on when the pipeline runs. Instead
of using datetime.datetime.now().hour, extract the event timestamp, apply the
configured timezone to it, and then check if the event's hour falls within the
configured after-hours window (currently hardcoded as 20 or less than 6).
Replace the server time dependency with event-based time logic that uses both
the event timestamp and timezone configuration.
- Line 236: The approach_count variable is currently counting the string
"repeated_approach" in seq.action_summary, which is fragile and format-dependent
and can lead to incorrect counts. Instead, iterate through seq.events directly
and count the number of events that contain ActionHint.REPEATED_APPROACH to
extract this signal in a stable, structured way that is independent of text
formatting.

In `@services/reasoning/risk_scoring.py`:
- Around line 166-167: The condition at lines 166-167 that checks if value > 0.0
is too permissive for the reasoning_confidence factor, causing low confidence
values like 0.1 to incorrectly receive the "High reasoning confidence" label.
Replace the simple value > 0.0 check with a more appropriate threshold (such as
checking if the value exceeds a meaningful confidence cutoff like 0.5 or 0.7) to
ensure only actually high confidence values produce that factor label.
Alternatively, implement conditional logic to apply different factor labels
based on confidence tiers if low and medium confidence levels should be
distinguished.
- Around line 139-140: The normalization expressions for "repeated_approach" and
"loitering" only clamp to the upper bound of 1.0 using min(), but do not clamp
to the lower bound of 0.0. This allows negative normalized values to be
returned, which can suppress total risk calculation. Wrap both the
"repeated_approach" and "loitering" normalization expressions with an additional
max() function call to ensure the result is clamped between 0.0 and 1.0,
preventing negative values from affecting the risk scoring.
- Around line 195-201: The _load_policy method currently only validates that
required sections exist in the policy data but does not enforce constraints on
the actual values within those sections (weights, thresholds, normalization
values). Add comprehensive schema validation after the section presence checks
to verify that weights and thresholds are valid numeric values within acceptable
ranges, and that normalization parameters meet expected constraints. This
ensures invalid configurations are caught at load-time rather than causing
silent failures during risk score calculations.
- Line 29: The risk_scoring.py module imports the yaml module, but PyYAML is not
declared as a dependency in services/reasoning/requirements.txt. This will cause
a ModuleNotFoundError when the service is deployed independently. Add the line
pyyaml>=6.0 to the services/reasoning/requirements.txt file to ensure the
required dependency is installed.

---

Outside diff comments:
In `@tests/test_risk_scoring.py`:
- Around line 65-174: Add two new test functions to cover missing regression
scenarios for the AdaptiveRiskScorer class. First, create a test that verifies
the scorer rejects or handles invalid policy weights and thresholds that do not
sum to 1.0 or fall outside acceptable ranges. Second, add a test that passes
negative values for continuous signal inputs (like negative loitering or
repeated_approach values) to the _normalize_signals or score methods and
verifies they are clamped to 0.0 rather than producing incorrect results. These
tests will ensure the implementation properly validates configuration and
handles edge cases in signal normalization.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 23b60538-b348-4b80-a680-838d2c067487

📥 Commits

Reviewing files that changed from the base of the PR and between 3429ec9 and f0dd265.

📒 Files selected for processing (4)
  • configs/risk_policy.yaml
  • services/reasoning/pipeline.py
  • services/reasoning/risk_scoring.py
  • tests/test_risk_scoring.py

Comment thread services/reasoning/pipeline.py Outdated
Comment thread services/reasoning/pipeline.py Outdated
Comment on lines +239 to +241
import datetime
current_hour = datetime.datetime.now().hour
is_after_hours = current_hour >= 20 or current_hour < 6

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

After-hours currently depends on server clock, not event time.

Line 239–241 makes severity depend on when the pipeline runs, so replaying the same track at a different hour can produce a different score. Compute after-hours from event timestamp + configured timezone/window instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/reasoning/pipeline.py` around lines 239 - 241, The is_after_hours
calculation at line 239-241 uses datetime.datetime.now().hour which depends on
the server's current time, causing the same event to be scored differently based
on when the pipeline runs. Instead of using datetime.datetime.now().hour,
extract the event timestamp, apply the configured timezone to it, and then check
if the event's hour falls within the configured after-hours window (currently
hardcoded as 20 or less than 6). Replace the server time dependency with
event-based time logic that uses both the event timestamp and timezone
configuration.

from pathlib import Path
from typing import Any, TypedDict

import yaml

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Candidate dependency manifests =="
fd -HI 'pyproject.toml|requirements.*|Pipfile|setup.py|poetry.lock'

echo
echo "== Entries mentioning yaml/pyyaml =="
rg -n --hidden \
  --iglob 'pyproject.toml' \
  --iglob 'requirements*.txt' \
  --iglob 'Pipfile' \
  --iglob 'setup.py' \
  --iglob 'poetry.lock' \
  '(?i)\bpyyaml\b|\byaml\b'

Repository: Devnil434/Eagle

Length of output: 14533


🏁 Script executed:

cat services/reasoning/requirements.txt

Repository: Devnil434/Eagle

Length of output: 148


Add PyYAML to services/reasoning/requirements.txt.

The module imports yaml but PyYAML is not listed in the service's dependency manifest. If services/reasoning/ is deployed independently, startup will fail with ModuleNotFoundError. Add pyyaml>=6.0 to services/reasoning/requirements.txt.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/reasoning/risk_scoring.py` at line 29, The risk_scoring.py module
imports the yaml module, but PyYAML is not declared as a dependency in
services/reasoning/requirements.txt. This will cause a ModuleNotFoundError when
the service is deployed independently. Add the line pyyaml>=6.0 to the
services/reasoning/requirements.txt file to ensure the required dependency is
installed.

Comment thread services/reasoning/risk_scoring.py Outdated
Comment thread services/reasoning/risk_scoring.py
Comment on lines +195 to +201
for section in ("risk_scoring", "risk_levels", "normalization"):
if section not in data:
raise ValueError(
f"Risk policy missing required section: '{section}'"
)

return data

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce full policy schema constraints at load-time.

_load_policy currently checks section presence only. Invalid weights/thresholds/normalization values pass startup and produce silently wrong risk scores.

Proposed hard validation in loader
         for section in ("risk_scoring", "risk_levels", "normalization"):
             if section not in data:
                 raise ValueError(
                     f"Risk policy missing required section: '{section}'"
                 )
+
+        weights = []
+        for signal, cfg in data["risk_scoring"].items():
+            weight = cfg.get("weight")
+            if not isinstance(weight, (int, float)) or weight < 0:
+                raise ValueError(f"Invalid weight for '{signal}': {weight}")
+            weights.append(float(weight))
+        if abs(sum(weights) - 1.0) > 1e-6:
+            raise ValueError("Risk policy weights must sum to 1.0")
+
+        low_max = data["risk_levels"].get("low_max")
+        medium_max = data["risk_levels"].get("medium_max")
+        if not (isinstance(low_max, (int, float)) and isinstance(medium_max, (int, float))):
+            raise ValueError("risk_levels.low_max and medium_max must be numeric")
+        if not (0 <= low_max <= medium_max <= 100):
+            raise ValueError("Risk thresholds must satisfy 0 <= low_max <= medium_max <= 100")
+
+        loitering_max = data["normalization"].get("loitering_max_seconds")
+        approach_max = data["normalization"].get("repeated_approach_max_count")
+        if not (isinstance(loitering_max, (int, float)) and loitering_max > 0):
+            raise ValueError("normalization.loitering_max_seconds must be > 0")
+        if not (isinstance(approach_max, (int, float)) and approach_max > 0):
+            raise ValueError("normalization.repeated_approach_max_count must be > 0")
 
         return data
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/reasoning/risk_scoring.py` around lines 195 - 201, The _load_policy
method currently only validates that required sections exist in the policy data
but does not enforce constraints on the actual values within those sections
(weights, thresholds, normalization values). Add comprehensive schema validation
after the section presence checks to verify that weights and thresholds are
valid numeric values within acceptable ranges, and that normalization parameters
meet expected constraints. This ensures invalid configurations are caught at
load-time rather than causing silent failures during risk score calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Context-Aware Adaptive Risk Scoring with Configurable YAML Policies

1 participant