Skip to content

feat: implement caching for knowledgebase and task results to prevent duplicate calls #701#938

Open
dinesh9997 wants to merge 5 commits into
utksh1:mainfrom
dinesh9997:bug/issue-701-scan-cache
Open

feat: implement caching for knowledgebase and task results to prevent duplicate calls #701#938
dinesh9997 wants to merge 5 commits into
utksh1:mainfrom
dinesh9997:bug/issue-701-scan-cache

Conversation

@dinesh9997

Copy link
Copy Markdown

Description

Resolves #701.

This PR implements caching layers in two key backend areas to prevent redundant operations and resource waste from duplicate scan queries and API calls:

  1. Vulnerability Knowledge-Base Caching (knowledgebase.py):

    • Problem: KnowledgeBase._load_entries previously read and deserialized all JSON feed files from disk on every vulnerability lookup (once per discovered service/port during a network scan).
    • Solution: Added a module-level in-memory cache (_cached_entries and _cached_mtime). It computes the maximum modification time (st_mtime) of the JSON feeds on disk. If the files have not changed, it returns the cached entries, eliminating repeated disk I/O and CPU parsing overhead. If a file is added/modified, it automatically invalidates and refreshes.
  2. Task Result Endpoint Caching (routes.py):

    • Problem: Requests to /api/v1/task/{task_id}/result were rebuilding findings, asset summaries, and scan diff structures from scratch on every call.
    • Solution: Added caching for tasks in a final state (completed, failed, or cancelled) under the key prefix tasks:result:{task_id}:{owner}. In-progress tasks bypass the cache so they remain live. The cache is automatically cleared when new tasks start or existing tasks are deleted via the existing invalidate_view_cache() hook.

Verification & Testing

Unit and integration tests have been updated and verified:

  • testing/backend/unit/test_knowledgebase.py: Added a test verifying that _load_entries serves cached content and invalidates correctly when feeds on disk are updated.
  • testing/backend/integration/test_task_result_cache.py: Added integration tests confirming successful cache hits for finished task results, cache bypassing for running tasks, and proper cache invalidation on view cache clear.

Test Execution Results:

testing\backend\unit\test_knowledgebase.py ...                           [ 13%]
testing\backend\unit\test_cache_helpers.py .............                 [ 69%]
testing\backend\integration\test_dashboard_cache.py ...                  [ 82%]
testing\backend\integration\test_task_result_cache.py ..                 [ 91%]
testing\backend\test_cache_invalidation.py ..                            [100%]

======================= 23 passed, 2 warnings in 8.54s ========================

@utksh1 utksh1 added level:advanced 55 pts difficulty label for advanced contributor PRs type:feature Feature work category bonus label area:backend Backend API, database, or service work labels Jun 15, 2026

@utksh1 utksh1 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the caching work. This cannot merge while required CI is failing.

The frontend-checks job is failing on the current head. Because this touches knowledgebase/task-result caching, please get CI green and keep the patch focused on cache behavior with direct invalidation/freshness tests.

@utksh1 utksh1 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for rebasing this. I still cannot merge the cache change as-is: the knowledgebase cache is process-global but only keyed by newest mtime, not by data_dir, so two KnowledgeBase instances pointing at different directories with the same newest mtime can return the wrong dataset. The task-result cache also needs clearer invalidation coverage for result/status changes outside invalidate_view_cache. Please tighten the cache keys/invalidation and keep the tests focused on those guarantees.

@utksh1 utksh1 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. This still is not ready: the knowledgebase cache remains process-global and keyed only by newest mtime, so different data_dir values with the same newest mtime can return the wrong dataset. The task-result cache also still needs tighter invalidation guarantees for status/result changes outside the view-cache invalidation path. Please fix those cache-key/invalidation issues and keep unrelated audit-policy changes out of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:backend Backend API, database, or service work level:advanced 55 pts difficulty label for advanced contributor PRs type:feature Feature work category bonus label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: Vulnerability scan results not cached, duplicate API calls waste resources

2 participants