feat: implement caching for knowledgebase and task results to prevent duplicate calls #701#938
feat: implement caching for knowledgebase and task results to prevent duplicate calls #701#938dinesh9997 wants to merge 5 commits into
Conversation
utksh1
left a comment
There was a problem hiding this comment.
Thanks for the caching work. This cannot merge while required CI is failing.
The frontend-checks job is failing on the current head. Because this touches knowledgebase/task-result caching, please get CI green and keep the patch focused on cache behavior with direct invalidation/freshness tests.
utksh1
left a comment
There was a problem hiding this comment.
Thanks for rebasing this. I still cannot merge the cache change as-is: the knowledgebase cache is process-global but only keyed by newest mtime, not by data_dir, so two KnowledgeBase instances pointing at different directories with the same newest mtime can return the wrong dataset. The task-result cache also needs clearer invalidation coverage for result/status changes outside invalidate_view_cache. Please tighten the cache keys/invalidation and keep the tests focused on those guarantees.
utksh1
left a comment
There was a problem hiding this comment.
Thanks for the update. This still is not ready: the knowledgebase cache remains process-global and keyed only by newest mtime, so different data_dir values with the same newest mtime can return the wrong dataset. The task-result cache also still needs tighter invalidation guarantees for status/result changes outside the view-cache invalidation path. Please fix those cache-key/invalidation issues and keep unrelated audit-policy changes out of this PR.
Description
Resolves #701.
This PR implements caching layers in two key backend areas to prevent redundant operations and resource waste from duplicate scan queries and API calls:
Vulnerability Knowledge-Base Caching (
knowledgebase.py):KnowledgeBase._load_entriespreviously read and deserialized all JSON feed files from disk on every vulnerability lookup (once per discovered service/port during a network scan)._cached_entriesand_cached_mtime). It computes the maximum modification time (st_mtime) of the JSON feeds on disk. If the files have not changed, it returns the cached entries, eliminating repeated disk I/O and CPU parsing overhead. If a file is added/modified, it automatically invalidates and refreshes.Task Result Endpoint Caching (
routes.py):/api/v1/task/{task_id}/resultwere rebuilding findings, asset summaries, and scan diff structures from scratch on every call.completed,failed, orcancelled) under the key prefixtasks:result:{task_id}:{owner}. In-progress tasks bypass the cache so they remain live. The cache is automatically cleared when new tasks start or existing tasks are deleted via the existinginvalidate_view_cache()hook.Verification & Testing
Unit and integration tests have been updated and verified:
testing/backend/unit/test_knowledgebase.py: Added a test verifying that_load_entriesserves cached content and invalidates correctly when feeds on disk are updated.testing/backend/integration/test_task_result_cache.py: Added integration tests confirming successful cache hits for finished task results, cache bypassing for running tasks, and proper cache invalidation on view cache clear.Test Execution Results: