Skip to content

Strengthening the Engineering #4222

Open
MuhammadKhalilzadeh wants to merge 14 commits into
developfrom
mo-368-jul-2-quality-enhancement
Open

Strengthening the Engineering #4222
MuhammadKhalilzadeh wants to merge 14 commits into
developfrom
mo-368-jul-2-quality-enhancement

Conversation

@MuhammadKhalilzadeh

Copy link
Copy Markdown
Collaborator

Strengthening the Engineering

Please ensure all items are checked off before requesting a review:

  • I deployed the code locally.
  • I have performed a self-review of my code.
  • I have included the issue # in the PR.
  • I have labelled the PR correctly.
  • The issue I am working on is assigned to me.
  • I have avoided using hardcoded values to ensure scalability and maintain consistency across the application.
  • I have ensured that font sizes, color choices, and other UI elements are referenced from the theme.
  • My pull request is focused and addresses a single, specific feature.
  • If there are UI changes, I have attached a screenshot or video to this PR.
  • If I added or modified an API endpoint, the change is reflected in the generated OpenAPI spec (npm run generate:swagger).
  • If the endpoint requires authentication, it uses authenticateJWT and the generated spec declares bearerAuth security.
  • I ran npm run check:api-drift and committed the regenerated swagger.yaml and endpoints.ts.
  • If this PR adds or modifies an organization-scoped table, the tenant isolation registry and test matrix are updated. See the tenant isolation runbook for details.

1. What Was Implemented

This report covers only the P0 code changes that were committed and pushed. The full audit findings (68 issues) are documented separately in C:\Workspace\verifywise\agent.md.

1.1 Frontend (Clients)

1.1.1 Raised TanStack Query staleTime

File: Clients/src/application/config/queryClient.ts

// Before
staleTime: 2 * 1000

// After
staleTime: 5 * 60 * 1000

Why: The 2-second default treated cached data as stale almost immediately, causing refetches on every component mount. Five minutes is a safe default for read-heavy governance data; volatile endpoints can still override locally.


1.1.2 Removed Full Translation Dictionary from customAxios.ts

File: Clients/src/infrastructure/api/customAxios.ts

What changed:

  • Removed import { translations, type Lang } from "../../i18n/translations"
  • Removed import { getLanguage } from "../../i18n/domTranslator"
  • Added a local getLanguage() helper
  • Added a minimal ERROR_TRANSLATIONS map containing only the two alert strings used by Axios:
    • "Error"
    • "An error occurred. Please try again later"

Why: The full 1.9 MB translation dictionary was imported into infrastructure code and pulled into the customAxios chunk on the critical path. Replacing it with a tiny map eliminates the bundle bloat while keeping non-English error-toast support for those specific messages.

Measured impact:

Metric Before After
customAxios chunk 1,748 kB 7.38 kB
Gzipped ~556 kB 2.59 kB

1.1.3 Removed Duplicate Redux Providers

File: Clients/src/App.tsx

What changed: Removed the inner <Provider store={store}> and <PersistGate persistor={persistor}> wrappers. The providers in main.tsx now handle store provision and rehydration exactly once.

Why: Both main.tsx and App.tsx wrapped the app in Redux providers. This caused PersistGate rehydration to run twice per boot and every Redux dispatch to propagate through two nested provider subtrees — wasted work and a source of subtle state-reconciliation bugs.


1.2 Backend (Servers)

1.2.1 Tuned Sequelize Connection Pool

File: Servers/database/db.ts

What changed: Added explicit pool and timeout configuration with environment-variable overrides:

pool: {
  min: Number(process.env.DB_POOL_MIN ?? 2),
  max: Number(process.env.DB_POOL_MAX ?? 20),
  acquire: Number(process.env.DB_POOL_ACQUIRE_MS ?? 30000),
  idle: Number(process.env.DB_POOL_IDLE_MS ?? 10000),
  evict: Number(process.env.DB_POOL_EVICT_MS ?? 10000),
},
dialectOptions: {
  connectTimeout: Number(process.env.DB_CONNECT_TIMEOUT_MS ?? 10000),
  statement_timeout: Number(process.env.DB_STATEMENT_TIMEOUT_MS ?? 30000),
}

Why: Sequelize defaults (max 5 connections, no statement timeout) are unsafe under production load. Explicit pool sizing prevents pool saturation, and timeouts prevent runaway queries from hanging requests indefinitely.


1.2.2 Added trust proxy and Global API Rate Limiter

File: Servers/app.ts

What changed:

  • Added app.set("trust proxy", Number(process.env.TRUST_PROXY_HOPS ?? 1));
  • Imported generalApiLimiter
  • Mounted it before route handlers: app.use("/api", generalApiLimiter);

Why:

  • Without trust proxy, Express ignores X-Forwarded-For, so req.ip becomes the load balancer IP. Rate limiting then treats all traffic as one client and can globally block the app once any limit is hit.
  • Most endpoints had no rate limiting at all. The global limiter provides a lenient baseline; stricter route-specific limiters continue to apply on top.

1.2.3 Backed Rate Limiters with Redis

File: Servers/middleware/rateLimit.middleware.ts

What changed:

  • Added a RedisRateLimitStore class implementing the express-rate-limit Store interface.
  • The store uses the existing IORedis client.
  • Each exported limiter received a distinct Redis key prefix:
    • rl:fileops
    • rl:api
    • rl:auth
    • rl:refresh
    • rl:aidet
  • The store fails open (totalHits: 0) if Redis is unreachable, so a Redis outage does not hard-block the API.

Why: The default in-memory MemoryStore does not share counters across server instances. In a horizontally scaled deployment, a client could exceed limits by rotating across pods. Redis-backed counters fix this without adding a new dependency.


1.2.4 Added Performance Indexes Migration

File: Servers/database/migrations/20260702143000-add-performance-indexes.js

What changed: Created a reversible migration adding six indexes:

CREATE INDEX IF NOT EXISTS idx_files_org_file_group
  ON verifywise.files (organization_id, file_group_id);

CREATE INDEX IF NOT EXISTS idx_files_content_search
  ON verifywise.files USING GIN (content_search);

CREATE INDEX IF NOT EXISTS idx_policy_manager_org
  ON verifywise.policy_manager (organization_id);

CREATE INDEX IF NOT EXISTS idx_policy_manager_org_next_review
  ON verifywise.policy_manager (organization_id, next_review_date)
  WHERE next_review_date IS NOT NULL;

CREATE INDEX IF NOT EXISTS idx_ai_detection_findings_scan
  ON verifywise.ai_detection_findings (organization_id, scan_id);

CREATE INDEX IF NOT EXISTS idx_audit_ledger_org_entity_entry
  ON verifywise.audit_ledger (organization_id, entity_type, entry_type, id DESC);

Why: These indexes target the highest-cardinality query paths found in the audit: file version history, full-text search, policy due-soon lists, AI detection findings by scan, and filtered audit-ledger queries.


1.2.5 Removed Queue obliterate and Added Stable Job IDs

Files:

  • Servers/services/slack/slackProducer.ts
  • Servers/services/automations/automationProducer.ts
  • Servers/jobs/producer.ts

What changed:

  • Removed await automationQueue.obliterate({ force: true }) from vendor-review and report schedulers.
  • Removed await notificationQueue.obliterate({ force: true }) from the Slack scheduler.
  • Added stable, unique jobId values to every repeatable queue.add() call.
  • Removed the obsolete "MUST be last — earlier schedulers obliterate the queue" comment.

Why: obliterate wiped the entire queue on every server start. Multiple schedulers calling it caused later schedulers to delete jobs added by earlier ones, making scheduled-job state non-deterministic and causing jobs to disappear after restarts. Stable jobIds make queue.add() idempotent — BullMQ updates the existing repeatable job instead of creating duplicates.


2. Validation

Check Command Result
Clients type check npx tsc --noEmit -p tsconfig.app.json ✅ Passed
Servers type check npx tsc --noEmit ✅ Passed
Clients production build npm run build ✅ Passed

2.1 Bundle Impact

The most significant measurable improvement:

Chunk Before After
customAxios.js 1,748 kB 7.38 kB
customAxios.js gzipped ~556 kB 2.59 kB

3. Commit Log

All commits pushed to origin/mo-368-jul-2-quality-enhancement.

# Hash Message File
1 51c39d228 perf(clients): raise TanStack Query staleTime from 2s to 5min Clients/src/application/config/queryClient.ts
2 36813665c perf(clients): remove full translation dictionary import from customAxios Clients/src/infrastructure/api/customAxios.ts
3 74a069b66 perf(clients): remove duplicate Redux Provider and PersistGate from App Clients/src/App.tsx
4 cab7f4ff8 perf(servers): add explicit Sequelize connection pool and timeout config Servers/database/db.ts
5 496b9d49c perf(servers): add trust proxy and global API rate limiter Servers/app.ts
6 268d084d0 perf(servers): back rate limiters with Redis for horizontal scaling Servers/middleware/rateLimit.middleware.ts
7 49cfe649a perf(servers): add performance indexes for files, policies, AI findings, audit ledger Servers/database/migrations/20260702143000-add-performance-indexes.js
8 e4593759b perf(servers): remove queue obliterate and add stable jobId to slack scheduler Servers/services/slack/slackProducer.ts
9 ca8fb6c9e perf(servers): remove queue obliterate and add stable jobIds to automation scheduler Servers/services/automations/automationProducer.ts
10 30af852ae docs(servers): remove obsolete 'MUST be last' queue obliterate comment Servers/jobs/producer.ts
11 0d50dc534 fix(servers): satisfy Store interface in Redis rate limit store Servers/middleware/rateLimit.middleware.ts

4. What Was Not Implemented (and Why)

Item Reason
Move files from PostgreSQL BYTEA to object storage Large migration requiring storage abstraction, data migration, and rollback planning. Next candidate after quick wins.
Split translations.ts by language Touches domTranslator, useTranslation, and all consumers. CustomAxios fix already removed the critical-path bloat.
Split VerifyWiseContext Structural refactor affecting ~30 consumers. Higher risk than quick wins.
Rewrite N+1 queries Requires careful SQL verification against schema and tests.

@MuhammadKhalilzadeh MuhammadKhalilzadeh self-assigned this Jul 2, 2026
@MuhammadKhalilzadeh MuhammadKhalilzadeh added enhancement New feature or request frontend Frontend related tasks/issues backend Backend related tasks/issues labels Jul 2, 2026
@MuhammadKhalilzadeh MuhammadKhalilzadeh added this to the 2.5 milestone Jul 2, 2026
- Update queryClient test to expect 5-minute staleTime.
- Fix customAxios i18n test to use localStorage language override.
- Remove obliterate assertions from automation/slack producer tests
  and assert stable jobId values instead.
Replace import * as LucideIcons from lucide-react with named imports
in the four remaining source files. Add a central iconMap utility that
only imports the ~58 icons actually referenced by TabBar/DashboardTabs,
so the 615 kB lucide namespace chunk is eliminated. The StyleGuide icon
catalog now also imports its common icons explicitly.

- src/presentation/utils/iconMap.ts: new tree-shakeable icon registry
- TabBar/DashboardTabs/NewControlPane: resolve icons via iconMap
- IconsSection: explicit named imports for COMMON_ICONS
@MuhammadKhalilzadeh

Copy link
Copy Markdown
Collaborator Author

1. Test Fixes for P0 Changes

After the P0 performance commits, the following tests needed to be updated to match the new behavior. All were fixed and committed.

1.1 queryClient.test.ts

File: Clients/src/application/config/__tests__/queryClient.test.ts

  • Updated expected staleTime from 2000 to 300000 (5 minutes).

1.2 customAxios.test.ts

File: Clients/src/infrastructure/api/__tests__/customAxios.test.ts

  • Removed stale mocks for domTranslator and translations.
  • Switched the non-English translation test to set localStorage.setItem("vw_lang_prototype", "de") before the assertion, because customAxios now reads the active language directly from local storage instead of importing the full i18n translator.

1.3 automationProducer.spec.ts & slackProducer.spec.ts

Files:

  • Servers/services/automations/tests/automationProducer.spec.ts

  • Servers/services/slack/tests/slackProducer.spec.ts

  • Removed obliterate assertions (the implementation no longer wipes queues on startup).

  • Added assertions for stable jobId values on repeatable jobs.


2. P1: Tree-Shook lucide-react

2.1 What Changed

Replaced all remaining import * as LucideIcons from "lucide-react" namespace imports with named imports.

New file:

  • Clients/src/presentation/utils/iconMap.ts — a registry that imports only the ~58 icons actually referenced by TabBar and DashboardTabs.

Updated files:

  • Clients/src/presentation/components/TabBar/index.tsx
  • Clients/src/presentation/components/DashboardTabs/index.tsx
  • Clients/src/presentation/components/Modals/Controlpane/NewControlPane.tsx
  • Clients/src/presentation/pages/StyleGuide/sections/IconsSection.tsx

2.2 Why

The namespace import forced the bundler to include every lucide-react export, producing a ~615 kB monolithic icon chunk that could not be tree-shaken. Named imports and a bounded registry let Rollup drop every unused icon. The StyleGuide icon catalog now also imports its common icons explicitly instead of pulling in the whole library.

2.3 Measured Impact

Asset Before After
lucide-react namespace chunk ~615 kB eliminated
Individual icon chunks tiny per-icon chunks (< 1 kB each) or inlined
StyleGuide chunk larger only explicit catalog icons

3. Validation

Check Command Result
Clients type check npx tsc --noEmit ✅ Passed
Servers type check npx tsc --noEmit ✅ Passed
Clients production build npm run build-dev ✅ Passed
Client unit tests npx vitest run ✅ 5,126 passed / 5,126
Server unit tests npm run test:unit ✅ 3,331 passed / 3,331

4. Known Lint Gap

npm run lint in Clients currently fails because the eslint package itself is not installed — only @eslint/js, eslint-plugin-react-*, and typescript-eslint are present. I did not add or remove dependencies without explicit approval. Fixing this is a one-line dependency change if desired.


5. Commit Log

All commits pushed to origin/mo-368-jul-2-quality-enhancement.

# Hash Message Files
1 0a0949a38 test: update tests for P0 performance changes queryClient.test.ts, customAxios.test.ts, automationProducer.spec.ts, slackProducer.spec.ts
2 5dabcef11 perf(client): tree-shake lucide-react by replacing namespace imports iconMap.ts, TabBar/index.tsx, DashboardTabs/index.tsx, NewControlPane.tsx, IconsSection.tsx

6. Next Steps

  1. Fix client lint by installing the correct eslint version (requires dependency change approval).
  2. Refactor useDashboardMetrics to React Query / useQueries to eliminate the ~18 parallel-request dashboard storm.
  3. Split VerifyWiseContext into focused contexts.
  4. Lazy-load the dashboard body.
  5. Extend manualChunks for further vendor splitting.

@MuhammadKhalilzadeh

Copy link
Copy Markdown
Collaborator Author

1. What was completed

1.1 Refactored useDashboardMetrics to React Query

The hook was rewritten to use @tanstack/react-query's useQueries instead of manual useState + useCallback + Promise.allSettled orchestration.

Key changes:

  • 12 parallel queries now manage the dashboard's ~18 network requests:
    • risk, evidence, vendorRisk, vendor, policy, incident, modelRisk, training, model (evidenceHub + lifecycle), project (useCases + organizationalFrameworks), governanceScore, task.
  • Pure query functions handle API calls and data transformation; they no longer touch React state or localStorage.
  • localStorage cache is preserved as initialData + initialDataUpdatedAt for each query, so fresh cached metrics still skip the network round-trip.
  • Loading/revalidating/progress states are derived from query states, preserving the existing 5-stage progress dialog semantics.
  • Public API is unchanged: all metric objects, loading, isRevalidating, error, progressStep, progressSteps, and every fetchXxx/fetchAllMetrics function are still returned.
  • Imperative refetch functions now call queryClient.refetchQueries against the dashboard query-key namespace.
  • Successful query data is written back to localStorage so hasDashboardCache() and other legacy consumers continue to work.

1.2 Updated tests

  • src/application/hooks/__tests__/useDashboardMetrics.test.ts was already wrapping hooks in QueryClientProvider; it was updated to:
    • Seed all dashboard cache keys in the "fresh cache" test (React Query now skips only the queries that are actually fresh).
    • Use waitFor after calling an individual refetch function to account for React Query's async state update.

1.3 Removed dead code

  • Deleted the manual cacheBuffer / beginCacheBuffering / flushCacheBuffer machinery; React Query now handles deduplication and cache lifecycle.
  • Removed the now-unused useState import.

2. Files changed

  • Clients/src/application/hooks/useDashboardMetrics.ts – full React Query refactor
  • Clients/src/application/hooks/__tests__/useDashboardMetrics.test.ts – test updates for fresh-cache and individual-refetch behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Backend related tasks/issues enhancement New feature or request frontend Frontend related tasks/issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants