Add comprehensive test suite by Quiterion · Pull Request #69 · anima-research/animachat

Quiterion · 2026-02-12T10:51:49Z

Summary

Adds a comprehensive automated test suite across the entire backend, achieving 87.66% statement and 78.92% branch coverage. This provides a regression safety net for future future changes.

2,469 tests across 56 test files, all passing
No source code modifications — all tests are characterization tests capturing existing behavior
CI workflow added to run tests with coverage reporting on every PR

What's covered

Area	Stmts	Branches	Highlights
Config	98.6%	93.8%	loader, model-loader, site-config-loader
Database	89.3%	75.5%	index.ts (5400 lines), all sub-stores, compaction, migration
Middleware	100%	100%	auth
Routes	82.4%	76.1%	All 18 route files tested
Services	93.1%	85.1%	All 5 AI providers (93-99%), inference, enhanced-inference, context manager
WebSocket	94.5%	80.0%	handler, room-manager

Test plan

All 2,469 backend tests pass
Frontend store tests pass (129 tests)
Coverage targets met (75%+ stmts and branches)

Tests for generateToken, verifyToken, and authenticateToken middleware. Coverage: 100% statements, 100% branches, 100% functions, 100% lines. Tests cover: - generateToken: valid JWT creation, 7-day expiry, unique per userId - verifyToken: valid tokens, wrong secret, expired, malformed, tampered payload - authenticateToken: valid Bearer token, missing header, empty token, invalid token, expired token, wrong secret, complex user IDs

…erage)

…0% coverage)

- encryption.test.ts: 34 tests covering roundtrip encrypt/decrypt for all data types, tampered ciphertext/IV/auth tag detection, wrong key rejection, malformed input handling, cross-instance key derivation, env var fallback - error-messages.test.ts: 46 tests verifying all message constants contain expected key phrases, template functions interpolate correctly, and all categories have appropriate structure Coverage: encryption.ts 91% stmt/100% branch, error-messages.ts 100%/100%

- pricing-cache.test.ts: 20 tests covering cache update/lookup, pricing computation (string/numeric/NaN/null/zero), staleness detection, cache replacement, refresh callback invocation and error handling - cache-strategies.test.ts: 37 tests covering DefaultCacheStrategy (Opus refresh vs rebuild, first-request token threshold at 500, context rotation, performance analysis with hit/expire/saved metrics), AggressiveCacheStrategy (always-refresh behavior), CostOptimizedCacheStrategy (Opus 2000-token vs non-Opus 5000-token thresholds) Coverage: pricing-cache.ts 97%/96%, cache-strategies.ts 100%/97%

41 tests covering: - User connection registration/unregistration (including multi-tab) - Room join/leave lifecycle and automatic cleanup of empty rooms - Multi-user room membership and user deduplication - Broadcast messaging with sender exclusion and closed connection skipping - AI request tracking: start, conflict detection, end, state queries - Heartbeat: pinging alive connections, terminating unresponsive ones - Stats reporting with room counts and AI request state - Edge cases: no userId, non-existent rooms, send errors, ping errors Documents hasActiveAiRequest quirk: returns true for non-existent rooms (undefined !== null) — getActiveAiRequest is the reliable alternative. Coverage: room-manager.ts 99% stmt, 93% branch

94 tests covering all parser functions: parseBasicJson, parseAnthropic, parseChromeExtension, parseArcChat, parseOpenAI, parseCursor, parseCursorJson, parseColonFormat. Tests include format detection, participant dedup, title extraction, edge cases (empty, invalid, branching), and MIME type guessing. Coverage: 98.2% stmt, 85.3% branch, 100% functions.

67 tests covering estimateTokens, getMessageTokens, and all 5 strategy implementations: AppendContextStrategy, RollingContextStrategy, LegacyRollingContextStrategy, StaticContextStrategy, AdaptiveContextStrategy. Tests include token estimation for text/images/thinking blocks, cache marker placement with arithmetic positioning, rolling window rotation, grace period behavior, branch change detection, and edge cases. Coverage: 96.3% stmt, 87.6% branch, 100% functions.

ConfigLoader (21 tests): - Load/cache/reload config from CONFIG_PATH env var - Default config fallback when file missing or invalid JSON - getBestProfile filtering: allowedModels, allowedUserGroups, modelCosts - Load balancing strategies: first, round-robin, least-used, random - getDefaultModel with/without config, getProviderProfiles, singleton ModelLoader (19 tests): - Load/cache/reload models from MODELS_CONFIG_PATH - getAllModels: system-only vs merged with user-defined models - User model settings conversion (topP/topK optional handling) - getModelsByProvider filtering, getModelById with user lookup - getModelProvider, missing file/invalid JSON fallbacks, singleton Coverage: loader.ts 98.79% stmt / 92.59% branch model-loader.ts 100% stmt / 85.71% branch

authenticity.ts (43 tests): - computeAuthenticity: empty/null input, single unaltered message, legacy messages, human-written AI, user messages, edit/split/posthoc propagation, name collisions (case-insensitive), multi-message conversations - getAuthenticityLevel: all 8 levels with priority ordering - getAuthenticityColor: all 8 level-to-color mappings verified - getAuthenticityTooltip: content validation for all levels modelColors.ts (46 tests): - Direct model ID matches for major model families - Pattern matching for Opus, Sonnet, Haiku, GPT, Llama, Gemini, Mistral, Command, DeepSeek, O1 variants with provider prefixes - Default fallback for unknown models, undefined/empty input - getLighterColor hex-to-rgba conversion with various opacities latex.ts (22 tests): - Display math ($$...$$, \[...\]), inline math ($...$, $...$) - Skip optimization (no delimiters = no processing) - Error recovery: all 4 catch blocks tested via katex mock throw - Mixed content, delimiter edge cases avatars.ts (35 tests): - loadAvatarPacks: API loading, caching, error handling - getAvatarUrl/getAvatarColor: pack lookup, null/missing cases - getModelAvatarUrl: canonicalId direct + derived - getParticipantAvatarUrl: override priority chain (participant > persona > model) - getParticipantColor: override priority chain, user-type returns null Coverage: authenticity 100%/98.61%, avatars 96.2%/92.98%, latex 100%/100%, modelColors 96.2%/97.54%

…h testing docs

- Add vitest.config.ts with projects config so `npx vitest run` works from monorepo root - Change avatars.test.ts to use relative imports instead of @/ alias (works in all contexts)

Anthropic (60 tests): - formatMessagesForAnthropic: user/assistant/system messages, multi-turn ordering, active branch selection, image/PDF/text attachments, mixed attachments - Cache control: simple messages, attachment messages, cache breakpoints - Thinking blocks: signed (structured), unsigned (XML text), redacted, mixed - Prefill-format thinking: thinking/redacted_thinking tag prepending - Image resize: under limit, over limit, no dimensions, sharp error - splitAtCacheBreakpoints: multi-section, empty sections, no breakpoints - calculateCacheSavings: known models, unknown models, zero tokens - parseThinkingTags: single/multiple blocks, no tags, empty tags - Helper methods: isImage, isPdf, isAudio, isVideo, getMediaType, getImageMediaType Bedrock (49 tests): - formatMessagesForClaude: user/assistant/system, multi-turn, active branch - Attachments: image, PDF, text inline, mixed, resize edge cases - buildRequestBody: Claude 3 Messages API vs Claude 2 legacy prompt format, system prompt, stop_sequences, temperature vs top_p/top_k exclusion, content block text extraction for Claude 2 - extractContentFromChunk: Claude 3 deltas, Claude 2 completions - isStreamComplete: Claude 3 message_stop, Claude 2 stop_reason - Helper methods: isImage, isPdf, getMediaType, validateApiKey Mutation tests passed (7 mutations, all caught): - Anthropic: formatMessagesForAnthropic system filter, splitAtCacheBreakpoints cache_control, calculateCacheSavings multiplier, parseThinkingTags regex - Bedrock: buildRequestBody Claude 3 detection, extractContentFromChunk field, isStreamComplete event type

88 tests across 3 provider service files: - openrouter.test.ts (41): formatMessagesForOpenRouter, detectProviderFromModelId, calculateCacheSavings, getMediaType, attachments, cache_control, thinking blocks - gemini.test.ts (26): formatMessagesForGemini, getMimeType, isSupportedMediaType, role mapping, thought_signature, blob store, attachments - openai-compatible.test.ts (21): formatMessagesForOpenAI, parseThinkingTags, think tags, redacted_thinking, attachments 9 mutations tested and caught (3 per file).

51 tests covering: - checkContentSync regex fallback patterns - checkContent tiered moderation (always-blocked, age-restricted, researcher-exempt) - Admin bypass - Threshold boundary precision (critical=0.5, blocking=0.7) - Tier priority ordering (tier 1 > tier 2 > tier 3) - API error handling (fail open on 5xx, network error, empty results) - checkMessages combining and filtering logic - No API key scenario Coverage: 100% stmts, 100% branch, 100% funcs, 100% lines Mutation tested: checkContentSync (blocked→false), CRITICAL_THRESHOLD (0.5→0.9), checkMessages filter removal, admin bypass removal — all caught

31 tests covering: - Priority ordering: user key > config profile > env var fallback - User API key behavior (allowed/disallowed, provider matching, DB errors) - Environment variable fallback for all providers (Anthropic, Bedrock, OpenRouter, OpenAI-compatible) - Bedrock default region, missing secret key handling - Config profile lookup parameter passing - Rate limit checks (disabled, no limits, no features) - Usage tracking with provider/billed cost calculations and margin - getCostForModel matching and edge cases Coverage: 98.63% stmts, 92.15% branch, 100% funcs, 98.57% lines Mutation tested: getEnvApiKey (Anthropic→null), source user→config, getCostForModel (find→first) — all caught

Email (16 tests): - sendVerificationEmail: subject, URL, 24h expiry, HTML template structure - sendPasswordResetEmail: subject, URL, 1h expiry, HTML/plaintext versions - No API key: verification returns true (dev mode), reset returns false - Error handling: API errors and exceptions return false - Template: DOCTYPE, button links, fallback text Coverage: 100% stmts, 93.75% branch Persona context builder (19 tests): - buildPersonaContextById: persona not found, delegation to buildPersonaContext - History assembly: combine historical + backscroll, skip current conversation - Participation ordering: chronological by logicalStart, filter incomplete - Context strategies: rolling (most recent), anchored (prefix + suffix), unknown - Branch inheritance: recursive parent collection - Token estimation: 1 token per 4 chars, active branch content - Pre-computed canonicalHistory path with missing message handling - Error: throws on missing conversation Coverage: 99.08% stmts, 90% branch Mutation tested: - Email: verification no-key true→false, password reset subject swap — caught - Persona: leftAt filter removal, sort order reversal — caught

Add tests for persistence, blob-store, collaboration, and shares stores. persistence.ts (13 tests): JSON serialization roundtrip, JSONL line parsing, empty file handling, malformed line handling, large event append+load, close idempotency. 91% stmts / 90% branch. blob-store.ts (24 tests): Save/retrieve by hash, deduplication, metadata-only retrieval, sharded directory structure, deletion with dedup cleanup, MIME type extension mapping, JSON blob roundtrip, error-rethrow branches. 91% stmts / 80% branch. collaboration.ts (45 tests): Share CRUD, permission updates, revocation with index cleanup, invite creation with expiration/max-uses/labels, invite token lookup with expiration/max-uses enforcement, invite usage tracking, creator-only deletion, full event replay for all 6 event types. 98% stmts / 89% branch. shares.ts (24 tests): Share creation with settings, token lookup with view count increment, expiration enforcement, owner-only deletion, bulk conversation deletion, event replay for created/deleted/viewed events. 98% stmts / 93% branch. Mutation testing (3+ methods per file): - persistence: appendEvent timestamp serialization, loadEvents ENOENT return, init guard - blob-store: computeHash algorithm, dedup logic, default extension - collaboration: getUserPermission return, deleteInvite creator check, expiration check - shares: deleteShare owner check, viewCount increment, expiration check

persona.ts (75 tests): Persona CRUD, custom options, archiving blocks new participations, deletion cleans up shares. History branch creation, head switching, cross-persona branch rejection. Participation tracking with sequential logical times, interleaving constraint, canonical branch history, fork-point filtering in collectBranchParticipations. Share CRUD with duplicate prevention, permission updates, revocation. Event replay for all 13 event types. 93% stmts / 81% branch. conversation-ui-state.ts (27 tests): Shared state save/load with caching, active branch set/get, branch count increment/decrement with floor at zero. Per-user state save/load/update, speakingAs, selectedResponder, detached mode with branch clearing on re-attach. Read tracking with deduplication and lastReadAt timestamps. Cache management (clearCache, clearUserCache). deleteConversation removes files and clears caches. 89% stmts / 75% branch. Mutation testing (3+ methods per file): - persona: interleaving constraint, logicalEnd<=logicalStart, owner permission - conversation-ui-state: Math.max(0) floor, detached branch clearing, read dedup

33 tests covering event handling, message queuing, exponential backoff reconnection, intentional disconnect, room management, connection state, visibility handler, keep-alive/staleness detection, connection timeout, and message parsing. Coverage: 84%/81% (stmt/branch).

14 tests covering initial state, ensureLoaded caching, concurrent load deduplication, isLoading lifecycle, error fallback with/without message, reload after error, reloadConfig force-fetch, getConfig sync access, and convenience getters. Coverage: 100%/100% (stmt/branch).

Remove 13 tests that only verified constructors exist (no behavioral assertions) and 1 incomplete test in shares.test.ts with no assertions. Flagged by quality review as specification-gaming patterns.

40 tests covering the user-related public API of Database: - createUser: fields, flags (emailVerified, ageVerified, tosAccepted), duplicate rejection - getUserById / getUserByEmail: lookup, missing, case-sensitivity (exact match only) - validatePassword: correct/wrong/missing - Email verification: token create, verify, expired token, consumed token - Manual verification: verified/already-verified/nonexistent - Age verification: set/check/nonexistent - ToS acceptance: set/nonexistent - Password reset: full flow, expired/consumed tokens, getPasswordResetTokenData - getAllUsers: returns all users - Event replay: user survives, email verification survives, password reset does NOT survive (password_reset event doesn't log new hash), age/ToS do NOT survive (no replay handlers for user_age_verified/user_tos_accepted events) - Init auto-creates test users on fresh DB Characterization quirks captured: - getUserByEmail is case-SENSITIVE (no lowercasing) - Password reset lost on DB reload (event doesn't persist new hash) - Age verification and ToS acceptance lost on DB reload (no replay handlers) Mutation tested: createUser duplicate check, verifyEmail expiry, validatePassword hash comparison, resetPassword hash update — all caught.

32 tests covering the grant-related public API of Database: - recordGrantInfo: mint increases balance, burn decreases, send transfers, tally adds; multiple mints aggregate, different currencies tracked independently - Balance goes negative on excessive burn (no enforcement) - Zero-amount mint is a no-op - Currency migration: opus→claude3opus, sonnets→old_sonnets - Undefined currency defaults to 'credit' - Grant details normalized (string→number coercion) - recordGrantCapability: grant/revoke, latest-wins, expiry enforcement - userHasActiveGrantCapability: active/revoked/expired/no-expiry/nonexistent - getUserGrantSummary: returns totals + infos + capabilities; empty for fresh user - Invite system: create, validate, claim (mints credits), maxUses enforcement, expired rejection, duplicate code rejection - Event replay: minted grants, capabilities, and burn balance all survive reload Mutation tested: updateGrantTotals mint delta sign flip, migrateCurrencyName skip, capabilityIsActive always-true — all caught.

41 tests covering the message and branching public API (MOST CRITICAL): - createMessage: creates with correct fields, UUID, activeBranchId, parentBranchId - getConversationMessages: returns messages sorted by tree order - Linear conversation (A→B→C): correct ordering and parent chain - Single branch (A→B1, A→B2): two branches on same message, active branch defaults to newest, setActiveBranch switches between them - Nested branches: multi-level tree (A→B1→C1, A→B1→C2, A→B2), switching between branches at different levels - addMessageBranch: edit-creates-new-branch semantics, preserveActiveBranch flag - setActiveBranch: switches active, returns false for nonexistent branch/message - deleteMessage: removes from conversation, doesn't affect siblings - deleteMessageBranch: preserves sibling branches, deletes entire message if only branch, cascade-deletes descendants, switches active when deleting active - Post-hoc operations: hide and edit with operation metadata - getMessage / updateMessage: CRUD operations - hiddenFromAi flag stored correctly - Tree ordering: parents always before children - Event replay: messages, branches, and deletions all survive DB reload - Edge cases: nonexistent conversation throws, attachments, auto-parent linking, root parentBranchId for first message, creationSource stored on branches Mutation tested: addMessageBranch activeBranchId update, setActiveBranch nonexistent-branch return, createMessage auto-parent linking — all caught.

26 tests covering collaboration shares, permission levels (viewer/ collaborator/editor), revocation, public shares (SharesStore), and event replay persistence. Mutation tested canUserAccessConversation owner check, canUserChatInConversation permission bypass, and revokeCollaborationShare no-op — all caught.

…t/78% branch) Mock outermost layer (DB, providers, API key manager, model loader) and let real InferenceService logic run. Covers: - determineActualFormat: standard/prefill/messages/completion routing - modelSupportsPrefill / providerSupportsPrefill - applyPostHocOperations: hide, hide_before, edit, hide_attachment, unhide - formatMessagesForConversation: standard, prefill, messages modes - consolidateConsecutiveMessages: bedrock alternating turns - truncateMessagesToFit: head truncation, oversized messages, multimodal - createMessagesModeChunkHandler: name prefix stripping - parseThinkingTags: think block extraction - streamCompletion: provider routing, stop sequences, thinking mode, rate limits, API key management, custom endpoints, usage tracking - buildPrompt: full pipeline integration - Mutation tested: 4 mutations on 4 methods, all caught

Install supertest, create shared test helper (createTestApp with real Database in temp dir), write 28 auth tests (register, login, profile, api-keys, grants, user lookup, forgot/reset password) and 29 conversation tests (CRUD, archive, messages, metrics, export, duplicate, UI state, mark-read, permission checks).

Auth: 71% stmts / 65% branch (58 tests) - Add user-not-found profile test, mixed API key listing masking - Add grant send with default currency/reason, invite code claim path - Add password reset flow exercise Conversations: 75% stmts / 65% branch (112 tests) - Add successful post-hoc delete, hide_attachment operation type - Add fork truncated mode, delete post-hoc non-owner check - Add UI state clearing (empty/null values), branch privacy not-found - Add duplicate with options, create validation, post-hoc with reason

Add integration tests for participants, bookmarks, models, site-config, and system routes. All files exceed 70% statement / 65% branch coverage: - participants.ts: 76.54% stmts / 71.05% branch (20 tests) - bookmarks.ts: 78.26% stmts / 75% branch (13 tests) - models.ts: 79.68% stmts / 83.33% branch (12 tests) - site-config.ts: 78.94% stmts / 100% branch (6 tests) - system.ts: 80% stmts / 100% branch (3 tests) Key testing techniques: - ConfigLoader injection for admin provider detection branches - Pre-populated OpenRouter pricing cache for cache-hit path - Admin user (cassandra) grants minting for currency coverage - Custom middleware injection for site-config admin check branches - Demo user login for user-defined model by ID tests

…tmt coverage) Add comprehensive characterization tests for AnthropicService.streamCompletion covering: - Request parameter building (model, temperature, top_p/top_k exclusivity, stop sequences) - Thinking configuration and max_tokens adjustment for budget - System prompt caching when _cacheControl is present - Streaming event handling (text deltas, thinking blocks, redacted thinking, signatures) - Cache metrics extraction from message_start events - Error handling with failure metrics recording - Demo mode simulation - llmLogger integration (request/response/cache metrics logging) - Edge cases: error chunks, stop sequences, thinking-only responses

…tmt coverage) Add comprehensive characterization tests for BedrockService.streamCompletion covering: - Claude 3 Messages API streaming (content_block_delta events, message_stop) - Claude 2 legacy streaming (completion field, stop_reason) - Request body construction for both API formats - InvokeModelWithResponseStreamCommand parameter verification - Error handling (empty response body, API errors, non-Error throws) - llmLogger integration (request/response logging, error logging) - Demo mode simulation (word-by-word streaming, completion signaling) - rawRequest return value structure

…mt coverage) Add comprehensive characterization tests for GeminiService covering: - streamCompletion: SSE stream parsing, text/thinking/image content handling - Request building: generationConfig (temp, topP, topK, maxOutputTokens, stopSequences) - System instruction, thinking config, Google Search tool, response modalities - Image generation: inlineData handling, preview-to-final replacement, blob storage - Thought signature capture and propagation to content blocks - Error handling: HTTP errors, malformed JSON, no response body, failure metrics - generateContent (non-streaming): text, thinking, image generation, tool config - Usage metadata extraction with defensive defaults

… coverage) Coverage: 22.3% → 96.15% stmts, 90.47% branches. Tests cover streamCompletion (request building, SSE parsing, thinking blocks, thought signatures, image generation with blob storage, error handling, usage metrics) and generateContent (non-streaming path, thinking, images, system instructions, tool configs).

…s, 98% stmt coverage) Coverage: 32.1% → 98.21% stmts, 95.77% branches. Tests cover streamCompletion (request building, SSE parsing, thinking tag extraction, usage/token tracking, error handling, llmLogger integration), listModels (success, error, missing data), and validateApiKey (models fallback, auth status codes, network errors).

… stmt coverage) Add comprehensive characterization tests for OpenRouterService covering: - streamCompletion: request building, headers, Anthropic provider forcing, thinking/reasoning support (max_tokens adjustment), SSE streaming and content assembly, usage/token tracking with cache metrics, all 3 reasoning field formats (reasoning_content, reasoning, reasoning_details) with priority ordering, image generation (delta.images, message.images, inlineData), blob replacement with old blob cleanup, error handling (HTTP errors, null body, network failures, failure metrics estimation) - streamCompletionExactTest: non-streaming request, Anthropic provider config, content delivery, cache token calculation, error paths - listModels: API fetching, error handling, missing data field - validateApiKey: success/failure paths, key passthrough - constructor: API key fallback chain, missing key warning Coverage: 30.0% → 96.47% stmts, 87.8% branches

…mt/70% branch) Covers connection/auth, chat flows (standard + prefill), regenerate, edit, continue, delete, abort, room management, credit checks, content filtering, error handling, hiddenFromAi, and parallel sampling. Mutation tests validate userHasSufficientCredits, handleAbort, handleDelete, and filterHiddenFromAi.

…77% branch) Comprehensive characterization tests for the Vue reactive store covering: - Authentication (login, logout, register, loadUser) - Message visibility (getVisibleMessages with caching, branch following) - Branch switching (single, batch, cascade, detached mode) - WebSocket event handlers (message_created, stream, message_edited, message_deleted, message_restored, message_split, branch_visibility) - loadConversation (with detached mode, read state flush, retry) - loadMessages, sendMessage, continueGeneration deeper paths - Conversation CRUD (create, update, archive, duplicate, compact) - Model management (load, custom CRUD, OpenRouter) - Read tracking (mark as read, debounced persist, unread counts) - Mutation tests on 4 methods (getVisibleMessages, switchBranch, setDetachedMode snapshot/restore)

Update completed work section with Tiers 1-3+ results (~2300 tests). Add Tier 4: remaining 11 routes, DB utilities, context manager, site-config-loader (8 new tasks, 35-42).

Admin tests (58 tests, 80.6% stmt / 68% branch): - requireAdmin middleware (401/403 for unauth/non-admin) - GET /admin/users, GET /admin/users/:id - POST /admin/users/:id/capabilities (grant/revoke, validation) - POST /admin/users/:id/credits (amount/currency validation) - POST /admin/users/:id/reload - GET /admin/stats, usage endpoints (user/system/model) - Config management (GET/PATCH /admin/config, reload, models visibility) - Bulk admin ops (verify-legacy-users, set-all-age-verified, set-all-tos-accepted) - GET /admin/conversation-size/:id Personas tests (66 tests, 76.4% stmt / 75% branch): - CRUD (create, list, get, update, delete with permission checks) - Archive persona - History branches (list, fork, set head) - Join/leave conversation (with roomManager mock) - Participations listing with branchId filter - Canonical branch and logical time updates - Sharing (create, update permission, revoke, access verification) - Permission hierarchy (owner > editor > user > viewer)

…oute tests - collaboration.test.ts (42 tests): public invite lookup, share CRUD, permission checks, invite creation/claiming/deletion, shared-with-me, my-permission endpoint, access control for viewers vs editors - invites.test.ts (18 tests): create invite with/without mint capability, auto-generated and custom codes, duplicate rejection, expiration, max-uses enforcement, public code validation, claim flow - import.test.ts (35 tests): preview and execute for basic_json, anthropic, arc_chat, and chrome_extension formats; branch import, orphan filtering, participant mapping, system message handling, messages-raw endpoint validation and import - shares.test.ts (19 tests): create tree and branch shares, public token retrieval with sanitized data, settings (model info, timestamps, download), user share listing, deletion with auth checks - custom-models.test.ts (35 tests): full CRUD with Zod validation, user isolation, localhost HTTP vs external HTTPS enforcement, private IP rejection (10.x, 172.16-31.x, 192.168.x, 169.254.x), test endpoint error paths (unsupported provider, missing API key, missing endpoint, unreachable server) - Extended test-helpers.ts to mount collaboration, invites, import, shares, and custom-models routes Coverage: collaboration 84%/95%, invites 81%/74%, import 82%/67%, shares 88%/89%, custom-models 65%/61% (test endpoint requires external services for full coverage)

Route tests: avatars (25), blobs (10), prompt (9), public-models (10) DB utility tests: compaction (17), migration (17), fix-branches (5) Service tests: context-manager (36) Config tests: site-config-loader additions to loader.test.ts (13)

…d conversations route - Add index.config.test.ts (91 tests): custom model CRUD, API key CRUD, admin stats, conversation CRUD, bookmarks, metrics, usage stats, collaboration invites, participants - Add index.search.test.ts (85 tests): branch operations, post-hoc operations, restore, delete cascade, archive, events, duplicate, import, update, UI state, event replay, usage aggregation, collaboration access, grant summary - Extend conversations.test.ts (+29 tests): restore message/branch success paths, split message, delete non-posthoc, fork with prefill/bookmarks/contentBlocks/ multi-branch/private-branches, backfill with shared conversations, compact by admin, Zod validation errors, detachedBranch UI state, subtree with children Coverage improvements: - database/index.ts: 48.96% → 72.70% branches (+23.74%) - routes/conversations.ts: 65.49% → 75.31% branches (+9.82%) - Overall backend: 76.79% → 77.58% branches

Task anima-research#3: Add enhanced-inference.test.ts (98.56% stmts / 91.27% branches) Task anima-research#4: Improve handler.test.ts branch coverage (70.08% → 79.02%) Task anima-research#5: Improve remaining gap files coverage: - context-manager.ts: 68.8% → 87.20% stmts, 85.05% branches - avatars.ts: 58.7% → 84.8% branches (upload tests, GIF handling, auth checks) - custom-models.ts: 61.5% → 70.8% branches (endpoint validation, auth checks) - import.ts: 66.5% → 76.1% branches (branching, arc_chat edges, auth, errors) All 2264 tests pass across 54 test files.

- Add GitHub Actions workflow to run tests with coverage on PRs - Remove test plan (all 47 tasks complete, 75% coverage target achieved)

greptile-apps · 2026-02-12T10:55:22Z

Greptile Overview

Greptile Summary

This PR adds a comprehensive test suite achieving 87.66% statement and 78.92% branch coverage across 2,469 tests in 56 test files. The tests are characterization tests that capture existing behavior without modifying any source code.

Major additions:

CI workflow (.github/workflows/test.yml) runs tests with coverage on every PR
Complete backend test coverage across all routes, services, database operations, and WebSocket functionality
Frontend store tests with proper mocking of external dependencies
Shared type validation tests ensuring Zod schema correctness
Test infrastructure using Vitest with isolated database setup via temporary directories and process.chdir
Documentation in CLAUDE.md for testing commands and workflows

Test quality highlights:

Proper test isolation using temp directories and cleanup
Comprehensive mocking of external services (AI providers, sharp, AWS SDK)
Integration tests using real database instances with supertest for HTTP testing
Edge case coverage including security scenarios (tampered JWT tokens, invalid inputs)
Large test files indicate thorough coverage (e.g., 5453 lines for WebSocket handler, 2158 for Anthropic service)

Coverage areas:

Config loaders: 98.6% statements
Database operations: 89.3% statements including the massive 5400-line index.ts
All 18 route files tested with realistic scenarios
All 5 AI provider services at 93-99% coverage
WebSocket streaming and room management: 94.5% statements

Confidence Score: 5/5

This PR is safe to merge with very high confidence - it only adds tests and CI configuration without modifying any production code
Score of 5/5 because: (1) Zero production code changes - all additions are test files, test configuration, and documentation, (2) Tests follow best practices with proper isolation, mocking, and cleanup, (3) CI workflow is well-configured with appropriate Node version pinning and coverage reporting, (4) Test infrastructure uses established patterns (Vitest, supertest, temp directories for isolation), (5) Comprehensive coverage across all critical systems reduces regression risk for future changes
No files require special attention - all test infrastructure and configuration files are well-implemented

Important Files Changed

Filename	Overview
.github/workflows/test.yml	Adds CI workflow to run tests with coverage reporting on PRs - well-configured with proper workspace builds and coverage thresholds
deprecated-claude-app/backend/src/routes/test-helpers.ts	Test infrastructure with isolated database setup using temp directories and process.chdir for complete isolation
deprecated-claude-app/backend/src/services/anthropic.test.ts	Extensive Anthropic service tests (2158 lines) with proper mocking and comprehensive coverage of message formatting and attachments
deprecated-claude-app/backend/src/services/inference.test.ts	Large inference orchestration test suite (2038 lines) testing multi-provider routing and context management
deprecated-claude-app/backend/src/websocket/handler.test.ts	Massive WebSocket handler test suite (5453 lines) covering streaming, room management, and message routing
deprecated-claude-app/frontend/src/store/index.test.ts	Frontend store tests (1882 lines) with proper mocking of localStorage, API, and WebSocket services
CLAUDE.md	New documentation file providing comprehensive guidance for working with the codebase including testing commands

Meganeuridae · 2026-05-18T06:55:35Z

Read through this carefully and wanted to share findings + a recommendation. Short version: the test suite is genuinely good and worth merging, with one realistic caveat — it was authored against a Feb 2026 snapshot of main and a chunk of the assertions now capture previous behavior that intentional changes have since superseded.

What's solid

Vitest + @vitest/coverage-v8 — right tool for an ESM TS monorepo. Per-workspace config plus a root project config that aggregates is the clean shape.
test-helpers.ts design — createTestApp() spins up a real Database backed by a per-test temp dir, mounts the actual routers, exposes a supertest agent. That means tests catch real persistence + wiring bugs, not just mock-shape regressions. cleanupTestApp properly restores cwd and removes the temp dir.
CI workflow runs on PRs with coverage targets (75% stmts + branches) and a markdown summary. The targets are visible in the GitHub Actions summary tab on every PR — much better than coverage-as-vibes.
Coverage breadth — middleware 100%, services 93–99% per-provider, routes 82.4%. Test types are characterization-style (lock down observed behavior) which is the right call for a codebase without prior tests: it gives a regression net first, leaves spec-style "tests as documented intent" as a separate later effort.
No source modifications — the PR description's claim holds; the test suite is purely additive against src/.
CLAUDE.md is also a nice add — an accurate architectural overview that helps any Claude instance working on the repo orient quickly.

Merge state

I rebased onto current main in a local sandbox to assess. Conflicts:

backend/package.json — Quiterion's branch removes express-rate-limit and import:claude-archive (those landed later in main). The resolution is mechanical: keep main's deps + script, add Quiterion's test deps + scripts.
package-lock.json — derived; resolved by npm install after fixing package.json.

The other 69 files apply clean — no source-code conflicts at all.

What the test suite catches: 45 failures, 2402 passes

After merging onto current main and running JWT_SECRET=… NODE_ENV=test npx vitest run:

Test Files  11 failed | 45 passed (56)
Tests       45 failed | 2402 passed | 22 skipped (2469)

97.3% of the suite still captures valid behavior. Every failure I recognized traces to a PR that landed after this branch was cut:

Failing test cluster	Caused by	Resolution
`services/anthropic.test.ts`, `bedrock.test.ts`, `context-strategies.test.ts` — "does NOT recognize gif as image"	PR #90 (uniform GIF support across providers)	Update assertions: gif IS now recognized
`database/collaboration.test.ts`, `shares.test.ts` — token shape/length asserts	PR #92 (token entropy bumped 48-bit → 128-bit)	Update length expectations
`services/enhanced-inference.test.ts`, `gemini.test.ts` — usage shape, NaN-defaults	PR #104 (four-channel cost tracking)	Update usage-shape fixtures
`utils/encryption.test.ts` — "uses a default key when JWT_SECRET is unset"	PR #91 (JWT_SECRET strict)	Test the new throw-on-unset behavior
`routes/auth.test.ts` — Grant Mint/Send, Forgot/Resend/Verify, Registration	Multiple auth-flow PRs	Re-snapshot against current responses
`routes/conversations.test.ts` — non-owner permission rejections, admin compact	Permission changes since	Re-snapshot rejection paths
`websocket/handler.test.ts` — `handles join_room`, `broadcasts typing event`	handler.ts churn	Investigate (could be intentional, could be a regression)

The pattern is what characterization tests are supposed to do: surface behavior changes between snapshots. None of the failures I traced look like regressions — they look like "main legitimately moved and the snapshot needs refreshing," with the possible exception of the websocket handler ones, which I'd want to look at individually before assuming intentional.

One small concern in `test-helpers.ts`

createTestApp() uses process.chdir() to position Database init against the per-test temp dir. That's a process-global side effect — if two tests' createTestApp() calls interleave (vitest can parallelize), they'll race on cwd and one Database may end up rooted in the wrong temp dir. Easy fix would be passing a base path into Database directly (one-line API change), or restricting test-file concurrency with a vitest config option. Not blocking; flagging because it's the kind of thing that would produce flaky failures rather than consistent ones.

Suggested path forward

Three options ranging from most-Quiterion-effort to most-already-done:

Rebase + Quiterion refreshes the 45 failing tests. Cleanest, preserves their authorship.
Rebase + I do the test refresh as a follow-up commit on this branch. Faster to merge; I have direct context on most of the changes that broke each test (several were my PRs). I'd push and Quiterion + Antra would review.
Merge as-is, fix failures in a follow-up PR. Worst — would let CI ship in a state where the test job is red on every PR until One-on-one chats switch model names for past messages when model is switched #2 lands, defeating the point.

My instinct is (2) — I introduced or shipped many of the post-Feb changes that broke the tests, so I can both rebase and update the assertions with high confidence about whether each new behavior is intentional vs a regression worth flagging. I'd then post a per-test summary on the rebase commit so Quiterion + Antra can sanity-check the intent calls. Happy to defer to either of you if you'd rather a different shape.

Either way: thanks for building this. It's a lot of careful work and the design is the right shape for the codebase.

Meganeuridae · 2026-05-18T07:46:57Z

Followed through on Option 2 from the earlier review: rebased onto current main and refreshed the 45 assertions broken by intentional behavior changes since the Feb 2026 snapshot. All 3057 tests pass (backend 2468 + frontend 322 + shared 267, 100% green).

New PR: #112

Each refresh commit there carries per-test intent-tagging — which PR changed the assertion, whether the new behavior is intentional, and why the test update is the right move. Full Co-Authored-By attribution to @Quiterion preserved on every commit (the framework, design, and 2,469 tests are all yours).

Once #112 lands, #69 can close.

Quiterion added 30 commits February 12, 2026 21:41

chore: add vitest testing infrastructure to all workspaces

95a1c0d

test: add shared package schema validation tests (207 tests, 100% cov…

c307b5d

…erage)

refactor: extract WebSocket prompt utilities with tests (34 tests, 10…

a488e45

…0% coverage)

chore: add CLAUDE.md, test coverage plan, and gitignore coverage dirs

cd73841

chore: add test:coverage and test:watch scripts, update CLAUDE.md wit…

8bb5054

…h testing docs

chore: suppress noisy test:coverage output with --reporter=dot --silent

289de73

chore: edit coverage flags

e94f76c

chore: revert modification of existing code

da57e39

fix: add root vitest workspace config, fix avatars test @/ alias

00241cd

- Add vitest.config.ts with projects config so `npx vitest run` works from monorepo root - Change avatars.test.ts to use relative imports instead of @/ alias (works in all contexts)

chore: remove existence-only constructor tests and incomplete share test

effa544

Remove 13 tests that only verified constructors exist (no behavioral assertions) and 1 incomplete test in shares.test.ts with no assertions. Flagged by quality review as specification-gaming patterns.

Quiterion added 20 commits February 12, 2026 21:41

docs: update test plan with completed tiers and add Tier 4

3c7267a

Update completed work section with Tiers 1-3+ results (~2300 tests). Add Tier 4: remaining 11 routes, DB utilities, context manager, site-config-loader (8 new tasks, 35-42).

chore: add CI test workflow and remove completed test plan

520c59f

- Add GitHub Actions workflow to run tests with coverage on PRs - Remove test plan (all 47 tasks complete, 75% coverage target achieved)

fix: change working directory for tests

69236aa

Meganeuridae mentioned this pull request May 18, 2026

Comprehensive test suite (rebase + test-refresh of #69) #112

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add comprehensive test suite#69

Add comprehensive test suite#69
Quiterion wants to merge 51 commits into
anima-research:mainfrom
Quiterion:feature/testing-infrastructure

Quiterion commented Feb 12, 2026

Uh oh!

greptile-apps Bot commented Feb 12, 2026

Important Files Changed

Uh oh!

Meganeuridae commented May 18, 2026

Uh oh!

Meganeuridae commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Quiterion commented Feb 12, 2026

Summary

What's covered

Test plan

Uh oh!

greptile-apps Bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Meganeuridae commented May 18, 2026

What's solid

Merge state

What the test suite catches: 45 failures, 2402 passes

One small concern in test-helpers.ts

Suggested path forward

Uh oh!

Meganeuridae commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

One small concern in `test-helpers.ts`