Add comprehensive test suite#69
Conversation
Tests for generateToken, verifyToken, and authenticateToken middleware. Coverage: 100% statements, 100% branches, 100% functions, 100% lines. Tests cover: - generateToken: valid JWT creation, 7-day expiry, unique per userId - verifyToken: valid tokens, wrong secret, expired, malformed, tampered payload - authenticateToken: valid Bearer token, missing header, empty token, invalid token, expired token, wrong secret, complex user IDs
- encryption.test.ts: 34 tests covering roundtrip encrypt/decrypt for all data types, tampered ciphertext/IV/auth tag detection, wrong key rejection, malformed input handling, cross-instance key derivation, env var fallback - error-messages.test.ts: 46 tests verifying all message constants contain expected key phrases, template functions interpolate correctly, and all categories have appropriate structure Coverage: encryption.ts 91% stmt/100% branch, error-messages.ts 100%/100%
- pricing-cache.test.ts: 20 tests covering cache update/lookup, pricing computation (string/numeric/NaN/null/zero), staleness detection, cache replacement, refresh callback invocation and error handling - cache-strategies.test.ts: 37 tests covering DefaultCacheStrategy (Opus refresh vs rebuild, first-request token threshold at 500, context rotation, performance analysis with hit/expire/saved metrics), AggressiveCacheStrategy (always-refresh behavior), CostOptimizedCacheStrategy (Opus 2000-token vs non-Opus 5000-token thresholds) Coverage: pricing-cache.ts 97%/96%, cache-strategies.ts 100%/97%
41 tests covering: - User connection registration/unregistration (including multi-tab) - Room join/leave lifecycle and automatic cleanup of empty rooms - Multi-user room membership and user deduplication - Broadcast messaging with sender exclusion and closed connection skipping - AI request tracking: start, conflict detection, end, state queries - Heartbeat: pinging alive connections, terminating unresponsive ones - Stats reporting with room counts and AI request state - Edge cases: no userId, non-existent rooms, send errors, ping errors Documents hasActiveAiRequest quirk: returns true for non-existent rooms (undefined !== null) — getActiveAiRequest is the reliable alternative. Coverage: room-manager.ts 99% stmt, 93% branch
94 tests covering all parser functions: parseBasicJson, parseAnthropic, parseChromeExtension, parseArcChat, parseOpenAI, parseCursor, parseCursorJson, parseColonFormat. Tests include format detection, participant dedup, title extraction, edge cases (empty, invalid, branching), and MIME type guessing. Coverage: 98.2% stmt, 85.3% branch, 100% functions.
67 tests covering estimateTokens, getMessageTokens, and all 5 strategy implementations: AppendContextStrategy, RollingContextStrategy, LegacyRollingContextStrategy, StaticContextStrategy, AdaptiveContextStrategy. Tests include token estimation for text/images/thinking blocks, cache marker placement with arithmetic positioning, rolling window rotation, grace period behavior, branch change detection, and edge cases. Coverage: 96.3% stmt, 87.6% branch, 100% functions.
ConfigLoader (21 tests):
- Load/cache/reload config from CONFIG_PATH env var
- Default config fallback when file missing or invalid JSON
- getBestProfile filtering: allowedModels, allowedUserGroups, modelCosts
- Load balancing strategies: first, round-robin, least-used, random
- getDefaultModel with/without config, getProviderProfiles, singleton
ModelLoader (19 tests):
- Load/cache/reload models from MODELS_CONFIG_PATH
- getAllModels: system-only vs merged with user-defined models
- User model settings conversion (topP/topK optional handling)
- getModelsByProvider filtering, getModelById with user lookup
- getModelProvider, missing file/invalid JSON fallbacks, singleton
Coverage: loader.ts 98.79% stmt / 92.59% branch
model-loader.ts 100% stmt / 85.71% branch
authenticity.ts (43 tests):
- computeAuthenticity: empty/null input, single unaltered message, legacy
messages, human-written AI, user messages, edit/split/posthoc propagation,
name collisions (case-insensitive), multi-message conversations
- getAuthenticityLevel: all 8 levels with priority ordering
- getAuthenticityColor: all 8 level-to-color mappings verified
- getAuthenticityTooltip: content validation for all levels
modelColors.ts (46 tests):
- Direct model ID matches for major model families
- Pattern matching for Opus, Sonnet, Haiku, GPT, Llama, Gemini, Mistral,
Command, DeepSeek, O1 variants with provider prefixes
- Default fallback for unknown models, undefined/empty input
- getLighterColor hex-to-rgba conversion with various opacities
latex.ts (22 tests):
- Display math ($$...$$, \[...\]), inline math ($...$, \(...\))
- Skip optimization (no delimiters = no processing)
- Error recovery: all 4 catch blocks tested via katex mock throw
- Mixed content, delimiter edge cases
avatars.ts (35 tests):
- loadAvatarPacks: API loading, caching, error handling
- getAvatarUrl/getAvatarColor: pack lookup, null/missing cases
- getModelAvatarUrl: canonicalId direct + derived
- getParticipantAvatarUrl: override priority chain (participant > persona > model)
- getParticipantColor: override priority chain, user-type returns null
Coverage: authenticity 100%/98.61%, avatars 96.2%/92.98%,
latex 100%/100%, modelColors 96.2%/97.54%
- Add vitest.config.ts with projects config so `npx vitest run` works from monorepo root - Change avatars.test.ts to use relative imports instead of @/ alias (works in all contexts)
Anthropic (60 tests): - formatMessagesForAnthropic: user/assistant/system messages, multi-turn ordering, active branch selection, image/PDF/text attachments, mixed attachments - Cache control: simple messages, attachment messages, cache breakpoints - Thinking blocks: signed (structured), unsigned (XML text), redacted, mixed - Prefill-format thinking: thinking/redacted_thinking tag prepending - Image resize: under limit, over limit, no dimensions, sharp error - splitAtCacheBreakpoints: multi-section, empty sections, no breakpoints - calculateCacheSavings: known models, unknown models, zero tokens - parseThinkingTags: single/multiple blocks, no tags, empty tags - Helper methods: isImage, isPdf, isAudio, isVideo, getMediaType, getImageMediaType Bedrock (49 tests): - formatMessagesForClaude: user/assistant/system, multi-turn, active branch - Attachments: image, PDF, text inline, mixed, resize edge cases - buildRequestBody: Claude 3 Messages API vs Claude 2 legacy prompt format, system prompt, stop_sequences, temperature vs top_p/top_k exclusion, content block text extraction for Claude 2 - extractContentFromChunk: Claude 3 deltas, Claude 2 completions - isStreamComplete: Claude 3 message_stop, Claude 2 stop_reason - Helper methods: isImage, isPdf, getMediaType, validateApiKey Mutation tests passed (7 mutations, all caught): - Anthropic: formatMessagesForAnthropic system filter, splitAtCacheBreakpoints cache_control, calculateCacheSavings multiplier, parseThinkingTags regex - Bedrock: buildRequestBody Claude 3 detection, extractContentFromChunk field, isStreamComplete event type
88 tests across 3 provider service files: - openrouter.test.ts (41): formatMessagesForOpenRouter, detectProviderFromModelId, calculateCacheSavings, getMediaType, attachments, cache_control, thinking blocks - gemini.test.ts (26): formatMessagesForGemini, getMimeType, isSupportedMediaType, role mapping, thought_signature, blob store, attachments - openai-compatible.test.ts (21): formatMessagesForOpenAI, parseThinkingTags, think tags, redacted_thinking, attachments 9 mutations tested and caught (3 per file).
51 tests covering: - checkContentSync regex fallback patterns - checkContent tiered moderation (always-blocked, age-restricted, researcher-exempt) - Admin bypass - Threshold boundary precision (critical=0.5, blocking=0.7) - Tier priority ordering (tier 1 > tier 2 > tier 3) - API error handling (fail open on 5xx, network error, empty results) - checkMessages combining and filtering logic - No API key scenario Coverage: 100% stmts, 100% branch, 100% funcs, 100% lines Mutation tested: checkContentSync (blocked→false), CRITICAL_THRESHOLD (0.5→0.9), checkMessages filter removal, admin bypass removal — all caught
31 tests covering: - Priority ordering: user key > config profile > env var fallback - User API key behavior (allowed/disallowed, provider matching, DB errors) - Environment variable fallback for all providers (Anthropic, Bedrock, OpenRouter, OpenAI-compatible) - Bedrock default region, missing secret key handling - Config profile lookup parameter passing - Rate limit checks (disabled, no limits, no features) - Usage tracking with provider/billed cost calculations and margin - getCostForModel matching and edge cases Coverage: 98.63% stmts, 92.15% branch, 100% funcs, 98.57% lines Mutation tested: getEnvApiKey (Anthropic→null), source user→config, getCostForModel (find→first) — all caught
Email (16 tests): - sendVerificationEmail: subject, URL, 24h expiry, HTML template structure - sendPasswordResetEmail: subject, URL, 1h expiry, HTML/plaintext versions - No API key: verification returns true (dev mode), reset returns false - Error handling: API errors and exceptions return false - Template: DOCTYPE, button links, fallback text Coverage: 100% stmts, 93.75% branch Persona context builder (19 tests): - buildPersonaContextById: persona not found, delegation to buildPersonaContext - History assembly: combine historical + backscroll, skip current conversation - Participation ordering: chronological by logicalStart, filter incomplete - Context strategies: rolling (most recent), anchored (prefix + suffix), unknown - Branch inheritance: recursive parent collection - Token estimation: 1 token per 4 chars, active branch content - Pre-computed canonicalHistory path with missing message handling - Error: throws on missing conversation Coverage: 99.08% stmts, 90% branch Mutation tested: - Email: verification no-key true→false, password reset subject swap — caught - Persona: leftAt filter removal, sort order reversal — caught
Add tests for persistence, blob-store, collaboration, and shares stores. persistence.ts (13 tests): JSON serialization roundtrip, JSONL line parsing, empty file handling, malformed line handling, large event append+load, close idempotency. 91% stmts / 90% branch. blob-store.ts (24 tests): Save/retrieve by hash, deduplication, metadata-only retrieval, sharded directory structure, deletion with dedup cleanup, MIME type extension mapping, JSON blob roundtrip, error-rethrow branches. 91% stmts / 80% branch. collaboration.ts (45 tests): Share CRUD, permission updates, revocation with index cleanup, invite creation with expiration/max-uses/labels, invite token lookup with expiration/max-uses enforcement, invite usage tracking, creator-only deletion, full event replay for all 6 event types. 98% stmts / 89% branch. shares.ts (24 tests): Share creation with settings, token lookup with view count increment, expiration enforcement, owner-only deletion, bulk conversation deletion, event replay for created/deleted/viewed events. 98% stmts / 93% branch. Mutation testing (3+ methods per file): - persistence: appendEvent timestamp serialization, loadEvents ENOENT return, init guard - blob-store: computeHash algorithm, dedup logic, default extension - collaboration: getUserPermission return, deleteInvite creator check, expiration check - shares: deleteShare owner check, viewCount increment, expiration check
persona.ts (75 tests): Persona CRUD, custom options, archiving blocks new participations, deletion cleans up shares. History branch creation, head switching, cross-persona branch rejection. Participation tracking with sequential logical times, interleaving constraint, canonical branch history, fork-point filtering in collectBranchParticipations. Share CRUD with duplicate prevention, permission updates, revocation. Event replay for all 13 event types. 93% stmts / 81% branch. conversation-ui-state.ts (27 tests): Shared state save/load with caching, active branch set/get, branch count increment/decrement with floor at zero. Per-user state save/load/update, speakingAs, selectedResponder, detached mode with branch clearing on re-attach. Read tracking with deduplication and lastReadAt timestamps. Cache management (clearCache, clearUserCache). deleteConversation removes files and clears caches. 89% stmts / 75% branch. Mutation testing (3+ methods per file): - persona: interleaving constraint, logicalEnd<=logicalStart, owner permission - conversation-ui-state: Math.max(0) floor, detached branch clearing, read dedup
33 tests covering event handling, message queuing, exponential backoff reconnection, intentional disconnect, room management, connection state, visibility handler, keep-alive/staleness detection, connection timeout, and message parsing. Coverage: 84%/81% (stmt/branch).
14 tests covering initial state, ensureLoaded caching, concurrent load deduplication, isLoading lifecycle, error fallback with/without message, reload after error, reloadConfig force-fetch, getConfig sync access, and convenience getters. Coverage: 100%/100% (stmt/branch).
Remove 13 tests that only verified constructors exist (no behavioral assertions) and 1 incomplete test in shares.test.ts with no assertions. Flagged by quality review as specification-gaming patterns.
40 tests covering the user-related public API of Database: - createUser: fields, flags (emailVerified, ageVerified, tosAccepted), duplicate rejection - getUserById / getUserByEmail: lookup, missing, case-sensitivity (exact match only) - validatePassword: correct/wrong/missing - Email verification: token create, verify, expired token, consumed token - Manual verification: verified/already-verified/nonexistent - Age verification: set/check/nonexistent - ToS acceptance: set/nonexistent - Password reset: full flow, expired/consumed tokens, getPasswordResetTokenData - getAllUsers: returns all users - Event replay: user survives, email verification survives, password reset does NOT survive (password_reset event doesn't log new hash), age/ToS do NOT survive (no replay handlers for user_age_verified/user_tos_accepted events) - Init auto-creates test users on fresh DB Characterization quirks captured: - getUserByEmail is case-SENSITIVE (no lowercasing) - Password reset lost on DB reload (event doesn't persist new hash) - Age verification and ToS acceptance lost on DB reload (no replay handlers) Mutation tested: createUser duplicate check, verifyEmail expiry, validatePassword hash comparison, resetPassword hash update — all caught.
32 tests covering the grant-related public API of Database: - recordGrantInfo: mint increases balance, burn decreases, send transfers, tally adds; multiple mints aggregate, different currencies tracked independently - Balance goes negative on excessive burn (no enforcement) - Zero-amount mint is a no-op - Currency migration: opus→claude3opus, sonnets→old_sonnets - Undefined currency defaults to 'credit' - Grant details normalized (string→number coercion) - recordGrantCapability: grant/revoke, latest-wins, expiry enforcement - userHasActiveGrantCapability: active/revoked/expired/no-expiry/nonexistent - getUserGrantSummary: returns totals + infos + capabilities; empty for fresh user - Invite system: create, validate, claim (mints credits), maxUses enforcement, expired rejection, duplicate code rejection - Event replay: minted grants, capabilities, and burn balance all survive reload Mutation tested: updateGrantTotals mint delta sign flip, migrateCurrencyName skip, capabilityIsActive always-true — all caught.
41 tests covering the message and branching public API (MOST CRITICAL): - createMessage: creates with correct fields, UUID, activeBranchId, parentBranchId - getConversationMessages: returns messages sorted by tree order - Linear conversation (A→B→C): correct ordering and parent chain - Single branch (A→B1, A→B2): two branches on same message, active branch defaults to newest, setActiveBranch switches between them - Nested branches: multi-level tree (A→B1→C1, A→B1→C2, A→B2), switching between branches at different levels - addMessageBranch: edit-creates-new-branch semantics, preserveActiveBranch flag - setActiveBranch: switches active, returns false for nonexistent branch/message - deleteMessage: removes from conversation, doesn't affect siblings - deleteMessageBranch: preserves sibling branches, deletes entire message if only branch, cascade-deletes descendants, switches active when deleting active - Post-hoc operations: hide and edit with operation metadata - getMessage / updateMessage: CRUD operations - hiddenFromAi flag stored correctly - Tree ordering: parents always before children - Event replay: messages, branches, and deletions all survive DB reload - Edge cases: nonexistent conversation throws, attachments, auto-parent linking, root parentBranchId for first message, creationSource stored on branches Mutation tested: addMessageBranch activeBranchId update, setActiveBranch nonexistent-branch return, createMessage auto-parent linking — all caught.
26 tests covering collaboration shares, permission levels (viewer/ collaborator/editor), revocation, public shares (SharesStore), and event replay persistence. Mutation tested canUserAccessConversation owner check, canUserChatInConversation permission bypass, and revokeCollaborationShare no-op — all caught.
…t/78% branch) Mock outermost layer (DB, providers, API key manager, model loader) and let real InferenceService logic run. Covers: - determineActualFormat: standard/prefill/messages/completion routing - modelSupportsPrefill / providerSupportsPrefill - applyPostHocOperations: hide, hide_before, edit, hide_attachment, unhide - formatMessagesForConversation: standard, prefill, messages modes - consolidateConsecutiveMessages: bedrock alternating turns - truncateMessagesToFit: head truncation, oversized messages, multimodal - createMessagesModeChunkHandler: name prefix stripping - parseThinkingTags: think block extraction - streamCompletion: provider routing, stop sequences, thinking mode, rate limits, API key management, custom endpoints, usage tracking - buildPrompt: full pipeline integration - Mutation tested: 4 mutations on 4 methods, all caught
Install supertest, create shared test helper (createTestApp with real Database in temp dir), write 28 auth tests (register, login, profile, api-keys, grants, user lookup, forgot/reset password) and 29 conversation tests (CRUD, archive, messages, metrics, export, duplicate, UI state, mark-read, permission checks).
Auth: 71% stmts / 65% branch (58 tests) - Add user-not-found profile test, mixed API key listing masking - Add grant send with default currency/reason, invite code claim path - Add password reset flow exercise Conversations: 75% stmts / 65% branch (112 tests) - Add successful post-hoc delete, hide_attachment operation type - Add fork truncated mode, delete post-hoc non-owner check - Add UI state clearing (empty/null values), branch privacy not-found - Add duplicate with options, create validation, post-hoc with reason
Add integration tests for participants, bookmarks, models, site-config, and system routes. All files exceed 70% statement / 65% branch coverage: - participants.ts: 76.54% stmts / 71.05% branch (20 tests) - bookmarks.ts: 78.26% stmts / 75% branch (13 tests) - models.ts: 79.68% stmts / 83.33% branch (12 tests) - site-config.ts: 78.94% stmts / 100% branch (6 tests) - system.ts: 80% stmts / 100% branch (3 tests) Key testing techniques: - ConfigLoader injection for admin provider detection branches - Pre-populated OpenRouter pricing cache for cache-hit path - Admin user (cassandra) grants minting for currency coverage - Custom middleware injection for site-config admin check branches - Demo user login for user-defined model by ID tests
…tmt coverage) Add comprehensive characterization tests for AnthropicService.streamCompletion covering: - Request parameter building (model, temperature, top_p/top_k exclusivity, stop sequences) - Thinking configuration and max_tokens adjustment for budget - System prompt caching when _cacheControl is present - Streaming event handling (text deltas, thinking blocks, redacted thinking, signatures) - Cache metrics extraction from message_start events - Error handling with failure metrics recording - Demo mode simulation - llmLogger integration (request/response/cache metrics logging) - Edge cases: error chunks, stop sequences, thinking-only responses
…tmt coverage) Add comprehensive characterization tests for BedrockService.streamCompletion covering: - Claude 3 Messages API streaming (content_block_delta events, message_stop) - Claude 2 legacy streaming (completion field, stop_reason) - Request body construction for both API formats - InvokeModelWithResponseStreamCommand parameter verification - Error handling (empty response body, API errors, non-Error throws) - llmLogger integration (request/response logging, error logging) - Demo mode simulation (word-by-word streaming, completion signaling) - rawRequest return value structure
…mt coverage) Add comprehensive characterization tests for GeminiService covering: - streamCompletion: SSE stream parsing, text/thinking/image content handling - Request building: generationConfig (temp, topP, topK, maxOutputTokens, stopSequences) - System instruction, thinking config, Google Search tool, response modalities - Image generation: inlineData handling, preview-to-final replacement, blob storage - Thought signature capture and propagation to content blocks - Error handling: HTTP errors, malformed JSON, no response body, failure metrics - generateContent (non-streaming): text, thinking, image generation, tool config - Usage metadata extraction with defensive defaults
… coverage) Coverage: 22.3% → 96.15% stmts, 90.47% branches. Tests cover streamCompletion (request building, SSE parsing, thinking blocks, thought signatures, image generation with blob storage, error handling, usage metrics) and generateContent (non-streaming path, thinking, images, system instructions, tool configs).
…s, 98% stmt coverage) Coverage: 32.1% → 98.21% stmts, 95.77% branches. Tests cover streamCompletion (request building, SSE parsing, thinking tag extraction, usage/token tracking, error handling, llmLogger integration), listModels (success, error, missing data), and validateApiKey (models fallback, auth status codes, network errors).
… stmt coverage) Add comprehensive characterization tests for OpenRouterService covering: - streamCompletion: request building, headers, Anthropic provider forcing, thinking/reasoning support (max_tokens adjustment), SSE streaming and content assembly, usage/token tracking with cache metrics, all 3 reasoning field formats (reasoning_content, reasoning, reasoning_details) with priority ordering, image generation (delta.images, message.images, inlineData), blob replacement with old blob cleanup, error handling (HTTP errors, null body, network failures, failure metrics estimation) - streamCompletionExactTest: non-streaming request, Anthropic provider config, content delivery, cache token calculation, error paths - listModels: API fetching, error handling, missing data field - validateApiKey: success/failure paths, key passthrough - constructor: API key fallback chain, missing key warning Coverage: 30.0% → 96.47% stmts, 87.8% branches
…mt/70% branch) Covers connection/auth, chat flows (standard + prefill), regenerate, edit, continue, delete, abort, room management, credit checks, content filtering, error handling, hiddenFromAi, and parallel sampling. Mutation tests validate userHasSufficientCredits, handleAbort, handleDelete, and filterHiddenFromAi.
…77% branch) Comprehensive characterization tests for the Vue reactive store covering: - Authentication (login, logout, register, loadUser) - Message visibility (getVisibleMessages with caching, branch following) - Branch switching (single, batch, cascade, detached mode) - WebSocket event handlers (message_created, stream, message_edited, message_deleted, message_restored, message_split, branch_visibility) - loadConversation (with detached mode, read state flush, retry) - loadMessages, sendMessage, continueGeneration deeper paths - Conversation CRUD (create, update, archive, duplicate, compact) - Model management (load, custom CRUD, OpenRouter) - Read tracking (mark as read, debounced persist, unread counts) - Mutation tests on 4 methods (getVisibleMessages, switchBranch, setDetachedMode snapshot/restore)
Update completed work section with Tiers 1-3+ results (~2300 tests). Add Tier 4: remaining 11 routes, DB utilities, context manager, site-config-loader (8 new tasks, 35-42).
Admin tests (58 tests, 80.6% stmt / 68% branch): - requireAdmin middleware (401/403 for unauth/non-admin) - GET /admin/users, GET /admin/users/:id - POST /admin/users/:id/capabilities (grant/revoke, validation) - POST /admin/users/:id/credits (amount/currency validation) - POST /admin/users/:id/reload - GET /admin/stats, usage endpoints (user/system/model) - Config management (GET/PATCH /admin/config, reload, models visibility) - Bulk admin ops (verify-legacy-users, set-all-age-verified, set-all-tos-accepted) - GET /admin/conversation-size/:id Personas tests (66 tests, 76.4% stmt / 75% branch): - CRUD (create, list, get, update, delete with permission checks) - Archive persona - History branches (list, fork, set head) - Join/leave conversation (with roomManager mock) - Participations listing with branchId filter - Canonical branch and logical time updates - Sharing (create, update permission, revoke, access verification) - Permission hierarchy (owner > editor > user > viewer)
…oute tests - collaboration.test.ts (42 tests): public invite lookup, share CRUD, permission checks, invite creation/claiming/deletion, shared-with-me, my-permission endpoint, access control for viewers vs editors - invites.test.ts (18 tests): create invite with/without mint capability, auto-generated and custom codes, duplicate rejection, expiration, max-uses enforcement, public code validation, claim flow - import.test.ts (35 tests): preview and execute for basic_json, anthropic, arc_chat, and chrome_extension formats; branch import, orphan filtering, participant mapping, system message handling, messages-raw endpoint validation and import - shares.test.ts (19 tests): create tree and branch shares, public token retrieval with sanitized data, settings (model info, timestamps, download), user share listing, deletion with auth checks - custom-models.test.ts (35 tests): full CRUD with Zod validation, user isolation, localhost HTTP vs external HTTPS enforcement, private IP rejection (10.x, 172.16-31.x, 192.168.x, 169.254.x), test endpoint error paths (unsupported provider, missing API key, missing endpoint, unreachable server) - Extended test-helpers.ts to mount collaboration, invites, import, shares, and custom-models routes Coverage: collaboration 84%/95%, invites 81%/74%, import 82%/67%, shares 88%/89%, custom-models 65%/61% (test endpoint requires external services for full coverage)
Route tests: avatars (25), blobs (10), prompt (9), public-models (10) DB utility tests: compaction (17), migration (17), fix-branches (5) Service tests: context-manager (36) Config tests: site-config-loader additions to loader.test.ts (13)
…d conversations route - Add index.config.test.ts (91 tests): custom model CRUD, API key CRUD, admin stats, conversation CRUD, bookmarks, metrics, usage stats, collaboration invites, participants - Add index.search.test.ts (85 tests): branch operations, post-hoc operations, restore, delete cascade, archive, events, duplicate, import, update, UI state, event replay, usage aggregation, collaboration access, grant summary - Extend conversations.test.ts (+29 tests): restore message/branch success paths, split message, delete non-posthoc, fork with prefill/bookmarks/contentBlocks/ multi-branch/private-branches, backfill with shared conversations, compact by admin, Zod validation errors, detachedBranch UI state, subtree with children Coverage improvements: - database/index.ts: 48.96% → 72.70% branches (+23.74%) - routes/conversations.ts: 65.49% → 75.31% branches (+9.82%) - Overall backend: 76.79% → 77.58% branches
Task anima-research#3: Add enhanced-inference.test.ts (98.56% stmts / 91.27% branches) Task anima-research#4: Improve handler.test.ts branch coverage (70.08% → 79.02%) Task anima-research#5: Improve remaining gap files coverage: - context-manager.ts: 68.8% → 87.20% stmts, 85.05% branches - avatars.ts: 58.7% → 84.8% branches (upload tests, GIF handling, auth checks) - custom-models.ts: 61.5% → 70.8% branches (endpoint validation, auth checks) - import.ts: 66.5% → 76.1% branches (branching, arc_chat edges, auth, errors) All 2264 tests pass across 54 test files.
- Add GitHub Actions workflow to run tests with coverage on PRs - Remove test plan (all 47 tasks complete, 75% coverage target achieved)
Greptile OverviewGreptile SummaryThis PR adds a comprehensive test suite achieving 87.66% statement and 78.92% branch coverage across 2,469 tests in 56 test files. The tests are characterization tests that capture existing behavior without modifying any source code. Major additions:
Test quality highlights:
Coverage areas:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| .github/workflows/test.yml | Adds CI workflow to run tests with coverage reporting on PRs - well-configured with proper workspace builds and coverage thresholds |
| deprecated-claude-app/backend/src/routes/test-helpers.ts | Test infrastructure with isolated database setup using temp directories and process.chdir for complete isolation |
| deprecated-claude-app/backend/src/services/anthropic.test.ts | Extensive Anthropic service tests (2158 lines) with proper mocking and comprehensive coverage of message formatting and attachments |
| deprecated-claude-app/backend/src/services/inference.test.ts | Large inference orchestration test suite (2038 lines) testing multi-provider routing and context management |
| deprecated-claude-app/backend/src/websocket/handler.test.ts | Massive WebSocket handler test suite (5453 lines) covering streaming, room management, and message routing |
| deprecated-claude-app/frontend/src/store/index.test.ts | Frontend store tests (1882 lines) with proper mocking of localStorage, API, and WebSocket services |
| CLAUDE.md | New documentation file providing comprehensive guidance for working with the codebase including testing commands |
|
Read through this carefully and wanted to share findings + a recommendation. Short version: the test suite is genuinely good and worth merging, with one realistic caveat — it was authored against a Feb 2026 snapshot of main and a chunk of the assertions now capture previous behavior that intentional changes have since superseded. What's solid
Merge stateI rebased onto current
The other 69 files apply clean — no source-code conflicts at all. What the test suite catches: 45 failures, 2402 passesAfter merging onto current main and running 97.3% of the suite still captures valid behavior. Every failure I recognized traces to a PR that landed after this branch was cut:
The pattern is what characterization tests are supposed to do: surface behavior changes between snapshots. None of the failures I traced look like regressions — they look like "main legitimately moved and the snapshot needs refreshing," with the possible exception of the websocket handler ones, which I'd want to look at individually before assuming intentional. One small concern in
|
|
Followed through on Option 2 from the earlier review: rebased onto current main and refreshed the 45 assertions broken by intentional behavior changes since the Feb 2026 snapshot. All 3057 tests pass (backend 2468 + frontend 322 + shared 267, 100% green). New PR: #112 Each refresh commit there carries per-test intent-tagging — which PR changed the assertion, whether the new behavior is intentional, and why the test update is the right move. Full Co-Authored-By attribution to @Quiterion preserved on every commit (the framework, design, and 2,469 tests are all yours). |
Summary
Adds a comprehensive automated test suite across the entire backend, achieving 87.66% statement and 78.92% branch coverage. This provides a regression safety net for future future changes.
What's covered
Test plan