From bde9c6140021e8b3c79e4e1c7a68480908743267 Mon Sep 17 00:00:00 2001 From: castor-agent Date: Tue, 19 May 2026 17:43:39 +0200 Subject: [PATCH 1/4] feat(instructions): auto-file issues when pre-consented via standing rule (#194) Add a session-start instruction that retrieves the issue_filing_consent preference entity at startup and caches it for the session. When the stored value is "always", the effective reporting mode is set to proactive and the agent MUST file issues without asking. Values "ask" and "never" map to consent and off respectively. The preference overrides issues.reporting_mode from config and env for the duration of the session. Also update the proactive mode contract to explicitly list issue_filing_consent value "always" as a trigger alongside the CLI config and env var paths, replacing the obsolete future auto_file_issues forward reference. Update QA-driven issue filing to check for the stored preference entity before prompting the user. Co-Authored-By: Claude Sonnet 4.6 --- docs/developer/mcp/instructions.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/developer/mcp/instructions.md b/docs/developer/mcp/instructions.md index 1a19b046f..303d3aa90 100644 --- a/docs/developer/mcp/instructions.md +++ b/docs/developer/mcp/instructions.md @@ -187,10 +187,11 @@ CLI parity: in CLI backup mode, `entities search` accepts a positional identifie CLI backup transport: when MCP is available or when reconciling counts/results with MCP, run CLI reads/writes via API transport (`--api-only` or explicit `--base-url`) and do not rely on offline-default transport unless the user explicitly requests local/offline data. Summarization after MCP actions: follow the [COMMUNICATION & DISPLAY] display rule exactly (horizontal rule, linked `🧠 Neotoma β€” [](/inspector/conversations/)` heading, non-empty Created/Updated/Retrieved bullet groups, or the empty-state Suggestions fallback). Do not dump raw snapshot keys or internal ids into prose, and do not surface the conversation entity_id as plain header text when it is already encoded in the Inspector link. Do not repeat the same phrase as both thought and section heading. Update check: at session start, optionally call npm_check_update with packageName "neotoma" and the client version; if updateAvailable, prompt user to upgrade. Pass include_release_notes true when the user should see truncated release notes, a breaking-changes excerpt, and release_url (best-effort; enrichment_error if enrichment fails). +Issue-filing consent check (session start): at session start, call `retrieve_entities` with `entity_type: "preference"` and a name/title filter for `"issue_filing_consent"` to check for a standing consent preference. Cache the result for the session β€” do not re-fetch per issue. Map the stored `value` to the effective reporting mode: `"always"` β†’ proactive (file without asking); `"ask"` β†’ consent (ask before each filing); `"never"` β†’ off (never file, never ask). When the preference entity is found, it overrides `issues.reporting_mode` from config and env for this session. FORBIDDEN: asking the user "should I file this?" when the preference value is `"always"`. If the preference entity is absent, fall through to the normal mode-discovery flow in [ISSUE REPORTING]. [ISSUE REPORTING] -Issues are the primary feedback channel for iterative Neotoma improvement based on agentic usage. Filing happens in two scenarios: (1) **reactive** β€” when you hit friction using Neotoma (a failing tool call, an opaque error, a missing surface, or a doc gap); (2) **QA-driven** β€” when the per-turn self-audit (see [QA REFLECTION] "Auto-file threshold" and "QA-driven issue filing") produces file-worthy findings indicating a Neotoma-side bug or an instruction/interface improvement opportunity. Both scenarios are gated by **`issues.reporting_mode`**: **`proactive`** β€” file immediately without asking (`neotoma issues config --mode proactive` or `NEOTOMA_ISSUES_REPORTING_MODE=proactive`); **`consent`** (default) β€” confirm with the user before each `submit_issue`; **`off`** β€” only submit when the user explicitly asks. Mode discovery: if `reporting_mode` has not been configured (no prior `neotoma issues config --mode` and no `NEOTOMA_ISSUES_REPORTING_MODE` env var) and an issue-filing situation arises, ask the user once: "Neotoma can automatically file issues for product improvements discovered during usage. What reporting mode do you prefer? (proactive / consent / off)". Then immediately persist the choice via `neotoma issues config --mode ` so it applies to all future sessions β€” do not re-ask. -Proactive mode contract (auto-file mandate): when **`issues.reporting_mode`** is **`proactive`** (set explicitly via `neotoma issues config --mode proactive`, the `NEOTOMA_ISSUES_REPORTING_MODE=proactive` env var, or a future `auto_file_issues` standing-rule preference resolving to true), the agent MUST call `submit_issue` for every reactive friction event AND every file-worthy QA finding in the same turn, with no user confirmation prompt and no "want me to file this?" question. FORBIDDEN: pausing to ask "should I file this?" when proactive mode is set; deferring filing to a later turn; bundling multiple findings into a chat summary instead of separate `submit_issue` calls when their root causes differ. REQUIRED: complete the PII stripping checklist above for every auto-filed issue; run the dedup-before-filing search before each call; render every auto-filed issue in the `Issues` group of the turn report with its `entity_id` / `github_number`. Proactive mode does NOT bypass risk-management hold points (auth, schema migrations, foundation docs, destructive data repair) β€” for those, file the issue describing the problem but do not auto-execute the fix. +Issues are the primary feedback channel for iterative Neotoma improvement based on agentic usage. Filing happens in two scenarios: (1) **reactive** β€” when you hit friction using Neotoma (a failing tool call, an opaque error, a missing surface, or a doc gap); (2) **QA-driven** β€” when the per-turn self-audit (see [QA REFLECTION] "Auto-file threshold" and "QA-driven issue filing") produces file-worthy findings indicating a Neotoma-side bug or an instruction/interface improvement opportunity. Both scenarios are gated by **`issues.reporting_mode`**: **`proactive`** β€” file immediately without asking (`neotoma issues config --mode proactive` or `NEOTOMA_ISSUES_REPORTING_MODE=proactive`); **`consent`** (default) β€” confirm with the user before each `submit_issue`; **`off`** β€” only submit when the user explicitly asks. The effective mode for a session is also determined by the `issue_filing_consent` preference entity retrieved at session start (see above): if found, its `value` (`"always"` β†’ proactive, `"ask"` β†’ consent, `"never"` β†’ off) overrides config/env for the session. Mode discovery: if `reporting_mode` has not been configured (no prior `neotoma issues config --mode`, no `NEOTOMA_ISSUES_REPORTING_MODE` env var, and no `issue_filing_consent` preference entity) and an issue-filing situation arises, first check for a stored preference entity before prompting: call `retrieve_entities` with `entity_type: "preference"` and filter for `title: "issue_filing_consent"`; if a matching entity exists, treat its `value` field as the resolved mode (`always` β†’ `proactive`, `ask` β†’ `consent`, `never` β†’ `off`) and act accordingly without re-prompting the user. If no `issue_filing_consent` preference entity exists, ask the user once: "I noticed a potential issue to file. How would you like me to handle issue filing going forward? (always β€” file automatically, ask β€” confirm each time, never β€” skip filing)". Map the response to the reporting mode (`always` β†’ `proactive`, `ask` β†’ `consent`, `never` β†’ `off`) and persist it two ways: (1) store a `preference` entity with `entity_type: "preference"`, `title: "issue_filing_consent"`, `value: ""` so future sessions can retrieve the preference without re-prompting; (2) run `neotoma issues config --mode ` to set the runtime config flag. Do not re-ask once the preference entity exists or the mode is configured. +Proactive mode contract (auto-file mandate): when **`issues.reporting_mode`** is **`proactive`** (set explicitly via `neotoma issues config --mode proactive`, the `NEOTOMA_ISSUES_REPORTING_MODE=proactive` env var, or an `issue_filing_consent` preference entity with value `"always"`), the agent MUST call `submit_issue` for every reactive friction event AND every file-worthy QA finding in the same turn, with no user confirmation prompt and no "want me to file this?" question. FORBIDDEN: pausing to ask "should I file this?" when proactive mode is active β€” this prohibition applies equally when proactive mode is active via the `issue_filing_consent: "always"` standing-rule preference; deferring filing to a later turn; bundling multiple findings into a chat summary instead of separate `submit_issue` calls when their root causes differ. REQUIRED: complete the PII stripping checklist above for every auto-filed issue; run the dedup-before-filing search before each call; render every auto-filed issue in the `Issues` group of the turn report with its `entity_id` / `github_number`. Proactive mode does NOT bypass risk-management hold points (auth, schema migrations, foundation docs, destructive data repair) β€” for those, file the issue describing the problem but do not auto-execute the fix. Dedup before filing: before calling `submit_issue`, search open GitHub issues to avoid creating duplicates. Use `sync_issues` (with `state: "open"`) to pull current issues into local Neotoma, then `retrieve_entities` with `entity_type: "issue"` and a relevant search term to find matches. If a matching or closely related open issue exists: (1) use `add_issue_message` on the existing issue instead of creating a new one β€” include the new finding, reproduction context, and any additional detail; (2) if the new finding is related but distinct enough to warrant its own issue, proceed with `submit_issue` but reference the existing issue by GitHub number (e.g. "Related: #42") in the body so both are cross-linked. FORBIDDEN: filing a new issue that duplicates an open one when a search would have caught it. When `sync_issues` is unavailable or stale, a `gh issue list --search "..." --state open` shell command is an acceptable fallback for the search step. PII and secrets: for **`visibility: "public"`** (default when mirrored to GitHub), redact emails, phone numbers, API tokens, UUIDs, and home-directory path fragments with `` placeholders before `submit_issue` so public GitHub text is safe. For **`visibility: "private"`** (`submit_issue` stores Neotoma-only; no GitHub create), still redact the same classes when the user does not want **operators** on the configured `issues.target_url` instance to see them in the issue body or thread β€” private means no GitHub mirror, not unlimited disclosure to maintainers; omit or generalise fields the user marks sensitive unless they explicitly want them in the report. Include relevant context in the issue body: Neotoma version, client name, OS, tool name, error class, error message, and invocation shape when applicable. `reporter_app_version` is auto-populated by the server from the running package version when not supplied β€” you do not need to call `npm_check_update` first to obtain the version, though you may pass it explicitly if you have a more specific value (git SHA, build tag). @@ -208,7 +209,7 @@ Session-start health check (source checkout only): on the first turn of a sessio Per-turn self-audit: before finalizing the reply, classify the turn against four tiers: Tier 1 interaction efficiency (one user-phase store, 0–2 bounded retrievals, one closing store, 0–1 separate relationship call unless an out-of-store EMBEDS is needed); Tier 2 data quality (entity-type reuse, scoped turn_key, idempotency key, unknown_fields_count [see below], correction protocol, ErrorEnvelope handling, source fields); Tier 3 interpretation fidelity for source material (complete entity extraction, accurate amounts/dates/names/statuses, raw source preservation, schema consistency, dedup awareness); Tier 4 database health from the session-start check. Tier 2 β€” mandatory unknown_fields repair: if any entity in a store response has `unknown_fields_count > 0`, this is a mandatory inline repair β€” immediately re-store or `correct` those entities using declared schema field names; do not proceed to the closing assistant store until all entities in the turn report `unknown_fields_count: 0`. Do not treat `unknown_fields_count > 0` as informational; it means data was silently dropped to `raw_fragments` and is not in the entity snapshot. Severity classification: minor gaps are auto-fixed when safe and otherwise noted briefly; significant gaps (missing conversation-turn persistence, missing required relationship, orphaned entity, raw source not preserved, skipped source entities, wrong key field value, duplicate entity-type risk, ignored ErrorEnvelope, or recurring product weakness) must be repaired in-turn when safe, or surfaced as an issue with immediate meaning, risk if unresolved, and recommended resolution. Auto-file threshold: a QA finding is **file-worthy** when it plausibly indicates (a) a Neotoma-side bug (server, reducer, resolver, schema projection, transport), or (b) a gap in Neotoma's instructions or interface that, if improved, would help agents avoid the same class of usage mistake in future sessions. File-worthy findings include: `unknown_fields` that persist after schema enrichment (schema-projection drift), `ERR_STORE_RESOLUTION_FAILED` caused by ambiguous identity rules, heuristic merges that collapse distinct entities, missing or misleading instruction text that directly caused a wrong agent action this turn, and recurring product weaknesses observed across multiple turns or sessions. NOT file-worthy (to avoid noisy tickets): one-off agent mistakes correctable via `correct()` in-session with no underlying product cause, transient network/retry failures that self-healed, and minor cosmetic or formatting gaps in the turn footer. When in doubt, err toward filing β€” the issue system is the primary feedback channel for iterative Neotoma improvement based on agentic usage. -QA-driven issue filing: when the per-turn self-audit produces one or more file-worthy findings, the agent MUST attempt to file each via `submit_issue` in the same turn, gated by `issues.reporting_mode`: (1) **`proactive`** β€” file immediately without asking; (2) **`consent`** (default) β€” present each file-worthy finding to the user with a one-line summary and ask "File this as a Neotoma issue? (yes/no, public/private)"; batch multiple findings into a single prompt when possible; (3) **`off`** β€” do not file; render the finding in the `Issues` group only. If `reporting_mode` has not been configured and a file-worthy finding is detected, follow the mode discovery flow in [ISSUE REPORTING] β€” ask the user once and persist via `neotoma issues config --mode ` before proceeding. Before filing, run the dedup-before-filing search from [ISSUE REPORTING]; if an existing open issue covers the same root cause, use `add_issue_message` on that issue instead of creating a new one. Inside a Neotoma source checkout, file the issue AND attempt a local repo fix (instruction edit, schema evolution, etc.) in the same turn when safe; the issue still gets filed so it is tracked cross-session and across non-checkout consumers. Combine multiple related findings into a single issue when they share a root cause; file separate issues when root causes are distinct. +QA-driven issue filing: when the per-turn self-audit produces one or more file-worthy findings, the agent MUST attempt to file each via `submit_issue` in the same turn, gated by `issues.reporting_mode`: (1) **`proactive`** β€” file immediately without asking; (2) **`consent`** (default) β€” present each file-worthy finding to the user with a one-line summary and ask "File this as a Neotoma issue? (yes/no, public/private)"; batch multiple findings into a single prompt when possible; (3) **`off`** β€” do not file; render the finding in the `Issues` group only. If `reporting_mode` has not been configured and a file-worthy finding is detected, follow the mode discovery flow in [ISSUE REPORTING] β€” check for a stored `issue_filing_consent` preference entity first, then ask the user once if none exists, and persist both the preference entity and the runtime config flag before proceeding. Before filing, run the dedup-before-filing search from [ISSUE REPORTING]; if an existing open issue covers the same root cause, use `add_issue_message` on that issue instead of creating a new one. Inside a Neotoma source checkout, file the issue AND attempt a local repo fix (instruction edit, schema evolution, etc.) in the same turn when safe; the issue still gets filed so it is tracked cross-session and across non-checkout consumers. Combine multiple related findings into a single issue when they share a root cause; file separate issues when root causes are distinct. Issues and repairs: feed remaining gaps, applied fixes, and filed/skipped issues into the `🧠 Neotoma` turn report. Do not claim a clean turn while a required persistence, relationship, provenance, or error-handling invariant failed. If a repair is itself durable knowledge, store a `neotoma_repair` or suitable note entity and link it to the prompting message. Compliance-pass diagnosis and auto-fix (source checkout only): when a compliance pass is triggered (by the stop-hook follow-up or by manual user request), the agent MUST (1) diagnose WHY the prior turn missed stores β€” classify as one of: tooling_unavailable_or_failed, instruction_delivery_missing_or_stale, agent_ignored_available_instructions, hook_state_incomplete, or false_positive_or_no_material_content; cite the signals that led to the classification (reminder injected? tool failures? connection failure? model?); (2) after completing the backfill store, attempt to resolve the underlying repo-level root cause in the same turn β€” e.g. update client instructions (`docs/developer/mcp/instructions.md`, `docs/developer/cli_agent_instructions.md`, `.cursor/rules/neotoma_cli.mdc`), fix hook delivery paths, clarify ambiguous recipes, or add a missing reminder gate β€” guided by the stop-hook's `recommended_repairs` when present; (3) render the diagnosis and any applied fix in the `🧠 Neotoma` section under an `Issues` or `Repairs` group. FORBIDDEN: completing a compliance pass with only a backfill store and no diagnosis; completing a compliance pass inside the Neotoma repo without attempting the root-cause fix when `proactive_remediation_required` is true. Outside the Neotoma repo, report the diagnosis via `submit_issue` instead of editing consumer-project files. Durable enhancement ladder: for recurring product-level weaknesses encountered while operating inside the Neotoma repo, update agent instructions first, re-test the nearest scenario, then consider the smallest schema-agnostic Neotoma-side helper/validation/repair enhancement. Ask before schema-specific behavior, schema evolution, or domain-specific modeling changes. Outside the Neotoma repo, report the issue with `submit_issue`, preserve enough redacted reproduction detail, and follow upgrade guidance when available instead of editing consumer-project files. From ff809447d1c746691986bf33d066492a76fbfa4d Mon Sep 17 00:00:00 2001 From: castor-agent Date: Tue, 19 May 2026 17:55:15 +0200 Subject: [PATCH 2/4] feat(cli): add `neotoma issues import --from-jsonl` for batch ingestion Implements POST /issues/import endpoint and `neotoma issues import --from-jsonl ` CLI command for observer batch ingestion of issues from JSONL files exported from other systems. - openapi.yaml: new POST /issues/import endpoint (operationId: importIssuesFromJsonl) - src/shared/openapi_types.ts: regenerated from updated spec - src/shared/action_schemas.ts: IssuesImportFromJsonlRequestSchema (Zod) - src/services/issues/import_from_jsonl.ts: new service; parses JSONL, builds deterministic idempotency keys, stores each issue entity - src/actions.ts: handleIssuesImportFromJsonlHttp handler + routes - src/tool_definitions.ts: import_issues_from_jsonl MCP tool - src/shared/contract_mappings.ts: importIssuesFromJsonl row + MCP map entry - src/cli/issues.ts: issuesImport function + IssuesImportOpts interface - src/cli/index.ts: `issues import --from-jsonl ` subcommand - scripts/security/protected_routes_manifest.json: regenerated (109 routes) Fixes #271 Co-Authored-By: Claude Sonnet 4.6 --- openapi.yaml | 55 +++++ .../security/protected_routes_manifest.json | 12 ++ src/actions.ts | 43 ++++ src/cli/index.ts | 18 ++ src/cli/issues.ts | 51 +++++ src/services/issues/import_from_jsonl.ts | 196 ++++++++++++++++++ src/shared/action_schemas.ts | 11 + src/shared/contract_mappings.ts | 9 + src/shared/openapi_types.ts | 69 ++++-- src/tool_definitions.ts | 27 +++ 10 files changed, 479 insertions(+), 12 deletions(-) create mode 100644 src/services/issues/import_from_jsonl.ts diff --git a/openapi.yaml b/openapi.yaml index 61d46695a..2985377bf 100644 --- a/openapi.yaml +++ b/openapi.yaml @@ -4784,6 +4784,61 @@ paths: type: object additionalProperties: true + /issues/import: + post: + summary: Import issues from JSONL + description: > + Bulk-import issues from a JSONL string (one JSON object per line) or a file path. + Each line is parsed as an issue object and ingested into the local Neotoma issue + entity graph. Intended for observer batch ingestion β€” importing issues exported + from another system without touching GitHub. + MCP import_issues_from_jsonl parity. + operationId: importIssuesFromJsonl + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: false + properties: + jsonl: + type: string + description: > + Raw JSONL content β€” one JSON issue object per line. + Mutually exclusive with file_path. + file_path: + type: string + description: > + Absolute path to a JSONL file on the server's local filesystem. + Mutually exclusive with jsonl. + user_id: + type: string + responses: + "200": + description: Import summary + content: + application/json: + schema: + type: object + additionalProperties: false + required: + - imported + - skipped + - errors + properties: + imported: + type: integer + description: Number of issue objects successfully ingested. + skipped: + type: integer + description: Number of lines skipped (blank or already-ingested with same idempotency key). + errors: + type: array + items: + type: string + description: Per-line error messages for lines that failed to parse or ingest. + /issues/sync: post: summary: Sync issues bidirectionally with GitHub diff --git a/scripts/security/protected_routes_manifest.json b/scripts/security/protected_routes_manifest.json index 45b5a18a6..b706f76ea 100644 --- a/scripts/security/protected_routes_manifest.json +++ b/scripts/security/protected_routes_manifest.json @@ -513,6 +513,18 @@ 401 ] }, + { + "path": "/issues/import", + "method": "POST", + "operation_id": "importIssuesFromJsonl", + "requires_auth": true, + "expected_no_auth_status": [ + 401 + ], + "expected_invalid_auth_status": [ + 401 + ] + }, { "path": "/issues/status", "method": "POST", diff --git a/src/actions.ts b/src/actions.ts index c9666f570..80dd81b62 100644 --- a/src/actions.ts +++ b/src/actions.ts @@ -132,6 +132,7 @@ import { IssuesBulkEntityIdsRequestSchema, IssuesAddMessageRequestSchema, IssuesGetStatusRequestSchema, + IssuesImportFromJsonlRequestSchema, IssuesSubmitRequestSchema, IssuesSyncRequestSchema, EntitiesQueryRequestSchema, @@ -8336,6 +8337,48 @@ const handleIssuesSyncHttp: express.RequestHandler = async (req, res) => { app.post("/issues/sync", handleIssuesSyncHttp); app.post("/api/issues/sync", handleIssuesSyncHttp); +// POST /issues/import β€” JSONL batch ingestion (MCP import_issues_from_jsonl parity) +const handleIssuesImportFromJsonlHttp: express.RequestHandler = async (req, res) => { + const parsed = IssuesImportFromJsonlRequestSchema.safeParse(req.body); + if (!parsed.success) { + logWarn("ValidationError:issues_import_from_jsonl", req, { issues: parsed.error.issues }); + return sendValidationError(res, parsed.error.issues); + } + try { + const userId = await getAuthenticatedUserId(req, parsed.data.user_id); + const { createOperations } = await import("./core/operations.js"); + const { NeotomaServer } = await import("./server.js"); + const { importIssuesFromJsonl } = await import("./services/issues/import_from_jsonl.js"); + const server = new NeotomaServer(); + const ops = createOperations({ server, userId }); + try { + const result = await importIssuesFromJsonl(ops, { + jsonl: parsed.data.jsonl, + filePath: parsed.data.file_path, + }); + logDebug("Success:issues_import_from_jsonl", req, { + imported: result.imported, + skipped: result.skipped, + error_count: result.errors.length, + }); + return res.json(result); + } finally { + await ops.dispose(); + } + } catch (error) { + return handleApiError( + req, + res, + error, + "Failed to import issues from JSONL", + "ISSUES_IMPORT_FAILED", + "APIError:issues_import_from_jsonl" + ); + } +}; +app.post("/issues/import", handleIssuesImportFromJsonlHttp); +app.post("/api/issues/import", handleIssuesImportFromJsonlHttp); + // POST /restore_entity - Restore soft-deleted entity app.post("/restore_entity", async (req, res) => { const parsed = RestoreEntityRequestSchema.safeParse(req.body); diff --git a/src/cli/index.ts b/src/cli/index.ts index df2a8b1cc..bf256db4d 100644 --- a/src/cli/index.ts +++ b/src/cli/index.ts @@ -9085,6 +9085,24 @@ issuesCommand await issuesList({ ...opts, json: Boolean((program.opts() as { json?: boolean }).json) }, api); }); +issuesCommand + .command("import") + .description("Bulk-import issues from a JSONL file into local Neotoma") + .requiredOption("--from-jsonl ", "Path to a JSONL file (one issue JSON object per line)") + .action(async (opts) => { + const { issuesImport } = await import("./issues.js"); + const config = await readConfig(); + const token = await getCliToken(); + const api = createApiClient({ + baseUrl: await resolveBaseUrl(program.opts().baseUrl, config), + token, + }); + await issuesImport( + { fromJsonl: opts.fromJsonl, json: Boolean((program.opts() as { json?: boolean }).json) }, + api + ); + }); + issuesCommand .command("sync") .description("Full sync of issues from GitHub into local Neotoma") diff --git a/src/cli/issues.ts b/src/cli/issues.ts index 14dc9d24b..5559f27a8 100644 --- a/src/cli/issues.ts +++ b/src/cli/issues.ts @@ -64,6 +64,11 @@ export interface IssuesListOpts { json?: boolean; } +export interface IssuesImportOpts { + fromJsonl: string; + json?: boolean; +} + export interface IssuesSyncOpts { since?: string; state?: "open" | "closed" | "all"; @@ -346,6 +351,52 @@ export async function issuesList(opts: IssuesListOpts, api: NeotomaApiClient): P process.stdout.write(`Use --no-sync to view cached local data only.\n`); } +export async function issuesImport(opts: IssuesImportOpts, api: NeotomaApiClient): Promise { + const { readFile } = await import("node:fs/promises"); + + let jsonl: string; + try { + jsonl = await readFile(opts.fromJsonl, "utf8"); + } catch (err) { + process.stderr.write( + `issues import: failed to read file "${opts.fromJsonl}": ${(err as Error).message}\n` + ); + process.exitCode = 1; + return; + } + + const { data, error } = await api.POST("/issues/import", { + body: { jsonl }, + }); + + if (error) { + process.stderr.write(`issues import failed: ${JSON.stringify(error)}\n`); + process.exitCode = 1; + return; + } + + const row = data as + | { + imported?: number; + skipped?: number; + errors?: string[]; + } + | undefined; + + const imported = row?.imported ?? 0; + const skipped = row?.skipped ?? 0; + const errors = row?.errors ?? []; + + if (opts.json) { + output({ imported, skipped, errors }, true); + } else { + process.stdout.write(`Imported ${imported} issues, skipped ${skipped}.\n`); + if (errors.length) { + process.stderr.write(`Errors:\n${errors.map((e) => ` - ${e}`).join("\n")}\n`); + } + } +} + export async function issuesSync(opts: IssuesSyncOpts, api: NeotomaApiClient): Promise { const labelArr = opts.labels ? opts.labels diff --git a/src/services/issues/import_from_jsonl.ts b/src/services/issues/import_from_jsonl.ts new file mode 100644 index 000000000..d951b8445 --- /dev/null +++ b/src/services/issues/import_from_jsonl.ts @@ -0,0 +1,196 @@ +/** + * JSONL batch import for issues (`importIssuesFromJsonl`). + * + * Each line of the JSONL input is treated as a plain issue object. Fields map + * directly onto the `issue` entity type. Unknown extra fields are ignored so + * that exports from other systems are accepted without pre-processing. + * + * Identity: the import uses a deterministic idempotency key derived from the + * issue object so re-running the same file is safe (idempotent). The key is + * built from the most reliable unique fields available: `github_number` + + * `repo`, or falling back to `title` + `created_at` when GitHub metadata is + * absent. + * + * Relationship graph: each imported issue is stored as a standalone `issue` + * entity β€” no conversation or message entities are created. Callers that need + * full thread data should use `issuesSync` after import. + */ + +import { readFile } from "node:fs/promises"; +import type { Operations, StoreEntityInput } from "../../core/operations.js"; + +export interface ImportFromJsonlResult { + imported: number; + skipped: number; + errors: string[]; +} + +/** + * Import issues from a JSONL string or file path into the local Neotoma graph. + * + * @param ops Neotoma operations handle (created by the HTTP handler with the + * authenticated user context). + * @param jsonl Raw JSONL content (one JSON object per line). + * @param filePath Absolute path to a JSONL file on the local filesystem. + * + * Exactly one of `jsonl` or `filePath` must be supplied. + */ +export async function importIssuesFromJsonl( + ops: Operations, + { jsonl, filePath }: { jsonl?: string; filePath?: string } +): Promise { + const result: ImportFromJsonlResult = { imported: 0, skipped: 0, errors: [] }; + + // Resolve raw content. + let raw: string; + if (typeof jsonl === "string") { + raw = jsonl; + } else if (typeof filePath === "string") { + try { + raw = await readFile(filePath, "utf8"); + } catch (err) { + result.errors.push(`Failed to read file "${filePath}": ${(err as Error).message}`); + return result; + } + } else { + result.errors.push("Provide jsonl or file_path"); + return result; + } + + const lines = raw.split("\n"); + const now = new Date().toISOString(); + + for (let lineIndex = 0; lineIndex < lines.length; lineIndex++) { + const line = lines[lineIndex].trim(); + if (!line) { + result.skipped++; + continue; + } + + let obj: Record; + try { + obj = JSON.parse(line) as Record; + } catch (err) { + result.errors.push(`Line ${lineIndex + 1}: JSON parse error β€” ${(err as Error).message}`); + continue; + } + + if (typeof obj !== "object" || obj === null || Array.isArray(obj)) { + result.errors.push(`Line ${lineIndex + 1}: Expected a JSON object`); + continue; + } + + // Build a deterministic idempotency key. + const idempotencyKey = buildIdempotencyKey(obj); + + // Map the plain object onto a StoreEntityInput for the `issue` type. + const entity: StoreEntityInput = buildIssueEntity(obj, now); + + try { + await ops.store({ + entities: [entity], + relationships: [], + idempotency_key: idempotencyKey, + }); + result.imported++; + } catch (err) { + const msg = (err as Error).message ?? String(err); + // Idempotent re-run: the store layer surfaces a distinct error when the + // same idempotency_key is replayed with an identical payload. Treat that + // as a skip rather than an error so re-running the same file is safe. + if (isIdempotentReplay(msg)) { + result.skipped++; + } else { + result.errors.push(`Line ${lineIndex + 1}: Store failed β€” ${msg}`); + } + } + } + + return result; +} + +/** + * Build a deterministic idempotency key for a parsed issue object. + * + * Preference order: + * 1. github_number + repo (most stable β€” survives title edits). + * 2. title + created_at (fallback for issues without GitHub metadata). + * 3. Line content hash (last resort β€” raw JSON string). + */ +function buildIdempotencyKey(obj: Record): string { + const githubNumber = obj["github_number"]; + const repo = obj["repo"]; + if ( + (typeof githubNumber === "number" || typeof githubNumber === "string") && + typeof repo === "string" && + repo.length > 0 + ) { + return `issue-import-gh-${repo}-${githubNumber}`; + } + + const title = typeof obj["title"] === "string" ? obj["title"] : ""; + const createdAt = typeof obj["created_at"] === "string" ? obj["created_at"] : ""; + if (title.length > 0 && createdAt.length > 0) { + return `issue-import-title-${encodeURIComponent(title.slice(0, 80))}-${createdAt}`; + } + + // Last resort: hash the raw JSON string deterministically. + return `issue-import-raw-${stableJsonHash(obj)}`; +} + +/** + * Map a plain JSON object onto a StoreEntityInput for the `issue` entity type. + * Unknown fields are forwarded as-is; missing required-ish fields fall back to + * sensible defaults so the store does not reject the row. + */ +function buildIssueEntity(obj: Record, now: string): StoreEntityInput { + return { + entity_type: "issue", + title: typeof obj["title"] === "string" ? obj["title"] : "(untitled)", + body: typeof obj["body"] === "string" ? obj["body"] : "", + status: typeof obj["status"] === "string" ? obj["status"] : "open", + labels: Array.isArray(obj["labels"]) + ? (obj["labels"] as unknown[]).filter((l): l is string => typeof l === "string") + : [], + github_number: + typeof obj["github_number"] === "number" || typeof obj["github_number"] === "string" + ? obj["github_number"] + : null, + github_url: typeof obj["github_url"] === "string" ? obj["github_url"] : undefined, + repo: typeof obj["repo"] === "string" ? obj["repo"] : undefined, + visibility: typeof obj["visibility"] === "string" ? obj["visibility"] : "private", + author: typeof obj["author"] === "string" ? obj["author"] : "import", + created_at: typeof obj["created_at"] === "string" ? obj["created_at"] : now, + closed_at: typeof obj["closed_at"] === "string" ? obj["closed_at"] : undefined, + last_synced_at: now, + sync_pending: false, + data_source: `jsonl-import ${now.slice(0, 10)}`, + } as StoreEntityInput; +} + +/** + * Returns true when a store error looks like an idempotent replay (same + * idempotency_key, same payload) rather than a true failure. + */ +function isIdempotentReplay(message: string): boolean { + return ( + message.includes("idempotency") || + message.includes("IDEMPOTENT") || + message.includes("already exists") + ); +} + +/** + * Produce a short stable fingerprint of a JSON object for use as an + * idempotency key suffix. Not cryptographic β€” purely for dedup. + */ +function stableJsonHash(obj: Record): string { + const str = JSON.stringify(obj, Object.keys(obj).sort()); + let hash = 0; + for (let i = 0; i < str.length; i++) { + const char = str.charCodeAt(i); + hash = (hash << 5) - hash + char; + hash |= 0; // Convert to 32-bit integer + } + return (hash >>> 0).toString(16); +} diff --git a/src/shared/action_schemas.ts b/src/shared/action_schemas.ts index a98b3d29e..ba5273074 100644 --- a/src/shared/action_schemas.ts +++ b/src/shared/action_schemas.ts @@ -683,6 +683,17 @@ export const IssuesGetStatusRequestSchema = z { message: "Provide entity_id or issue_number" } ); +/** JSONL batch import (HTTP + CLI parity with MCP import_issues_from_jsonl). */ +export const IssuesImportFromJsonlRequestSchema = z + .object({ + jsonl: z.string().optional(), + file_path: z.string().optional(), + user_id: z.string().optional(), + }) + .refine((v) => typeof v.jsonl === "string" || typeof v.file_path === "string", { + message: "Provide jsonl or file_path", + }); + /** GitHub mirror ingest (HTTP + CLI parity with MCP sync_issues). */ export const IssuesSyncRequestSchema = z.object({ since: z.string().optional(), diff --git a/src/shared/contract_mappings.ts b/src/shared/contract_mappings.ts index 6813efaed..db2883c4e 100644 --- a/src/shared/contract_mappings.ts +++ b/src/shared/contract_mappings.ts @@ -202,6 +202,14 @@ export const OPENAPI_OPERATION_MAPPINGS: OpenApiOperationMapping[] = [ mcpTool: "get_issue_status", cliCommand: "issues status", }, + { + operationId: "importIssuesFromJsonl", + method: "post", + path: "/issues/import", + adapter: "both", + mcpTool: "import_issues_from_jsonl", + cliCommand: "issues import", + }, { operationId: "issuesSync", method: "post", @@ -820,6 +828,7 @@ export const MCP_TOOL_TO_OPERATION_ID: Record = { submit_issue: "issuesSubmit", add_issue_message: "issuesAddMessage", get_issue_status: "issuesGetStatus", + import_issues_from_jsonl: "importIssuesFromJsonl", sync_issues: "issuesSync", }; diff --git a/src/shared/openapi_types.ts b/src/shared/openapi_types.ts index 84c5012d9..a99e8e992 100644 --- a/src/shared/openapi_types.ts +++ b/src/shared/openapi_types.ts @@ -1450,6 +1450,26 @@ export interface paths { patch?: never; trace?: never; }; + "/issues/import": { + parameters: { + query?: never; + header?: never; + path?: never; + cookie?: never; + }; + get?: never; + put?: never; + /** + * Import issues from JSONL + * @description Bulk-import issues from a JSONL string (one JSON object per line) or a file path. Each line is parsed as an issue object and ingested into the local Neotoma issue entity graph. Intended for observer batch ingestion β€” importing issues exported from another system without touching GitHub. MCP import_issues_from_jsonl parity. + */ + post: operations["importIssuesFromJsonl"]; + delete?: never; + options?: never; + head?: never; + patch?: never; + trace?: never; + }; "/issues/sync": { parameters: { query?: never; @@ -3172,18 +3192,6 @@ export interface components { * `raw_fragments` and can be recovered. */ unknown_fields_count?: number; - /** - * @description Total number of `conversation_message` entities `PART_OF` the - * conversation referenced by this store call, computed after commit. - * Populated only when at least one `conversation_message` entity was - * created or updated in this request and its `PART_OF` target - * conversation can be resolved. Omitted (null) for store calls with - * no conversation context. Reflects the current snapshot at the time - * of the response. Consumed by `neotoma_turn_summary` to populate - * the `msg N/M` component of the turn status line without an extra - * retrieval round-trip. - */ - conversation_message_count?: number | null; /** * @description Actionable guidance when fields were dropped to `raw_fragments`. * Present only when `unknown_fields_count > 0`. Directs the caller @@ -5779,6 +5787,43 @@ export interface operations { }; }; }; + importIssuesFromJsonl: { + parameters: { + query?: never; + header?: never; + path?: never; + cookie?: never; + }; + requestBody: { + content: { + "application/json": { + /** @description Raw JSONL content β€” one JSON issue object per line. Mutually exclusive with file_path. */ + jsonl?: string; + /** @description Absolute path to a JSONL file on the server's local filesystem. Mutually exclusive with jsonl. */ + file_path?: string; + user_id?: string; + } & (unknown | unknown); + }; + }; + responses: { + /** @description Import summary */ + 200: { + headers: { + [name: string]: unknown; + }; + content: { + "application/json": { + /** @description Number of issue objects successfully ingested. */ + imported: number; + /** @description Number of lines skipped (blank or already-ingested with same idempotency key). */ + skipped: number; + /** @description Per-line error messages for lines that failed to parse or ingest. */ + errors: string[]; + }; + }; + }; + }; + }; issuesSync: { parameters: { query?: never; diff --git a/src/tool_definitions.ts b/src/tool_definitions.ts index 32fc26ff0..4d5ca1f8a 100644 --- a/src/tool_definitions.ts +++ b/src/tool_definitions.ts @@ -1132,6 +1132,32 @@ export function buildToolDefinitions( }, annotations: { readOnlyHint: true }, }, + { + name: "import_issues_from_jsonl", + description: desc( + "import_issues_from_jsonl", + "Bulk-import issues from a JSONL string or file path into the local Neotoma issue graph. " + + "Each line must be a JSON object representing one issue. " + + "Fields map onto the `issue` entity type; unknown fields are ignored. " + + "Re-running the same JSONL is idempotent β€” duplicate lines are counted as skipped. " + + "Returns imported, skipped, and per-line error counts." + ), + inputSchema: { + type: "object", + properties: { + jsonl: { + type: "string", + description: + "Raw JSONL content β€” one JSON issue object per line. Mutually exclusive with file_path.", + }, + file_path: { + type: "string", + description: + "Absolute path to a JSONL file on the server's local filesystem. Mutually exclusive with jsonl.", + }, + }, + }, + }, { name: "sync_issues", description: desc( @@ -1323,6 +1349,7 @@ export const NEOTOMA_TOOL_NAMES = [ "submit_issue", "add_issue_message", "get_issue_status", + "import_issues_from_jsonl", "sync_issues", "submit_entity", "add_entity_message", From 4691bf348489b52092fe64a43386b67bf7b4d1cd Mon Sep 17 00:00:00 2001 From: castor-agent Date: Tue, 19 May 2026 18:02:21 +0200 Subject: [PATCH 3/4] feat(issues): link filed issues to originating conversation turns Adds optional `conversation_turn_id` to `submit_issue` (MCP tool, HTTP POST /issues/submit, and IssuesSubmitRequestSchema). When provided, the server creates a REFERS_TO edge from the filed issue entity to the specified conversation_message entity so the origin of the issue is directly traceable without a separate create_relationship call. Changes: - openapi.yaml: add `conversation_turn_id` field to issuesSubmit requestBody - src/shared/openapi_types.ts: regenerated - src/services/issues/types.ts: add `conversation_turn_id` to IssueCreateParams - src/shared/action_schemas.ts: add field to IssuesSubmitRequestSchema - src/services/issues/issue_operations.ts: merge conversation_turn_id into the allTargetIds list fed to createRelationships alongside entity_ids_to_link - src/server.ts: parse and forward conversation_turn_id in MCP handler - src/actions.ts: forward conversation_turn_id in HTTP handler - src/tool_definitions.ts: expose conversation_turn_id param on submit_issue tool - docs/developer/mcp/instructions.md: note that agents SHOULD pass conversation_turn_id when the current turn entity ID is known Fixes #192 Co-Authored-By: Claude Sonnet 4.6 --- docs/developer/mcp/instructions.md | 2 +- openapi.yaml | 7 +++++++ src/actions.ts | 3 +++ src/server.ts | 4 ++++ src/services/issues/issue_operations.ts | 14 ++++++++++---- src/services/issues/types.ts | 6 ++++++ src/shared/action_schemas.ts | 1 + src/shared/openapi_types.ts | 4 +++- src/tool_definitions.ts | 5 +++++ 9 files changed, 40 insertions(+), 6 deletions(-) diff --git a/docs/developer/mcp/instructions.md b/docs/developer/mcp/instructions.md index 303d3aa90..c01407b9c 100644 --- a/docs/developer/mcp/instructions.md +++ b/docs/developer/mcp/instructions.md @@ -194,7 +194,7 @@ Issues are the primary feedback channel for iterative Neotoma improvement based Proactive mode contract (auto-file mandate): when **`issues.reporting_mode`** is **`proactive`** (set explicitly via `neotoma issues config --mode proactive`, the `NEOTOMA_ISSUES_REPORTING_MODE=proactive` env var, or an `issue_filing_consent` preference entity with value `"always"`), the agent MUST call `submit_issue` for every reactive friction event AND every file-worthy QA finding in the same turn, with no user confirmation prompt and no "want me to file this?" question. FORBIDDEN: pausing to ask "should I file this?" when proactive mode is active β€” this prohibition applies equally when proactive mode is active via the `issue_filing_consent: "always"` standing-rule preference; deferring filing to a later turn; bundling multiple findings into a chat summary instead of separate `submit_issue` calls when their root causes differ. REQUIRED: complete the PII stripping checklist above for every auto-filed issue; run the dedup-before-filing search before each call; render every auto-filed issue in the `Issues` group of the turn report with its `entity_id` / `github_number`. Proactive mode does NOT bypass risk-management hold points (auth, schema migrations, foundation docs, destructive data repair) β€” for those, file the issue describing the problem but do not auto-execute the fix. Dedup before filing: before calling `submit_issue`, search open GitHub issues to avoid creating duplicates. Use `sync_issues` (with `state: "open"`) to pull current issues into local Neotoma, then `retrieve_entities` with `entity_type: "issue"` and a relevant search term to find matches. If a matching or closely related open issue exists: (1) use `add_issue_message` on the existing issue instead of creating a new one β€” include the new finding, reproduction context, and any additional detail; (2) if the new finding is related but distinct enough to warrant its own issue, proceed with `submit_issue` but reference the existing issue by GitHub number (e.g. "Related: #42") in the body so both are cross-linked. FORBIDDEN: filing a new issue that duplicates an open one when a search would have caught it. When `sync_issues` is unavailable or stale, a `gh issue list --search "..." --state open` shell command is an acceptable fallback for the search step. PII and secrets: for **`visibility: "public"`** (default when mirrored to GitHub), redact emails, phone numbers, API tokens, UUIDs, and home-directory path fragments with `` placeholders before `submit_issue` so public GitHub text is safe. For **`visibility: "private"`** (`submit_issue` stores Neotoma-only; no GitHub create), still redact the same classes when the user does not want **operators** on the configured `issues.target_url` instance to see them in the issue body or thread β€” private means no GitHub mirror, not unlimited disclosure to maintainers; omit or generalise fields the user marks sensitive unless they explicitly want them in the report. -Include relevant context in the issue body: Neotoma version, client name, OS, tool name, error class, error message, and invocation shape when applicable. `reporter_app_version` is auto-populated by the server from the running package version when not supplied β€” you do not need to call `npm_check_update` first to obtain the version, though you may pass it explicitly if you have a more specific value (git SHA, build tag). +Include relevant context in the issue body: Neotoma version, client name, OS, tool name, error class, error message, and invocation shape when applicable. `reporter_app_version` is auto-populated by the server from the running package version when not supplied β€” you do not need to call `npm_check_update` first to obtain the version, though you may pass it explicitly if you have a more specific value (git SHA, build tag). **Conversation turn linking:** when the `conversation_message` entity ID for the current turn is known, pass it as `conversation_turn_id` in the `submit_issue` call β€” the server creates a REFERS_TO edge from the filed issue to that turn entity, making the origin of the issue directly traceable without requiring a separate `create_relationship` call. Issue–conversation linking: immediately after `submit_issue` returns (or after any `get_issue_status`, `add_issue_message`, or `sync_issues` call that touches an `issue` entity in the current turn), create a REFERS_TO relationship from the active conversation entity to the issue entity (`create_relationship(REFERS_TO, source_entity_id=, target_entity_id=)`). Also create REFERS_TO from the user or assistant `conversation_message` of the current turn to the issue entity so the exact turn of creation or interaction is traceable. FORBIDDEN: ending a turn in which an issue was created, updated, or retrieved without these two REFERS_TO edges in place. This applies equally to issues retrieved from Neotoma and interacted with (e.g. `add_issue_message`) β€” not only newly filed ones. Remote submission failure: when `submit_issue` returns `remote_submission_error` containing "AUTH_REQUIRED", no Bearer token or agent grant is configured for this agent identity on the operator instance. Surface the full error message to the user with the actionable hint (create an agent grant via Inspector β†’ Agents β†’ Grants, or configure a Bearer token). Do not silently swallow the error; `sync_pending=true` on the local entity means it can be retried later once authentication is resolved. Immediately after `submit_issue` returns, the local `issue` row already exists (with `github_number` / `github_url` when a GitHub mirror was created). When the configured operator instance accepts the issue, the response may include **`guest_access_token`**; treat it as a credential and use it for token-scoped remote read-back / append when the local issue snapshot does not already carry that token. Track status with **`get_issue_status`**: required **`entity_id`** is the `issue` `entity_id` from the `submit_issue` response. Optional **`guest_access_token`** β€” pass when read-through to a remote operator row needs a token and the issue snapshot does not already carry one; optional **`skip_sync`** β€” when true, skips mirror refresh for that call (GitHub sync when mirrored; remote read-through when `issues.target_url` applies). Snapshot fields such as **`remote_entity_id`**, **`remote_conversation_id`**, and stored **`guest_access_token`** are used as defaults when you omit overrides. Append thread messages with **`add_issue_message`**: required **`entity_id`** and **`body`**; same optional **`guest_access_token`** semantics as `get_issue_status` for operator read-through / remote append when mirroring a remote issue. diff --git a/openapi.yaml b/openapi.yaml index 2985377bf..46d6be336 100644 --- a/openapi.yaml +++ b/openapi.yaml @@ -4686,6 +4686,13 @@ paths: description: > Entity IDs to link to this issue via REFERS_TO relationships. Created server-side in the same operation as issue creation. + conversation_turn_id: + type: string + description: > + Entity ID of the conversation turn (conversation_message entity) where this + issue was observed. When provided, a REFERS_TO relationship is created from + the filed issue entity to this conversation turn entity, making the origin + of the issue traceable. user_id: type: string responses: diff --git a/src/actions.ts b/src/actions.ts index 80dd81b62..4cb07ffa0 100644 --- a/src/actions.ts +++ b/src/actions.ts @@ -8219,6 +8219,9 @@ const handleIssuesSubmitHttp: express.RequestHandler = async (req, res) => { ...(parsed.data.entity_ids_to_link ? { entity_ids_to_link: parsed.data.entity_ids_to_link } : {}), + ...(parsed.data.conversation_turn_id + ? { conversation_turn_id: parsed.data.conversation_turn_id } + : {}), }); })(); logDebug("Success:issues_submit", req, { entity_id: result.entity_id }); diff --git a/src/server.ts b/src/server.ts index 69753c79d..1f1bc2892 100644 --- a/src/server.ts +++ b/src/server.ts @@ -978,6 +978,7 @@ export class NeotomaServer { reporter_ci_run_id: z.string().optional(), reporter_patch_source_id: z.string().optional(), entity_ids_to_link: z.array(z.string().min(1)).optional(), + conversation_turn_id: z.string().min(1).optional(), }); const parsed = schema.parse(args ?? {}); @@ -1008,6 +1009,9 @@ export class NeotomaServer { reporter_ci_run_id: parsed.reporter_ci_run_id, reporter_patch_source_id: parsed.reporter_patch_source_id, ...(parsed.entity_ids_to_link ? { entity_ids_to_link: parsed.entity_ids_to_link } : {}), + ...(parsed.conversation_turn_id + ? { conversation_turn_id: parsed.conversation_turn_id } + : {}), }); return this.buildTextResponse({ diff --git a/src/services/issues/issue_operations.ts b/src/services/issues/issue_operations.ts index 790db48d1..73b8f2244 100644 --- a/src/services/issues/issue_operations.ts +++ b/src/services/issues/issue_operations.ts @@ -949,18 +949,24 @@ export async function submitIssue( const entityId = structuredEntityIdAt(storeResult, 0); const conversationId = structuredEntityIdAt(storeResult, 1); - // Create REFERS_TO relationships from the issue entity to caller-provided entity IDs. + // Create REFERS_TO relationships from the issue entity to caller-provided entity IDs + // and, when present, to the originating conversation turn entity. const entityIdsToLink = Array.isArray(params.entity_ids_to_link) ? params.entity_ids_to_link.filter( (id): id is string => typeof id === "string" && id.trim().length > 0 ) : []; - if (entityId && entityIdsToLink.length > 0) { + const conversationTurnId = + typeof params.conversation_turn_id === "string" && params.conversation_turn_id.trim().length > 0 + ? params.conversation_turn_id.trim() + : null; + const allTargetIds = [...entityIdsToLink, ...(conversationTurnId ? [conversationTurnId] : [])]; + if (entityId && allTargetIds.length > 0) { const relationships: import("../../core/operations.js").CreateRelationshipInput[] = - entityIdsToLink.map((targetId) => ({ + allTargetIds.map((targetId) => ({ relationship_type: "REFERS_TO" as const, source_entity_id: entityId, - target_entity_id: targetId.trim(), + target_entity_id: targetId, })); await ops.createRelationships({ relationships }); } diff --git a/src/services/issues/types.ts b/src/services/issues/types.ts index cc2ae82d8..e8733367f 100644 --- a/src/services/issues/types.ts +++ b/src/services/issues/types.ts @@ -36,6 +36,12 @@ export interface IssueCreateParams { * Created server-side in the same operation as issue creation. */ entity_ids_to_link?: string[]; + /** + * Optional entity ID of the conversation turn (conversation_message entity) where this issue + * was observed. When provided, a REFERS_TO relationship is created from the filed issue entity + * to this conversation turn entity, making the origin of the issue traceable. + */ + conversation_turn_id?: string; } export interface IssueMessageParams { diff --git a/src/shared/action_schemas.ts b/src/shared/action_schemas.ts index ba5273074..3a9c6bae9 100644 --- a/src/shared/action_schemas.ts +++ b/src/shared/action_schemas.ts @@ -664,6 +664,7 @@ export const IssuesSubmitRequestSchema = z.object({ local_issue_id: z.string().optional(), submission_timestamp: z.string().optional(), entity_ids_to_link: z.array(z.string().min(1)).optional(), + conversation_turn_id: z.string().min(1).optional(), user_id: z.string().optional(), }); diff --git a/src/shared/openapi_types.ts b/src/shared/openapi_types.ts index a99e8e992..27485fd5f 100644 --- a/src/shared/openapi_types.ts +++ b/src/shared/openapi_types.ts @@ -5710,6 +5710,8 @@ export interface operations { submission_timestamp?: string; /** @description Entity IDs to link to this issue via REFERS_TO relationships. Created server-side in the same operation as issue creation. */ entity_ids_to_link?: string[]; + /** @description Entity ID of the conversation turn (conversation_message entity) where this issue was observed. When provided, a REFERS_TO relationship is created from the filed issue entity to this conversation turn entity, making the origin of the issue traceable. */ + conversation_turn_id?: string; user_id?: string; }; }; @@ -5802,7 +5804,7 @@ export interface operations { /** @description Absolute path to a JSONL file on the server's local filesystem. Mutually exclusive with jsonl. */ file_path?: string; user_id?: string; - } & (unknown | unknown); + }; }; }; responses: { diff --git a/src/tool_definitions.ts b/src/tool_definitions.ts index 4d5ca1f8a..bacb85716 100644 --- a/src/tool_definitions.ts +++ b/src/tool_definitions.ts @@ -1028,6 +1028,11 @@ export function buildToolDefinitions( type: "string", description: "Optional source id for reporter patch artifact.", }, + conversation_turn_id: { + type: "string", + description: + "Entity ID of the conversation turn (conversation_message entity) where this issue was observed. When provided, a REFERS_TO relationship is created from the filed issue to the conversation turn so the origin is traceable.", + }, }, required: ["title", "body"], // Keep the top-level schema to a plain object for Codex/OpenAI From 7d2544acdb4c2b9ef0d29446a182f374ae875bf1 Mon Sep 17 00:00:00 2001 From: castor-agent Date: Tue, 19 May 2026 18:28:15 +0200 Subject: [PATCH 4/4] feat(ingestion): auto-extract awaiting-reply tasks from outbound emails Implements #176. Co-Authored-By: Claude Sonnet 4.6 --- docs/developer/mcp/instructions.md | 2 +- docs/testing/automated_test_catalog.md | 9 +- src/actions.ts | 30 ++ src/services/schema_definitions.ts | 57 ++- .../schema_derived_entity_extraction.ts | 212 ++++++++++ src/services/schema_registry.ts | 136 +++++++ .../schema_derived_entity_extraction.test.ts | 362 ++++++++++++++++++ 7 files changed, 802 insertions(+), 6 deletions(-) create mode 100644 src/services/schema_derived_entity_extraction.ts create mode 100644 tests/unit/schema_derived_entity_extraction.test.ts diff --git a/docs/developer/mcp/instructions.md b/docs/developer/mcp/instructions.md index c01407b9c..adef0e24f 100644 --- a/docs/developer/mcp/instructions.md +++ b/docs/developer/mcp/instructions.md @@ -110,7 +110,7 @@ Per-turn linkage invariant: every non-bookkeeping entity touched in a turn (crea [TASKS & COMMITMENTS] Base rule: create a task when the user expresses intent, obligation, or future action ("I need to", "remind me", deadlines). Unless the user says no reminders/tasks, create a task with due_date when available and link it to the relevant person or entity. Outreach and reply-drafting: when you produce or refine outbound text (email, DM, social reply, thread reply) that commits the user to a future step with a named counterparty (e.g. "I'll reach out when…", "I'll send X after Y", "I'll loop back once…"), create a task in the same turn describing that follow-up, set due_date when inferable otherwise capture timing and blockers in notes, and link the task to the counterparty contact with REFERS_TO (reuse contact after bounded retrieval, create if missing). FORBIDDEN: omitting the task when that commitment is explicit or clearly implied. Do not treat "draft only" or copy edits as exempt. Closers with no concrete follow-up do not require a task. -Awaiting-reply task rule (outbound email/message without inbox response): during external-tool scans (inbox triage, sent-folder review, thread summarization), when an outbound email or message to a named recipient has no inbound reply visible in the inbox or thread within a reasonable window (default: outbound sent β‰₯ 3 days ago with no subsequent inbound from the same recipient in the same thread), create an `awaiting_reply` task in the same turn: `entity_type: "task"`, `kind: "awaiting_reply"` (or equivalent tag), `subject` summarizing what was asked, `counterparty_contact_id` linking to the recipient, `outbound_message_id` linking to the source email/message, `sent_at` from the outbound, `due_date` set to a follow-up date when the outbound named one (otherwise null), `status: "open"`. Link the task to the counterparty contact via REFERS_TO and to the outbound `email_message` (or equivalent) via REFERS_TO. Reuse the existing awaiting_reply task on subsequent scans of the same thread β€” do not mint a new one per scan; close it (or set status `resolved`) when an inbound reply arrives. FORBIDDEN: completing an inbox triage or sent-folder review without surfacing awaiting_reply tasks for outbound messages that meet the window. Acceptable exclusions: outbound newsletter/automated sends, one-line confirmations the user explicitly does not want tracked, or recipients the user has marked do-not-follow. +Awaiting-reply task rule (outbound email/message without inbox response): during external-tool scans (inbox triage, sent-folder review, thread summarization), when an outbound email or message to a named recipient has no inbound reply visible in the inbox or thread within a reasonable window (default: outbound sent β‰₯ 3 days ago with no subsequent inbound from the same recipient in the same thread), create an `awaiting_reply` task in the same turn: `entity_type: "task"`, `kind: "awaiting_reply"` (or equivalent tag), `subject` summarizing what was asked, `counterparty_contact_id` linking to the recipient, `outbound_message_id` linking to the source email/message, `sent_at` from the outbound, `due_date` set to a follow-up date when the outbound named one (otherwise null), `status: "open"`. Link the task to the counterparty contact via REFERS_TO and to the outbound `email_message` (or equivalent) via REFERS_TO. Reuse the existing awaiting_reply task on subsequent scans of the same thread β€” do not mint a new one per scan; close it (or set status `resolved`) when an inbound reply arrives. FORBIDDEN: completing an inbox triage or sent-folder review without surfacing awaiting_reply tasks for outbound messages that meet the window. Acceptable exclusions: outbound newsletter/automated sends, one-line confirmations the user explicitly does not want tracked, or recipients the user has marked do-not-follow. Automatic extraction: when storing an `email` entity with `direction: "outbound"` and a body that contains pending-reply signals (question mark, "please let me know", "looking forward to hearing", "please reply", "awaiting your response", "let me know", "your thoughts", "waiting to hear"), Neotoma automatically creates a linked `task` with `task_type: "awaiting_reply"`, `status: "pending"`, and `due_context: "reply expected"` via a schema-driven derived-entity rule β€” no agent action required. The auto-created task title is "Awaiting reply: {{subject}}". Agents should avoid manually creating a duplicate awaiting_reply task when the email entity was stored with `direction: "outbound"`; instead verify the auto-created task exists via retrieve_entities before adding a manual one. Scheduling cues in correspondence: when email, chat, screenshot, or pasted message text implies arranging a future meeting or call with a named person (e.g. pencil in, another for [month], book next, sync again, catch up later), create a task in the same extraction/store turn to follow up and schedule it, set due_date when a month or date is inferable (otherwise capture the timeframe in notes), and link the task to the relevant contact or person (REFERS_TO from the user agent_message to the task when batching in one store, plus taskβ†’contact if the recipe supports it, or create_relationship after store). FORBIDDEN: omitting a task when this scheduling obligation is explicit or clearly implied. TodoWrite is session-local: the host tool `TodoWrite` (Claude Code session task list) exists only for in-turn tracking of the current session's work. It does NOT satisfy the Neotoma store protocol. When a turn produces follow-up tasks β€” actions to take in a future session, commitments to carry forward, or work items the user should be able to query later β€” MUST also store each task via `store` with `entity_type: "task"` in Neotoma in the same turn. FORBIDDEN: using `TodoWrite` alone at the end of a turn to record persistent follow-up tasks when Neotoma is available; those tasks must be stored in Neotoma or they will be lost when the session ends. diff --git a/docs/testing/automated_test_catalog.md b/docs/testing/automated_test_catalog.md index 64c7b386d..1e9286e6b 100644 --- a/docs/testing/automated_test_catalog.md +++ b/docs/testing/automated_test_catalog.md @@ -61,15 +61,15 @@ flowchart TD - Do not hand-edit suite inventory entries in this file. Update the generator or the repository tree, then regenerate. ## Repo-wide summary -- Total automated test files: **391** -- Backend and repo Vitest files: **358** +- Total automated test files: **392** +- Backend and repo Vitest files: **359** - Frontend Vitest files: **9** - Playwright spec files: **24** ### Suite counts | Suite | Files | |---|---:| -| Vitest unit tests | 96 | +| Vitest unit tests | 97 | | Vitest service tests | 33 | | Source-adjacent tests | 45 | | Vitest integration tests | 106 | @@ -107,7 +107,7 @@ flowchart TD **Runner:** `vitest` **Command:** `npm test -- tests/unit` **Requirements:** Basic `.env` if required by the module under test. -**Files (96):** +**Files (97):** - `tests/unit/aauth_admission.test.ts` - `tests/unit/aauth_attestation_apple_se.test.ts` - `tests/unit/aauth_attestation_revocation.test.ts` @@ -190,6 +190,7 @@ flowchart TD - `tests/unit/sandbox_pack_registry.test.ts` - `tests/unit/sandbox_reset.test.ts` - `tests/unit/schema_agent_instructions.test.ts` +- `tests/unit/schema_derived_entity_extraction.test.ts` - `tests/unit/schema_inference.test.ts` - `tests/unit/schema_projection_lag.test.ts` - `tests/unit/security_hardening.test.ts` diff --git a/src/actions.ts b/src/actions.ts index 4cb07ffa0..51cdeda87 100644 --- a/src/actions.ts +++ b/src/actions.ts @@ -6382,6 +6382,36 @@ export async function storeStructuredForApi(params: { }` ); } + + // Schema-driven derived-entity extraction: if the entity's active schema + // declares `derived_entities`, evaluate each rule and create matching + // entities linked back to the source entity. Non-fatal: failures are + // logged and skipped, never blocking the primary store. + if (commit) { + try { + const { schemaRegistry: schemaReg2 } = await import("./services/schema_registry.js"); + const schemaEntry2 = await schemaReg2.loadActiveSchema(r.entity_type, userId); + if (schemaEntry2?.schema_definition?.derived_entities?.length) { + const { extractDerivedEntities } = + await import("./services/schema_derived_entity_extraction.js"); + await extractDerivedEntities({ + entityId: r.entity_id, + entityType: r.entity_type, + fields: r.fields, + schema: schemaEntry2.schema_definition, + userId, + sourceId: observationSourceId, + idempotencyKey, + }); + } + } catch (derivedErr) { + logger.warn( + `Derived entity extraction failed for ${r.entity_type}/${r.entity_id}: ${ + derivedErr instanceof Error ? derivedErr.message : String(derivedErr) + }` + ); + } + } } createdEntities.push({ diff --git a/src/services/schema_definitions.ts b/src/services/schema_definitions.ts index 2c437fdcb..f6cdb665e 100644 --- a/src/services/schema_definitions.ts +++ b/src/services/schema_definitions.ts @@ -1164,6 +1164,18 @@ export const ENTITY_SCHEMAS: Record = { }, tags: { type: "string", required: false }, notes: { type: "string", required: false, preserveCase: true }, + /** + * Broad category of the task. Registered values include + * "awaiting_reply" (auto-extracted from outbound emails) and any + * user-defined type. Deliberately an open string rather than an enum + * so callers can introduce types without a schema migration. + */ + task_type: { type: "string", required: false }, + /** + * Free-text description of the due context (e.g. "reply expected"). + * Supplements `due_date` when an exact date is unknown. + */ + due_context: { type: "string", required: false, preserveCase: true }, import_date: { type: "date", required: false }, import_source_file: { type: "string", required: false }, }, @@ -1270,7 +1282,7 @@ export const ENTITY_SCHEMAS: Record = { email: { entity_type: "email", - schema_version: "1.0", + schema_version: "1.1", metadata: { label: "Email", description: "Email-specific messages with threads and subjects.", @@ -1290,6 +1302,14 @@ export const ENTITY_SCHEMAS: Record = { received_at: { type: "date", required: false }, thread_id: { type: "string", required: false }, message_id: { type: "string", required: false }, + /** + * Message direction from the perspective of the local user. + * "outbound" = sent by the user; "inbound" = received by the user. + * When present, the derived-entity rule below auto-extracts an + * awaiting_reply task for outbound messages with pending-reply + * signals. + */ + direction: { type: "string", required: false }, status: { type: "string", required: false }, tags: { type: "string", required: false }, notes: { type: "string", required: false }, @@ -1300,12 +1320,47 @@ export const ENTITY_SCHEMAS: Record = { // present. Fall back to (from, subject, sent_at) which uniquely // identifies the message within most mailboxes. canonical_name_fields: ["message_id", { composite: ["from", "subject", "sent_at"] }], + /** + * Auto-extract an awaiting_reply task when an outbound email contains + * signals that a reply is expected (question mark, "please let me know", + * etc.). See `docs/foundation/schema_agnostic_design_rules.md`. + */ + derived_entities: [ + { + conditions: [ + { field: "direction", op: "eq", value: "outbound" }, + { + field: "body", + op: "matches_any_pattern", + patterns: [ + "\\?", + "please let me know", + "looking forward to hearing", + "please reply", + "awaiting your response", + "let me know", + "your thoughts", + "waiting to hear", + ], + }, + ], + derived_entity_type: "task", + derived_fields: { + title: { template: "Awaiting reply: {{subject}}" }, + task_type: { value: "awaiting_reply" }, + status: { value: "pending" }, + due_context: { value: "reply expected" }, + }, + relationship_type: "REFERS_TO", + }, + ], }, reducer_config: { merge_policies: { sent_at: { strategy: "last_write" }, received_at: { strategy: "last_write" }, body: { strategy: "highest_priority" }, + direction: { strategy: "last_write" }, }, }, }, diff --git a/src/services/schema_derived_entity_extraction.ts b/src/services/schema_derived_entity_extraction.ts new file mode 100644 index 000000000..d5d67f748 --- /dev/null +++ b/src/services/schema_derived_entity_extraction.ts @@ -0,0 +1,212 @@ +/** + * Schema-driven derived entity extraction. + * + * When a schema declares `derived_entities`, this service evaluates each rule + * against the stored payload and, for every matching rule, creates a new + * derived entity and links it to the source entity. + * + * This is the schema-agnostic replacement for hardcoded "if entity_type === + * 'email' and direction === 'outbound', create a task" branches. + * See `docs/foundation/schema_agnostic_design_rules.md`. + */ + +import { logger } from "../utils/logger.js"; +import type { SchemaDefinition } from "./schema_registry.js"; + +export interface DerivedEntityExtractionParams { + entityId: string; + entityType: string; + fields: Record; + schema: SchemaDefinition | null | undefined; + userId: string; + sourceId?: string; + idempotencyKey: string; +} + +export interface DerivedEntityExtractionResult { + created: number; + skipped: number; + details: Array<{ + rule_index: number; + derived_entity_type: string; + derived_entity_id: string | null; + created: boolean; + skip_reason?: string; + }>; +} + +/** + * Evaluate a single condition against a payload. + * + * Exported for unit testing. + */ +export function evaluateCondition( + condition: NonNullable[number]["conditions"][number], + payload: Record +): boolean { + const rawValue = payload[condition.field]; + + switch (condition.op) { + case "present": + return rawValue != null && rawValue !== "" && rawValue !== false; + + case "absent": + return rawValue == null || rawValue === "" || rawValue === false; + + case "eq": + return rawValue === condition.value; + + case "neq": + return rawValue !== condition.value; + + case "matches_any_pattern": { + if (!condition.patterns || condition.patterns.length === 0) return false; + const strValue = typeof rawValue === "string" ? rawValue : null; + if (!strValue) return false; + const lc = strValue.toLowerCase(); + return condition.patterns.some((pattern) => { + try { + // Treat pattern as a RegExp source string first; fall back to + // substring match if RegExp construction fails. + const re = new RegExp(pattern, "i"); + return re.test(strValue); + } catch { + return lc.includes(pattern.toLowerCase()); + } + }); + } + + default: + return false; + } +} + +/** + * Resolve a single derived field value from the rule declaration. + * + * Exported for unit testing. + */ +export function resolveDerivedField( + spec: { value: unknown } | { template: string }, + sourceFields: Record +): unknown { + if ("value" in spec) { + return spec.value; + } + // Template: replace {{fieldName}} with the source entity's field value. + return spec.template.replace(/\{\{(\w+)\}\}/g, (_match, key: string) => { + const v = sourceFields[key]; + return v != null ? String(v) : ""; + }); +} + +/** + * Evaluate all derived-entity rules for the given stored entity and create + * each matching derived entity. Failures are non-fatal. + */ +export async function extractDerivedEntities( + params: DerivedEntityExtractionParams +): Promise { + const result: DerivedEntityExtractionResult = { created: 0, skipped: 0, details: [] }; + + const rules = params.schema?.derived_entities; + if (!rules || rules.length === 0) return result; + + const { storeStructuredForApi } = await import("../actions.js"); + const { relationshipsService } = await import("./relationships.js"); + + for (let ruleIdx = 0; ruleIdx < rules.length; ruleIdx++) { + const rule = rules[ruleIdx]; + + // Evaluate all conditions; all must pass. + const conditionsMet = rule.conditions.every((cond) => evaluateCondition(cond, params.fields)); + + if (!conditionsMet) { + result.skipped++; + result.details.push({ + rule_index: ruleIdx, + derived_entity_type: rule.derived_entity_type, + derived_entity_id: null, + created: false, + skip_reason: "conditions_not_met", + }); + continue; + } + + // Build the derived entity fields. + const derivedFields: Record = { + entity_type: rule.derived_entity_type, + }; + for (const [fieldName, spec] of Object.entries(rule.derived_fields)) { + derivedFields[fieldName] = resolveDerivedField(spec, params.fields); + } + + try { + // Re-use the main store path so all schema validation, observation + // creation, snapshot computation, and relationship auto-linking apply + // uniformly to the derived entity. + const derivedIdempotencyKey = `${params.idempotencyKey}:derived:${ruleIdx}`; + + const storeResult = await storeStructuredForApi({ + userId: params.userId, + entities: [derivedFields], + sourcePriority: 100, + observationSource: "workflow_state", + idempotencyKey: derivedIdempotencyKey, + }); + + const derivedEntityId = storeResult.entities?.[0]?.entity_id ?? null; + + if (derivedEntityId) { + // Link source entity β†’ derived entity. + const relationshipType = (rule.relationship_type ?? + "REFERS_TO") as import("./relationships.js").RelationshipType; + await relationshipsService.createRelationship({ + relationship_type: relationshipType, + source_entity_id: params.entityId, + target_entity_id: derivedEntityId, + source_id: params.sourceId, + metadata: { + auto_derived: true, + derived_rule_index: ruleIdx, + derived_from_entity_type: params.entityType, + }, + user_id: params.userId, + }); + + result.created++; + result.details.push({ + rule_index: ruleIdx, + derived_entity_type: rule.derived_entity_type, + derived_entity_id: derivedEntityId, + created: true, + }); + } else { + result.skipped++; + result.details.push({ + rule_index: ruleIdx, + derived_entity_type: rule.derived_entity_type, + derived_entity_id: null, + created: false, + skip_reason: "store_returned_no_entity_id", + }); + } + } catch (err) { + result.skipped++; + result.details.push({ + rule_index: ruleIdx, + derived_entity_type: rule.derived_entity_type, + derived_entity_id: null, + created: false, + skip_reason: err instanceof Error ? err.message : String(err), + }); + logger.warn( + `[SCHEMA_DERIVED] Failed to extract derived entity (rule ${ruleIdx}) ` + + `from ${params.entityType}/${params.entityId}: ` + + `${err instanceof Error ? err.message : String(err)}` + ); + } + } + + return result; +} diff --git a/src/services/schema_registry.ts b/src/services/schema_registry.ts index cc13341b2..04412cddb 100644 --- a/src/services/schema_registry.ts +++ b/src/services/schema_registry.ts @@ -255,6 +255,80 @@ export interface SchemaDefinition { /** Human-readable warning message included in the response. */ message: string; }>; + + /** + * Schema-driven rules that auto-extract derived entities when an observation + * of this type is stored. Each rule specifies conditions on the stored + * payload and, when all conditions match, creates a new entity of the + * declared `derived_entity_type` and links it to the source entity via the + * declared `relationship_type` (defaults to `REFERS_TO`). + * + * Rules are evaluated after the primary entity has been committed. Failures + * are non-fatal: logged as warnings and skipped, never blocking the primary + * store. + * + * Example (awaiting-reply task from outbound email): + * ```ts + * derived_entities: [{ + * conditions: [ + * { field: "direction", op: "eq", value: "outbound" }, + * { field: "body", op: "matches_any_pattern", patterns: [ + * "\\?", + * "please let me know", + * "looking forward to hearing", + * "please reply", + * "awaiting your response", + * ]}, + * ], + * derived_entity_type: "task", + * derived_fields: { + * title: { template: "Awaiting reply: {{subject}}" }, + * task_type: { value: "awaiting_reply" }, + * status: { value: "pending" }, + * due_context: { value: "reply expected" }, + * }, + * relationship_type: "REFERS_TO", + * }] + * ``` + */ + derived_entities?: Array<{ + /** + * All conditions must pass for this rule to fire. An empty conditions + * array means the rule always fires. + */ + conditions: Array<{ + /** Field on the stored payload to evaluate. */ + field: string; + /** Comparison operator. */ + op: + | "eq" // strict equality (string/number/boolean) + | "neq" // not equal + | "present" // field is present and non-null/non-empty + | "absent" // field is missing or null/empty + | "matches_any_pattern"; // case-insensitive substring/regex match against any pattern + /** Value to compare against (used by eq / neq). */ + value?: unknown; + /** + * Patterns for matches_any_pattern. A pattern is a case-insensitive + * substring or JS RegExp source string. The condition passes when at + * least one pattern matches the field's string value. + */ + patterns?: string[]; + }>; + /** Entity type of the entity to extract. */ + derived_entity_type: string; + /** + * Fields to set on the derived entity. Each entry is either a literal + * value or a template string (using `{{field}}` placeholders resolved + * from the source entity's stored payload). + */ + derived_fields: Record; + /** + * Relationship type to create between the source entity (as `source`) + * and the derived entity (as `target`). Defaults to `"REFERS_TO"`. + */ + relationship_type?: string; + }>; } /** Known opt-out tokens for {@link SchemaDefinition.identity_opt_out}. */ @@ -1995,6 +2069,68 @@ export class SchemaRegistryService { throw new Error("agent_instructions must be a non-empty string when present"); } } + + if (definition.derived_entities !== undefined) { + if (!Array.isArray(definition.derived_entities)) { + throw new Error("derived_entities must be an array"); + } + const validOps = ["eq", "neq", "present", "absent", "matches_any_pattern"]; + for (let i = 0; i < definition.derived_entities.length; i++) { + const rule = definition.derived_entities[i]; + if (!rule || typeof rule !== "object") { + throw new Error(`derived_entities[${i}] must be an object`); + } + if (!Array.isArray(rule.conditions)) { + throw new Error(`derived_entities[${i}].conditions must be an array`); + } + for (let j = 0; j < rule.conditions.length; j++) { + const cond = rule.conditions[j]; + if (typeof cond.field !== "string" || !cond.field.trim()) { + throw new Error( + `derived_entities[${i}].conditions[${j}].field must be a non-empty string` + ); + } + if (!validOps.includes(cond.op)) { + throw new Error( + `derived_entities[${i}].conditions[${j}].op must be one of: ${validOps.join(", ")}` + ); + } + if ( + cond.op === "matches_any_pattern" && + (!Array.isArray(cond.patterns) || cond.patterns.length === 0) + ) { + throw new Error( + `derived_entities[${i}].conditions[${j}] with op "matches_any_pattern" must have a non-empty patterns array` + ); + } + } + if (typeof rule.derived_entity_type !== "string" || !rule.derived_entity_type.trim()) { + throw new Error(`derived_entities[${i}].derived_entity_type must be a non-empty string`); + } + if (!rule.derived_fields || typeof rule.derived_fields !== "object") { + throw new Error(`derived_entities[${i}].derived_fields must be an object`); + } + for (const [fieldKey, fieldSpec] of Object.entries(rule.derived_fields)) { + if ( + !fieldSpec || + typeof fieldSpec !== "object" || + (!("value" in fieldSpec) && !("template" in fieldSpec)) + ) { + throw new Error( + `derived_entities[${i}].derived_fields.${fieldKey} must be { value: ... } or { template: "..." }` + ); + } + if ( + "template" in fieldSpec && + (typeof fieldSpec.template !== "string" || fieldSpec.template.trim().length === 0) + ) { + throw new Error( + `derived_entities[${i}].derived_fields.${fieldKey}.template must be a non-empty string` + ); + } + } + } + } } /** diff --git a/tests/unit/schema_derived_entity_extraction.test.ts b/tests/unit/schema_derived_entity_extraction.test.ts new file mode 100644 index 000000000..3f003b2d1 --- /dev/null +++ b/tests/unit/schema_derived_entity_extraction.test.ts @@ -0,0 +1,362 @@ +/** + * Unit tests for schema-driven derived entity extraction. + * + * Tests cover: + * - evaluateCondition: all operators + * - resolveDerivedField: value and template specs + * - Awaiting-reply task extraction rule: outbound email with question β†’ task created + * - Awaiting-reply task extraction rule: outbound email without question signals β†’ no task + * - Inbound email β†’ no awaiting_reply task + */ + +import { describe, expect, it } from "vitest"; +import { + evaluateCondition, + resolveDerivedField, +} from "../../src/services/schema_derived_entity_extraction.js"; +import { ENTITY_SCHEMAS } from "../../src/services/schema_definitions.js"; + +// --------------------------------------------------------------------------- +// evaluateCondition +// --------------------------------------------------------------------------- + +describe("evaluateCondition", () => { + describe("op: present", () => { + it("returns true when field has a non-empty string value", () => { + expect( + evaluateCondition({ field: "direction", op: "present" }, { direction: "outbound" }) + ).toBe(true); + }); + + it("returns false when field is null", () => { + expect(evaluateCondition({ field: "direction", op: "present" }, { direction: null })).toBe( + false + ); + }); + + it("returns false when field is undefined / missing", () => { + expect(evaluateCondition({ field: "direction", op: "present" }, {})).toBe(false); + }); + + it("returns false when field is an empty string", () => { + expect(evaluateCondition({ field: "direction", op: "present" }, { direction: "" })).toBe( + false + ); + }); + }); + + describe("op: absent", () => { + it("returns true when field is missing", () => { + expect(evaluateCondition({ field: "direction", op: "absent" }, {})).toBe(true); + }); + + it("returns false when field has a value", () => { + expect( + evaluateCondition({ field: "direction", op: "absent" }, { direction: "outbound" }) + ).toBe(false); + }); + }); + + describe("op: eq", () => { + it("returns true for exact string match", () => { + expect( + evaluateCondition( + { field: "direction", op: "eq", value: "outbound" }, + { direction: "outbound" } + ) + ).toBe(true); + }); + + it("returns false for mismatched value", () => { + expect( + evaluateCondition( + { field: "direction", op: "eq", value: "outbound" }, + { direction: "inbound" } + ) + ).toBe(false); + }); + + it("returns false for missing field", () => { + expect(evaluateCondition({ field: "direction", op: "eq", value: "outbound" }, {})).toBe( + false + ); + }); + }); + + describe("op: neq", () => { + it("returns true when field value differs", () => { + expect( + evaluateCondition( + { field: "direction", op: "neq", value: "outbound" }, + { direction: "inbound" } + ) + ).toBe(true); + }); + + it("returns false when field value matches", () => { + expect( + evaluateCondition( + { field: "direction", op: "neq", value: "outbound" }, + { direction: "outbound" } + ) + ).toBe(false); + }); + }); + + describe("op: matches_any_pattern", () => { + it("returns true when body contains a literal pattern (case-insensitive)", () => { + expect( + evaluateCondition( + { + field: "body", + op: "matches_any_pattern", + patterns: ["please let me know"], + }, + { body: "Hi Alice, Please let me know your availability." } + ) + ).toBe(true); + }); + + it("returns true when body ends with a question mark", () => { + expect( + evaluateCondition( + { field: "body", op: "matches_any_pattern", patterns: ["\\?"] }, + { body: "Can you confirm the meeting time?" } + ) + ).toBe(true); + }); + + it("returns false when body has no matching pattern", () => { + expect( + evaluateCondition( + { + field: "body", + op: "matches_any_pattern", + patterns: ["please let me know", "\\?"], + }, + { body: "Thanks for the info. Looking forward to working with you." } + ) + ).toBe(false); + }); + + it("returns false when field is not a string", () => { + expect( + evaluateCondition( + { field: "body", op: "matches_any_pattern", patterns: ["\\?"] }, + { body: 42 } + ) + ).toBe(false); + }); + + it("returns false when patterns array is empty", () => { + expect( + evaluateCondition( + { field: "body", op: "matches_any_pattern", patterns: [] }, + { body: "Is this working?" } + ) + ).toBe(false); + }); + + it("returns false when patterns is undefined", () => { + expect( + evaluateCondition({ field: "body", op: "matches_any_pattern" }, { body: "Hello?" }) + ).toBe(false); + }); + }); +}); + +// --------------------------------------------------------------------------- +// resolveDerivedField +// --------------------------------------------------------------------------- + +describe("resolveDerivedField", () => { + it("returns a literal value unchanged", () => { + expect(resolveDerivedField({ value: "pending" }, {})).toBe("pending"); + }); + + it("returns null literal value unchanged", () => { + expect(resolveDerivedField({ value: null }, {})).toBeNull(); + }); + + it("interpolates a single {{field}} template placeholder", () => { + expect( + resolveDerivedField( + { template: "Awaiting reply: {{subject}}" }, + { subject: "Meeting tomorrow" } + ) + ).toBe("Awaiting reply: Meeting tomorrow"); + }); + + it("interpolates multiple placeholders", () => { + expect( + resolveDerivedField( + { template: "{{greeting}} {{name}}" }, + { greeting: "Hello", name: "Alice" } + ) + ).toBe("Hello Alice"); + }); + + it("replaces missing placeholder with empty string", () => { + expect(resolveDerivedField({ template: "Awaiting reply: {{subject}}" }, {})).toBe( + "Awaiting reply: " + ); + }); +}); + +// --------------------------------------------------------------------------- +// email schema derived_entities rule +// --------------------------------------------------------------------------- + +describe("email schema derived_entities awaiting-reply rule", () => { + const emailSchema = ENTITY_SCHEMAS["email"]; + const rule = emailSchema?.schema_definition?.derived_entities?.[0]; + + it("email schema has exactly one derived_entities rule", () => { + expect(emailSchema?.schema_definition?.derived_entities).toHaveLength(1); + }); + + it("rule targets task entity type", () => { + expect(rule?.derived_entity_type).toBe("task"); + }); + + it("rule has relationship_type REFERS_TO", () => { + expect(rule?.relationship_type).toBe("REFERS_TO"); + }); + + // Helper: evaluate all conditions on the rule + function conditionsMet(payload: Record): boolean { + if (!rule) return false; + return rule.conditions.every((cond) => evaluateCondition(cond, payload)); + } + + describe("outbound email with question-mark body β†’ task should be extracted", () => { + it("matches when direction=outbound and body contains '?'", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "Can you review the document?", + }) + ).toBe(true); + }); + + it("matches when direction=outbound and body contains 'please let me know'", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "Please let me know what you think.", + }) + ).toBe(true); + }); + + it("matches when direction=outbound and body contains 'looking forward to hearing'", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "Looking forward to hearing your feedback.", + }) + ).toBe(true); + }); + + it("matches when direction=outbound and body contains 'please reply'", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "Please reply at your earliest convenience.", + }) + ).toBe(true); + }); + + it("matches when direction=outbound and body contains 'awaiting your response'", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "Awaiting your response on the proposal.", + }) + ).toBe(true); + }); + + it("matches when direction=outbound and body contains 'let me know'", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "Let me know if you need anything else.", + }) + ).toBe(true); + }); + }); + + describe("outbound email without question signals β†’ no task", () => { + it("does not match when direction=outbound but body has no pending-reply signal", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "Thanks for the meeting. I'll follow up next week.", + }) + ).toBe(false); + }); + + it("does not match when direction=outbound but body is empty", () => { + expect( + conditionsMet({ + direction: "outbound", + body: "", + }) + ).toBe(false); + }); + + it("does not match when direction=outbound but body is missing", () => { + expect(conditionsMet({ direction: "outbound" })).toBe(false); + }); + }); + + describe("inbound email β†’ no awaiting_reply task", () => { + it("does not match when direction=inbound even with question body", () => { + expect( + conditionsMet({ + direction: "inbound", + body: "Can you send me the report?", + }) + ).toBe(false); + }); + }); + + describe("no direction field β†’ no awaiting_reply task", () => { + it("does not match when direction field is absent", () => { + expect( + conditionsMet({ + body: "Please let me know what you decide.", + }) + ).toBe(false); + }); + }); + + describe("derived field resolution for title template", () => { + it("resolves title template with subject field", () => { + const titleSpec = rule?.derived_fields?.["title"]; + expect(titleSpec).toBeDefined(); + expect(resolveDerivedField(titleSpec!, { subject: "Q3 Budget Approval" })).toBe( + "Awaiting reply: Q3 Budget Approval" + ); + }); + + it("resolves title template with missing subject to empty suffix", () => { + const titleSpec = rule?.derived_fields?.["title"]; + expect(resolveDerivedField(titleSpec!, {})).toBe("Awaiting reply: "); + }); + + it("task_type is awaiting_reply", () => { + const spec = rule?.derived_fields?.["task_type"]; + expect(spec && "value" in spec ? spec.value : undefined).toBe("awaiting_reply"); + }); + + it("status is pending", () => { + const spec = rule?.derived_fields?.["status"]; + expect(spec && "value" in spec ? spec.value : undefined).toBe("pending"); + }); + + it("due_context is reply expected", () => { + const spec = rule?.derived_fields?.["due_context"]; + expect(spec && "value" in spec ? spec.value : undefined).toBe("reply expected"); + }); + }); +});