Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 29 additions & 1 deletion docs/developer/cli_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,15 +328,29 @@ neotoma session --servers
- `--force`: Overwrite existing configuration.
- `--skip-db`: Skip database initialization.
- `--skip-env`: Skip interactive `.env` creation and variable prompts (e.g. for CI or non-interactive use).
- `--project-local`: Store the Neotoma config in `.neotoma/config.json` in the current directory (project-scoped) instead of the user-level `~/.config/neotoma/config.json`. The project-local config takes precedence over the user-level config when `readEffectiveConfig` is used. Use this when you want per-project Neotoma configuration that is independent of the user-level setup.
- `--safe`: Dry-run mode. Reports what `init` would do (create directories, write config, run migrations) without writing any files or making any changes. Output lists each planned action with a check mark. Exit code is 0 if everything would succeed. Combine with `--json` to get machine-readable output.

**Example:**
**Examples:**

```bash
# Basic initialization
neotoma init

# Initialize with custom data directory
neotoma init --data-dir /path/to/data

# Store config in current project directory instead of user home
neotoma init --project-local

# Preview what init would do without making any changes
neotoma init --safe

# Dry-run with machine-readable output
neotoma init --safe --json

# Combine: dry-run scoped to current project
neotoma init --safe --project-local
```

**What it creates:**
Expand All @@ -345,6 +359,20 @@ neotoma init --data-dir /path/to/data
- SQLite database: `<data-dir>/neotoma.db` (with WAL mode enabled)
- Encryption key (if user chooses key-derived auth when prompted): `~/.config/neotoma/keys/neotoma.key` (mode 0600).
- Environment file target: project `<checkout>/.env` when checkout is detected, otherwise `~/.config/neotoma/.env`
- Config file: `~/.config/neotoma/config.json` (default) or `.neotoma/config.json` in the current directory when `--project-local` is given.

**Runtime overrides** for `neotoma init`:

| Precedence | Source | Description |
|------------|--------|-------------|
| 1 (highest) | `--data-dir` flag | Explicit data directory path |
| 2 | `NEOTOMA_DATA_DIR` env var | Environment variable override |
| 3 (default) | Auto-detected or `~/neotoma/data` | Resolved at startup |

| Precedence | Source | Description |
|------------|--------|-------------|
| 1 (highest) | `--project-local` flag | Write to `.neotoma/config.json` in cwd |
| 2 (default) | (no flag) | Write to `~/.config/neotoma/config.json` |

### Harness setup

Expand Down
40 changes: 34 additions & 6 deletions docs/developer/mcp/instructions.md

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions docs/foundation/what_to_store.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,22 @@ These before/after examples show what storage looks like in practice. "Before" i
- Before: You have an ongoing billing dispute with a vendor. Details are scattered across emails, chat messages, and phone call notes. Reconstructing the timeline requires manual archaeology.
- After: Agent stores each interaction as an observation on the dispute entity. `{ entity_type: "dispute", vendor: "Acme Billing", status: "open", amount_disputed: 250.00 }` with observations for each touchpoint. The full timeline is queryable: "What did we know about this dispute on March 15?"

## Store-first protocol for external tool actions

When an agent is about to execute a write action in an external tool — sending an email, creating a GitHub issue, posting a Slack message, booking a calendar event, triggering a webhook, or any equivalent operation that mutates state outside Neotoma — it MUST store the intent as a Neotoma entity BEFORE executing the action.

This is not optional. The store step is non-negotiable regardless of how simple or low-risk the external action appears.

The three-step sequence is:

1. **Store intent.** Store an entity capturing what will be done, to whom, and with what content. Use the most specific entity_type (e.g. `email_draft`, `issue`, `event`, `message`). Set `status: "pending"`.
2. **Execute the external action.** Call the external tool only after the store returns successfully.
3. **Update entity status.** After the external action completes or fails, update the stored entity with the outcome (`status: "sent"`, `status: "created"`, `status: "failed"`, plus any external IDs returned).

If the external action fails, the stored entity remains as durable evidence of the intent. If the action succeeds, the entity records what was done, when, and to whom — traceable and auditable.

This protocol extends the general store-first rule (which applies to reading external data) to cover writes. The agent instructions in `docs/developer/mcp/instructions.md` define the full binding rule and entity-type mapping under `[STORE-FIRST PROTOCOL]`.

## What NOT to store

| Condition | Reason |
Expand Down
216 changes: 216 additions & 0 deletions docs/subsystems/github_entities.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
# GitHub Entities

When email records, calendar invites, or other external sources reference GitHub resources (issues, pull requests, organizations, or projects), agents extract and store those resources as first-class Neotoma entities and link them to the originating record via REFERS_TO.

## Scope

This document covers:

- Canonical entity types and field names for GitHub resources extracted from external records.
- Extraction rules for each resource class.
- Linking conventions (email entity as source, REFERS_TO edges, observation `data_source`).

It does NOT cover:

- The `issue` subsystem's Neotoma-native issue tracking and GitHub mirror pipeline. See [`issues.md`](issues.md).
- Generic external-entity submission. See [`entity_submission.md`](entity_submission.md).

## Entity Types

### GitHub Issue (`issue`)

Use `entity_type: "issue"` for GitHub issues referenced in email or other external records. This is the same type used by the Neotoma issue subsystem; identity is `(github_number, repo)`.

**Required fields:**

| Field | Type | Description |
|-------|------|-------------|
| `github_number` | number | Issue number (e.g. `42`) |
| `repo` | string | `owner/name` (e.g. `markmhendrickson/neotoma`) |

**Optional fields:**

| Field | Type | Description |
|-------|------|-------------|
| `github_url` | string | Full issue URL |
| `title` | string | Issue title when parseable |
| `status` | string | `open` or `closed` when known |
| `data_source` | string | Provenance string (tool + id + date) |
| `source_quote` | string | Verbatim snippet from the email body supporting extraction |

**Identity rule:** `[{ composite: ["github_number", "repo"] }]` — re-stores update the existing row rather than creating a duplicate.

**Do NOT use** a generic `note` or invent ad hoc fields (`github_issue_number`, `repository`, `url`) when the canonical fields are recoverable. If only partial context is available and `github_number` + `repo` cannot be populated, store a `note` or `technical_research` entity instead until canonical fields are known. See `[ISSUE REPORTING]` GitHub issue URL extraction rule in MCP instructions.

### Pull Request (`pull_request`)

Use `entity_type: "pull_request"` for GitHub pull requests referenced in email or other external records.

**Required fields:**

| Field | Type | Description |
|-------|------|-------------|
| `number` | number | PR number (e.g. `57`) |
| `repo` | string | `owner/name` (e.g. `markmhendrickson/neotoma`) |

**Optional fields:**

| Field | Type | Description |
|-------|------|-------------|
| `url` | string | Full PR URL |
| `title` | string | PR title when parseable |
| `status` | string | `open`, `merged`, or `closed` when known |
| `author` | string | GitHub login of PR author |
| `base_branch` | string | Target branch |
| `head_branch` | string | Source branch |
| `created_at` | date | PR creation timestamp |
| `merged_at` | date | Merge timestamp |
| `closed_at` | date | Close timestamp |
| `data_source` | string | Provenance string |
| `source_quote` | string | Verbatim snippet from the email body |

**Identity rule:** `[{ composite: ["number", "repo"] }]` with `url` as fallback.

**URL pattern for recognition:** `github.com/<owner>/<repo>/pull/<number>`.

**Aliases accepted by resolver:** `pr`, `github_pr`, `merge_request`.

### GitHub Organization (`organization` / `company`)

Use `entity_type: "organization"` (or reuse `company` per entity-type reuse check) for GitHub organizations mentioned as email senders, vendors, sponsors, or named collaborators.

**Fields (from `company` schema):**

| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Organization display name (required) |
| `website` | string | `https://github.com/<login>` |
| `external_id` | string | GitHub login (e.g. `octocat`) — use as the stable identifier |
| `description` | string | Organization description when available |
| `data_source` | string | Provenance string |

**Identity rule:** `["external_id", "website", "email", "legal_name", "name"]` in priority order. Use `external_id` = GitHub login for the most stable deduplication key.

**Do NOT** create a new `github_org` type. Use the established `organization` / `company` type with `external_id` set to the GitHub login and `website` set to the GitHub URL.

### GitHub Project (`project`)

Use `entity_type: "project"` for GitHub Projects referenced in email (e.g. project board links or project-context subject lines). This is a general-purpose project type shared with non-GitHub projects; disambiguate using `data_source`.

**Fields (from `project` schema):**

| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Project name (required) |
| `status` | string | `active`, `closed`, etc. (required by schema) |
| `description` | string | Project description |
| `notes` | string | GitHub project URL or other notes |
| `data_source` | string | Provenance string (e.g. `GitHub Projects email reference 2026-05-19`) |

**Identity rule:** `["name"]`.

**Do NOT** create a new `github_project` type. Use `project` with `data_source` to distinguish GitHub Projects from other project types.

## Extraction Rules

### When to Extract

Run the GitHub entity extraction pass as part of the per-record scan (see `[COMMUNICATION & DISPLAY]` per-record extraction checklist) whenever an email, calendar invite, chat message, or web page body contains:

- A GitHub issue URL: `github.com/<owner>/<repo>/issues/<number>`
- A GitHub PR URL: `github.com/<owner>/<repo>/pull/<number>`
- An issue or PR reference: `#<number>` (when repo context is available from subject or sender)
- A GitHub organization name or `github.com/<login>` URL
- A GitHub Projects board link: `github.com/orgs/<org>/projects/<id>` or `github.com/users/<login>/projects/<id>`

### Extraction per Entity Class

**GitHub issue from URL:**

```
entity_type: "issue"
github_number: <number from URL>
repo: "<owner>/<name>"
github_url: "<full URL>"
title: "<title if in subject or body>"
data_source: "email message_id=<id> <ISO-date>"
source_quote: "<verbatim URL or surrounding sentence>"
```

**GitHub PR from URL:**

```
entity_type: "pull_request"
number: <number from URL>
repo: "<owner>/<name>"
url: "<full URL>"
title: "<title if parseable>"
data_source: "email message_id=<id> <ISO-date>"
source_quote: "<verbatim URL or surrounding sentence>"
```

**GitHub organization from sender or body:**

```
entity_type: "organization"
name: "<org display name>"
external_id: "<github login>"
website: "https://github.com/<login>"
data_source: "email message_id=<id> <ISO-date>"
```

**GitHub project from URL or subject:**

```
entity_type: "project"
name: "<project name>"
status: "active"
notes: "<full GitHub Projects URL>"
data_source: "GitHub Projects email reference <ISO-date>"
```

### Linking

After storing the extracted entity, link it to the originating email record in the **same `store` call** using the `relationships` array:

```
{ relationship_type: "REFERS_TO", source_entity_id: "<email_entity_id>", target_entity_id: "<github_entity_id>" }
```

Or, when batching in one store call, use index-based references:

```
{ relationship_type: "REFERS_TO", source_index: <email_index>, target_index: <github_entity_index> }
```

Use the email entity (e.g. `email_message`) as the **source** on the REFERS_TO edge and the GitHub entity as the target. This matches the `[STORE RECIPES]` user-phase relationship convention (message → extracted entity).

### Observation `data_source`

Every GitHub entity stored from email MUST carry a per-entity `data_source` field identifying the originating email:

```
"email message_id=<gmail_message_id> <ISO-date>"
```

When the `message_id` is unavailable, use the sender address and date:

```
"email from=<sender> <ISO-date>"
```

This satisfies the multi-row `data_source` identity requirement in `[PROVENANCE]` and prevents distinct email records from collapsing into the same GitHub entity row when the same issue is mentioned in multiple emails.

## Schema Registration

- `issue` — defined in `src/services/issues/seed_schema.ts` (global, seeded at startup).
- `pull_request` — defined in `src/services/schema_definitions.ts` (static bootstrap, `ENTITY_SCHEMAS`).
- `organization` / `company` — defined in `src/services/schema_definitions.ts`.
- `project` — defined in `src/services/schema_definitions.ts`.

## Related Documents

- [`issues.md`](issues.md) — Neotoma-native issue tracking and GitHub mirror pipeline
- [`docs/developer/mcp/instructions.md`](../developer/mcp/instructions.md) — `[GITHUB ENTITY EXTRACTION]` section with inline extraction rules for agents
- [`record_types.md`](record_types.md) — Full catalog of application-level entity types
- [`relationships.md`](relationships.md) — Relationship types (REFERS_TO, EMBEDS, PART_OF)
34 changes: 34 additions & 0 deletions docs/subsystems/record_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Application types for notes, documents, messages, tasks, projects, and events.
| `task` | Action items with status | Founders & Small Teams |
| `project` | Multi-step initiatives | Founders & Small Teams |
| `event` | Meetings, appointments, calendar events | All Tier 1 ICPs |
| `pull_request` | GitHub pull requests extracted from email or chat | Founders & Small Teams |
**Rationale:** These types support core Tier 1 workflows:
- **AI-Native Operators:** Research synthesis (document, note), communication tracking (message)
- **Knowledge Workers:** Due diligence (document), legal research (document, note), client work (message)
Expand Down Expand Up @@ -302,6 +303,39 @@ const EVENT_PATTERNS = {
location: /(?:location|where)[\s:]*([A-Za-z0-9\s,.-]+)/i,
};
```
#### Pull Request
A GitHub pull request extracted from email, chat messages, or other external records. Identity is `repo + number` (e.g. `markmhendrickson/neotoma#42`). See [`github_entities.md`](./github_entities.md) for extraction rules.

**Required Fields:**
- `number`: number — PR number (e.g. `42`)
- `repo`: string — `owner/name` (e.g. `markmhendrickson/neotoma`)

**Optional Fields:**
- `url`: string — Full PR URL (e.g. `https://github.com/owner/repo/pull/42`)
- `title`: string — PR title when parseable
- `body`: string — PR description when available
- `status`: string — `open`, `merged`, or `closed`
- `author`: string — GitHub login of PR author (auto-links to `contact` via reference_fields)
- `base_branch`: string — Target branch
- `head_branch`: string — Source branch
- `linked_issues`: string — Issue reference(s) linked to this PR (auto-links to `issue` via reference_fields)
- `created_at`: ISO 8601 date — PR creation timestamp (emits `pull_request_created` event)
- `merged_at`: ISO 8601 date — Merge timestamp (emits `pull_request_merged` event)
- `closed_at`: ISO 8601 date — Close timestamp (emits `pull_request_closed` event)
- `data_source`: string — Provenance string (e.g. `email message_id=<id> <ISO-date>`)
- `source_quote`: string — Verbatim snippet from the originating record

**Identity rule:** `[{ composite: ["number", "repo"] }]` with `url` as fallback.

**Aliases accepted by resolver:** `pr`, `github_pr`, `merge_request`.

**Reference fields (auto-linked at store time):**
- `author` → `contact` (REFERS_TO)
- `repo` → `github_repo` (REFERS_TO)
- `linked_issues` → `issue` (REFERS_TO)

**Schema registration:** `src/services/schema_definitions.ts` (`ENTITY_SCHEMAS["pull_request"]`).

### 4.3 Knowledge Types
#### Contact
**Required Fields:**
Expand Down
14 changes: 8 additions & 6 deletions docs/testing/automated_test_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,19 @@ flowchart TD
- Do not hand-edit suite inventory entries in this file. Update the generator or the repository tree, then regenerate.

## Repo-wide summary
- Total automated test files: **391**
- Backend and repo Vitest files: **358**
- Total automated test files: **393**
- Backend and repo Vitest files: **360**
- Frontend Vitest files: **9**
- Playwright spec files: **24**

### Suite counts
| Suite | Files |
|---|---:|
| Vitest unit tests | 96 |
| Vitest unit tests | 97 |
| Vitest service tests | 33 |
| Source-adjacent tests | 45 |
| Vitest integration tests | 106 |
| Vitest CLI tests | 59 |
| Vitest CLI tests | 60 |
| Vitest contract tests | 10 |
| Vitest security tests | 1 |
| Vitest subscription tests | 3 |
Expand Down Expand Up @@ -107,7 +107,7 @@ flowchart TD
**Runner:** `vitest`
**Command:** `npm test -- tests/unit`
**Requirements:** Basic `.env` if required by the module under test.
**Files (96):**
**Files (97):**
- `tests/unit/aauth_admission.test.ts`
- `tests/unit/aauth_attestation_apple_se.test.ts`
- `tests/unit/aauth_attestation_revocation.test.ts`
Expand Down Expand Up @@ -181,6 +181,7 @@ flowchart TD
- `tests/unit/parquet_reader.test.ts`
- `tests/unit/product_feedback_schema.test.ts`
- `tests/unit/protected_entity_types.test.ts`
- `tests/unit/pull_request_schema.test.ts`
- `tests/unit/relationship_batch_schemas.test.ts`
- `tests/unit/relationship_reducer.test.ts`
- `tests/unit/request_context.test.ts`
Expand Down Expand Up @@ -415,7 +416,7 @@ flowchart TD
**Runner:** `vitest`
**Command:** `npm test -- tests/cli`
**Requirements:** Basic `.env`; some tests provision temp config homes automatically.
**Files (59):**
**Files (60):**
- `tests/cli/api_client_offline_fallback.test.ts`
- `tests/cli/backup_verify.test.ts`
- `tests/cli/cli_access_commands.test.ts`
Expand All @@ -435,6 +436,7 @@ flowchart TD
- `tests/cli/cli_ingest_remote_upload.test.ts`
- `tests/cli/cli_init_commands.test.ts`
- `tests/cli/cli_init_env_targeting.test.ts`
- `tests/cli/cli_init_flags.test.ts`
- `tests/cli/cli_init_interactive.test.ts`
- `tests/cli/cli_issues_commands.test.ts`
- `tests/cli/cli_mcp_commands.test.ts`
Expand Down
Loading