perf(filtering): split METADATA into thin index + fat content table

## Problem

The `filtering::delete` function re-sequences `_subset_` primary keys after removing rows to maintain the dense 0-based ID invariant required by the PLAID vector index. On large indices (~90K rows, ~5GB database), this takes ~1s even for single-file updates.

The root cause is that `_subset_` is the `INTEGER PRIMARY KEY` (i.e. the rowid). Changing it relocates the row in the B-tree, which requires reading and rewriting all ~27KB per row (including overflow pages holding large TEXT columns like `code`, `signature`, etc).

Benchmarks on a 20K-row table with 27KB rows, shifting ~19K rows:

| Approach | Time |
|----------|------|
| `_subset_` as PK (current) | 0.94s |
| Autoincrement PK + indexed `_subset_` | 0.21s |
| Thin table (no fat columns) | 0.008s |

The thin table is ~115x faster than the current approach because SQLite never touches overflow pages.

## Proposal

Split METADATA into two tables:

**METADATA** (thin, ~5-10MB for 90K rows):
```sql
_subset_ INTEGER PRIMARY KEY,      -- dense 0-based, re-sequenced on delete
_content_id_ INTEGER NOT NULL,     -- FK to METADATA_CONTENT (stable, monotonic)
file TEXT,
name TEXT,
qualified_name TEXT,
line INTEGER,
end_line INTEGER,
language TEXT,
unit_type TEXT,
complexity INTEGER,
has_loops INTEGER,
has_branches INTEGER,
has_error_handling INTEGER
```

**METADATA_CONTENT** (fat, ~2.5GB for 90K rows):
```sql
_content_id_ INTEGER PRIMARY KEY,  -- stable monotonic, never re-sequenced
code TEXT,
signature TEXT,
docstring TEXT,
parameters TEXT,
calls TEXT,
called_by TEXT,
variables TEXT,
imports TEXT,
return_type TEXT,
extends TEXT,
parent_class TEXT
```

## Impact on public API

The public interface (`create`, `update`, `delete`, `get`, `where_condition`) stays the same. Internal changes only:

| Function | Change |
|----------|--------|
| `create` / `update` | INSERT into both tables |
| `delete` | Re-sequence thin table only (~8ms); cascade-delete orphaned content rows |
| `get` | JOIN on `_content_id_` |
| `where_condition` | No change (filters on thin columns only) |
| `where_condition_regexp` | JOIN when pattern targets a content column |
| `count` / `exists` | No change |

## Migration

Use `PRAGMA user_version` to track schema version:
- v0: current single-table layout
- v1: split layout

On first load with new code, detect v0 and run a one-time migration:
1. CREATE METADATA_CONTENT, populate from existing fat columns
2. Rebuild METADATA as thin table (CREATE new, INSERT, DROP old, RENAME)
3. Set `PRAGMA user_version = 1`
4. VACUUM

Pure row shuffling, no re-encoding needed. ~30-60s on a 5GB database. Transaction ensures no partial state on interruption.

## Context

Complements PR #135 which addresses the other half of incremental delete performance (IVF patching). Together these would bring a small incremental update from ~60s to well under 1s on large indices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(filtering): split METADATA into thin index + fat content table #136

Problem

Proposal

Impact on public API

Migration

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Approach	Time
`_subset_` as PK (current)	0.94s
Autoincrement PK + indexed `_subset_`	0.21s
Thin table (no fat columns)	0.008s

Function	Change
`create` / `update`	INSERT into both tables
`delete`	Re-sequence thin table only (~8ms); cascade-delete orphaned content rows
`get`	JOIN on `_content_id_`
`where_condition`	No change (filters on thin columns only)
`where_condition_regexp`	JOIN when pattern targets a content column
`count` / `exists`	No change

perf(filtering): split METADATA into thin index + fat content table #136

Description

Problem

Proposal

Impact on public API

Migration

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions