feat(capture): add S1 app parser registry with golden fixture tests by heming-gmh · Pull Request #15 · Einsia/OpenChronicle

heming-gmh · 2026-04-26T17:00:48Z

Summary

Introduce a lightweight, baseline-plus-patches parser registry for the S1 capture enrichment layer, so the project can grow from one monolithic module (s1_parser.py) into a family of small, independently-testable app parsers — without changing a single byte of downstream behaviour.

Why

The current s1_parser.enrich(capture) merges three responsibilities in one module: generic AX→S1 extraction, browser URL extraction, and (on another branch) editor/terminal metadata. This works for a small surface area, but it does not scale as the project adds parsers for Slack, Linear, Office, Figma, Notion, terminals, and other high-value contexts.

This PR splits infrastructure from implementation:

The Baseline + Patches Model

enrich() always computes a generic baseline (focused element, visible text, url=None).
Registered AppParser implementations then run in priority order.
Each parser may return an S1Patch that selectively overrides fields.
Later parsers can compose on top of earlier ones — e.g. a Linear parser can match linear.app in the URL that the browser parser already extracted.

This is intentionally not a first-match-wins dispatch. Composition lets an app that is both a browser AND a special-purpose tool (e.g. a browser-based IDE) get both URL extraction AND app-specific metadata.

Benefits

Reviewable parsers: golden fixture tests (input.json → expected.json) let maintainers inspect one raw AX input and one expected S1 output for each parser PR.
Fault isolation: a broken parser never takes down the whole capture pipeline — exceptions are logged and the baseline fields still ship.
Zero downstream churn: focused_element, visible_text, and url are unchanged; timeline, FTS indexing, and dedup hashing all work without modification.
Low ceremony for contributors: a future parser author only needs to implement AppParser (4 members), register it, add a fixture directory, and open a PR.

What Changed

New files

File	Purpose
`app_parsers/__init__.py`	Parser registry — `register()`, `apply_parsers()`, builtin auto-registration
`app_parsers/base.py`	`FocusedElement`, `ParseContext`, `S1Fields`, `S1Patch`, `AppParser` Protocol
`app_parsers/browser.py`	`BrowserParser` — migrated from `s1_parser._extract_url`, behaviour identical
`tests/test_s1_registry.py`	5 unit tests: priority ordering, patch composition, exception resilience, `app_context` write semantics
`tests/test_s1_golden.py`	Parametrised golden fixture runner (4 fixtures)
`tests/test_s1_integration.py`	4 integration tests: scheduler+enrich+FTS pipeline, timeline `_format_events` rendering
`tests/fixtures/s1/*/`	4 fixture directories (`chrome_url`, `safari_bare_domain`, `generic_cursor_textarea`, `non_browser_url_textfield`)

Modified files

File	Change
`s1_parser.py`	Remove `_BROWSER_BUNDLES`, `_extract_url`; rewrite `enrich()` as baseline + `apply_parsers()`; re-export `FocusedElement` from `base.py`
`tests/conftest.py`	Add `autouse` fixture to reset parser registry before each test

Not in this PR

No Slack, Linear, Office, editor, or terminal parser behaviour.
No dynamic plugin loading or entry-point discovery.
No schema_version change.
No AppleScript, browser automation, OCR, or new permissions.

Test Plan

9 existing test_s1_parser.py tests pass unchanged
5 registry unit tests cover ordering, composition, error isolation, app_context semantics
4 golden fixture tests cover browser + non-browser URL extraction scenarios
4 integration tests cover scheduler→enrich→FTS and timeline _format_events contracts
Full test suite: 82/82 pass

Backwards Compatibility

enrich(capture) produces exactly the same focused_element, visible_text, and url values as before. Downstream consumers (timeline aggregator, FTS indexer, content-dedup hasher) require zero changes.

Introduce a lightweight, baseline-plus-patches parser registry for the S1 capture enrichment layer. app-parser logic is now isolated behind a simple protocol (AppParser) and composed in priority order, so future app-specific parsers can be contributed one at a time without touching the core enrichment code. The generic baseline always produces focused_element, visible_text, and url=None. Registered parsers then run in priority order and may selectively override fields via S1Patch. This lets parsers compose — for example a future Linear parser can match linear.app in the URL that the browser parser already extracted. Key changes: - Move FocusedElement and parser base types to app_parsers/base.py - Move browser URL extraction into BrowserParser (app_parsers/browser.py) - Wire enrich() to run baseline + apply_parsers() from registry - Add 4 golden fixture tests (chrome_url, safari_bare_domain, generic_cursor_textarea, non_browser_url_textfield) - Add 5 registry unit tests (priority, composition, exception resilience, app_context write semantics) - Add 4 integration tests (scheduler + FTS, timeline _format_events) - autouse conftest fixture to reset registry between tests Zero behaviour change to enrich() output. All 82 existing + new tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a modular app parser registry for the S1 enrichment pipeline, refactoring the existing logic into a more extensible architecture. It migrates browser-specific URL extraction to a dedicated parser and adds a comprehensive suite of unit, integration, and golden fixture tests. Feedback suggests implementing a deep merge for "app_context" to support better parser composition and switching to "read_bytes()" for JSON loading in tests to ensure robust encoding handling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread src/openchronicle/capture/app_parsers/__init__.py

Comment thread tests/test_s1_golden.py Outdated

Comment thread tests/test_s1_integration.py Outdated

Comment thread tests/test_s1_integration.py Outdated

fix: use read_bytes() instead of read_text() for JSON loading in tests

11a4b58

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

heming-gmh mentioned this pull request Apr 26, 2026

capture: add editor and terminal S1 field extraction #13

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(capture): add S1 app parser registry with golden fixture tests#15

feat(capture): add S1 app parser registry with golden fixture tests#15
heming-gmh wants to merge 2 commits intoEinsia:mainfrom
heming-gmh:feat/s1-parser-registry-golden-tests

heming-gmh commented Apr 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heming-gmh commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

The Baseline + Patches Model

Benefits

What Changed

New files

Modified files

Not in this PR

Test Plan

Backwards Compatibility

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

heming-gmh commented Apr 26, 2026 •

edited

Loading