Skip to content

Implement Synix v0.1 - declarative pipeline for AI conversation exports#1

Open
marklubin wants to merge 2 commits intomainfrom
v0.1-implementation
Open

Implement Synix v0.1 - declarative pipeline for AI conversation exports#1
marklubin wants to merge 2 commits intomainfrom
v0.1-implementation

Conversation

@marklubin
Copy link
Copy Markdown
Owner

Summary

This PR implements the complete Synix v0.1 pipeline system, including both Phase 1a (core pipeline) and Phase 1b (extended capabilities).

Phase 1a - Core Pipeline

  • Two-layer storage architecture (control.db + artifacts.db)
  • TransformStep (1:1 processing) and AggregateStep (N:1 with time-based grouping)
  • Claude and ChatGPT export source parsers
  • FTS5 full-text search with automatic triggers
  • Incremental processing via materialization keys (skip already-processed records)
  • CLI commands: init, run, status, search, plan, runs

Phase 1b - Extended Capabilities

  • FoldStep: Sequential processing with state accumulation (like reduce/fold)
  • MergeStep: Fan-in operation combining records from multiple upstream sources
  • Artifact publishing: File surface for exporting results to filesystem
  • CLI export command: synix export <step> [--format markdown/json/text]

Example Usage

from synix import Pipeline

pipeline = Pipeline("personal-memory", agent="mark")

# Multiple sources
pipeline.source("claude", file="exports/claude.json", format="claude-export")
pipeline.source("chatgpt", file="exports/chatgpt.json", format="chatgpt-export")

# Parallel transforms
pipeline.transform("claude-sum", from_="claude", prompt=summarize)
pipeline.transform("gpt-sum", from_="chatgpt", prompt=summarize)

# Merge multiple sources
pipeline.merge("all-summaries", sources=["claude-sum", "gpt-sum"], prompt=combine)

# Fold for evolving narrative
pipeline.fold("narrative", from_="all-summaries", prompt=evolve, initial_state="")

# Publish artifact
pipeline.artifact("report", from_="narrative", surface="file://output/report.md")

pipeline.run()

Architecture

src/synix/
├── cli.py              # Click CLI commands
├── config.py           # Pydantic settings
├── db/                 # SQLAlchemy models (control + artifacts)
├── llm/                # OpenAI-compatible LLM client
├── pipeline.py         # Core Pipeline class
├── services/           # Business logic (records, search, runs)
├── sources/            # Claude/ChatGPT parsers
├── steps/              # Transform, Aggregate, Fold, Merge
└── surfaces/           # Artifact publishing (file://)

Test Plan

  • 115 tests passing (72 unit + 43 E2E)
  • Linting passes (ruff check)
  • Format check passes (ruff format --check)

Synix is a text processing pipeline system for converting AI conversation
exports into searchable, structured memory artifacts.

Phase 1a - Core Pipeline:
- Two-layer storage (control.db + artifacts.db)
- TransformStep (1:1) and AggregateStep (N:1 with grouping)
- Claude and ChatGPT source parsers
- FTS5 search with triggers
- Incremental processing via materialization keys
- CLI: init, run, status, search, plan, runs

Phase 1b - Extended Capabilities:
- FoldStep for sequential state accumulation
- MergeStep for combining multiple upstream sources
- Artifact publishing to filesystem (file:// surface)
- CLI export command

Example usage:
    pipeline = Pipeline("personal-memory", agent="mark")
    pipeline.source("claude", file="exports/claude.json", format="claude-export")
    pipeline.transform("summaries", from_="claude", prompt=summarize)
    pipeline.merge("combined", sources=["a", "b"], prompt=combine)
    pipeline.fold("narrative", from_="combined", prompt=evolve)
    pipeline.artifact("report", from_="narrative", surface="file://output/report.md")
    pipeline.run()

Tests: 115 passing (72 unit + 43 E2E)
Establishes project guide for agent continuity across sessions:
- Development commands and architecture reference
- Prioritized backlog tracking (drill-down API, semantic search, branching)
- Session journaling protocol with first entry documenting Phase 1b
- Execution tracking checklist mapped to DESIGN.md sections
marklubin added a commit that referenced this pull request Mar 10, 2026
Closes #62

P0 trust/correctness:
- Resolve relative source_dir/build_dir against pipeline file, not cwd (#3)
- Clear synix_dir on --build-dir override to prevent stale routing (#4)
- Propagate source load failures instead of silently succeeding (#5)
- Add Layer.level read-only property to fix info crash (#8)
- Rewrite info/status to read .synix/ snapshot store, not legacy build/ (#9)
- Diff uses RefStore run history instead of legacy versions/ dir (#11)

P1 operator consistency:
- Planner uses estimated-count placeholders for downstream cardinality (#1)
- Standardize invalid ref handling to sys.exit(1) across all inspectors (#10)
- Clean also removes refs/releases/ ref files (#12)

P2 docs/discoverability:
- Mesh commands honor SYNIX_MESH_ROOT env var via resolve_mesh_root() (#2)
- Batch planner tracks DAG cardinality instead of estimate_output_count(1) (#6)
- Fix llms.txt diff syntax to match actual CLI (#7)
- Add refs/plans to refs list prefix scan (#13)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant