Atlas

Advanced Technical Library and Archival System

Historical design documentation for The OASIS Project. These documents capture architectural decisions, completed feature designs, and implementation records that shaped the project. They are preserved here for reference after being retired from active repositories.

DAWN Archive

Design documents from the DAWN voice assistant, organized by subsystem.

Core Architecture

Document	Description
MULTI_THREADED_CORE_DESIGN	Multi-threaded core: session manager, worker pool, per-session history, metrics
UNIFIED_COMMAND_PLAN	Unified command registry replacing fragmented callback system
UNIFIED_LOGGING_DESIGN	Canonical logging.h/logging.c shared byte-identically across DAWN, ECHO, MIRAGE, and STAT (OLOG_* namespace, syslog + console suppression + ms timestamps, copy-based sync)
CONFIG_FILE_DESIGN	Full TOML configuration schema (dawn.toml, secrets.toml)
CONFIG_SYSTEM_PLAN	Config system core infrastructure (Phase 1)
PERFORMANCE_ANALYSIS	Benchmark data vs industry (ASR latency, LLM throughput, end-to-end)
SECURITY_AUDIT	Static code audit: 15 findings across ~58K LOC (Dec 2025)
USER_AUTH_DESIGN	User authentication: deployment modes, dawn-admin CLI, setup wizard, multi-user, RBAC, session mgmt, audit log, IP rate limit, DAP2 registration key (Phases 0–4)
TUI_IMPLEMENTATION_PLAN	Console TUI for real-time monitoring and statistics

Speech and Audio

Document	Description
DAWN_ASR_UPGRADE_PLAN	Vosk-to-Whisper ASR migration plan
VAD_IMPLEMENTATION_NOTES	Silero VAD model selection rationale and Week 1 implementation
PHASE_2_3_IMPLEMENTATION_PLAN	Streaming ASR with Silero VAD + Whisper chunking (v1)
PHASE_2_3_REVISED_PLAN	Revised plan after whisper.cpp investigation (v2)
PHASE_2_3_FINAL_DECISIONS	Final implementation decisions — architecture review 9.0/10
AEC_DELAY_CALIBRATION	Auto-calibrate AEC delay using TTS boot greeting
AEC_IMPLEMENTATION_STATUS	Native 48kHz AEC with WebRTC AEC3 — working state
AEC_IMPLEMENTATION_GUIDE	WebRTC AEC3 setup, resampling strategy, tuning parameters

LLM Integration

Document	Description
STREAMING	SSE streaming for OpenAI, Claude, and llama.cpp
STREAMING_ARCHITECTURE	Streaming response architecture diagram and flow
LLM_INTERRUPT_IMPLEMENTATION	Non-blocking LLM interrupt with threading architecture
LLM_RATE_LIMIT_DESIGN	Client-side rate limiter (sliding window, RPM-based)
LLAMA_SERVER_OPTIMIZATION	llama.cpp server tuning for Jetson
MODEL_CONFIG_SYSTEM	Model-specific parameter optimization system
NATIVE_TOOLS_PLAN	Native tool/function calling implementation
COMMAND_TAGS_DYNAMIC_GENERATION_PLAN	Dynamic command tag generation from tool registry
SUMMARIZER_TFIDF_PLAN	Summarizer fallback fix + TF-IDF extractive summarization

WebUI

Document	Description
WEBUI_DESIGN	WebUI architecture and feature documentation
WEBUI_AESTHETIC_PLAN	"Stark-Grade" visual overhaul plan
WEBUI_SETTINGS_PLAN	Settings panel implementation
WEBUI_VISION_DESIGN	Vision/image upload support design
WEBUI_VISION_NEXT_STEPS	Vision feature implementation status
WEBUI_IMAGE_STORAGE_DESIGN	Image storage strategy (replacing inline base64)
CONVERSATION_HISTORY_DESIGN	Per-user conversation history UI
CONVERSATION_EXPORT_DESIGN	JSON conversation export
WEBUI_ALWAYS_ON_PLAN	Always-on continuous voice listening (server-side VAD, wake word, unified action button)

Satellite and Protocol

Document	Description
DAP2_DESIGN	Dawn Audio Protocol 2.0 — full design (Phases 0-4)
PLEX_INTEGRATION_DESIGN	Plex Media Server music streaming integration
SMARTTHINGS	SmartThings OAuth integration (blocked at AWS WAF)

Memory Subsystem

Memory has its own subdirectory at dawn/memory/ — see the memory README for an annotated index. Entries below are the canonical pointers from the top-level Atlas index.

Document	Description
STATE	Living snapshot. Current state of the memory subsystem — recently shipped, benchmark position, short/medium/long-term workstreams in priority order. Read first when starting memory-focused work.
SYSTEM_DESIGN	Persistent memory: entity graph, relations, facts, semantic embeddings, hybrid search, contacts, entity merge, retrieval benchmarking (Phases 1–6.7 + S4 + 13)
INJECTION_FILTER	Memory injection filter: shared blocklist module with Unicode normalization (homoglyphs, accents, fullwidth, invisible chars), ~118 patterns across 17 categories, data-marking framing, all-path coverage (tool/extraction/import), 137 unit tests
EMBEDDING_UPGRADE	bge-small-en-v1.5-int8 model swap + tech-debt cache-invalidation fix + ID-based extraction filter + per-user embedding recompute worker (schema v41). Lifted LoCoMo overall +7.9pp, LongMemEval R@5 +1.4pp. Cross-encoder reranker was Feature 2 of the original plan — investigated and reverted (see RERANKER_INVESTIGATION).
CAT2_TEMPORAL	Cat-2 temporal extraction collapse (LoCoMo): failure-mode taxonomy (A1/A2/B/C/D/E), Phase 1 = conversation anchor injection (schema v42 `conversations.anchor_date`), live result `recall_generation` cat-2 0.022 → 0.321 (+29.9pp), overall +7.1pp. Phase 2 (`event_when` field) re-scoped after Phase 1 exceeded the original L1+L2 mid-projection. Companion: CAT2_TEMPORAL_INVESTIGATION_PLAN
RERANKER_INVESTIGATION	Cross-encoder reranker investigation: implemented (ms-marco-MiniLM-L-6-v2 int8 ONNX with CUDA EP, integration across memory + RAG paths, 5 config keys) then reverted after empirical results and literature review showed no net benefit on conversational data and only marginal lift on LongMemEval at 10× latency. Kept artifacts: shared WordPiece tokenizer (`memory_embed_tokenizer`), `rerank_shootout.py` test harness
LOCOMO_CAT3_PROFILING	LoCoMo cat-3 failure-mode profiling, session-neighbor boost (Tier 2 quick win, +3.0pp dialog overall / +20.0pp cat-3), and memory-pipeline bench mode (Tier 1, Phase 0/1/1.5): end-to-end LoCoMo evaluation against extracted memory at production parity, `recall_reach` metric, Haiku 4.5 result of 0.742 / 0.646 cat-3 (+9.3pp / +20pp over dialog baseline). Identifies retrieval vs answer-support framing for closing the gap to leaders
COMPETITOR_LANDSCAPE	Frozen reference of published numbers and methodology observations from competing memory-retrieval systems on LongMemEval, LoCoMo, and ConvoMem. No DAWN-side numbers — durable competitor research that doesn't rot with shipments. Includes the MemPalace methodology audit and the retrieval-vs-end-to-end-QA distinction.

RAG (Document Search)

Document	Description
RAG_DESIGN	Document search / RAG: chunking, embeddings, hybrid semantic+keyword search, WebUI Document Library, admin management, `document_index` URL tool. Sibling to memory — shares `embedding_engine.c` but operates on `document_chunks`, not `memory_facts`.

Scheduler and Tools

Document	Description
SCHEDULER_DESIGN	Timers, alarms, reminders, scheduled tasks — fully implemented
TOOL_PLAN_EXECUTOR_DESIGN	Plan executor DSL: multi-step tool orchestration, conditionals, loops, variable binding, safety controls, sleep step
WEB_IMAGE_SEARCH	Image search tool: SearXNG, curl_multi concurrent fetch, SSRF DNS pinning, magic byte validation, LRU cache, WebUI lightbox
CALDAV_DESIGN	CalDAV calendar integration (multi-account, RFC 4791, Google OAuth, RRULE)
EMAIL_DESIGN	Email integration (IMAP/SMTP, Gmail REST API, multi-account, 10 LLM actions)
TWO_STEP_TOOL_PATTERN	Two-step tool pattern: load guidelines then execute (used by render_visual)
VISUAL_RENDERING_TOOL	Visual rendering tool: inline SVG/HTML diagrams via LLM tool calling, progress indicator investigation

Coding Harness (Code Projects + MCP Bridge)

Document Description

CODING_HARNESS_DESIGN Code-projects subsystem: HTTP+SSE MCP bridge to an operator-launched cbm (codebase-memory-mcp) code-graph server, in-process libgit2 clone/fetch/checkout, per-project name-translation boundary, WebUI Coding popover + dawn-admin CLI. Phase 1 (clone-and-index) + Phase 2 (branch tracking, link-local repos, refresh-vs-rebuild, multi-project namemap, cbm sharing with Claude Code). Default-OFF (DAWN_ENABLE_CODE_PROJECTS); no-subprocess invariant CI-enforced. Consolidates the Phase 1 plan, Phase 2 plan, and cbm-sharing note.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
dawn		dawn
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Atlas

DAWN Archive

Core Architecture

Speech and Audio

LLM Integration

WebUI

Satellite and Protocol

Memory Subsystem

RAG (Document Search)

Scheduler and Tools

Coding Harness (Code Projects + MCP Bridge)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Atlas

DAWN Archive

Core Architecture

Speech and Audio

LLM Integration

WebUI

Satellite and Protocol

Memory Subsystem

RAG (Document Search)

Scheduler and Tools

Coding Harness (Code Projects + MCP Bridge)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages