Skip to content

The-OASIS-Project/atlas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 

Repository files navigation

Atlas

Advanced Technical Library and Archival System

Historical design documentation for The OASIS Project. These documents capture architectural decisions, completed feature designs, and implementation records that shaped the project. They are preserved here for reference after being retired from active repositories.

DAWN Archive

Design documents from the DAWN voice assistant, organized by subsystem.

Core Architecture

Document Description
MULTI_THREADED_CORE_DESIGN Multi-threaded core: session manager, worker pool, per-session history, metrics
UNIFIED_COMMAND_PLAN Unified command registry replacing fragmented callback system
UNIFIED_LOGGING_DESIGN Canonical logging.h/logging.c shared byte-identically across DAWN, ECHO, MIRAGE, and STAT (OLOG_* namespace, syslog + console suppression + ms timestamps, copy-based sync)
CONFIG_FILE_DESIGN Full TOML configuration schema (dawn.toml, secrets.toml)
CONFIG_SYSTEM_PLAN Config system core infrastructure (Phase 1)
PERFORMANCE_ANALYSIS Benchmark data vs industry (ASR latency, LLM throughput, end-to-end)
SECURITY_AUDIT Static code audit: 15 findings across ~58K LOC (Dec 2025)
USER_AUTH_DESIGN User authentication: deployment modes, dawn-admin CLI, setup wizard, multi-user, RBAC, session mgmt, audit log, IP rate limit, DAP2 registration key (Phases 0–4)
TUI_IMPLEMENTATION_PLAN Console TUI for real-time monitoring and statistics

Speech and Audio

Document Description
DAWN_ASR_UPGRADE_PLAN Vosk-to-Whisper ASR migration plan
VAD_IMPLEMENTATION_NOTES Silero VAD model selection rationale and Week 1 implementation
PHASE_2_3_IMPLEMENTATION_PLAN Streaming ASR with Silero VAD + Whisper chunking (v1)
PHASE_2_3_REVISED_PLAN Revised plan after whisper.cpp investigation (v2)
PHASE_2_3_FINAL_DECISIONS Final implementation decisions — architecture review 9.0/10
AEC_DELAY_CALIBRATION Auto-calibrate AEC delay using TTS boot greeting
AEC_IMPLEMENTATION_STATUS Native 48kHz AEC with WebRTC AEC3 — working state
AEC_IMPLEMENTATION_GUIDE WebRTC AEC3 setup, resampling strategy, tuning parameters

LLM Integration

Document Description
STREAMING SSE streaming for OpenAI, Claude, and llama.cpp
STREAMING_ARCHITECTURE Streaming response architecture diagram and flow
LLM_INTERRUPT_IMPLEMENTATION Non-blocking LLM interrupt with threading architecture
LLM_RATE_LIMIT_DESIGN Client-side rate limiter (sliding window, RPM-based)
LLAMA_SERVER_OPTIMIZATION llama.cpp server tuning for Jetson
MODEL_CONFIG_SYSTEM Model-specific parameter optimization system
NATIVE_TOOLS_PLAN Native tool/function calling implementation
COMMAND_TAGS_DYNAMIC_GENERATION_PLAN Dynamic command tag generation from tool registry
SUMMARIZER_TFIDF_PLAN Summarizer fallback fix + TF-IDF extractive summarization

WebUI

Document Description
WEBUI_DESIGN WebUI architecture and feature documentation
WEBUI_AESTHETIC_PLAN "Stark-Grade" visual overhaul plan
WEBUI_SETTINGS_PLAN Settings panel implementation
WEBUI_VISION_DESIGN Vision/image upload support design
WEBUI_VISION_NEXT_STEPS Vision feature implementation status
WEBUI_IMAGE_STORAGE_DESIGN Image storage strategy (replacing inline base64)
CONVERSATION_HISTORY_DESIGN Per-user conversation history UI
CONVERSATION_EXPORT_DESIGN JSON conversation export
WEBUI_ALWAYS_ON_PLAN Always-on continuous voice listening (server-side VAD, wake word, unified action button)

Satellite and Protocol

Document Description
DAP2_DESIGN Dawn Audio Protocol 2.0 — full design (Phases 0-4)
PLEX_INTEGRATION_DESIGN Plex Media Server music streaming integration
SMARTTHINGS SmartThings OAuth integration (blocked at AWS WAF)

Memory Subsystem

Memory has its own subdirectory at dawn/memory/ — see the memory README for an annotated index. Entries below are the canonical pointers from the top-level Atlas index.

Document Description
STATE Living snapshot. Current state of the memory subsystem — recently shipped, benchmark position, short/medium/long-term workstreams in priority order. Read first when starting memory-focused work.
SYSTEM_DESIGN Persistent memory: entity graph, relations, facts, semantic embeddings, hybrid search, contacts, entity merge, retrieval benchmarking (Phases 1–6.7 + S4 + 13)
INJECTION_FILTER Memory injection filter: shared blocklist module with Unicode normalization (homoglyphs, accents, fullwidth, invisible chars), ~118 patterns across 17 categories, data-marking framing, all-path coverage (tool/extraction/import), 137 unit tests
EMBEDDING_UPGRADE bge-small-en-v1.5-int8 model swap + tech-debt cache-invalidation fix + ID-based extraction filter + per-user embedding recompute worker (schema v41). Lifted LoCoMo overall +7.9pp, LongMemEval R@5 +1.4pp. Cross-encoder reranker was Feature 2 of the original plan — investigated and reverted (see RERANKER_INVESTIGATION).
CAT2_TEMPORAL Cat-2 temporal extraction collapse (LoCoMo): failure-mode taxonomy (A1/A2/B/C/D/E), Phase 1 = conversation anchor injection (schema v42 conversations.anchor_date), live result recall_generation cat-2 0.022 → 0.321 (+29.9pp), overall +7.1pp. Phase 2 (event_when field) re-scoped after Phase 1 exceeded the original L1+L2 mid-projection. Companion: CAT2_TEMPORAL_INVESTIGATION_PLAN
RERANKER_INVESTIGATION Cross-encoder reranker investigation: implemented (ms-marco-MiniLM-L-6-v2 int8 ONNX with CUDA EP, integration across memory + RAG paths, 5 config keys) then reverted after empirical results and literature review showed no net benefit on conversational data and only marginal lift on LongMemEval at 10× latency. Kept artifacts: shared WordPiece tokenizer (memory_embed_tokenizer), rerank_shootout.py test harness
LOCOMO_CAT3_PROFILING LoCoMo cat-3 failure-mode profiling, session-neighbor boost (Tier 2 quick win, +3.0pp dialog overall / +20.0pp cat-3), and memory-pipeline bench mode (Tier 1, Phase 0/1/1.5): end-to-end LoCoMo evaluation against extracted memory at production parity, recall_reach metric, Haiku 4.5 result of 0.742 / 0.646 cat-3 (+9.3pp / +20pp over dialog baseline). Identifies retrieval vs answer-support framing for closing the gap to leaders
COMPETITOR_LANDSCAPE Frozen reference of published numbers and methodology observations from competing memory-retrieval systems on LongMemEval, LoCoMo, and ConvoMem. No DAWN-side numbers — durable competitor research that doesn't rot with shipments. Includes the MemPalace methodology audit and the retrieval-vs-end-to-end-QA distinction.

RAG (Document Search)

Document Description
RAG_DESIGN Document search / RAG: chunking, embeddings, hybrid semantic+keyword search, WebUI Document Library, admin management, document_index URL tool. Sibling to memory — shares embedding_engine.c but operates on document_chunks, not memory_facts.

Scheduler and Tools

Document Description
SCHEDULER_DESIGN Timers, alarms, reminders, scheduled tasks — fully implemented
TOOL_PLAN_EXECUTOR_DESIGN Plan executor DSL: multi-step tool orchestration, conditionals, loops, variable binding, safety controls, sleep step
WEB_IMAGE_SEARCH Image search tool: SearXNG, curl_multi concurrent fetch, SSRF DNS pinning, magic byte validation, LRU cache, WebUI lightbox
CALDAV_DESIGN CalDAV calendar integration (multi-account, RFC 4791, Google OAuth, RRULE)
EMAIL_DESIGN Email integration (IMAP/SMTP, Gmail REST API, multi-account, 10 LLM actions)
TWO_STEP_TOOL_PATTERN Two-step tool pattern: load guidelines then execute (used by render_visual)
VISUAL_RENDERING_TOOL Visual rendering tool: inline SVG/HTML diagrams via LLM tool calling, progress indicator investigation

Coding Harness (Code Projects + MCP Bridge)

Document Description
CODING_HARNESS_DESIGN Code-projects subsystem: HTTP+SSE MCP bridge to an operator-launched cbm (codebase-memory-mcp) code-graph server, in-process libgit2 clone/fetch/checkout, per-project name-translation boundary, WebUI Coding popover + dawn-admin CLI. Phase 1 (clone-and-index) + Phase 2 (branch tracking, link-local repos, refresh-vs-rebuild, multi-project namemap, cbm sharing with Claude Code). Default-OFF (DAWN_ENABLE_CODE_PROJECTS); no-subprocess invariant CI-enforced. Consolidates the Phase 1 plan, Phase 2 plan, and cbm-sharing note.

About

Advanced Technical Library and Archival System (Documentation)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors