Accessibility-first spatial companion for blind and low-vision (BLV) users. Runs fully on-device on Apple Vision Pro.
v0.1 in planning → Sprint 1.1 starting. See docs/plans/manifest.md for the spec map and docs/plans/progress.md for execution state.
Four modes, zero voice input, all local inference:
| Mode | Trigger | What it does |
|---|---|---|
| Scene | Digital Crown → scene + button |
Describes what's in front of you via on-device detector + TTS |
| Recall | Crown → recall + button |
1st press: Gemma-generated room gestalt. 2nd press within 5 s: top-N nearest annotations |
| Read | Crown → read + button |
Captures passthrough, runs VNRecognizeTextRequest, speaks the text |
| Silent | Crown → silent |
Mutes normal speech; ambient cold-anchor speech and chimes still fire |
Plus ambient announce: spatial chime on world-anchor novelty, full speech only on cold (>30 min unseen) anchor AND idle user (no gesture/head-motion in last 5 min).
Two Swift products in Package.swift:
| Module | Role |
|---|---|
EchoSightCore |
Engine: ARKit, perception, speech, storage. No SwiftUI. Fully unit-testable. |
EchoSightApp |
UI shell: SwiftUI views, WindowGroup, ImmersiveSpace, glue code. Depends on Core. |
The split keeps EchoSightCore importable in test targets without pulling in the app lifecycle.
- Zero external network at inference time. All models ship or download once at first launch (~2.6 GB).
- Layer 1 always-on. Scene reconstruction + world tracking run continuously with negligible power cost.
- Annotations anchored to world. Persistent visionOS world anchors back cross-session recall.
- Single
SpeechOutactor ownsAVSpeechSynthesizerwith priority slots (.highpreempts.normal, ducks via.duckOthers).
xcodebuild -scheme EchoSight -destination 'generic/platform=visionOS' build | xcbeautify
xcodebuild test -scheme EchoSight -destination 'platform=visionOS Simulator,name=Apple Vision Pro' | xcbeautifyRequires Xcode 16+, visionOS 2.0 SDK, Apple Vision Pro (or simulator) for runtime verification.
- Zero network inference. All models run on-device (local inference only).
- Photos: full access required (passthrough capture reads camera frames via visionOS 2 API).
- Microphone: reserved in Info.plist; not used in v0.1.
- World sensing: required for ARKit mesh reconstruction and anchor persistence.
- Log exports hash anchor UUIDs (SHA-256). No frames, OCR text, or prompts leave the device.
docs/plans/— manifest, 16 spec files, findings, workflow, progress logEchoSight-AVP-Brainstorm.md— original brainstorm (planning input, 2026-04-11)archive/legacy-web/— previous web-app version of the project (Django backend + React frontend).claude/— project-local Claude Code config and research cache
This repository previously hosted a Django + React web-app prototype of EchoSight (2023). Earlier work is preserved under archive/legacy-web/ with full git history via git log --follow. The visionOS rewrite begins 2026-04-21 as a greenfield native app.
Wesley Shih, Pranathi Mokshagundam, Chaitra Vishweshwaraiah (original web app). Wesley Shih (visionOS rewrite).
EchoSight's screenshot-based capture mechanism is inspired by PeekABoo by Om Chachad -- specifically the pattern of observing the Photos library for new screenshots to work around the lack of a direct camera API on visionOS. We reimplemented the mechanism internally to keep our dependency surface minimal, but the architectural idea is theirs.