Skip to content

Phase 4: Transcription + App-Aware Shortcuts #13

@sonnes

Description

@sonnes

Goal

Enable two-way communication: others speak, September transcribes. The keyboard becomes context-aware, showing shortcuts specific to the focused application.

Design Reference

Settings — Transcription

Transcription Settings

App Shortcuts (rightmost keyboard section)

The keyboard assembly shows the App Shortcuts panel on the far right, displaying the focused app name (e.g., "VS Code") and context-aware shortcuts.

Keyboard with App Shortcuts

Deliverables

Transcription Engines

  • Apple SpeechSFSpeechRecognizer for on-device transcription. Zero configuration, private
  • Whisper API — OpenAI Whisper via URLSession. Requires API key. Higher accuracy
  • whisper.cpp — local inference via SPM wrapper (e.g., whisper-kit or swift-whisper). Offline, fast. Bundle a small model (base or small)

Talk Mode

  • Transcription display — live text appearing as others speak, shown in the predictions area or a dedicated transcript view
  • Continuous listening — microphone stays active between pauses (togglable)
  • Auto-punctuation — automatically adds periods, commas, question marks (togglable)
  • Context pipeline — transcribed text feeds into the AI prediction engine as conversation context, improving sentence predictions

Settings — Transcription

  • Engine selection — 3 cards:
    • Apple Speech — icon (microphone), subtitle "On-device, private"
    • Whisper — icon (sparkles), subtitle "OpenAI, accurate"
    • Whisper.cpp — icon (chip), subtitle "Local, fast"
  • Language dropdown — "English (US)" default, populated from engine capabilities
  • Auto-Punctuation toggle — green when ON, gray when OFF, with subtitle "Automatically add periods, commas, and question marks"
  • Continuous Listening toggle — with subtitle "Keep microphone active between pauses"

App Observer

  • NSWorkspace observer — listen to NSWorkspace.didActivateApplicationNotification to detect frontmost app changes
  • Bundle ID mapping — map app bundle identifiers to shortcut sets

App Shortcuts Section

  • Dynamic shortcut loading — when the focused app changes, update the App Shortcuts section (rightmost 200pt panel)
  • Focused app header — show app icon + name (e.g., "VS Code" with green dot)
  • Built-in shortcut sets for common apps:
    • VS Code — Toggle Terminal, Command Palette, Go to File, Split Editor, Toggle Sidebar, Quick Fix
    • Safari — New Tab, Close Tab, Reload, Back, Forward, Show Bookmarks
    • Notes — New Note, Bold, Italic, Checklist, Table, Find
    • Mail — New Message, Reply, Forward, Archive, Flag
    • Finder — New Window, New Folder, Get Info, Quick Look
  • Shortcut execution — tapping a shortcut injects the key combination into the focused app via CGEvent
  • Uses ShortcutButton and ShortcutFull components from Phase 1

Acceptance Criteria

  • Apple Speech transcribes spoken audio in real-time with visible text output
  • Transcribed text improves AI prediction relevance
  • Auto-punctuation adds periods at sentence boundaries
  • Continuous listening keeps microphone active across pauses
  • Switching to VS Code updates the App Shortcuts panel with VS Code shortcuts
  • Switching to Safari shows Safari shortcuts
  • Tapping an app shortcut executes it in the focused app
  • Settings toggles persist via SwiftData

Dependencies

  • Phase 1 (keyboard layout, App Shortcuts section placeholder, CGEvent injection)
  • Phase 2 (AI prediction engine — transcriptions feed into prediction context)

Metadata

Metadata

Assignees

No one assigned

    Labels

    phase-4Transcription + ShortcutsswiftmacOS Swift app

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions