Sophia

Status: Voice Fingerprint & Clone Plan Approved (2026-05-22)

Scott approved the voice fingerprint plan. Awaiting his voice samples from phone side to supplement dashcam extraction.

The Current Hermes-Facing Service

The voice-agent service lives in voice-agent/. Use that deployment guide for containerized voice sidecar endpoints, Hermes dashboard integration, intent routing, and voice fingerprint enrollment.

cd voice-agent
pip install -e .
voice-agent serve --host 0.0.0.0 --port 8765 --config configs/dev.yaml

Architecture

Mobile/Web Client → Caddy (HTTPS) → Voice Agent (FastAPI, port 8765)
    ├─ STT: faster-whisper (tiny/int8, 2 CPU threads)
    ├─ VAD: WebRTC VAD
    ├─ Speaker Verification: SpeechBrain ECAPA-TDNN (spkrec-ecapa-voxceleb)
    ├─ TTS: Piper TTS / pyttsx3 fallback
    ├─ LLM Intent: OpenAI-compatible provider
    └─ Neo4j: Speaker nodes, voiceprints, captures

Voice Fingerprint & Clone Plan

See PLAN.md for the comprehensive plan.

Phases

Install SpeechBrain in container (enables real ECAPA-TDNN speaker embeddings)
Extract clean voice samples from dashcam/bodycam/audio archives
Enroll Scott's voiceprint from clean samples
Verify and tune threshold (0.1 -> 0.75)
Voice cloning for TTS -- OpenVoice -> RVC -> ElevenLabs

Technology Comparison (Voice Cloning)

Technology	Min Samples	Quality	Speed	GPU	Status
OpenVoice	3-10s ref	High	Real-time	No	Start here
RVC	1-10 min	Very High	Medium	Yes	Upgrade path
Coqui XTTS	5-30 sec ref	High	Medium	Yes	Alternative
ElevenLabs	1 min	Highest	Fastest	No	Commercial
Bark/Suno	10-30 sec	Medium	Slow	No	Free option

Data Inventory

Existing in Neo4j (memory database)

Node	Count	Notes
AudioFile	36,931	Indexed from /mnt/S/sophia-ingest/audio/
SophiaCapture	11	Voice captures with transcripts
VoiceTrainingSample	7	All from one 2000/11/27 recording (~42 sec)
VoiceIdentity	1	scott
VoiceSpeakerCluster	1	legacy:2000_1127_220512:SPEAKER_00
Speaker	1	scott

NAS Audio Sources

Source	Files	Size	Notes
audio/2024/	~134K	22GB	Year/month/day WAVs + transcriptions
audio/2025/	~116K	~20GB	Same structure
audio/2026/	~27K	~5GB	Same structure
audio/2000-2013/	~200	~2GB	Legacy recordings
dashcam/2024/	133K	6.3TB	MP4 video + metadata
dashcam/2025/	116K	5.9TB	MP4 video + metadata
dashcam/2026/	27K	1.3TB	MP4 video + metadata
bodycam MOVI0000.avi	1	98MB	Has RTTM (4 speakers)
bodycam MOVI0002.avi	1	484MB	Has RTTM data

Setup

cd voice-agent
pip install -e .

Install SpeechBrain (required for speaker verification)

docker exec voice-agent-sophia-voice-1 pip install speechbrain torchaudio

Container deployment

cd voice-agent
docker compose up --build
curl http://127.0.0.1:8765/healthz
curl http://127.0.0.1:8765/status

Voiceprint enrollment

docker exec voice-agent-sophia-voice-1 python /app/scripts/voice_insight.py \
  --config /app/configs/voice_insight.yaml init-schema

docker exec voice-agent-sophia-voice-1 python /app/scripts/voice_insight.py \
  --config /app/configs/voice_insight.yaml export-training-clips --identity scott --limit 100

docker exec voice-agent-sophia-voice-1 python /app/scripts/voice_insight.py \
  --config /app/configs/voice_insight.yaml build-voiceprint --identity scott

Hermes Integration

Hermes has a bundled sophia_voice plugin. Set the sidecar URL before launching:

export SOPHIA_VOICE_URL=http://127.0.0.1:8765
hermes dashboard --tui

The plugin exposes:

sophia_voice_status -- Sidecar status, model profile, protocol
sophia_voice_intent -- Classify transcript, return Hermes prompt
sophia_voice_chat -- Voice chat through Sophia's Hermes-aware path
sophia_voice_events -- Recent voice sidecar events for debugging

Legacy

The older scripts below are preserved as historical experiments.

Sophia V0.1 (original dictation tool)

pip install -r requirements.txt
cd sophia_V.01
python sophia.py

Sophia V0.2 (DeepSpeech)

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer
python sophia_V.02/Sophia_io.py -m deepspeech-0.7.0-models.pbmm -s deepspeech-0.7.0-models.scorer

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
voice-agent		voice-agent
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
Sophia_io.py		Sophia_io.py
UNIFICATION.md		UNIFICATION.md
app_debug_js.js		app_debug_js.js
design_plan.md		design_plan.md
requirements.txt		requirements.txt
script.log		script.log
selenium.py		selenium.py
setup.py		setup.py
start.sh		start.sh
venv.sh		venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sophia

Status: Voice Fingerprint & Clone Plan Approved (2026-05-22)

The Current Hermes-Facing Service

Architecture

Voice Fingerprint & Clone Plan

Phases

Technology Comparison (Voice Cloning)

Data Inventory

Existing in Neo4j (memory database)

NAS Audio Sources

Setup

Install SpeechBrain (required for speaker verification)

Container deployment

Voiceprint enrollment

Hermes Integration

Legacy

Sophia V0.1 (original dictation tool)

Sophia V0.2 (DeepSpeech)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sophia

Status: Voice Fingerprint & Clone Plan Approved (2026-05-22)

The Current Hermes-Facing Service

Architecture

Voice Fingerprint & Clone Plan

Phases

Technology Comparison (Voice Cloning)

Data Inventory

Existing in Neo4j (memory database)

NAS Audio Sources

Setup

Install SpeechBrain (required for speaker verification)

Container deployment

Voiceprint enrollment

Hermes Integration

Legacy

Sophia V0.1 (original dictation tool)

Sophia V0.2 (DeepSpeech)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages