Feature Description
Add a visual agent avatar to Nod.ie that displays an animated character whose mouth movements are synchronized with the AI's speech output using MuseTalk technology. The avatar will be displayed in a circular frame, maintaining Nod.ie's distinctive design.
Key Requirements
-
Avatar Display
- Show the agent image (Nod.ie character) in a circular frame
- Increase window size from 120x120 to 250x250
- Maintain transparent background and frameless design
- Keep circular aesthetic with avatar masked to circle
-
Lip-Sync Animation
- Integrate MuseTalk for real-time mouth animation
- Sync mouth movements with TTS audio output from Unmute
- Gracefully fallback to static image if resources unavailable
- Support 30+ fps animation when available
-
UI/UX Considerations
- Preserve circular design language
- Maintain drag-to-move functionality
- Add avatar toggle in existing Settings menu
- Smooth transitions between animated/static states
- Show audio activity ring around avatar edge
Technical Implementation Plan
Phase 1: MuseTalk Backend Service
-
Create Docker Container
musetalk-service:
image: nodie/musetalk-api:latest
ports:
- "8765:8765"
environment:
- MODEL_PATH=/models
- FACE_SIZE=256
-
API Endpoints
POST /initialize - Load model and prepare for streaming
POST /process - Send audio chunk, receive video frame
GET /health - Check service availability
POST /cleanup - Release resources
-
Backend Architecture
- FastAPI or Flask for REST API
- WebSocket support for real-time streaming
- Queue system for frame buffering
- Automatic GPU detection and fallback
Phase 2: Frontend Integration
-
Modify Electron Window (main.js)
- Update size: 120x120 → 250x250
- Keep circular transparent design
-
Update UI (index.html)
- Add avatar container with circular mask
- Layer structure:
- Background: Avatar (static or animated)
- Overlay: Audio activity ring
- Foreground: State indicators
-
Renderer Updates (renderer.js)
- Add MuseTalk WebSocket client
- Stream audio to MuseTalk backend
- Receive and display video frames
- Handle fallback to static image
Phase 3: Audio-Video Synchronization
-
Audio Pipeline
Unmute TTS → Audio Buffer → MuseTalk API
↓
Audio Playback
-
Frame Synchronization
- Tag audio chunks with timestamps
- Buffer video frames with matching timestamps
- Synchronize playback using requestAnimationFrame
Phase 4: Settings Integration
-
Add to existing Settings dialog:
- "Show Avatar" toggle (default: on)
- "Avatar Quality" dropdown (auto/high/low)
- "Use GPU Acceleration" checkbox
-
Store preferences using existing electron-store
Phase 5: Resource Management
-
Fallback Logic
- Check MuseTalk service health on startup
- Monitor frame processing latency
- Auto-disable if latency > 500ms
- Graceful degradation to static image
-
Performance Optimization
- Implement frame skipping for low-end systems
- Cache processed frames when possible
- Limit concurrent processing requests
Docker Container Solution
MuseTalk API Container
FROM python:3.9-cuda
# Install MuseTalk dependencies
RUN pip install torch torchvision torchaudio
RUN git clone https://github.com/TMElyralab/MuseTalk.git
# Install API framework
RUN pip install fastapi uvicorn websockets
# Copy API wrapper code
COPY musetalk_api.py /app/
# Download models
RUN python -m musetalk.download_models
EXPOSE 8765
CMD ["uvicorn", "musetalk_api:app", "--host", "0.0.0.0", "--port", "8765"]
API Wrapper (musetalk_api.py)
- Handles model initialization
- Processes audio → video frame conversion
- Manages GPU/CPU fallback
- Implements frame caching
- WebSocket streaming support
Benefits of Docker Approach
- Isolation - Python/ML dependencies contained
- Portability - Works across platforms
- Scalability - Can run on separate machine
- Consistency - Same as Unmute architecture
- Optional - Users can disable if not needed
Technical Challenges & Solutions
Challenge 1: MuseTalk requires Python/ML environment
- Solution: Docker container with REST/WebSocket API, similar to Unmute backend
Challenge 2: Maintaining circular design with video
- Solution: CSS clip-path or canvas masking to create circular video viewport
Challenge 3: Real-time performance
- Solution: Adaptive quality with automatic fallback to static image
Success Criteria
- Avatar displays in 250x250 circular window
- Lip movements sync with speech when resources available
- Automatic fallback to static image when needed
- Settings toggle for avatar on/off
- Maintains <200ms latency target
- Docker container easy to deploy
Future Enhancements
- Multiple avatar options in Settings
- Facial expressions beyond lip-sync
- Custom avatar upload support
- Avatar marketplace integration
Feature Description
Add a visual agent avatar to Nod.ie that displays an animated character whose mouth movements are synchronized with the AI's speech output using MuseTalk technology. The avatar will be displayed in a circular frame, maintaining Nod.ie's distinctive design.
Key Requirements
Avatar Display
Lip-Sync Animation
UI/UX Considerations
Technical Implementation Plan
Phase 1: MuseTalk Backend Service
Create Docker Container
API Endpoints
POST /initialize- Load model and prepare for streamingPOST /process- Send audio chunk, receive video frameGET /health- Check service availabilityPOST /cleanup- Release resourcesBackend Architecture
Phase 2: Frontend Integration
Modify Electron Window (
main.js)Update UI (
index.html)Renderer Updates (
renderer.js)Phase 3: Audio-Video Synchronization
Audio Pipeline
Frame Synchronization
Phase 4: Settings Integration
Add to existing Settings dialog:
Store preferences using existing electron-store
Phase 5: Resource Management
Fallback Logic
Performance Optimization
Docker Container Solution
MuseTalk API Container
API Wrapper (musetalk_api.py)
Benefits of Docker Approach
Technical Challenges & Solutions
Challenge 1: MuseTalk requires Python/ML environment
Challenge 2: Maintaining circular design with video
Challenge 3: Real-time performance
Success Criteria
Future Enhancements