Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/pr-checks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: PR Checks

on:
pull_request:
branches: [main, master]
push:
branches: [main, master]

jobs:
test:
name: Run Tests
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest

- name: Install dependencies
run: bun install

- name: Run tests with coverage
run: npm run test:coverage

- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
if: always()
with:
files: ./coverage/coverage-final.json
fail_ci_if_error: false
continue-on-error: true

docker:
name: Docker Build
runs-on: ubuntu-latest
needs: test # Only run if tests pass

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build Docker image
run: docker build -t dewey:test .
12 changes: 3 additions & 9 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,8 @@ jobs:
- name: Install dependencies
run: bun install --frozen-lockfile

- name: Run unit tests with coverage
run: bun test --env-file=/dev/null --coverage --coverage-reporter=lcov __tests__/config.test.js __tests__/errors.test.js __tests__/job.test.js __tests__/jobQueue.test.js __tests__/migrate.test.js __tests__/utils.test.js
env:
# Set API key from GitHub secrets if available (optional - tests will skip if not set)
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

- name: Run integration tests
run: bun test --env-file=/dev/null __tests__/integration.test.js
- name: Run tests with coverage
run: npm run test:coverage
env:
# Set API key from GitHub secrets if available (optional - tests will skip if not set)
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
Expand All @@ -40,7 +34,7 @@ jobs:
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage/lcov.info
files: ./coverage/coverage-final.json
fail_ci_if_error: true

- name: Log in to GHCR
Expand Down
14 changes: 10 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,19 @@ node src/index.js
### Testing
```bash
# Run all tests
bun test
npm test

# Run tests with Jest directly
NODE_OPTIONS='--experimental-vm-modules' jest
# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm run test:coverage

# Run specific test file
NODE_OPTIONS='--experimental-vm-modules' jest __tests__/jobQueue.test.js
npm test __tests__/jobQueue.test.js

# Run tests matching a pattern
npm test -t "should migrate file job"
```

### Docker
Expand Down
5 changes: 5 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

FROM oven/bun:1.3.0-slim AS base

# Install ffmpeg for metadata extraction
RUN apt-get update && \
apt-get install -y --no-install-recommends ffmpeg && \
rm -rf /var/lib/apt/lists/*

ENV SOURCE_DIR=/data/incoming \
DEST_DIR=/data/library \
LOG_FILE=/data/logs/migrations.log \
Expand Down
71 changes: 27 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
[![GHCR Package](https://img.shields.io/badge/ghcr-dewey-blue?logo=docker)](https://github.com/masonfox/dewey/pkgs/container/dewey)
[![codecov](https://codecov.io/gh/masonfox/dewey/graph/badge.svg?token=5CR300GF1F)](https://codecov.io/gh/masonfox/dewey)

Node-based containerized watcher that organizes incoming audiobook files into a canonical library structure using Claude AI for intelligent author/title normalization.
Node-based containerized watcher that organizes incoming audiobook files into a canonical library structure using ffprobe and Claude AI for intelligent author/title normalization.

### Features
- **Smart Directory Watching**: Monitors source directory with configurable stability timeout to ensure complete uploads
- **AI-Powered Normalization**: Uses Claude AI to intelligently parse and normalize author/title from filenames
- **Multiple File Formats**: Supports `.mp3` and `.m4b` audiobook files
- **ffprobe & AI-Powered Normalization**: Uses ffprobe and Claude AI to intelligently parse and normalize author/title from filenames
- **Multiple File Formats**: Supports `.mp3` and `.m4b` audiobook files
- **Flexible Structure**: Handles both single files and multi-file directories
- **Robust Processing**: Prevents race conditions with directory stability checks and processing locks
- **Automatic Organization**: Creates clean `[Author]/[Book Title]` library structure
Expand All @@ -18,6 +18,22 @@ Node-based containerized watcher that organizes incoming audiobook files into a
- **Rate Limiting**: Built-in Claude API rate limiting with exponential backoff retry logic

### Quick Start
**Docker Compose** (Recommended)
```
services:
dewey:
image: ghcr.io/masonfox/dewey:latest
container_name: dewey
environment:
ANTHROPIC_API_KEY: sk-ant-xxxx
volumes:
- your/path/to/incoming:/data/incoming
- your/path/to/library:/data/library
- your/path/to/logs:/data/logs
restart: unless-stopped
```

**Docker Run Script**
```bash
docker run -d --name dewey \
-e ANTHROPIC_API_KEY=sk-ant-xxxx \
Expand All @@ -36,6 +52,10 @@ bun install
# Set environment variables
cp .env.example .env

# change these .env values to local paths:
SOURCE_DIR=./data/incoming
DEST_DIR=./data/library

# Run the application
bun start

Expand All @@ -50,17 +70,6 @@ ANTHROPIC_API_KEY=sk-ant-xxx bun test

Drop `.mp3`/`.m4b` files or directories into the `incoming/` directory. Dewey will automatically detect and migrate them to your organized library.

### GitHub Container Registry (GHCR)
This repository automatically publishes Docker images to GitHub Container Registry on pushes to `main`/`master` branches.

**Published Image**: `ghcr.io/masonfox/dewey:latest`

The workflow builds and publishes when changes are made to:
- `Dockerfile`
- `src/**` (source code)
- `package.json` (dependencies)
- `.github/workflows/publish.yml` (CI configuration)

Ensure your repository has Actions permissions set to `Read and write packages` in Settings β†’ Actions β†’ General.

### How It Works
Expand All @@ -71,37 +80,11 @@ Ensure your repository has Actions permissions set to `Read and write packages`
- Multi-file directories are processed as single units
- Single files are handled individually or grouped with their parent directory
- Processing locks prevent race conditions
4. **Metadata Extraction**:
- Claude AI analyzes filenames to extract normalized author and title
4. **Metadata Extraction**:
- Uses ffprobe to pull metadata from files/directories
- Uses Claude to normalize this data for correctness and consistency
- Falls back to heuristic parsing if Claude is unavailable
- Rate limiting prevents API quota exhaustion
5. **Library Organization**: Files are moved to `DEST_DIR/[Author]/[Title]/` structure
6. **Cleanup**: Source files/directories are removed after successful migration
7. **Logging**: All operations logged to console and persistent log file

### Technical Details

#### Directory Stability
- Configurable timeout (default 5s) ensures complete uploads before processing
- Prevents partial file processing during slow network transfers
- Multiple stability checks on both directory and file modification times

#### AI Integration
- Claude API validation on startup with graceful fallback
- Built-in rate limiting (45 requests/minute with buffer)
- Exponential backoff retry logic for transient failures
- Structured JSON parsing with validation

#### Error Handling
- Comprehensive error logging with context
- Graceful degradation when Claude API is unavailable
- Automatic cleanup of partial migrations on failure
- Non-fatal validation errors with detailed reporting

### Notes
- **Fallback Behavior**: If Claude API fails, uses filename-based heuristics for author/title extraction
- **Idempotent Operations**: Existing library structure is respected; duplicates are handled intelligently
- **File Preservation**: Original file extensions and quality are maintained during migration
- **Resource Efficiency**: Intelligent batching and deduplication minimize unnecessary processing


7. **Logging**: All operations logged to console and persistent log file
Loading