Skip to content

josevelaz/mp3-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MP3 Parser API

An API service that analyzes MP3 files and returns the number of audio frames. Specifically designed for MPEG Version 1 Audio Layer 3 files.

Prerequisites

  • Bun v1.0.0 or higher

Setup

  1. Clone the repository:
git clone <repository-url>
cd mp3-parser
  1. Install dependencies:
bun install
  1. (Optional) Configure the port:
export PORT=3000  # Default is 3000

Running the Server

# Development with hot reload
bun run dev

# Production
bun run start

The server runs on http://localhost:3000 by default.

API Usage

POST /file-upload

Upload an MP3 file to get the frame count.

Request:

  • Content-Type: multipart/form-data
  • Body: form field named file containing the MP3 file

Response:

{
  "frameCount": 1234
}

Example with curl

curl -X POST http://localhost:3000/file-upload \
  -F "file=@/path/to/your/file.mp3"

Error Responses

Status Description
400 Missing file, empty file, or invalid MP3
415 Unsupported format (not MPEG1 Layer 3)
500 Internal server error

Error format:

{
  "error": "Error message here"
}

Testing

Run the test suite:

bun test

Validation

To validate the API is returning accurate frame counts, use the included sample file:

Quick Validation

  1. Start the server:
bun run start
  1. In another terminal, upload the sample file:
curl -X POST http://localhost:3000/file-upload \
  -F "file=@sample_file.mp3"

Expected output:

{"frameCount":6089}

Verification with mediainfo

Use mediainfo to verify the frame count matches exactly:

mediainfo --fullscan sample_file.mp3 | grep "Frame count"
# Output: Frame count : 6089

The API returns exactly 6089 frames, matching mediainfo 100%.

Sample File Details

Property Value
Format MPEG Version 1 Layer III
Bitrate VBR (~64 kbps average)
Sample Rate 44.1 kHz
Channels Stereo
Duration ~159 seconds
Expected Frames 6089

Design Decisions

Why No HTTP Framework?

This project uses Bun's native Bun.serve() API directly rather than a framework like Express or Hono. Here's why:

  1. Bun.serve() is production-ready: Bun's HTTP server is built on top of a highly optimized C++ implementation. It handles routing, request/response lifecycle, and even WebSockets out of the box.

  2. Minimal overhead: For a focused API with a single endpoint, adding a framework introduces unnecessary abstraction. The native API gives us exactly what we need:

    • Route registration with method-specific handlers
    • Request/Response objects that follow web standards (Fetch API)
    • Built-in support for multipart/form-data parsing
  3. Type safety without wrappers: The Request and Response types are standard web APIs, well-documented and familiar to anyone who's worked with modern JavaScript.

  4. Educational clarity: This project demonstrates how HTTP routing works at its core—a Map of paths to handlers with method discrimination. Frameworks often obscure these fundamentals.

The HTTPServer class in this project is a thin wrapper (~100 lines) that provides:

  • Chainable route registration (server.post("/upload", handler))
  • Centralized error handling for domain-specific errors
  • Clean separation between HTTP infrastructure and business logic

How the MP3 Parser Works

The parser implements frame counting for MPEG Version 1 Layer 3 audio files according to the ISO/IEC 11172-3 MPEG Audio specification. For a practical reference, see the MP3 Frame Header documentation.

MP3 File Structure Overview

An MP3 file is a sequence of frames, where each frame contains:

  1. A 4-byte header with metadata (bitrate, sample rate, etc.)
  2. Optional CRC (2 bytes if protection bit is set)
  3. Side information (17 or 32 bytes depending on stereo/mono)
  4. The actual audio data
┌─────────────────────────────────────────────────────────────────┐
│ [ID3v2 Tag]  │  Frame 1  │  Frame 2  │  Frame 3  │  ...  │ EOF │
│  (optional)  │           │           │           │       │     │
└─────────────────────────────────────────────────────────────────┘
                     │
                     ▼
         ┌───────────────────────────┐
         │  Header (4 bytes)         │
         │  Side Info (17/32 bytes)  │
         │  Audio Data               │
         └───────────────────────────┘

Frame Header Structure

The 4-byte frame header is the key to parsing. Here's its bit layout:

AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM

A (11 bits) - Frame sync (all bits set = 0x7FF)
B (2 bits)  - MPEG Audio version (11 = MPEG1, 10 = MPEG2, etc.)
C (2 bits)  - Layer (01 = Layer III, 10 = Layer II, 11 = Layer I)
D (1 bit)   - Protection bit (CRC)
E (4 bits)  - Bitrate index (lookup table)
F (2 bits)  - Sample rate index (lookup table)
G (1 bit)   - Padding bit
H (1 bit)   - Private bit
I (2 bits)  - Channel mode (00 = Stereo, 11 = Mono, etc.)
J (2 bits)  - Mode extension
K (1 bit)   - Copyright
L (1 bit)   - Original
M (2 bits)  - Emphasis

Parsing Algorithm

  1. Skip ID3v2 tags: If the file starts with "ID3", read the tag size (stored as a syncsafe integer) and skip past it.

  2. Find frame sync: Scan for the sync pattern—first 11 bits all set to 1 (0xFF followed by 0xE0 mask on the second byte).

  3. Validate header: Extract version, layer, bitrate index, and sample rate index. Reject frames that aren't MPEG1 Layer 3 or have invalid indices.

  4. Skip Xing/Info frames: VBR files have a metadata frame at the start containing "Xing" or "Info" identifier. This frame has a valid header but no audio—skip it.

  5. Calculate frame size: Use the formula:

    frameSize = floor(144 × bitrate / sampleRate) + padding
    
  6. Count and advance: Increment the frame count and jump ahead by frameSize bytes to the next frame.

Bitrate and Sample Rate Tables

For MPEG1 Layer 3, these are the valid values:

Bitrate Index Bitrate (kbps)
0 free format
1 32
2 40
3 48
4 56
5 64
6 80
7 96
8 112
9 128
10 160
11 192
12 224
13 256
14 320
15 reserved
Sample Rate Index Sample Rate (Hz)
0 44100
1 48000
2 32000
3 reserved

Why Skip Xing/Info Frames?

VBR (Variable Bit Rate) files include a special first frame that contains:

  • Total frame count
  • Total file size
  • TOC (table of contents) for seeking
  • Quality indicator

This frame uses a valid MP3 header but contains no audio data—it's purely metadata. The identifier ("Xing" for VBR or "Info" for CBR) appears after the header and side information. The parser detects and skips this frame to return an accurate count of audio frames only, matching tools like mediainfo.

References


Project Structure

src/
├── index.ts           # Root barrel export
├── http/              # HTTP infrastructure
│   ├── server.ts      # HTTPServer class with Bun.serve()
│   ├── routes.ts      # Route definitions
│   ├── types.ts       # HTTP types
│   └── helpers.ts     # Utilities (jsonResponse)
├── parser/            # MP3 parsing domain
│   ├── mp3-parser.ts  # MP3Parser class
│   ├── constants.ts   # Bitrate/sample rate tables, identifiers
│   ├── types.ts       # FrameHeader type
│   └── errors.ts      # Custom error classes
└── controllers/       # Request handlers
    └── file-upload.controller.ts

See AGENTS.md for detailed architecture documentation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors