Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
a576c39
chore: adjust CHANGELOG/CONTRIBUTING headings for mdBook inclusion
Arkptz Apr 22, 2026
2da59f9
docs(book): add mdBook scaffold with book.toml and all chapter content
Arkptz Apr 22, 2026
ffaf835
ci: add mdbook build and Pages deploy workflow
Arkptz Apr 22, 2026
801fe41
docs(readme): trim content migrated to book, add docs badge
Arkptz Apr 22, 2026
6acd23c
fix(book): restore include directive for benchmarks chapter
Arkptz Apr 22, 2026
f50f097
fix(docs): watch docs/ path for benchmark-driven rebuilds
Arkptz Apr 22, 2026
11d2cd2
fix(book): restore strict linkcheck error policy with targeted excludes
Arkptz Apr 22, 2026
c73cc24
docs(strict): document that only parse_error counter is populated
Arkptz Apr 24, 2026
a947484
fix(reader): reject symlinked directory inputs and entries
Arkptz Apr 24, 2026
40f7042
test(security): cover symlink directory and entry rejection
Arkptz Apr 24, 2026
36bfb3f
fix(har): apply header size caps consistent with mitmproxy reader
Arkptz Apr 24, 2026
586eff8
docs(book): correct templates YAML shape and parameter placeholder names
Arkptz Apr 24, 2026
7bb82b4
docs(install): fix download URL to include release version
Arkptz Apr 24, 2026
f297294
docs(security): rewrite TOCTOU section to match actual implementation
Arkptz Apr 24, 2026
72d2c18
docs(diagnostics): rename event category from flow_rejected to rejected
Arkptz Apr 24, 2026
7e96434
docs(diagnostics): clarify tnetstring parse error halts file processing
Arkptz Apr 24, 2026
73023b6
docs(intro): replace 'milliseconds' claim with actual benchmark ratio
Arkptz Apr 24, 2026
019efe1
docs(cli): clarify format auto-detect uses extension and content
Arkptz Apr 24, 2026
3445b77
docs(bench): strip duplicate H1 from generated benchmarks file
Arkptz Apr 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
name: Docs

on:
push:
branches: [main]
paths:
- 'book/**'
- 'docs/**'
- '.github/workflows/docs.yml'
- 'CHANGELOG.md'
- 'CONTRIBUTING.md'
- 'README.md'
pull_request:
paths:
- 'book/**'
- 'docs/**'
- '.github/workflows/docs.yml'
- 'CHANGELOG.md'
- 'CONTRIBUTING.md'
- 'README.md'
workflow_dispatch:

permissions:
contents: read
pages: write
id-token: write

concurrency:
group: docs-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: dtolnay/rust-toolchain@stable

# Bump cache key when any tool version in the install step changes
- name: Cache mdbook binaries
id: cache-mdbook
uses: actions/cache@v5
with:
path: ~/.cargo/bin/mdbook*
key: mdbook-v2-${{ hashFiles('.github/workflows/docs.yml') }}

- name: Install mdbook and plugins
if: steps.cache-mdbook.outputs.cache-hit != 'true'
run: |
cargo install \
mdbook@0.4.40 \
mdbook-linkcheck@0.7.7 \
mdbook-toc@0.14.2 \
mdbook-admonish@1.18.0 \
mdbook-mermaid@0.14.1

- name: Build book
run: mdbook build book

- name: Upload Pages artifact
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
uses: actions/upload-pages-artifact@v4
with:
path: target/book/html

- name: Verify build (PR)
if: github.event_name == 'pull_request'
run: |
test -f target/book/html/index.html
test -s target/book/html/index.html
echo "Build OK"

deploy:
needs: build
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ result-*
# Ruff cache (leftover from Python tooling)
.ruff_cache/

# mdBook build output
/target/book/

# integration test artifacts
tests/integration/level1/fixtures/*.flow
tests/integration/level1/out/
Expand Down
2 changes: 0 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
Expand Down
2 changes: 0 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
# Contributing — Local Testing Guide

This document covers how to run the three test tracks locally.

## Prerequisites
Expand Down
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ exclude = [
"docs/demo.mp4", ".github/**", "scripts/**",
"flake.nix", "flake.lock", ".envrc", ".direnv/**",
".sisyphus/**", ".ruff_cache/**",
"book/**", "docs/**",
]

[[bin]]
Expand Down
212 changes: 8 additions & 204 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ A Rust rewrite of [mitmproxy2swagger](https://github.com/alufers/mitmproxy2swagg
[![Crates.io](https://img.shields.io/crates/v/mitm2openapi.svg)](https://crates.io/crates/mitm2openapi)
[![Downloads](https://img.shields.io/crates/d/mitm2openapi.svg)](https://crates.io/crates/mitm2openapi)
[![docs.rs](https://img.shields.io/docsrs/mitm2openapi)](https://docs.rs/mitm2openapi)
[![docs](https://img.shields.io/badge/docs-arkptz.github.io-blue)](https://arkptz.github.io/mitm2openapi/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

<img src="docs/demo.gif" alt="Demo: capture → discover → generate → browse Swagger UI" width="720">
Expand Down Expand Up @@ -39,17 +40,13 @@ Credit to [@alufers](https://github.com/alufers) for the original tool that pion

## Installation

### From binary releases

Download a pre-built binary from [GitHub Releases](https://github.com/Arkptz/mitm2openapi/releases).

### From source

```bash
cargo install --git https://github.com/Arkptz/mitm2openapi
cargo install mitm2openapi
```

## Quick Start
Or download a pre-built binary from [GitHub Releases](https://github.com/Arkptz/mitm2openapi/releases).

## Quick start

```bash
# 1. Capture traffic with mitmproxy
Expand All @@ -64,206 +61,13 @@ mitm2openapi discover -i capture.flow -o templates.yaml -p "https://api.example.
mitm2openapi generate -i capture.flow -t templates.yaml -o openapi.yaml -p "https://api.example.com"
```

### Skip the manual edit

If you know which paths you care about up front, use `--exclude-patterns`
and `--include-patterns` to let `discover` do the curation:

```bash
mitm2openapi discover \
-i capture.flow -o templates.yaml -p "https://api.example.com" \
--exclude-patterns '/static/**,/images/**,*.css,*.js,*.svg' \
--include-patterns '/api/**,/v2/**'

mitm2openapi generate \
-i capture.flow -t templates.yaml -o openapi.yaml -p "https://api.example.com"
```

Paths matching `--include-patterns` are auto-activated (emitted without
the `ignore:` prefix). Paths matching `--exclude-patterns` are dropped
entirely. Everything else still gets `ignore:` for manual review.

<details>
<summary><strong>CLI Reference</strong> (click to expand)</summary>

### `discover`

Scan captured traffic and produce a templates file listing all observed endpoints.

```
mitm2openapi discover [OPTIONS] -i <INPUT> -o <OUTPUT> -p <PREFIX>
```

| Option | Description |
|--------|-------------|
| `-i, --input <PATH>` | Input file (flow dump or HAR) |
| `-o, --output <PATH>` | Output YAML templates file |
| `-p, --prefix <URL>` | API prefix URL to filter requests |
| `--format <FORMAT>` | Input format: `auto`, `har`, `mitmproxy` (default: `auto`) |
| `--exclude-patterns <GLOBS>` | Comma-separated globs; matching paths are dropped entirely. `*` = single segment, `**` = any subtree. E.g. `/static/**,*.css` |
| `--include-patterns <GLOBS>` | Comma-separated globs; matching paths are emitted without `ignore:` (auto-activated for `generate`) |
| `--max-input-size <BYTES>` | Maximum input file size (default: `2GiB`). Accepts suffixes: `KiB`, `MiB`, `GiB` |
| `--allow-symlinks` | Allow symlinked input files (default: rejected for safety) |
| `--strict` | Treat warnings as errors; exit code 2 if any cap fires, flow is rejected, or parse error occurs |
| `--report <PATH>` | Write a structured JSON processing report to the given path |

### `generate`

Generate an OpenAPI 3.0 spec from captured traffic using a curated templates file.

```
mitm2openapi generate [OPTIONS] -i <INPUT> -t <TEMPLATES> -o <OUTPUT> -p <PREFIX>
```

| Option | Description |
|--------|-------------|
| `-i, --input <PATH>` | Input file (flow dump or HAR) |
| `-t, --templates <PATH>` | Templates YAML file (from `discover`) |
| `-o, --output <PATH>` | Output OpenAPI YAML file |
| `-p, --prefix <URL>` | API prefix URL |
| `--format <FORMAT>` | Input format: `auto`, `har`, `mitmproxy` (default: `auto`) |
| `--openapi-title <TITLE>` | Custom title for the spec |
| `--openapi-version <VER>` | Custom spec version (default: `1.0.0`) |
| `--exclude-headers <LIST>` | Comma-separated headers to exclude |
| `--exclude-cookies <LIST>` | Comma-separated cookies to exclude |
| `--include-headers` | Include headers in the spec |
| `--ignore-images` | Ignore image content types |
| `--suppress-params` | Suppress parameter suggestions |
| `--tags-overrides <JSON>` | JSON string for tag overrides |
| `--max-input-size <BYTES>` | Maximum input file size (default: `2GiB`). Accepts suffixes: `KiB`, `MiB`, `GiB` |
| `--max-payload-size <BYTES>` | Maximum tnetstring payload size (default: `256MiB`) |
| `--max-depth <N>` | Maximum tnetstring nesting depth (default: `256`) |
| `--max-body-size <BYTES>` | Maximum request/response body size (default: `64MiB`) |
| `--allow-symlinks` | Allow symlinked input files (default: rejected for safety) |
| `--strict` | Treat warnings as errors; exit code 2 if any cap fires, flow is rejected, or parse error occurs |
| `--report <PATH>` | Write a structured JSON processing report to the given path |

</details>

## Resource Limits

To prevent denial-of-service when processing untrusted captures, `mitm2openapi`
enforces several configurable limits:

| Flag | Default | Purpose |
|------|---------|---------|
| `--max-input-size` | 2 GiB | Reject files larger than this before reading |
| `--max-payload-size` | 256 MiB | Cap on individual tnetstring payload allocation |
| `--max-depth` | 256 | Recursion depth limit for nested tnetstring structures |
| `--max-body-size` | 64 MiB | Maximum request/response body considered during schema inference |
| `--allow-symlinks` | off | By default, symlinked inputs are rejected to prevent path-traversal on shared CI runners |

In addition to the configurable limits above, the following per-field caps are
applied unconditionally to prevent data corruption:

| Field | Cap | Behaviour |
|-------|-----|-----------|
| Header name | 8 KiB | Dropped (other headers still processed) |
| Header value | 64 KiB | Truncated to cap |
| Form fields per request | 1 000 | Excess fields ignored |
| URL scheme | `http` / `https` only | Non-HTTP flows silently skipped |
| Port number | 1–65 535 | Out-of-range port drops the request |
| HTTP status code | 100–599 | Invalid codes treated as no response |

Identity fields (scheme, host, path, method, header names) require valid UTF-8.
Flows with non-UTF-8 identity bytes are skipped to prevent data aliasing through
replacement-character collisions. Control characters in paths are stripped
automatically.

Increase `--max-input-size` if you work with captures larger than 2 GiB (e.g.
`--max-input-size 8GiB`). The other limits rarely need tuning.

Both mitmproxy flow files and HAR files are processed incrementally — memory usage
stays bounded regardless of input size.

## Diagnostics

When the tnetstring parser encounters corruption in a mitmproxy flow file, it
halts and emits a warn-level log with the byte offset, number of successfully
parsed entries, and an error classification. No resync is attempted — binary
payloads can contain bytes that mimic valid tnetstring length prefixes, so
scanning forward would produce phantom flows.

### Structured report (`--report`)

Pass `--report <PATH>` to either `discover` or `generate` to write a JSON
processing summary. This is useful for CI pipelines that need structured data
instead of log scraping.

```json
{
"report_version": 1,
"tool_version": "0.2.3",
"input": {
"path": "capture.flow",
"format": "Auto",
"size_bytes": 102400
},
"result": {
"flows_read": 150,
"flows_emitted": 148,
"paths_in_spec": 12
},
"events": {
"parse_error": {
"TNetString parse error at byte 98304: unexpected end of input": 1
}
}
}
```

### Strict mode
## Documentation

Pass `--strict` to either `discover` or `generate` to treat any warning-level
event as a hard failure. The process exits with code 2 if any resource cap
fired, a flow was rejected, or a parse error was encountered.

This is designed for CI gates where silent degradation is unacceptable:

```bash
mitm2openapi discover -i capture.flow -o templates.yaml -p https://api.example.com --strict \
|| echo "FAIL: corrupt or over-limit flows detected"
```

Without `--strict`, the same conditions are logged at warn level and processing
continues (exit code 0).

## Supported Formats

| Format | Versions | Extension |
|--------|----------|-----------|
| mitmproxy flow dumps | v19, v20, v21 | `.flow` |
| HAR (HTTP Archive) | 1.2 (incrementally parsed) | `.har` |

Format is auto-detected from file content. Use `--format` to override.

## Migration from Python mitmproxy2swagger

| Python (`mitmproxy2swagger`) | Rust (`mitm2openapi`) |
|-----|-----|
| `pip install mitmproxy2swagger` | Single binary, no runtime |
| `mitmproxy2swagger -i <file> -o <spec> -p <prefix>` | Two-step: `discover` then `generate` |
| Edits spec file in-place | Separate templates file for curation |
| Requires Python 3.x + mitmproxy | Standalone binary |
| Supports mitmproxy only | Supports mitmproxy flow dumps + HAR |

### Key differences

- **Two-step workflow**: `discover` produces a templates file; you curate it; `generate` produces the final spec. This separates endpoint selection from spec generation.
- **Templates file**: Discovered endpoints are prefixed with `ignore:`. Remove the prefix to include an endpoint. This replaces editing the output spec directly.
- **No Python dependency**: Ships as a single static binary for Linux, macOS, and Windows.
- **HAR support**: Process HAR exports from browser DevTools or other HTTP tools.
Full documentation at **[arkptz.github.io/mitm2openapi](https://arkptz.github.io/mitm2openapi/)** — covers installation, traffic capture setup, the full discover → curate → generate pipeline, CLI reference, resource limits, filtering, strict mode, format details, benchmarks, and security model.

## Benchmarks

Automated CI benchmark runs weekly against the Python original
([`mitmproxy2swagger`](https://github.com/alufers/mitmproxy2swagger)). See
[docs/benchmarks.md](docs/benchmarks.md) for the latest timing and memory
comparison on a ~80 MB synthetic capture, or
trigger a fresh run via
[Actions → Benchmark](../../actions/workflows/bench.yml).

Reproduce locally with the commands documented in the workflow file.
Automated CI benchmarks run weekly against the Python original. See [docs/benchmarks.md](docs/benchmarks.md) for the latest comparison on a ~80 MB synthetic capture.

## Contributing

Expand Down
Loading
Loading