stylefp

Extract stylistic fingerprints from text corpora.

stylefp analyzes a body of writing and produces a detailed stylistic profile — quantitative metrics across 8 dimensions plus optional LLM-powered qualitative analysis. Use it to understand a writer's style, generate actionable style guides, or rewrite documents to match a target voice.

pip install stylefp

Quick Start

# Analyze a collection of documents
stylefp analyze ./my-writing/ -o ./output

# Analyze without LLM (no API key needed)
stylefp analyze ./my-writing/ --no-qualitative

# Rewrite a document in a target style
stylefp rewrite draft.md -s ./output/stylefp_profile.json

Output:

stylefp_profile.json — full quantitative + qualitative fingerprint
style_guide.md — human-readable style guide (LLM-generated or template-based)

How It Works

                          ┌─────────────────┐
                          │   Input Corpus   │
                          │  .txt .md .html  │
                          │    .rst .htm     │
                          └────────┬────────┘
                                   │
                          ┌────────▼────────┐
                          │   spaCy NLP     │
                          │   Processing    │
                          └────────┬────────┘
                                   │
            ┌──────────────────────┼──────────────────────┐
            │                      │                      │
   ┌────────▼────────┐   ┌────────▼────────┐   ┌─────────▼────────┐
   │ Sentence        │   │ Vocabulary      │   │ Punctuation      │
   │ Structure       │   │ Readability     │   │ Rhetorical       │
   │ Image           │   │ Writing Style   │   │                  │
   └────────┬────────┘   └────────┬────────┘   └─────────┬────────┘
            │                      │                      │
            └──────────────────────┼──────────────────────┘
                                   │
                     ┌─────────────▼─────────────┐
                     │  Qualitative Analysis     │
                     │  (Claude LLM, optional)   │
                     │  Fed by quantitative data │
                     └─────────────┬─────────────┘
                                   │
                     ┌─────────────▼─────────────┐
                     │         Outputs           │
                     │  JSON Profile + Style Guide│
                     └───────────────────────────┘

The pipeline is feed-forward: quantitative metrics are computed first, then passed to the LLM as context for richer qualitative analysis. Each analyzer can be independently enabled or disabled.

Rewrite Verification

When rewriting a document, stylefp extracts all numeric data points and structured data containers (tables, lists, key-value pairs) from the original text. The rewritten output is then verified against this whitelist:

Diagram validation — every number inside a generated Mermaid diagram must trace back to the source document. Zero-tolerance: a single fabricated value causes the entire diagram to be stripped.
Prose validation — numbers in the rewritten prose are cross-referenced against the original. Fabricated numbers are flagged with their surrounding context so the user can review them.

This prevents the LLM from hallucinating statistics, percentages, or chart data that don't exist in the source material.

Style Metrics

Analyzer	What it measures
Sentence	Length distributions, type ratios (declarative/interrogative/exclamatory/imperative), grammatical complexity, opener patterns
Vocabulary	TTR, MATTR, hapax ratio, Yule's K, formality score, POS distribution, TF-IDF characteristic words, jargon ratio
Punctuation	Per-sentence punctuation frequencies, quotation usage, emphasis markers (italics, bold, ALL-CAPS)
Structure	Paragraph and document length distributions, structural elements (headers, lists, blockquotes)
Readability	Flesch-Kincaid, Flesch Reading Ease, Gunning Fog, Coleman-Liau, ARI, SMOG, Dale-Chall
Rhetorical	Passive voice, hedging, intensifiers, contractions, pronoun ratios, dialogue ratio
Image	Image features and diagram detection
Writing Style	8 independent style dimensions (0.0–1.0): descriptive, persuasive, narrative, expository, review, technical, objective, subjective

Qualitative Analysis (LLM-powered, optional)

When an Anthropic API key is available, Claude analyzes a representative sample and produces tone, mood, narrative voice, rhetorical devices, thematic patterns, distinctive quirks, audience assessment, and style register.

Installation

With uv (recommended)

git clone https://github.com/CarolinaRiascos/stylometry.git
cd stylometry
uv venv
uv pip install -e ".[dev]"
uv run python -m spacy download en_core_web_sm

With pip

pip install stylefp
python -m spacy download en_core_web_sm

Environment Variables

# Required for qualitative analysis and rewrite commands
export ANTHROPIC_API_KEY=sk-ant-...

CLI Reference

`stylefp analyze`

Analyze a corpus and extract a stylistic fingerprint.

stylefp analyze <paths>... [OPTIONS]

Option	Description	Default
`-o, --output`	Output directory	Current directory
`--no-qualitative`	Skip LLM analysis	`False`
`--spacy-model`	spaCy model name	`en_core_web_sm`
`--json-only`	JSON output only, skip style guide	`False`
`-q, --quiet`	Suppress progress output	`False`

Examples:

# Analyze a directory of Markdown files
stylefp analyze ./blog-posts/ -o ./analysis

# Analyze specific files
stylefp analyze essay1.txt essay2.txt essay3.md

# Fast analysis (no API key needed)
stylefp analyze ./docs/ --no-qualitative --json-only

`stylefp rewrite`

Rewrite a document to match a target writing style.

stylefp rewrite <input_file> -s <style_profile.json> [OPTIONS]

Option	Description	Default
`-s, --style`	Path to a `stylefp_profile.json`	Required
`--sample`	Sample text in the target style	`None`
`-o, --output`	Output directory	Current directory
`-q, --quiet`	Suppress progress output	`False`

The rewrite command automatically detects a style_guide.md in the same directory as the profile JSON and uses it for additional context.

Examples:

# Rewrite a draft to match an analyzed style
stylefp rewrite my-draft.md -s ./hemingway-analysis/stylefp_profile.json

# Include a sample of the target style for better matching
stylefp rewrite report.txt -s ./style/stylefp_profile.json --sample ./style/example.txt

Web App

A FastAPI web interface is also available, featuring a demo tab with a precomputed analysis and two style-transferred rewrites: an Eiffel Tower article and an AI in Logistics research paper.

uvicorn stylefp.web.app:app

`stylefp schema`

Print the JSON schema for the StyleFingerprint model.

stylefp schema

Supported Input Formats

Format	Extensions
Plain text	`.txt`
Markdown	`.md`, `.markdown`
reStructuredText	`.rst`
HTML	`.html`, `.htm`

Files are read as UTF-8 (with latin-1 fallback). Markdown and HTML formatting is stripped before analysis.

Output Format

`stylefp_profile.json`

A structured JSON file containing all computed metrics. Top-level fields:

{
  "corpus_name": "my-writing",
  "document_count": 12,
  "total_words": 45230,
  "sentence": { ... },
  "vocabulary": { ... },
  "punctuation": { ... },
  "structure": { ... },
  "readability": { ... },
  "rhetorical": { ... },
  "writing_style": { ... },
  "qualitative": { ... },
  "metadata": { "version": "0.1.0", "timestamp": "...", "spacy_model": "en_core_web_sm" }
}

Use stylefp schema to see the full JSON schema.

`style_guide.md`

A human-readable style guide covering voice, sentence structure, vocabulary, punctuation, and rhetorical patterns. When qualitative analysis is enabled, this is generated by Claude as an actionable writing guide. Otherwise, a template-based guide is produced from quantitative data alone.

Development

# Clone and install
git clone https://github.com/CarolinaRiascos/stylometry.git
cd stylometry
uv venv
uv pip install -e ".[dev]"
uv run python -m spacy download en_core_web_sm

# Run tests
uv run pytest tests/ -v

# Lint
uv run ruff check src/

# Type check
uv run mypy src/stylefp/

Architecture

src/stylefp/
├── __init__.py
├── __main__.py             # Entry point
├── cli.py                  # Typer CLI (analyze, rewrite, schema)
├── config.py               # StylefpConfig dataclass
├── corpus.py               # Document loading & text extraction
├── data_search.py          # Data point & container extraction
├── formula.py              # Formula handling
├── models.py               # Pydantic models for all features
├── nlp.py                  # spaCy NLP utilities
├── pipeline.py             # Analyzer orchestration
├── validation.py           # Fabricated diagram & number validation
├── analyzers/
│   ├── base.py             # BaseAnalyzer abstract class
│   ├── image.py            # Image feature analysis
│   ├── sentence.py         # Sentence types, complexity, openers
│   ├── vocabulary.py       # Lexical diversity, formality, TF-IDF
│   ├── punctuation.py      # Punctuation habits & emphasis
│   ├── structure.py        # Paragraph & document organization
│   ├── readability.py      # Standard readability indices
│   ├── rhetorical.py       # Voice, hedging, pronouns, dialogue
│   ├── writing_style.py    # 8-dimension style classification
│   └── qualitative.py      # LLM-powered analysis (Claude)
├── output/
│   ├── json_writer.py      # JSON profile output
│   └── markdown_writer.py  # Style guide output
├── prompts/
│   └── templates.py        # LLM prompt templates
└── web/
    ├── app.py              # FastAPI web interface
    ├── preloaded_examples.py # Bundled demo data loader
    ├── schemas.py          # Request/response models
    ├── data/               # Precomputed example files
    └── static/             # HTML, CSS, JS assets

All analyzers implement BaseAnalyzer.analyze(corpus, docs) and return strongly-typed Pydantic models. The pipeline registers analyzers as (field_name, label, instance) tuples, runs them sequentially, and assembles the results into a StyleFingerprint.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
diagrams		diagrams
docs		docs
examples		examples
src/stylefp		src/stylefp
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stylefp

Quick Start

How It Works

Rewrite Verification

Style Metrics

Qualitative Analysis (LLM-powered, optional)

Installation

With uv (recommended)

With pip

Environment Variables

CLI Reference

`stylefp analyze`

`stylefp rewrite`

Web App

`stylefp schema`

Supported Input Formats

Output Format

`stylefp_profile.json`

`style_guide.md`

Development

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stylefp

Quick Start

How It Works

Rewrite Verification

Style Metrics

Qualitative Analysis (LLM-powered, optional)

Installation

With uv (recommended)

With pip

Environment Variables

CLI Reference

stylefp analyze

stylefp rewrite

Web App

stylefp schema

Supported Input Formats

Output Format

stylefp_profile.json

style_guide.md

Development

Architecture

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`stylefp analyze`

`stylefp rewrite`

`stylefp schema`

`stylefp_profile.json`

`style_guide.md`

Packages