Extract stylistic fingerprints from text corpora.
stylefp analyzes a body of writing and produces a detailed stylistic profile β quantitative metrics across 8 dimensions plus optional LLM-powered qualitative analysis. Use it to understand a writer's style, generate actionable style guides, or rewrite documents to match a target voice.
pip install stylefp
# Analyze a collection of documents
stylefp analyze ./my-writing/ -o ./output
# Analyze without LLM (no API key needed)
stylefp analyze ./my-writing/ --no-qualitative
# Rewrite a document in a target style
stylefp rewrite draft.md -s ./output/stylefp_profile.jsonOutput:
stylefp_profile.jsonβ full quantitative + qualitative fingerprintstyle_guide.mdβ human-readable style guide (LLM-generated or template-based)
βββββββββββββββββββ
β Input Corpus β
β .txt .md .html β
β .rst .htm β
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β spaCy NLP β
β Processing β
ββββββββββ¬βββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββ
β β β
ββββββββββΌβββββββββ ββββββββββΌβββββββββ βββββββββββΌβββββββββ
β Sentence β β Vocabulary β β Punctuation β
β Structure β β Readability β β Rhetorical β
β Image β β Writing Style β β β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ βββββββββββ¬βββββββββ
β β β
ββββββββββββββββββββββββΌβββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ
β Qualitative Analysis β
β (Claude LLM, optional) β
β Fed by quantitative data β
βββββββββββββββ¬ββββββββββββββ
β
βββββββββββββββΌββββββββββββββ
β Outputs β
β JSON Profile + Style Guideβ
βββββββββββββββββββββββββββββ
The pipeline is feed-forward: quantitative metrics are computed first, then passed to the LLM as context for richer qualitative analysis. Each analyzer can be independently enabled or disabled.
When rewriting a document, stylefp extracts all numeric data points and structured data containers (tables, lists, key-value pairs) from the original text. The rewritten output is then verified against this whitelist:
- Diagram validation β every number inside a generated Mermaid diagram must trace back to the source document. Zero-tolerance: a single fabricated value causes the entire diagram to be stripped.
- Prose validation β numbers in the rewritten prose are cross-referenced against the original. Fabricated numbers are flagged with their surrounding context so the user can review them.
This prevents the LLM from hallucinating statistics, percentages, or chart data that don't exist in the source material.
| Analyzer | What it measures |
|---|---|
| Sentence | Length distributions, type ratios (declarative/interrogative/exclamatory/imperative), grammatical complexity, opener patterns |
| Vocabulary | TTR, MATTR, hapax ratio, Yule's K, formality score, POS distribution, TF-IDF characteristic words, jargon ratio |
| Punctuation | Per-sentence punctuation frequencies, quotation usage, emphasis markers (italics, bold, ALL-CAPS) |
| Structure | Paragraph and document length distributions, structural elements (headers, lists, blockquotes) |
| Readability | Flesch-Kincaid, Flesch Reading Ease, Gunning Fog, Coleman-Liau, ARI, SMOG, Dale-Chall |
| Rhetorical | Passive voice, hedging, intensifiers, contractions, pronoun ratios, dialogue ratio |
| Image | Image features and diagram detection |
| Writing Style | 8 independent style dimensions (0.0β1.0): descriptive, persuasive, narrative, expository, review, technical, objective, subjective |
When an Anthropic API key is available, Claude analyzes a representative sample and produces tone, mood, narrative voice, rhetorical devices, thematic patterns, distinctive quirks, audience assessment, and style register.
git clone https://github.com/CarolinaRiascos/stylometry.git
cd stylometry
uv venv
uv pip install -e ".[dev]"
uv run python -m spacy download en_core_web_smpip install stylefp
python -m spacy download en_core_web_sm# Required for qualitative analysis and rewrite commands
export ANTHROPIC_API_KEY=sk-ant-...Analyze a corpus and extract a stylistic fingerprint.
stylefp analyze <paths>... [OPTIONS]| Option | Description | Default |
|---|---|---|
-o, --output |
Output directory | Current directory |
--no-qualitative |
Skip LLM analysis | False |
--spacy-model |
spaCy model name | en_core_web_sm |
--json-only |
JSON output only, skip style guide | False |
-q, --quiet |
Suppress progress output | False |
Examples:
# Analyze a directory of Markdown files
stylefp analyze ./blog-posts/ -o ./analysis
# Analyze specific files
stylefp analyze essay1.txt essay2.txt essay3.md
# Fast analysis (no API key needed)
stylefp analyze ./docs/ --no-qualitative --json-onlyRewrite a document to match a target writing style.
stylefp rewrite <input_file> -s <style_profile.json> [OPTIONS]| Option | Description | Default |
|---|---|---|
-s, --style |
Path to a stylefp_profile.json |
Required |
--sample |
Sample text in the target style | None |
-o, --output |
Output directory | Current directory |
-q, --quiet |
Suppress progress output | False |
The rewrite command automatically detects a style_guide.md in the same directory as the profile JSON and uses it for additional context.
Examples:
# Rewrite a draft to match an analyzed style
stylefp rewrite my-draft.md -s ./hemingway-analysis/stylefp_profile.json
# Include a sample of the target style for better matching
stylefp rewrite report.txt -s ./style/stylefp_profile.json --sample ./style/example.txtA FastAPI web interface is also available, featuring a demo tab with a precomputed analysis and two style-transferred rewrites: an Eiffel Tower article and an AI in Logistics research paper.
uvicorn stylefp.web.app:appPrint the JSON schema for the StyleFingerprint model.
stylefp schema| Format | Extensions |
|---|---|
| Plain text | .txt |
| Markdown | .md, .markdown |
| reStructuredText | .rst |
| HTML | .html, .htm |
Files are read as UTF-8 (with latin-1 fallback). Markdown and HTML formatting is stripped before analysis.
A structured JSON file containing all computed metrics. Top-level fields:
{
"corpus_name": "my-writing",
"document_count": 12,
"total_words": 45230,
"sentence": { ... },
"vocabulary": { ... },
"punctuation": { ... },
"structure": { ... },
"readability": { ... },
"rhetorical": { ... },
"writing_style": { ... },
"qualitative": { ... },
"metadata": { "version": "0.1.0", "timestamp": "...", "spacy_model": "en_core_web_sm" }
}Use stylefp schema to see the full JSON schema.
A human-readable style guide covering voice, sentence structure, vocabulary, punctuation, and rhetorical patterns. When qualitative analysis is enabled, this is generated by Claude as an actionable writing guide. Otherwise, a template-based guide is produced from quantitative data alone.
# Clone and install
git clone https://github.com/CarolinaRiascos/stylometry.git
cd stylometry
uv venv
uv pip install -e ".[dev]"
uv run python -m spacy download en_core_web_sm
# Run tests
uv run pytest tests/ -v
# Lint
uv run ruff check src/
# Type check
uv run mypy src/stylefp/src/stylefp/
βββ __init__.py
βββ __main__.py # Entry point
βββ cli.py # Typer CLI (analyze, rewrite, schema)
βββ config.py # StylefpConfig dataclass
βββ corpus.py # Document loading & text extraction
βββ data_search.py # Data point & container extraction
βββ formula.py # Formula handling
βββ models.py # Pydantic models for all features
βββ nlp.py # spaCy NLP utilities
βββ pipeline.py # Analyzer orchestration
βββ validation.py # Fabricated diagram & number validation
βββ analyzers/
β βββ base.py # BaseAnalyzer abstract class
β βββ image.py # Image feature analysis
β βββ sentence.py # Sentence types, complexity, openers
β βββ vocabulary.py # Lexical diversity, formality, TF-IDF
β βββ punctuation.py # Punctuation habits & emphasis
β βββ structure.py # Paragraph & document organization
β βββ readability.py # Standard readability indices
β βββ rhetorical.py # Voice, hedging, pronouns, dialogue
β βββ writing_style.py # 8-dimension style classification
β βββ qualitative.py # LLM-powered analysis (Claude)
βββ output/
β βββ json_writer.py # JSON profile output
β βββ markdown_writer.py # Style guide output
βββ prompts/
β βββ templates.py # LLM prompt templates
βββ web/
βββ app.py # FastAPI web interface
βββ preloaded_examples.py # Bundled demo data loader
βββ schemas.py # Request/response models
βββ data/ # Precomputed example files
βββ static/ # HTML, CSS, JS assets
All analyzers implement BaseAnalyzer.analyze(corpus, docs) and return strongly-typed Pydantic models. The pipeline registers analyzers as (field_name, label, instance) tuples, runs them sequentially, and assembles the results into a StyleFingerprint.