Frederick Signal Atlas

Frederick Signal Atlas builds a local SQLite database of Frederick, Maryland public-interest news coverage and the people mentioned in that coverage.

It is designed for a daily workflow:

Fetch Frederick-related articles from configured public sources.
Extract article text and identify people mentioned.
Store person-level metadata such as roles, organizations, addresses, quotes, and article context.
Generate an end-of-day report describing notable connections between documented individuals.

Current Behavior

The pipeline now prefers precision over recall.

If a source article cannot be resolved cleanly, it is skipped rather than indexing a wrapper page.
If OPENAI_API_KEY is not configured, fallback extraction runs in a conservative mode and may store no people for a given day rather than polluting the database with titles, organizations, or event names mistaken for people.
If an article body changes, or the content extractor changes, the article is automatically re-queued for extraction.
Person identity is keyed with a disambiguation key (person_key) built from the person's name plus the best available organization, address, or home-location context.

Quick Start

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -e .
python -m frederick_pipeline daily-run

Optional:

$env:OPENAI_API_KEY="your-key"
$env:OPENAI_MODEL="gpt-4.1-mini"

Without OPENAI_API_KEY, the system is still runnable for fetch and storage testing, but the extracted people dataset will usually be sparse. For production-quality person and relationship data, set the API key.

Output Model

The local SQLite database stores:

articles: fetched article metadata, normalized URL, body text, and extraction state
people: canonical people records with person_key disambiguation
person_aliases: alternate names seen in source articles
article_people: article-scoped person mentions with confidence scores and article-local metadata
inferred_connections: same-day inferred links such as co-mentions, shared organizations, and shared addresses

Daily markdown discovery files are written to data/discoveries/YYYY-MM-DD.md.

Only higher-confidence article-person records are promoted into reports and connection analysis.

Source Notes

The default source configuration now focuses on official Frederick city and county feeds.

Official feeds generally work well for fetching and archiving.
Aggregated Google News RSS was removed from the default source list because it frequently resolved to wrapper pages or publisher homepages rather than stable article URLs, which polluted early-stage extraction.
If you want to add broader media coverage later, prefer source-specific publisher feeds over aggregator wrappers.

GitHub Actions

The repository includes a scheduled workflow at .github/workflows/daily-pipeline.yml.

It runs every day at 13:05 UTC.
In America/New_York, that is 9:05 AM EDT during daylight saving time and 8:05 AM EST during standard time.
It can also be triggered manually from the GitHub Actions tab with workflow_dispatch.
The workflow installs dependencies, runs python -m frederick_pipeline daily-run, and commits updated data/ outputs back to main.

Required Repository Configuration

Add this repository secret in GitHub if you want high-quality person extraction:

OPENAI_API_KEY: optional for basic fetch-only runs, recommended for production-quality people and connection data

Optional repository variable:

OPENAI_MODEL: defaults to gpt-4.1-mini

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
data		data
docs		docs
sources		sources
src/frederick_pipeline		src/frederick_pipeline
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Frederick Signal Atlas

Current Behavior

Quick Start

Output Model

Source Notes

GitHub Actions

Required Repository Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Frederick Signal Atlas

Current Behavior

Quick Start

Output Model

Source Notes

GitHub Actions

Required Repository Configuration

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages