Autonomous pipeline that extracts investment signals from SEC EDGAR filings. Searches every 10-K for monopoly language, scores companies on 11 quantitative dimensions, tracks insider buying in real-time, and validates claims against federal contract records.
Built to find undiscovered small-cap companies with structural competitive advantages — before the market does.
graph TB
subgraph Discovery ["Discovery Layer"]
EFTS["EFTS Search<br/><i>120 moat keywords across<br/>all SEC 10-K filings</i>"]
FM["Forward Moat<br/><i>Backlog, partnerships,<br/>capex inflection</i>"]
end
subgraph Scoring ["Scoring Layer"]
AMPX["AMPX Screener<br/><i>11-dimension threshold<br/>scoring engine</i>"]
DD["Deep Dive<br/><i>12-module analysis with<br/>PASS / WATCHLIST / KILL</i>"]
end
subgraph Data ["Data Layer"]
XBRL["XBRL Extractor<br/><i>Machine-readable financials<br/>from SEC filings</i>"]
F4["Form 4 Poller<br/><i>Real-time insider buying<br/>from SEC atom feed</i>"]
YF["yfinance<br/><i>Price metrics, float,<br/>short interest</i>"]
end
subgraph Validation ["Validation Layer"]
GOV["USAspending.gov<br/><i>Federal contract<br/>verification</i>"]
XVAL["10-K Cross-Validation<br/><i>Customer filing<br/>mentions</i>"]
end
DB[("SQLite<br/>35 tables · WAL mode<br/>Auto-migration")]
EFTS --> DD
FM --> DD
AMPX --> DD
GOV --> EFTS
XVAL --> EFTS
XBRL --> DB
F4 --> DB
YF --> DB
AMPX --> DB
DD --> DB
EFTS --> DB
FM --> DB
-
EFTS over newsflow. Instead of scraping financial news (survivorship bias, delayed), the system searches SEC's full-text search index directly. Every public company's 10-K is searchable. This finds companies that tell the SEC they're monopolies — before analysts notice.
-
Threshold scoring over ML. The AMPX screener uses explicit, auditable thresholds (crash depth, revenue growth, debt ratio, insider buying) rather than a black-box model. Every score is decomposable — you can see exactly why a company scored 9.5/12.5.
-
Multi-source validation. Discovery hits aren't trusted at face value. Claims like "sole source" are validated against USAspending.gov federal contract records and cross-referenced in other companies' 10-K filings. A company mentioned as a supplier by 3+ other companies gets a higher score than one that only claims it themselves.
-
Going-concern hard kill. Four regex patterns scan the latest 10-K for going-concern language. Any match is an automatic disqualification — no score can override fundamental business risk.
| Module | What it does | Entry point |
|---|---|---|
| EFTS Discovery | Searches all SEC 10-K filings for 120 moat keywords across 10 moat types, validates against federal contracts and customer filings | python -m sec_filing_intelligence.discovery --run |
| AMPX Screener | 11-dimension threshold scorer: crash depth, revenue growth, debt, runway, float, institutional ownership, analyst coverage, sector, short interest, options, insider buying | python -m sec_filing_intelligence.ampx_rules --run |
| Forward Moat | Detects companies building moats via backlog acceleration, partnership mismatches, capex inflection, technology milestones | python -m sec_filing_intelligence.forward_moat --run |
| Deep Dive | 12-module automated analysis: moat scoring (0-25), analog DNA matching, balance sheet, insider forensics, expected value estimation | python -m sec_filing_intelligence.deep_dive --run |
| XBRL Extractor | Replaces unreliable yfinance data with machine-readable SEC XBRL financials (revenue, debt, shares outstanding, cash flow) | python -m sec_filing_intelligence.xbrl_fundamentals --refresh |
| Form 4 Poller | Real-time SEC atom feed parser for insider buying. Detects clusters (2+ insiders buying within 30 days) | python -m sec_filing_intelligence.form4_rss_poller --poll |
git clone https://github.com/Pdong19/edgar-scanner.git
cd edgar-scanner
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Edit .env with your contact email (SEC fair-access policy requires it)All APIs used are free and public — SEC EDGAR, USAspending.gov, yfinance. No paid API keys required.
# Search SEC filings for moat language
python examples/search_sec_filings.py "sole source"
# Score a single ticker across 11 dimensions
python examples/score_single_ticker.py ASTS
# Run the full EFTS discovery scan (searches all SEC 10-Ks)
python -m sec_filing_intelligence.discovery --run
# Score the entire universe with the AMPX screener
python -m sec_filing_intelligence.ampx_rules --run
# Scan for forward moat signals
python -m sec_filing_intelligence.forward_moat --run
# Poll SEC for latest insider transactions
python -m sec_filing_intelligence.form4_rss_poller --poll
# Refresh XBRL fundamentals from SEC filings
python -m sec_filing_intelligence.xbrl_fundamentals --refreshEach module produces output in three places:
| Output | Location | Description |
|---|---|---|
| Terminal | stdout/stderr | Scan progress and summary report print directly to your terminal |
| CSV files | output/ directory |
Full scored results with all dimensions (machine-readable) |
| Database | data/sec_filings.db |
All results persisted to SQLite for querying and historical tracking |
After a scan, you can query results directly:
# View top scorers from the database
sqlite3 data/sec_filings.db "SELECT ticker, score FROM scr_ampx_rules_scores ORDER BY score DESC LIMIT 10"
# View latest discovery flags
sqlite3 data/sec_filings.db "SELECT ticker, composite_score, moat_types FROM scr_discovery_flags ORDER BY composite_score DESC LIMIT 10"AMPX Screener — scores 2,600+ tickers through a funnel, then ranks survivors on 11 dimensions:
AMPX Rules Screener — 2026-04-17
═════════════════════════════════
Funnel: 2,651 universe → 1,847 w/fundamentals → 187 survivors
Top 15 by score:
#1 RCAT 9.0/12.5 CRASH:2.0 REV:2.0 DEBT:1.0 RUN:1.5 FLT:1.0 INST:1.0 ANLY:0.5
#2 AEHR 8.5/12.5 CRASH:2.0 REV:1.0 DEBT:1.0 RUN:1.5 FLT:1.0 INST:0.5 ANLY:0.5 PRI:1.0
#3 BKSY 8.0/12.5 CRASH:2.0 REV:2.0 DEBT:1.0 RUN:0.75 FLT:1.0 PRI:1.0
Going-concern kills: 12 tickers
Full results: output/ampx_rules/2026-04-17.csv
Discovery — searches SEC 10-K filings for companies claiming monopoly positions:
Discovery Scan — 2026-04-17 (Phases 1-6)
═════════════════════════════════════════
EFTS hits: 18,660 raw → 5,066 unique tickers
USAspending: 49 of top 100 have federal contracts
Customer validation: 63 of top 75 mentioned in other 10-Ks
Deep-dive verdicts: 4 PASS, 29 WATCHLIST, 67 KILL
All thresholds, keywords, and scoring weights live in config.py (800 lines). No code changes needed to customize the scanner for your own use case.
Add moat keywords — append to the DISCOVERY_HARD_KEYWORDS or DISCOVERY_SOFT_KEYWORDS lists:
DISCOVERY_HARD_KEYWORDS = [
"sole source",
"only provider",
"your new keyword here", # just add a string
]Map keywords to moat types — add to DISCOVERY_KEYWORD_MOAT_MAP:
DISCOVERY_KEYWORD_MOAT_MAP = {
"your new keyword here": ["technology", "supply_chain"],
}Change scoring thresholds — adjust the AMPX screener gates:
AMPX_MIN_MARKET_CAP_M = 30 # raise/lower the market cap floor
AMPX_MAX_MARKET_CAP_M = 500 # expand to mid-caps
AMPX_MIN_CRASH_52W_PCT = -60 # require deeper crashTarget different sectors — edit AMPX_PRIORITY_INDUSTRIES:
AMPX_PRIORITY_INDUSTRIES = (
"semiconductor", "nuclear", "quantum",
"your sector here",
)- Keyword Search — 120 keywords across hard ("sole source", "only provider") and soft ("proprietary technology", "ITAR") layers, searched via SEC EDGAR EFTS API
- Sector + Market Cap Scoring — SIC code classification with sector multipliers (defense +5, pharma -3)
- Fundamentals Enrichment — Debt, revenue growth, analyst coverage adjustments
- Government Validation — USAspending.gov API confirms sole-source contracts
- 10-K Context Analysis — Downloads actual filing text, classifies keyword location (Business Description vs Risk Factors)
- Deep Dive — 12-module automated analysis with PASS/WATCHLIST/KILL verdicts
| Dim | Max | Signal |
|---|---|---|
| CRASH | 2.0 | Stock crashed 60-80%+ from highs |
| REVGROWTH | 2.0 | Revenue growing 30-100%+ YoY |
| DEBT | 1.0 | Near-zero debt (D/E < 0.1) |
| RUNWAY | 1.5 | 6-8+ quarters of cash runway |
| FLOAT | 1.0 | Small float (< 100M shares) |
| INSTOWN | 1.0 | Low institutional ownership (< 20%) |
| ANALYST | 1.0 | Under-followed (0-3 analysts) |
| PRIORITY | 1.0 | Target sector match (34 keywords) |
| SHORT | 0.5 | High short interest (> 15%) |
| LEAPS | 0.5 | LEAPS options available |
| INSIDER | 1.0 | Insider buying cluster detected |
Regulatory, Technology/Patent, Infrastructure, Network/Data, Supply Chain, Government Contract, Platform, Switching Cost, Data/IP, Qualified Supplier
# Run all 216 tests
pytest tests/ -v
# Lint
ruff check src/ tests/sec-filing-intelligence/
├── src/sec_filing_intelligence/
│ ├── config.py # 800 lines of thresholds, keywords, scoring weights
│ ├── db.py # SQLite WAL wrapper + 35-table DDL + auto-migration
│ ├── utils.py # Rate limiting, logging, chunking
│ ├── discovery.py # EFTS full-text search engine (6 phases)
│ ├── deep_dive.py # 12-module automated analysis framework
│ ├── filing_scanner.py # SEC EDGAR EFTS query builder + XML parser
│ ├── ampx_rules.py # 11-dimension threshold scorer
│ ├── forward_moat.py # Trajectory signal detection
│ ├── xbrl_fundamentals.py # SEC XBRL financial data extraction
│ ├── fundamentals.py # yfinance data refresh
│ ├── price_analyzer.py # ATH, 52w range, float, short interest
│ ├── form4_parser.py # SEC Form 4 XML transaction parser
│ ├── form4_rss_poller.py # Real-time SEC atom feed poller
│ ├── insider_tracker.py # Insider transaction aggregation
│ └── moat_scorer.py # CPC patent classification scoring
├── tests/ # 216 tests across 8 test modules
├── examples/ # Runnable example scripts
├── .github/workflows/ # CI: tests on Python 3.10/3.11/3.12
├── pyproject.toml
├── .env.example
└── LICENSE (MIT)
MIT