Releases: discretewater/orga
Releases · discretewater/orga
ORGA v0.1.3
ORGA v0.1.3 is the second public release of the project.
ORGA is a fast, explainable, non-LLM extraction engine for profiling institutional websites. It discovers key pages, extracts structured contact information, and assigns primary organization categories using deterministic rules, semantic heuristics, JSON-LD parsing, and lightweight Bayesian classification.
Highlights in this release:
- The second public open-source release of ORGA
- Dockerized FastAPI microservices for single-site extraction and batch jobs
- Deterministic extraction pipeline for names, locations, phones, emails, and social links
- Layered organization classification with rule-based scoring and Bayesian fallback
- Stronger parser hardening and defensive handling for noisy real-world websites
- Improved address/location deduplication and sanitization
- Public-facing documentation, examples, and design notes
- CI workflow, CHANGELOG, and CONTRIBUTING guide included
- PyPI package published for the core library
What ORGA is good at:
- Profiling institutional websites such as hospitals, universities, government agencies, nonprofits, and international organizations
- Producing structured JSON output quickly and predictably
- Serving as an explainable, low-cost baseline without LLM dependencies
Known boundaries:
- Not designed for open-world semantic understanding
- Not intended for deep PDF reading or nuanced corporate hierarchy interpretation
- Address parsing from noisy footer text may still yield partially parsed raw strings in difficult cases
This release establishes the frozen baseline for the current deterministic architecture. Future work may explore lightweight supervised calibration and optional LLM-assisted features without compromising the core system’s speed, traceability, and deterministic behavior.