Skip to content

Releases: discretewater/orga

ORGA v0.1.3

22 Mar 17:23

Choose a tag to compare

ORGA v0.1.3 is the second public release of the project.

ORGA is a fast, explainable, non-LLM extraction engine for profiling institutional websites. It discovers key pages, extracts structured contact information, and assigns primary organization categories using deterministic rules, semantic heuristics, JSON-LD parsing, and lightweight Bayesian classification.

Highlights in this release:

  • The second public open-source release of ORGA
  • Dockerized FastAPI microservices for single-site extraction and batch jobs
  • Deterministic extraction pipeline for names, locations, phones, emails, and social links
  • Layered organization classification with rule-based scoring and Bayesian fallback
  • Stronger parser hardening and defensive handling for noisy real-world websites
  • Improved address/location deduplication and sanitization
  • Public-facing documentation, examples, and design notes
  • CI workflow, CHANGELOG, and CONTRIBUTING guide included
  • PyPI package published for the core library

What ORGA is good at:

  • Profiling institutional websites such as hospitals, universities, government agencies, nonprofits, and international organizations
  • Producing structured JSON output quickly and predictably
  • Serving as an explainable, low-cost baseline without LLM dependencies

Known boundaries:

  • Not designed for open-world semantic understanding
  • Not intended for deep PDF reading or nuanced corporate hierarchy interpretation
  • Address parsing from noisy footer text may still yield partially parsed raw strings in difficult cases

This release establishes the frozen baseline for the current deterministic architecture. Future work may explore lightweight supervised calibration and optional LLM-assisted features without compromising the core system’s speed, traceability, and deterministic behavior.