Skip to content

fix(converters): emit schema-conformant OSI documents (drop non-spec root dialects/vendors)#148

Open
andreybavt wants to merge 1 commit into
open-semantic-interchange:mainfrom
Kaelio:fix/schema-conformant-osi-documents
Open

fix(converters): emit schema-conformant OSI documents (drop non-spec root dialects/vendors)#148
andreybavt wants to merge 1 commit into
open-semantic-interchange:mainfrom
Kaelio:fix/schema-conformant-osi-documents

Conversation

@andreybavt
Copy link
Copy Markdown

@andreybavt andreybavt commented Jun 8, 2026

Problem

Output from the dbt→OSI converter fails OSI's own validator (validation/validate.py)
against the published core schema:

[Schema] (root): Additional properties are not allowed ('dialects' was unexpected)

Because the CLI writes via to_osi_yaml(), every dbt→OSI conversion produces a
non-conformant document.

Root cause

OSIDocument (python/src/osi/models.py) declared optional root-level
dialects/vendors fields that aren't in core-spec/osi-schema.json, whose root is
additionalProperties: false (only version + semantic_model). The dbt converter
set dialects=[self._dialect], so dialects was emitted at the document root.

Fix

  • Remove the non-spec dialects/vendors fields from OSIDocument.
  • Stop the dbt converter from populating a document-root dialect.

Per-expression dialect tagging (OSIExpression.dialects) is unchanged and remains the
schema-valid home for dialects, so no information is lost. Dialect selection still
flows end-to-end; the two affected converter tests now assert it on the per-expression path.

Regression guard

Adds converters/dbt/tests/test_schema_conformance.py, which converts a representative
manifest and validates the emitted document (YAML and JSON, for ANSI_SQL and
SNOWFLAKE) against core-spec/osi-schema.json, reusing validation/validate.py.
CI now fails if a converter emits a non-conformant document root again.

Testing

  • validate.py on converter output: fails before ('dialects' was unexpected),
    passes after.
  • Full dbt converter test suite green.

Out of scope

Whether OSI should support a document-level (default) dialect is the open discussion in
#52 (one dialect per document) and #16 (default dialect at dataset level). This PR takes
no position and makes no schema change - it only aligns the reference model and
converter with the schema as published today.

Drop the non-spec root `dialects`/`vendors` fields from OSIDocument and stop the
dbt converter emitting a root dialect, so dbt->OSI output validates against
core-spec/osi-schema.json. Dialects remain per-expression (no information lost).
Add a regression test that schema-validates converter output.
@andreybavt
Copy link
Copy Markdown
Author

@khush-bhatia , let me know if this first PR follows the community guidelines. Happy to take bigger scope after this first one lands

@khush-bhatia khush-bhatia requested a review from QMalcolm June 8, 2026 18:23
Copy link
Copy Markdown
Member

@khush-bhatia khush-bhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants