fix(converters): emit schema-conformant OSI documents (drop non-spec root dialects/vendors)#148
Open
andreybavt wants to merge 1 commit into
Conversation
Drop the non-spec root `dialects`/`vendors` fields from OSIDocument and stop the dbt converter emitting a root dialect, so dbt->OSI output validates against core-spec/osi-schema.json. Dialects remain per-expression (no information lost). Add a regression test that schema-validates converter output.
Author
|
@khush-bhatia , let me know if this first PR follows the community guidelines. Happy to take bigger scope after this first one lands |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Output from the dbt→OSI converter fails OSI's own validator (
validation/validate.py)against the published core schema:
Because the CLI writes via
to_osi_yaml(), everydbt→OSIconversion produces anon-conformant document.
Root cause
OSIDocument(python/src/osi/models.py) declared optional root-leveldialects/vendorsfields that aren't incore-spec/osi-schema.json, whose root isadditionalProperties: false(onlyversion+semantic_model). The dbt converterset
dialects=[self._dialect], sodialectswas emitted at the document root.Fix
dialects/vendorsfields fromOSIDocument.Per-expression dialect tagging (
OSIExpression.dialects) is unchanged and remains theschema-valid home for dialects, so no information is lost. Dialect selection still
flows end-to-end; the two affected converter tests now assert it on the per-expression path.
Regression guard
Adds
converters/dbt/tests/test_schema_conformance.py, which converts a representativemanifest and validates the emitted document (YAML and JSON, for
ANSI_SQLandSNOWFLAKE) againstcore-spec/osi-schema.json, reusingvalidation/validate.py.CI now fails if a converter emits a non-conformant document root again.
Testing
validate.pyon converter output: fails before ('dialects' was unexpected),passes after.
Out of scope
Whether OSI should support a document-level (default) dialect is the open discussion in
#52 (one dialect per document) and #16 (default dialect at dataset level). This PR takes
no position and makes no schema change - it only aligns the reference model and
converter with the schema as published today.