From 60b4abb8bbc64c10d2b180ba59833d17274ce27b Mon Sep 17 00:00:00 2001 From: Jon Mease Date: Fri, 24 Apr 2026 14:25:15 -0400 Subject: [PATCH] Add datatype field to Field and Metric; reframe is_time as role marker Introduces a top-level `datatype` on Field and Metric with a closed logical enum: string, integer, number, boolean, date, time, timestamp, timestamp_tz, other. Addresses issue #84. `datatype` and `dimension.is_time` are independent and orthogonal: - `datatype` declares the field's logical data type (casting/serialization). - `is_time` is a temporal-role marker (time-series analysis, temporal filtering). A field with `is_time: true` may carry any `datatype` (e.g. integer for a year grain, string for a month name, date for a calendar date). When `is_time` is unset, it defaults to `true` for temporal datatypes (`date`, `time`, `timestamp`, `timestamp_tz`) and `false` otherwise. Explicit `is_time` always wins, so authors can set `is_time: false` on an audit `created_at` to keep it off the time axis. Taxonomy and type/role split were chosen after benchmarking 14 peer semantic layers and 5 portable type standards. Notable precedent: Snowflake Semantic Views' YAML authoring form has a `time_dimensions:` collection whose entries can carry any `data_type` (the published example annotates `order_year` with `data_type: NUMBER`); LookML's `dimension_group` accepts `date`, `datetime`, `timestamp`, `epoch`, and `yyyymmdd`. Snowflake converter updated: `_classify_field` honors explicit `is_time` first, then falls back to the temporal-datatype default. 9 new tests cover the datatype paths and the mixed-metadata cases (`d_year` with `datatype: integer` and `is_time: true`, audit timestamp opt-out, etc.). tpcds_semantic_model.yaml demonstrates three coexistence patterns: datatype-only, datatype + is_time, and is_time-only. Also added .gitignore for __pycache__ directories. Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitignore | 2 + converters/index.md | 9 ++- .../src/osi_to_snowflake_yaml_converter.py | 26 +++++++- .../test_osi_to_snowflake_yaml_converter.py | 52 ++++++++++++++++ core-spec/osi-schema.json | 23 ++++++- core-spec/spec.md | 61 +++++++++++++++++-- core-spec/spec.yaml | 23 ++++++- docs/index.md | 2 +- examples/tpcds_semantic_model.yaml | 41 ++++++++++++- 9 files changed, 225 insertions(+), 14 deletions(-) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..43ae0e2 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +__pycache__/ +*.py[cod] diff --git a/converters/index.md b/converters/index.md index 6477291..ac37c2f 100644 --- a/converters/index.md +++ b/converters/index.md @@ -93,11 +93,17 @@ Datasets represent logical tables (fact or dimension tables). They contain field Fields represent row-level attributes. They can be simple column references or computed expressions. +> **Note:** `datatype` (on `Field` and `Metric`) declares a field's logical data type; `dimension.is_time` is an independent temporal-role marker. +> A field may carry both, either, or neither. Use `datatype` for data-type questions (casting, serialization); use `is_time` for role questions +> (classifying time dimensions). When `is_time` is unset it defaults to `true` if `datatype` is one of `date`, `time`, `timestamp`, `timestamp_tz`, +> and `false` otherwise. Explicit `is_time` always wins. + | OSI Field | Description | Converter Consideration | |-----------|-------------|------------------------| | `name` | Field identifier | Map to column/attribute name | | `expression.dialects` | Multi-dialect SQL expressions | Select the dialect matching the target vendor; fall back to `ANSI_SQL` | -| `dimension.is_time` | Whether the field is a time dimension | Map to vendor-specific time dimension markers | +| `datatype` | Logical data type of the field (one of `string`, `integer`, `number`, `boolean`, `date`, `time`, `timestamp`, `timestamp_tz`, `other`). | Converters SHOULD consult `datatype` for the field's data type; prefer the temporal members (`date`, `time`, `timestamp`, `timestamp_tz`) to classify time dimensions. Use `other` + `custom_extensions` for types not covered by the enum. | +| `dimension.is_time` | Temporal-role marker. When `true`, the field should be treated as a time dimension regardless of its `datatype` (e.g. an integer year grain, a string month name, or a date column). When unset, defaults to `true` for temporal `datatype`s (`date`, `time`, `timestamp`, `timestamp_tz`) and `false` otherwise. | Map to vendor-specific time dimension markers. Converters SHOULD classify as a time dimension when `is_time` resolves to `true` (either explicit or defaulted from a temporal `datatype`). An explicit `is_time: false` suppresses the time-dimension classification even on temporal-typed columns. | | `label` | Categorization label | Map if vendor supports field labels/tags | | `description` | Human-readable description | Most vendors support field descriptions | | `ai_context` | Synonyms and business context | Map if vendor supports semantic annotations | @@ -144,6 +150,7 @@ Metrics are aggregate measures defined at the semantic model level. They can spa |-----------|-------------|------------------------| | `name` | Metric identifier | Map to vendor's measure/KPI name | | `expression.dialects` | Multi-dialect aggregate expressions | Select the appropriate dialect; fall back to `ANSI_SQL` | +| `datatype` | Logical data type of the metric result (one of `string`, `integer`, `number`, `boolean`, `date`, `time`, `timestamp`, `timestamp_tz`, `other`). | Converters SHOULD consult `datatype` to declare the result type of the aggregation. Most numeric measures will be `number` or `integer`; use `other` + `custom_extensions` for types not covered by the enum. | | `description` | What the metric measures | Most vendors support descriptions | | `ai_context` | Synonyms and business context | Map if vendor supports semantic annotations | diff --git a/converters/snowflake/src/osi_to_snowflake_yaml_converter.py b/converters/snowflake/src/osi_to_snowflake_yaml_converter.py index a296173..040e27d 100644 --- a/converters/snowflake/src/osi_to_snowflake_yaml_converter.py +++ b/converters/snowflake/src/osi_to_snowflake_yaml_converter.py @@ -16,6 +16,8 @@ SUPPORTED_VERSION = "0.1.1" +_TIME_DATATYPES = frozenset({"date", "time", "timestamp", "timestamp_tz"}) + class OsiConversionError(Exception): """Raised when an OSI YAML cannot be converted to Snowflake format.""" @@ -199,11 +201,31 @@ def _convert_dataset(dataset): def _classify_field(field): - """Returns 'dimension', 'time_dimension', or 'fact' based on field structure.""" + """Classify a field as 'fact', 'dimension', or 'time_dimension'. + + ``datatype`` declares the field's data type; ``dimension.is_time`` is + an independent temporal-role marker. Classification rules: + + - A field with no ``dimension`` block is a ``fact`` regardless of + ``datatype`` (data type does not imply role). + - Explicit ``dimension.is_time`` always wins: ``True`` classifies as + ``time_dimension``; ``False`` classifies as ``dimension`` even when + ``datatype`` is temporal (author opt-out for e.g. audit timestamps). + - When ``dimension.is_time`` is unset, it defaults to ``True`` for + temporal ``datatype`` values (``date``, ``time``, ``timestamp``, + ``timestamp_tz``) and ``False`` otherwise. + """ dimension = field.get("dimension") if dimension is None: return "fact" - if isinstance(dimension, dict) and dimension.get("is_time") is True: + is_time = dimension.get("is_time") if isinstance(dimension, dict) else None + if is_time is True: + return "time_dimension" + if is_time is False: + return "dimension" + # is_time is unset; default from datatype + datatype = field.get("datatype") + if datatype in _TIME_DATATYPES: return "time_dimension" return "dimension" diff --git a/converters/snowflake/tests/test_osi_to_snowflake_yaml_converter.py b/converters/snowflake/tests/test_osi_to_snowflake_yaml_converter.py index 70a1e52..e93063d 100644 --- a/converters/snowflake/tests/test_osi_to_snowflake_yaml_converter.py +++ b/converters/snowflake/tests/test_osi_to_snowflake_yaml_converter.py @@ -178,6 +178,58 @@ def test_dimension_bare_true(self): def test_dimension_none_is_fact(self): assert _classify_field({"dimension": None}) == "fact" + def test_datatype_timestamp_is_time_dimension(self): + assert _classify_field( + {"dimension": {}, "datatype": "timestamp"} + ) == "time_dimension" + + def test_datatype_date_is_time_dimension(self): + assert _classify_field( + {"dimension": {}, "datatype": "date"} + ) == "time_dimension" + + def test_datatype_time_is_time_dimension(self): + assert _classify_field( + {"dimension": {}, "datatype": "time"} + ) == "time_dimension" + + def test_datatype_timestamp_tz_is_time_dimension(self): + assert _classify_field( + {"dimension": {}, "datatype": "timestamp_tz"} + ) == "time_dimension" + + def test_datatype_string_is_dimension(self): + assert _classify_field( + {"dimension": {}, "datatype": "string"} + ) == "dimension" + + def test_datatype_other_is_dimension(self): + assert _classify_field( + {"dimension": {}, "datatype": "other"} + ) == "dimension" + + def test_is_time_preserved_when_datatype_non_temporal(self): + """A dimension with is_time=True is classified as a time_dimension + even when datatype is non-temporal, because is_time is an + independent role marker (e.g., d_year with datatype: integer and + is_time: true is a time-role integer grain).""" + assert _classify_field( + {"dimension": {"is_time": True}, "datatype": "integer"} + ) == "time_dimension" + + def test_is_time_false_suppresses_time_dimension_on_temporal_datatype(self): + """Explicit is_time: false is an author opt-out for temporal + columns (e.g., an audit created_at that should not appear on + the time axis). Explicit is_time always wins over the default.""" + assert _classify_field( + {"dimension": {"is_time": False}, "datatype": "timestamp"} + ) == "dimension" + + def test_no_dimension_with_temporal_datatype_is_still_fact(self): + """A temporal datatype on a field with no dimension block is still + a fact; type does not imply role.""" + assert _classify_field({"datatype": "timestamp"}) == "fact" + # --------------------------------------------------------------------------- # _extract_expression diff --git a/core-spec/osi-schema.json b/core-spec/osi-schema.json index 30210d1..a0ae73e 100644 --- a/core-spec/osi-schema.json +++ b/core-spec/osi-schema.json @@ -122,13 +122,28 @@ "required": ["dialects"], "additionalProperties": false }, + "Datatype": { + "type": "string", + "enum": [ + "string", + "integer", + "number", + "boolean", + "date", + "time", + "timestamp", + "timestamp_tz", + "other" + ], + "description": "Logical data type for fields and metrics. Describes what kind of value is stored, independent of role (e.g. dimension vs fact). Use `other` plus `custom_extensions` for vendor-specific types not covered by the enum." + }, "Dimension": { "type": "object", "description": "Dimension metadata", "properties": { "is_time": { "type": "boolean", - "description": "Indicates if this is a time-based dimension for temporal filtering" + "description": "Temporal-role marker. When true, consumers that distinguish time dimensions (e.g. for time-series analysis or temporal filtering) should treat this field as a time dimension. This is a *role* flag, independent of the field's data type: a field with `is_time: true` may carry any `datatype` (e.g. `integer` for a year grain, `string` for a month name, as well as temporal datatypes). When `is_time` is unset, it defaults to `true` if `datatype` is one of `date`, `time`, `timestamp`, or `timestamp_tz`, and `false` otherwise. Set `is_time: false` explicitly to opt a temporal-typed column (such as an audit timestamp) out of time-dimension treatment." } }, "additionalProperties": false @@ -155,6 +170,9 @@ "type": "string", "description": "Human-readable description" }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, "ai_context": { "$ref": "#/$defs/AIContext" }, @@ -280,6 +298,9 @@ "type": "string", "description": "Human-readable description of what the metric measures" }, + "datatype": { + "$ref": "#/$defs/Datatype" + }, "ai_context": { "$ref": "#/$defs/AIContext" }, diff --git a/core-spec/spec.md b/core-spec/spec.md index 33da9bd..6e92832 100644 --- a/core-spec/spec.md +++ b/core-spec/spec.md @@ -36,6 +36,22 @@ Supported SQL and expression language dialects for metrics and field definitions | `TABLEAU` | Tableau calculations | | `DATABRICKS` | Databricks SQL | +### Datatypes + +Logical data types for fields and metrics. + +| Datatype | Description | +|----------|-------------| +| `string` | Variable-length Unicode character data. | +| `integer` | Signed integer with no scale. | +| `number` | Real number (floating-point or decimal) with unspecified precision. | +| `boolean` | Logical two-valued truth type. | +| `date` | Calendar date with no time-of-day component. | +| `time` | Time-of-day with no date component. | +| `timestamp` | Instant-in-time without timezone offset (naive / local). | +| `timestamp_tz` | Instant-in-time with timezone offset (zoned). | +| `other` | Any data type not covered above; use `custom_extensions` for vendor-specific refinement. | + ### Vendors Supported vendors for custom extensions and integrations. @@ -202,6 +218,7 @@ Fields represent row-level attributes that can be used for grouping, filtering, | `dimension` | object | No | Dimension metadata (e.g., `is_time` flag) | | `label` | string | No | Label for categorization | | `description` | string | No | Human-readable description | +| `datatype` | string (enum) | No | Logical data type for this field. See [Datatypes](#datatypes). | | `ai_context` | string/object | No | Additional context for AI tools (e.g., synonyms) | | `custom_extensions` | array | No | Vendor-specific attributes | @@ -228,7 +245,7 @@ expression: | Field | Type | Description | |-------|------|-------------| -| `is_time` | boolean | Indicates if this is a time-based dimension for temporal filtering | +| `is_time` | boolean | Temporal-role marker. When `true`, consumers that distinguish time dimensions (e.g. for time-series analysis or temporal filtering) should treat this field as a time dimension. This is a *role* flag, independent of the field's data type. See [Datatype and `is_time`: type vs. role](#datatype-and-is_time-type-vs-role). | ### Examples @@ -268,6 +285,7 @@ expression: dialects: - dialect: ANSI_SQL expression: order_date + datatype: date dimension: is_time: true description: Date when order was placed @@ -290,6 +308,33 @@ expression: description: Normalized email address ``` +### Datatype and `is_time`: type vs. role + +`datatype` and `dimension.is_time` are independent properties that answer different questions: + +- **`datatype`** describes the *data type* of the field (e.g. `date`, `integer`, `string`, `timestamp_tz`): what kind of values the field holds. +- **`dimension.is_time`** is a *temporal-role marker*: whether the field should be treated as a time dimension for time-series analysis or temporal filtering, regardless of its data type. + +**Default for `is_time`.** When `is_time` is not set explicitly, it defaults to `true` if `datatype` is one of `date`, `time`, `timestamp`, `timestamp_tz`, and `false` otherwise. Explicit `is_time` always wins. Set `is_time: false` on a temporal-typed column (e.g. an audit `created_at` you don't want on the time axis) to opt out of the default. + +Common combinations: + +| Column example | `datatype` | `is_time` | Effective role | Why | +|---|---|---|---|---| +| `d_date` (calendar date) | `date` | omitted | time dimension | Temporal `datatype`; `is_time` defaults to `true`. | +| `order_timestamp` | `timestamp_tz` | omitted | time dimension | Same. | +| `created_at` (audit timestamp) | `timestamp` | `false` | regular dimension | Explicit opt-out of the temporal default. | +| `d_year` (integer year grain) | `integer` | `true` | time dimension | Non-temporal `datatype`; `is_time: true` makes the role explicit. | +| `d_quarter_name` (e.g. `"Q1"`) | `string` | `true` | time dimension | String-valued temporal grain. | +| `customer_id` | `integer` | omitted | regular dimension | Non-temporal `datatype`; `is_time` defaults to `false`. | + +> **Precedent.** This type/role separation mirrors [Snowflake Semantic Views' YAML authoring form](https://docs.snowflake.com/en/user-guide/views-semantic/semantic-view-yaml-spec), which has a structural `time_dimensions:` collection whose entries can carry any `data_type`. The published example annotates `order_year` with `data_type: NUMBER`. LookML supports a similar split via its [`dimension_group`](https://cloud.google.com/looker/docs/reference/param-field-dimension-group), whose `datatype` enum covers `date`, `datetime`, `timestamp`, plus the integer-encoded forms `epoch` and `yyyymmdd`. + +**Consumer guidance.** + +- For *data-type* questions (casting, serialization, downstream type inference): prefer `datatype` when present. If only `is_time: true` is set, do not infer a specific scalar type from it. +- For *role* questions (classifying time dimensions in a query UI, generating time-series output sections, choosing time-aware aggregations): treat the field as a time dimension when `is_time` resolves to `true`, whether explicitly set or defaulted from a temporal `datatype`. + --- ## Metrics @@ -303,6 +348,7 @@ Quantitative measures defined on business data, representing key calculations li | `name` | string | Yes | Unique identifier for the metric | | `expression` | object | Yes | Expression definition with dialect support | | `description` | string | No | Human-readable description of what the metric measures | +| `datatype` | string (enum) | No | Logical data type for this metric. See [Datatypes](#datatypes). | | `ai_context` | string/object | No | Additional context for AI tools (e.g., synonyms) | | `custom_extensions` | array | No | Vendor-specific attributes | @@ -324,9 +370,11 @@ expression: ```yaml - name: total_revenue expression: - - dialect: ANSI_SQL - expression: SUM(orders.amount) + dialects: + - dialect: ANSI_SQL + expression: SUM(orders.amount) description: Total revenue across all orders + datatype: number ai_context: synonyms: - "total sales" @@ -338,9 +386,11 @@ expression: ```yaml - name: avg_orders expression: - - dialect: ANSI_SQL - expression: SUM(orders.amount) / COUNT(DISTINCT customers.id) + dialects: + - dialect: ANSI_SQL + expression: SUM(orders.amount) / COUNT(DISTINCT customers.id) description: Average orders + datatype: number ai_context: synonyms: - "Order Average by customer" @@ -446,6 +496,7 @@ semantic_model: dialects: - dialect: ANSI_SQL expression: order_date + datatype: date dimension: is_time: true description: Order date diff --git a/core-spec/spec.yaml b/core-spec/spec.yaml index d27a9ba..1ead02c 100644 --- a/core-spec/spec.yaml +++ b/core-spec/spec.yaml @@ -168,8 +168,17 @@ fields: # Optional: Dimension metadata # Indicates this field can be used as a dimension for grouping/filtering dimension: - # Optional: Indicates if this is a time-based dimension - # Used for time-series analysis and temporal filtering + # Optional: Temporal-role marker + # When true, consumers should treat this field as a time dimension + # for time-series analysis and temporal filtering. This is a *role* + # flag, independent of the field's data type. A field with + # is_time: true may carry any datatype (e.g. integer for a year + # grain, string for a month name, date/timestamp for a date column). + # + # Default: when unset, is_time defaults to true if datatype is one + # of date, time, timestamp, timestamp_tz, and false otherwise. Set + # is_time: false explicitly to opt a temporal-typed column out of + # time-dimension treatment. is_time: boolean # Optional: Label for categorization (e.g., "filter") @@ -178,6 +187,11 @@ fields: # Optional: Human-readable description of the field description: string + # Optional: Logical data type for this field + # One of: string, integer, number, boolean, date, time, timestamp, timestamp_tz, other + # Use "other" + custom_extensions for vendor-specific types + datatype: string + # Optional: Additional context for AI tools (e.g., synonyms, business terms) # Helps LLMs understand the field meaning and generate better queries ai_context: string @@ -207,6 +221,11 @@ metrics: # Should explain what the metric measures and how it's used description: string + # Optional: Logical data type for this metric + # One of: string, integer, number, boolean, date, time, timestamp, timestamp_tz, other + # Use "other" + custom_extensions for vendor-specific types + datatype: string + # Optional: Additional context for AI tools (e.g., synonyms, business context) # Helps LLMs understand the metric meaning and suggest it appropriately ai_context: string diff --git a/docs/index.md b/docs/index.md index 6c9f5f7..634c03e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -365,7 +365,7 @@ A practical guide for organizations looking to adopt OSI. |------|------------| | **Semantic Model** | A structured description of business data that defines datasets, fields, relationships, and metrics. It provides a shared vocabulary for interpreting data across tools and teams. | | **Dataset** | A logical representation of a business entity, typically corresponding to a fact table or dimension table in a data warehouse. | -| **Field** | A row-level attribute within a dataset, used for grouping, filtering, or as part of metric expressions. Fields can be simple column references or computed expressions. | +| **Field** | A row-level attribute within a dataset, used for grouping, filtering, or as part of metric expressions. Fields can be simple column references or computed expressions. A field's logical data type is declared by the optional top-level `datatype` field (one of `string`, `integer`, `number`, `boolean`, `date`, `time`, `timestamp`, `timestamp_tz`, or `other`). | | **Dimension** | A categorical attribute used to slice and filter data (e.g., region, product category, date). In OSI, dimensions are represented as fields with optional metadata such as `is_time`. | | **Metric** | A quantitative measure computed by aggregating data across one or more datasets (e.g., total revenue, average order value). Metrics are defined at the semantic model level. | | **Relationship** | A foreign key connection between two datasets, defining how they can be joined. Relationships are always many-to-one (from the referencing dataset to the referenced dataset). | diff --git a/examples/tpcds_semantic_model.yaml b/examples/tpcds_semantic_model.yaml index 98e956b..de0ee5d 100644 --- a/examples/tpcds_semantic_model.yaml +++ b/examples/tpcds_semantic_model.yaml @@ -34,6 +34,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_sold_date_sk description: Foreign key to date dimension + datatype: integer dimension: is_time: false ai_context: @@ -47,6 +48,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_item_sk description: Foreign key to item dimension + datatype: integer dimension: is_time: false ai_context: @@ -60,6 +62,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_customer_sk description: Foreign key to customer dimension + datatype: integer dimension: is_time: false ai_context: @@ -73,6 +76,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_store_sk description: Foreign key to store dimension + datatype: integer dimension: is_time: false ai_context: @@ -86,6 +90,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_quantity description: Quantity of items sold + datatype: integer ai_context: synonyms: - "units sold" @@ -97,6 +102,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_sales_price description: Sales price per unit + datatype: number ai_context: synonyms: - "unit price" @@ -108,6 +114,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_ext_sales_price description: Extended sales price (quantity * price) + datatype: number ai_context: synonyms: - "total price" @@ -119,6 +126,7 @@ semantic_model: - dialect: ANSI_SQL expression: ss_net_profit description: Net profit from the sale + datatype: number ai_context: synonyms: - "profit" @@ -144,6 +152,7 @@ semantic_model: - dialect: ANSI_SQL expression: d_date_sk description: Surrogate key for date + datatype: integer dimension: is_time: false @@ -153,8 +162,8 @@ semantic_model: - dialect: ANSI_SQL expression: d_date description: Actual date value - dimension: - is_time: true + datatype: date + dimension: {} ai_context: synonyms: - "date" @@ -166,12 +175,15 @@ semantic_model: - dialect: ANSI_SQL expression: d_year description: Year + datatype: integer dimension: is_time: true ai_context: synonyms: - "year" + # Declares temporal role via is_time without a datatype annotation. + # Both datatype and is_time are independently optional. - name: d_quarter_name expression: dialects: @@ -185,6 +197,8 @@ semantic_model: - "quarter" - "fiscal quarter" + # Declares temporal role via is_time without a datatype annotation. + # Both datatype and is_time are independently optional. - name: d_month_name expression: dialects: @@ -217,6 +231,7 @@ semantic_model: - dialect: ANSI_SQL expression: c_customer_sk description: Surrogate key for customer + datatype: integer dimension: is_time: false @@ -226,6 +241,7 @@ semantic_model: - dialect: ANSI_SQL expression: c_customer_id description: Business key for customer + datatype: string dimension: is_time: false ai_context: @@ -239,6 +255,7 @@ semantic_model: - dialect: ANSI_SQL expression: c_first_name description: Customer first name + datatype: string dimension: is_time: false @@ -248,6 +265,7 @@ semantic_model: - dialect: ANSI_SQL expression: c_last_name description: Customer last name + datatype: string dimension: is_time: false @@ -257,6 +275,7 @@ semantic_model: - dialect: ANSI_SQL expression: c_first_name || ' ' || c_last_name description: Customer full name (computed field) + datatype: string dimension: is_time: false ai_context: @@ -270,6 +289,7 @@ semantic_model: - dialect: ANSI_SQL expression: c_email_address description: Customer email address + datatype: string dimension: is_time: false ai_context: @@ -297,6 +317,7 @@ semantic_model: - dialect: ANSI_SQL expression: i_item_sk description: Surrogate key for item + datatype: integer dimension: is_time: false @@ -306,6 +327,7 @@ semantic_model: - dialect: ANSI_SQL expression: i_item_id description: Business key for item + datatype: string dimension: is_time: false ai_context: @@ -320,6 +342,7 @@ semantic_model: - dialect: ANSI_SQL expression: i_item_desc description: Item description + datatype: string dimension: is_time: false ai_context: @@ -333,6 +356,7 @@ semantic_model: - dialect: ANSI_SQL expression: i_brand description: Brand name + datatype: string dimension: is_time: false ai_context: @@ -346,6 +370,7 @@ semantic_model: - dialect: ANSI_SQL expression: i_category description: Item category + datatype: string dimension: is_time: false ai_context: @@ -359,6 +384,7 @@ semantic_model: - dialect: ANSI_SQL expression: i_current_price description: Current price of the item + datatype: number dimension: is_time: false ai_context: @@ -386,6 +412,7 @@ semantic_model: - dialect: ANSI_SQL expression: s_store_sk description: Surrogate key for store + datatype: integer dimension: is_time: false @@ -395,6 +422,7 @@ semantic_model: - dialect: ANSI_SQL expression: s_store_id description: Business key for store + datatype: string dimension: is_time: false ai_context: @@ -408,6 +436,7 @@ semantic_model: - dialect: ANSI_SQL expression: s_store_name description: Store name + datatype: string dimension: is_time: false ai_context: @@ -421,6 +450,7 @@ semantic_model: - dialect: ANSI_SQL expression: s_city description: City where store is located + datatype: string dimension: is_time: false ai_context: @@ -434,6 +464,7 @@ semantic_model: - dialect: ANSI_SQL expression: s_state description: State where store is located + datatype: string dimension: is_time: false ai_context: @@ -447,6 +478,7 @@ semantic_model: - dialect: ANSI_SQL expression: s_number_employees description: Number of employees at the store + datatype: integer ai_context: synonyms: - "employee count" @@ -502,6 +534,7 @@ semantic_model: - dialect: ANSI_SQL expression: SUM(store_sales.ss_ext_sales_price) description: Total sales revenue across all transactions + datatype: number ai_context: synonyms: - "total revenue" @@ -514,6 +547,7 @@ semantic_model: - dialect: ANSI_SQL expression: SUM(store_sales.ss_net_profit) description: Total net profit from store sales + datatype: number ai_context: synonyms: - "net profit" @@ -526,6 +560,7 @@ semantic_model: - dialect: ANSI_SQL expression: SUM(store_sales.ss_ext_sales_price) / COUNT(DISTINCT customer.c_customer_sk) description: Average lifetime sales value per customer + datatype: number ai_context: synonyms: - "CLV" @@ -539,6 +574,7 @@ semantic_model: - dialect: ANSI_SQL expression: SUM(store_sales.ss_ext_sales_price) description: Total sales by brand (requires grouping by item.i_brand) + datatype: number ai_context: synonyms: - "brand sales" @@ -551,6 +587,7 @@ semantic_model: - dialect: ANSI_SQL expression: SUM(store_sales.ss_ext_sales_price) / NULLIF(SUM(store.s_number_employees), 0) description: Sales per employee across stores + datatype: number ai_context: synonyms: - "sales per employee"