Skip to content

object_fields: characterize array and nested-object values #58

Description

@abegong

Summary

The object_fields measurement primitive (see
product/specs/inspector-layers-spec.md)
builds a per-field data dictionary over a set of frontmatter objects: presence,
type histogram, cardinality, and the most common values.

Its first cut characterizes scalar values only (keeping string and numeric
scalars distinct). Array and nested-object values are counted as present and
typed, but not characterized further. This issue tracks deepening that.

Motivation

Array fields are common in wikis — tags: [a, b, c] is the obvious case — and
today they report only "present, array," missing the element-value distribution
an agent would want to spot an enum-of-tags. Nested objects (meta: {…}) are
similarly opaque.

Proposal

Two independent extensions, in rough priority order:

  1. Array elements. Treat an array of scalars as a multiset of its elements
    and run the dictionary over the flattened elements (cardinality, common
    values). Decide handling for arrays of objects.
  2. Nested objects. Flatten meta.x into dotted keys and dictionary those
    too. Watch for key explosion and output growth on deeply nested data.

Tradeoffs

Captured in the inspector-layers spec, Open Question 4 (resolved: scalars only
for the first cut, this deferred to a follow-up):

  • Scalars only — simple, bounded, but misses the common tag case.
  • Array-element characterization — makes tag/label fields legible; adds a second
    aggregation mode; ambiguous for arrays of objects.
  • Nested-object recursion — full coverage, but risks key explosion and thin
    signal for deep nesting.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions