Context
The track-level filter: "<value>" shortcut covers every track in the shipped default config by narrowing to a single type value (SIGNAL, DOMAIN, CHAIN, …). Consumers bringing their own data will hit the "almost right, just need to tweak" cases the shortcut cannot express: score >= 0.8 AND type IN ('binding', 'catalytic'), renaming a column from desc to description, projecting 40 fields down to 4, capping at N rows, or deriving length = end - start. That need is the motivation for a declarative transform: pipeline on DataSourceDescriptor, modelled on Vega-Lite's transform vocabulary (filter | calculate | rename | pick | limit).
The design is fully specified in specs/transform-engine.md: TypeScript types, JSON Schema additions, a ~250-line hand-rolled SQL-WHERE-flavored expression parser in src/schema/expressions.ts (zero runtime dependency, ~2–4 kB gzipped), the engine proper in src/schema/transforms.ts, validator + normalizer + loader wire-in, the three test suites, and an implementation plan with acceptance criteria.
Task
Implement the design in specs/transform-engine.md as a single PR.
Scope (brief — see the design doc for the full spec):
- Extend
src/schema/types.ts with Transform, FieldPredicate, TransformFunction, DataSourceDescriptor.transform?:, and ProtvistaRuntimeAPI.registerTransform().
- Extend
src/schema/schema.json with the Transform and FieldPredicate $defs and the transform: property on DataSourceDescriptor.
- Extend
src/schema/registry.ts with a transforms bucket and BUILTIN_TRANSFORM_OPERATORS.
- Re-export the new types +
BUILTIN_TRANSFORM_OPERATORS from src/schema/index.ts.
- Create
src/schema/expressions.ts — hand-rolled lexer, recursive-descent parser by precedence level, and AST-to-closure compiler for the SQL-WHERE-flavored grammar (AND / OR / NOT, = / != / <> / < / <= / > / >=, IN / NOT IN, BETWEEN / NOT BETWEEN, LIKE / NOT LIKE, IS [NOT] NULL, + / - / * / /, dotted-path identifiers, single-quoted strings, number literals, parens).
- Create
src/schema/transforms.ts — engine, applyTransforms(), fieldPredicateToFn(), registerBuiltinTransforms().
- Extend
src/schema/validate.ts with checkTransform(), the FieldPredicate anyOf → missing-predicate-operator special-case, and (optional but recommended) parse-time expression validation so authors see typos at config-load instead of track-load.
- Preserve
transform in src/schema/normalize.ts (NormalizedDataSource.transform?: + expandDescriptor passthrough).
- Replace the two inline
.filter(...) call sites in src/load-data.ts with applyTransforms(...). Thread Registry through the loader signature and call registerBuiltinTransforms(registry) once at loader init.
- Write the three test suites:
expressions.spec.ts (lexer + parser precedence + parser errors + evaluator + operator coverage + null safety + dotted paths), transforms.spec.ts (engine contract), and restore the transform-vocabulary cases in schema.spec.ts / validate.spec.ts / types.spec.ts / normalize.spec.ts / registry.spec.ts.
- Update
specs/config-approach.md: add the Non-Goals bullet, the Data Model blocks, the Edge Cases rows, the acceptance criteria, and the Example 4 pipeline. See §7 of the design doc for the exact passages.
- Remove the two planning comments in
src/schema/types.ts (near line 21) and src/schema/registry.ts (near line 38) that point at this issue's design doc.
Notes:
The design deliberately borrows SQL-WHERE syntax and not SQL NULL semantics: missing / undefined / null / NaN are all falsy, IS NULL matches any of them, x = NULL is false (two-valued logic). Identifier fields are case-sensitive; keywords are case-insensitive. Strings use single quotes only; LIKE uses SQL wildcards (% / _) compiled to anchored regex. Dotted-path identifiers (association.disease, locations.0.start) walk the item via the same readDottedPath helper the structured FieldPredicate already uses — 32-segment depth cap.
Acceptance hinges on four measurable things: the default UniProt config still loads and renders identically (no one needs to add a transform: block); the Example 4 YAML in the design doc validates / loads / renders end-to-end; the track-level filter: "DOMAIN" shortcut is output-equal to transform: [{ filter: { field: "type", equal: "DOMAIN" } }] (parity test); and the bundle-size delta for the engine + parser stays under 5 kB gzipped.
No new runtime dependency. No vega-expression, no filtrex, no full SQL parser — the in-tree parser is the whole thing.
Context
The track-level
filter: "<value>"shortcut covers every track in the shipped default config by narrowing to a singletypevalue (SIGNAL,DOMAIN,CHAIN, …). Consumers bringing their own data will hit the "almost right, just need to tweak" cases the shortcut cannot express:score >= 0.8 AND type IN ('binding', 'catalytic'), renaming a column fromdesctodescription, projecting 40 fields down to 4, capping at N rows, or derivinglength = end - start. That need is the motivation for a declarativetransform:pipeline onDataSourceDescriptor, modelled on Vega-Lite's transform vocabulary (filter | calculate | rename | pick | limit).The design is fully specified in
specs/transform-engine.md: TypeScript types, JSON Schema additions, a ~250-line hand-rolled SQL-WHERE-flavored expression parser insrc/schema/expressions.ts(zero runtime dependency, ~2–4 kB gzipped), the engine proper insrc/schema/transforms.ts, validator + normalizer + loader wire-in, the three test suites, and an implementation plan with acceptance criteria.Task
Implement the design in
specs/transform-engine.mdas a single PR.Scope (brief — see the design doc for the full spec):
src/schema/types.tswithTransform,FieldPredicate,TransformFunction,DataSourceDescriptor.transform?:, andProtvistaRuntimeAPI.registerTransform().src/schema/schema.jsonwith theTransformandFieldPredicate$defsand thetransform:property onDataSourceDescriptor.src/schema/registry.tswith a transforms bucket andBUILTIN_TRANSFORM_OPERATORS.BUILTIN_TRANSFORM_OPERATORSfromsrc/schema/index.ts.src/schema/expressions.ts— hand-rolled lexer, recursive-descent parser by precedence level, and AST-to-closure compiler for the SQL-WHERE-flavored grammar (AND/OR/NOT,=/!=/<>/</<=/>/>=,IN/NOT IN,BETWEEN/NOT BETWEEN,LIKE/NOT LIKE,IS [NOT] NULL,+/-/*//, dotted-path identifiers, single-quoted strings, number literals, parens).src/schema/transforms.ts— engine,applyTransforms(),fieldPredicateToFn(),registerBuiltinTransforms().src/schema/validate.tswithcheckTransform(), the FieldPredicateanyOf→missing-predicate-operatorspecial-case, and (optional but recommended) parse-time expression validation so authors see typos at config-load instead of track-load.transforminsrc/schema/normalize.ts(NormalizedDataSource.transform?:+expandDescriptorpassthrough)..filter(...)call sites insrc/load-data.tswithapplyTransforms(...). ThreadRegistrythrough the loader signature and callregisterBuiltinTransforms(registry)once at loader init.expressions.spec.ts(lexer + parser precedence + parser errors + evaluator + operator coverage + null safety + dotted paths),transforms.spec.ts(engine contract), and restore the transform-vocabulary cases inschema.spec.ts/validate.spec.ts/types.spec.ts/normalize.spec.ts/registry.spec.ts.specs/config-approach.md: add the Non-Goals bullet, the Data Model blocks, the Edge Cases rows, the acceptance criteria, and the Example 4 pipeline. See §7 of the design doc for the exact passages.src/schema/types.ts(near line 21) andsrc/schema/registry.ts(near line 38) that point at this issue's design doc.Notes:
The design deliberately borrows SQL-WHERE syntax and not SQL NULL semantics: missing /
undefined/null/NaNare all falsy,IS NULLmatches any of them,x = NULLisfalse(two-valued logic). Identifier fields are case-sensitive; keywords are case-insensitive. Strings use single quotes only;LIKEuses SQL wildcards (%/_) compiled to anchored regex. Dotted-path identifiers (association.disease,locations.0.start) walk the item via the samereadDottedPathhelper the structuredFieldPredicatealready uses — 32-segment depth cap.Acceptance hinges on four measurable things: the default UniProt config still loads and renders identically (no one needs to add a
transform:block); the Example 4 YAML in the design doc validates / loads / renders end-to-end; the track-levelfilter: "DOMAIN"shortcut is output-equal totransform: [{ filter: { field: "type", equal: "DOMAIN" } }](parity test); and the bundle-size delta for the engine + parser stays under 5 kB gzipped.No new runtime dependency. No
vega-expression, nofiltrex, no full SQL parser — the in-tree parser is the whole thing.