Skip to content

docs(sdk): clarify hook value nullability in tailordb examples#1167

Draft
dqn wants to merge 1 commit into
mainfrom
feat/sdk-tailordb-hook-null-docs
Draft

docs(sdk): clarify hook value nullability in tailordb examples#1167
dqn wants to merge 1 commit into
mainfrom
feat/sdk-tailordb-hook-null-docs

Conversation

@dqn
Copy link
Copy Markdown
Contributor

@dqn dqn commented May 13, 2026

Docs-only fix surfaced by the llm-challenge benchmark: TailorDB hook examples were silently bypassing the value argument, leading agents that read docs to omit null fallbacks even though the type defines value as T | null.

Why

packages/sdk/src/configure/services/tailordb/types.ts:9-20 defines hook inputs as HookFn<TReturn | null, TData, TReturn>. The runtime injects null for value when the user omits the field on create, or on update when the field is absent from the mutation input.

Docs prose mentioned this nullability on one bullet line, but every code example sidestepped value:

  • create: ({ user }) => user.id (ignored value)
  • update: ({ value }) => value (passthrough, doesn't transform)
  • create/update: ({ data }) => ... (used data, not value)

No example demonstrated value ?? "" or any explicit null handling. Agents that read docs internalize the bypass pattern and write value.toLowerCase(), which fails at runtime (and at typecheck if strictNullChecks is on).

Evidence

A/B experiment via pnpm challenge:experiment on m05-db-type-hooks-create (a problem that asks for a lowercase-normalizing create hook):

Metric Baseline (main) Candidate (this branch) Delta
iteration passRate 1/3 (33.3%) 3/3 (100%) +66.7 pp
cost per pass $0.1275 $0.0714 -40%
turns median 15 11 -4
readDocs median 1 1 0 (still reads, but now learns correct pattern)

All three iterations on the candidate converged on the reference solution shape:

create: ({ value }) => (value ?? "").toLowerCase();

Baseline iterations wrote two value.toLowerCase() (typecheck fail) and one value!.toLowerCase() (non-null assertion workaround).

What changed

  • Sharpened the value bullet in the hook input contract to explicitly note T | null and the value ?? "" idiom.
  • Updated the first hook example to demonstrate transforming value with the null fallback ((value ?? "").toLowerCase()).
  • Split the second example to keep the "derive from user" pattern visible while clarifying that it explicitly ignores value.

No source code or type changes. The types already encoded the contract; the examples now reinforce what the types say.

Notes

Validated only against m05 because that was the problem whose full-package iteration passRate diverged from types-only in the benchmark. A follow-up sweep should grep for the same anti-pattern in other SDK docs (resolver / executor argument handling) that transform nullable inputs.

llm-challenge benchmark harness lives in PR #1148.

The Hook type signature on the configure layer types value as T | null for
both create and update hooks, but every example in the docs sidesteps value
or simply passes it through. AI codegen tools (and humans) regularly miss
this nullability and write value.toLowerCase() / value.length on a possibly
null input, producing TypeErrors at runtime.

- Tighten the value bullet to state the T | null type explicitly and call
  out that update hooks do NOT auto-inject the existing value.
- Replace the trivial passthrough example with a transform that uses the
  documented value ?? "" fallback pattern.
- Add a sibling example that derives from user, so the prior intent of
  showing both forms is preserved.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 13, 2026

⚠️ No Changeset found

Latest commit: 16a73b1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

⚡ pkg.pr.new

@tailor-platform/sdk

pnpm add https://pkg.pr.new/@tailor-platform/sdk@16a73b1
pnpm dlx https://pkg.pr.new/@tailor-platform/sdk@16a73b1 --help

@tailor-platform/create-sdk

pnpm add https://pkg.pr.new/@tailor-platform/create-sdk@16a73b1
pnpm dlx https://pkg.pr.new/@tailor-platform/create-sdk@16a73b1 my-app

commit: 16a73b1

@github-actions
Copy link
Copy Markdown

Code Metrics Report (packages/sdk)

main (6c3a2cb) #1167 (903307e) +/-
Coverage 61.9% 61.9% 0.0%
Code to Test Ratio 1:0.4 1:0.4 0.0
Details
  |                    | main (6c3a2cb) | #1167 (903307e) | +/-  |
  |--------------------|----------------|-----------------|------|
  | Coverage           |          61.9% |           61.9% | 0.0% |
  |   Files            |            363 |             363 |    0 |
  |   Lines            |          12636 |           12636 |    0 |
  |   Covered          |           7827 |            7827 |    0 |
  | Code to Test Ratio |          1:0.4 |           1:0.4 |  0.0 |
  |   Code             |          82628 |           82628 |    0 |
  |   Test             |          34422 |           34422 |    0 |

SDK Configure Bundle Size

main (6c3a2cb) #1167 (903307e) +/-
configure-index-size 17.84KB 17.84KB 0KB
dependency-chunks-size 33.56KB 33.56KB 0KB
total-bundle-size 51.39KB 51.39KB 0KB

Runtime Performance

main (6c3a2cb) #1167 (903307e) +/-
Generate Median 2,406ms 2,565ms 159ms
Generate Max 2,483ms 2,731ms 248ms
Apply Build Median 2,434ms 2,604ms 170ms
Apply Build Max 2,469ms 2,630ms 161ms

Type Performance (instantiations)

main (6c3a2cb) #1167 (903307e) +/-
tailordb-basic 35,130 35,130 0
tailordb-optional 3,841 3,841 0
tailordb-relation 7,428 7,428 0
tailordb-validate 2,566 2,566 0
tailordb-hooks 5,767 5,767 0
tailordb-object 12,136 12,136 0
tailordb-enum 2,462 2,462 0
resolver-basic 9,424 9,424 0
resolver-nested 26,111 26,111 0
resolver-array 18,187 18,187 0
executor-schedule 4,234 4,234 0
executor-webhook 873 873 0
executor-record 8,166 8,166 0
executor-resolver 4,369 4,369 0
executor-operation-function 869 869 0
executor-operation-gql 869 869 0
executor-operation-webhook 888 888 0
executor-operation-workflow 1,714 1,714 0

Reported by octocov

dqn added a commit that referenced this pull request May 13, 2026
…ate cascade)

First problem of Phase 2.5 harder tier. Tests whether agents internalize the
"update hook value is T | null" precondition (PR #1167 documented it for
create; this verifies the same friction angle for update).

- `shared/helpers.ts` listProblems regex extended to `^(\d{3}|m\d+|h\d+)-` so
  the harness discovers h-tier problems alongside legacy and Phase 2.
- Reference solution validates with `pnpm challenge --problem h01 --use-solution`
  (3/3 stages pass, 4/4 tests).
dqn added a commit that referenced this pull request May 14, 2026
…(Phase 5a)

First real AI solve cycle (Phase 4) falsified the LLM-as-judge ambition:
judge fired on 1/25 problems, output was a false signal, both true SDK
improvements (PRs #1167, #1168) came from analyze --diff + iter variance
instead. Phase 5a retires the unused machinery; profile-diff + iter
variance become the primary signal in subsequent phases.

# What's removed

- core/judge.ts (486 lines) + core/judge.test.ts (446 lines)
- runJudgePostProcessing / computeJudgeDiff / ImprovementCandidate in cli.ts
- ProblemResult.judge, Analytics.affordanceDistribution / improvementCandidatesPath
- DiffReport.affordanceDelta + showAffordanceDelta in analyze.ts
- "Top affordances" line in formatReportTable
- LLM-as-judge / Affordance Taxonomy 12-label sections in README + SKILL.md
- judge / improvement-candidates references in skill docs

# Renamed

- meta.json: hypothesizedAffordance -> designNote (37 problems)
- ProblemMeta type: same

# Verification

- 291 tests pass (-31 from judge.test.ts retirement, as expected)
- challenge:verify-solution: 37/37 PASS
- typecheck / lint / format clean
- core/ production: 7,370 -> 6,306 lines (-1,064)
dqn added a commit that referenced this pull request May 14, 2026
…Phase 5b)

Replace the retired judge module with two operator-facing signals:

1. 5-bucket Read-target classifier (sdk-dts / sdk-package-src / sdk-docs /
   problem-files / other) populated per iteration into TraceMetrics. Legacy
   readSdkDts and readDocs become derived views of this map.
2. emitIterDiff post-processor that runs `git diff --no-index` between the
   first failing and first passing iteration's work tree for any problem
   with iterations.passRate in (0, 1), writing to
   <runArtifactRoot>/iter-diff/<problemId>.diff. Direct human-readable
   answer to "what flipped the outcome", no LLM needed.

Validated against existing m05 (passedByIteration [F,F,T]) and m18 ([F,T,T])
Phase 4 trace data:
- m05: passing iter consults sdk-docs 1 less time than failing iters
  (delta=-1.0). Consistent with the docs-misleading SDK PR #1167.
- m18: passing iters consult sdk-dts +2 and problem-files +2 more than the
  failing iter. The discriminating bucket is not docs-vs-not-docs but
  type/problem-file engagement depth. Phase 5c will render this as a
  multi-bucket delta vector rather than gating on a single bucket.

# Verification

- 305 tests pass (+14 from 291)
- challenge:verify-solution: 37/37 PASS
- typecheck / lint / format clean
- +735 / -102 lines across 7 core files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant