diff --git a/.gitignore b/.gitignore index f20a034..01963eb 100644 --- a/.gitignore +++ b/.gitignore @@ -15,6 +15,11 @@ clients/ts/dist/ # Bench artifacts crates/compare/results.csv +# agent-eval seeded golden data + per-run scoring output (recreated by setup.sh) +scripts/agent-eval/.golden-data/ +scripts/agent-eval/results.json +scripts/agent-eval/examples/results.json + # Smoke-audit screenshots (regenerated on demand) /landing-desktop.png /landing-mobile.png diff --git a/AGENTS.md b/AGENTS.md index f6f6169..5881e38 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -66,6 +66,8 @@ Compare SQL: `SELECT name, age FROM User WHERE age > 25 ORDER BY age DESC LIMIT | Add a column | `alter User add column status: str` | `ALTER TABLE User ADD COLUMN status TEXT` | | Drop a column | `alter User drop column status` | `ALTER TABLE User DROP COLUMN status` | | Create an index | `alter User add index .email` | `CREATE INDEX ON User (email)` | +| Unique column | `type User { unique email: str }` | `CREATE TABLE User (email TEXT UNIQUE)` | +| Add unique constraint | `alter User add unique .email` | `CREATE UNIQUE INDEX ON User (email)` | | Insert | `insert User { name := "Alice", age := 30 }` | `INSERT INTO User (name, age) VALUES ('Alice', 30)` | | Scan a table | `User` | `SELECT * FROM User` | | Filter | `User filter .age > 30` | `SELECT * FROM User WHERE age > 30` | @@ -88,7 +90,7 @@ Compare SQL: `SELECT name, age FROM User WHERE age > 25 ORDER BY age DESC LIMIT | Update | `User filter .id = 1 update { age := 31 }` | `UPDATE User SET age = 31 WHERE id = 1` | | Update with expr | `User update { age := .age + 1 }` | `UPDATE User SET age = age + 1` | | Delete | `User filter .age < 18 delete` | `DELETE FROM User WHERE age < 18` | -| Upsert | `upsert User on .id { id := 1, name := "Alice" }` | `INSERT ... ON CONFLICT (id) DO UPDATE ...` | +| Upsert (key must be `unique`) | `upsert User on .id { id := 1, name := "Alice" }` | `INSERT ... ON CONFLICT (id) DO UPDATE ...` | | CASE | `case when .age > 30 then "old" else "young" end` | `CASE WHEN age > 30 THEN 'old' ELSE 'young' END` | | Materialized view | `materialize OldUsers as User filter .age > 28` | `CREATE MATERIALIZED VIEW OldUsers AS ...` | @@ -119,7 +121,9 @@ Canonical type names: `str`, `int`, `float`, `bool`, `datetime`, `uuid`, `bytes` **Footgun:** the executor's type resolver falls back to `TypeId::Str` for any unknown name (`crates/query/src/executor/`), so `string`, `varchar`, or a typo silently produces a Str column with no error. Always use the canonical names above. -`required` is a prefix keyword on the field, not a `!` suffix: `required name: str`, never `name: str!`. +`required` is a prefix keyword on the field, not a `!` suffix: `required name: str`, never `name: str!`. `unique` is a sibling prefix keyword (`required unique email: str`, either order) that auto-creates a unique B+tree index and enforces no duplicate non-null values on insert/update/upsert. + +**Footgun (since 0.4.7):** `upsert on .` requires `.col` to be **unique** — declare it `unique` in the `type`, or run `alter add unique .` first. Upserting on a non-unique column is now a hard error (this closed a bug where upsert could silently create duplicate keys). `alter add unique` first scans for existing duplicates and fails if any are present; it also rejects a column that already has a non-unique index (no in-place upgrade). Null values are exempt from `unique`. --- @@ -127,7 +131,7 @@ Canonical type names: `str`, `int`, `float`, `bool`, `datetime`, `uuid`, `bytes` These are the design moves that buy the speedup. Understanding them keeps you from accidentally undoing them: -1. **Planner is a pure function.** It does not touch the catalog — it emits `RangeScan` speculatively, and the executor lowers to `Filter(SeqScan)` at runtime if no index exists. This keeps the parser → plan pipeline allocation-free for cache hits. +1. **Planner is a pure function.** It does not touch the catalog — it emits `RangeScan`/`IndexScan` speculatively. The executor lowers them to `Filter(SeqScan)` at runtime only when no index exists on the column; otherwise it walks the B+tree directly (unique indexes: raw column-value keys; non-unique indexes: composite `(value, rid)` keys via `BTree::range_rids`, heap-fetching matched rows and rechecking exclusive bounds). This keeps the parser → plan pipeline allocation-free for cache hits. 2. **Plan cache hashes canonical PowQL.** Literals are substituted at lookup time (FNV-1a hash, `crates/query/src/plan_cache.rs`). A repeated `User filter .id = ` reuses the same plan for all N. 3. **Compiled integer predicates.** `Filter(SeqScan)` on simple numeric predicates compiles into a branch-free byte-level check that skips full row decoding. See `execute_plan` fast paths in `crates/query/src/executor/` (module dir). 4. **mmap-based scans.** The storage layer exposes `try_for_each_row_raw` over memory-mapped heap files. Early termination is a `return ControlFlow::Break`. @@ -158,7 +162,7 @@ cargo run --release -p powdb-cli # embedded REPL cargo run --release -p powdb-cli -- --remote host:5433 --password ``` -**The REPL is line-oriented.** A statement split across lines fails to parse — write each statement on one line. +**The REPL buffers lines until braces/parens balance** — multi-line `type`/`insert` paste works; a statement still cannot span two separately-submitted balanced lines. ### TCP server @@ -183,7 +187,17 @@ if (r.kind === "rows") console.table(r.rows); await client.close(); ``` -**No parameter binding yet.** If your input is untrusted, escape it yourself; we don't have prepared-statement placeholders over the wire. +**Parameter binding (`$1`..`$N`).** Pass untrusted values as positional parameters instead of interpolating them into the query string. Placeholders are 1-based `$N` (not `?` — `??` is the COALESCE operator). Binding happens at the token level on the server: each `$N` is replaced with the literal token for the matching value before parsing, so an injection-shaped string is inert data and can never change the query's shape. + +```ts +// Values pass as the second argument, in $1, $2, … order. +await client.query("insert User { name := $1, email := $2, age := $3 }", [name, email, age]); +const r = await client.query("User filter .email = $1 { .name }", [email]); +// null binds PowQL null; numbers bind int when integral, float otherwise. +await client.query("insert User { name := $1, age := $2 }", ["Dana", null]); +``` + +The params form uses the `QueryWithParams` (0x04) wire message and requires `powdb-server >= 0.4.7`. The plain no-params `query(q)` form is unchanged. Return shapes: - `{ kind: "rows", columns: string[], rows: string[][] }` — SELECT-like queries diff --git a/CHANGELOG.md b/CHANGELOG.md index 42a9c43..1358bcb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,57 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **Parameter binding over the wire (`$1`..`$N`).** Clients can send a query + template plus positional values instead of interpolating untrusted input + into the query string. Placeholders are 1-based `$N` (not `?` — `??` is the + COALESCE operator). Binding happens at the **token level** on the server: + each `$N` is replaced with the literal token for its value before parsing, + so an injection-shaped string is inert data and can never change the query's + shape. New wire message `QueryWithParams` (`0x04`) — a pure protocol + addition; existing messages and pre-0.4.7 clients are unaffected. The + TypeScript client gains `client.query(powql, params?)`. Engine API: + `Engine::execute_powql_with_params` / `execute_powql_readonly_with_params`. +- **Unique constraints.** Declare a column unique with the `unique` field + modifier (`type User { required unique email: str }`) or add one to an + existing table with `alter User add unique .email` (which scans for existing + duplicates first and fails if any exist). Declaring `unique` auto-creates a + unique B+tree index; enforcement is a storage-layer pre-check shared by the + plain, prepared, and upsert write paths, so duplicates are rejected with + `unique constraint violation on .` before anything is written + or WAL-logged. The constraint survives restart (persisted in the catalog + + rebuilt on WAL replay). +- **Range scans use B+tree indexes.** `>`, `>=`, `<`, `<=`, and `between` on an + indexed column now traverse the index (unique: raw keys; non-unique: + composite `(value, rid)` keys) instead of always falling back to a full + scan — roughly 7× faster on a selective range over 100K rows. NULLs are + correctly excluded and exclusive bounds are honored. +- **`EXPLAIN` shows the executed plan.** Because the planner is pure (no + catalog access), it emits speculative `IndexScan`/`RangeScan` nodes; the + executor lowers them at runtime when no index exists. `EXPLAIN` now applies + the same lowering before printing, so it shows `Filter(SeqScan)` for an + unindexed column instead of a misleading `IndexScan`. +- **Multi-line REPL input.** The `powdb-cli` REPL buffers lines until braces + and parentheses balance (outside string literals), so multi-line `type` and + `insert` statements can be pasted or typed across lines. +- **Agent-DX evaluation harness** (`scripts/agent-eval/`). A model-agnostic, + offline harness that scores how well an LLM writes PowQL given only + `AGENTS.md` and a 10-table schema, with a parallel SQLite baseline for + comparison. Not wired into CI. + +### Changed +- **BREAKING:** `upsert on .col` now requires `.col` to be a `unique` + column. Declare it with the `unique` modifier or `alter add unique .col`. + This fixes a bug where `upsert ... on .id` followed by a plain + `insert` of the same id silently produced duplicate rows. + +### Fixed +- Lowering an unindexed equality update to `Filter(SeqScan)` exposed a fused + scan-update path that swallowed `update_hinted` errors and still counted the + row as modified — which bypassed the v0.4.6 oversized-row guard for that + path. Errors now propagate as `StorageError`; all three `oversized_rows` + tests pass. + ## [0.4.6] - 2026-06-09 ### Fixed diff --git a/clients/ts/README.md b/clients/ts/README.md index ea93ddf..46c8c57 100644 --- a/clients/ts/README.md +++ b/clients/ts/README.md @@ -60,6 +60,27 @@ escapeIdent("User"); // → "User" (throws on invalid) `escapeLiteral` accepts `string | number | bigint | boolean | null`. It rejects `NaN`/`Infinity`, `undefined`, objects, arrays, symbols, and `Date` — convert those yourself before passing them in. +### Parameter binding (`$N`) + +For the strongest separation between code and data, pass values as positional `$N` parameters instead of interpolating them. The server binds each placeholder at the **token level** — a string becomes a literal token, never interpolated text — so an injection-shaped value is inert and can never change the query's shape. Placeholders are 1-based (`?` is not a placeholder; `??` is the COALESCE operator). + +```typescript +// Values are passed as the second argument, in $1, $2, … order. +await client.query("insert User { name := $1, email := $2, age := $3 }", [ + name, + email, + age, +]); + +const r = await client.query("User filter .email = $1 { .name }", [email]); + +// null binds PowQL null; numbers bind as int when integral, float otherwise; +// bigint always binds as int. +await client.query("insert User { name := $1, age := $2 }", ["Dana", null]); +``` + +`QueryParam` is `string | number | bigint | boolean | null`. The params form sends the `QueryWithParams` (0x04) wire message and **requires powdb-server >= 0.4.7**. The plain `query(q)` and `query(q, { signal })` forms are unchanged. + ## Authentication For servers using the legacy shared password (`POWDB_PASSWORD`), pass @@ -277,7 +298,7 @@ Returns a `Promise`. Options: > **Multi-user servers:** requires client ≥0.4.0 (`user` option) and server > ≥0.4.6 (enforced roles). See the version matrix under Authentication. -### `client.query(query, opts?)` +### `client.query(query, params?, opts?)` Sends a PowQL query and returns a `Promise`: @@ -285,6 +306,8 @@ Sends a PowQL query and returns a `Promise`: - `{ kind: "scalar", value: string }` — for aggregates (`count`, `sum`, `avg`, etc.) - `{ kind: "ok", affected: bigint }` — for mutations (`insert`, `update`, `delete`) +`params?: QueryParam[]` — positional values bound to `$1`, `$2`, … placeholders (see Parameter binding above; requires server ≥0.4.7). When omitted, the plain query path is used. The legacy two-argument `query(q, { signal })` form is still accepted — an array second argument is treated as params, an object as options. + `opts.signal?: AbortSignal` — aborts the returned promise (see Cancellation above). Throws a `PowDBError` (see Structured errors above) on any failure. diff --git a/clients/ts/src/index.ts b/clients/ts/src/index.ts index 470ffff..31e2567 100644 --- a/clients/ts/src/index.ts +++ b/clients/ts/src/index.ts @@ -18,7 +18,12 @@ import * as net from "node:net"; import * as tls from "node:tls"; import { EventEmitter } from "node:events"; -import { encode, tryDecode, type Message } from "./protocol.js"; +import { + encode, + tryDecode, + type Message, + type WireParam, +} from "./protocol.js"; import { PowDBError } from "./errors.js"; import { coerceRows, @@ -35,6 +40,38 @@ export type QueryResult = | { kind: "ok"; affected: bigint } | { kind: "message"; message: string }; +/** + * A value bound to a positional `$N` placeholder in {@link Client.query}. + * + * The server binds these at the token level — a string is substituted as a + * literal token, never interpolated — so injection-shaped input is inert. + * Numbers bind as ints when integral and floats otherwise; `bigint` always + * binds as an int; `null` binds PowQL `null`. + */ +export type QueryParam = string | number | bigint | boolean | null; + +/** Map a JS {@link QueryParam} to its wire encoding. */ +function toWireParam(p: QueryParam): WireParam { + if (p === null) return { tag: "null" }; + switch (typeof p) { + case "string": + return { tag: "str", value: p }; + case "boolean": + return { tag: "bool", value: p }; + case "bigint": + return { tag: "int", value: p }; + case "number": + return Number.isInteger(p) + ? { tag: "int", value: BigInt(p) } + : { tag: "float", value: p }; + default: + throw new PowDBError( + `unsupported query parameter type: ${typeof p}`, + "protocol_error", + ); + } +} + export interface ClientOptions { host: string; port: number; @@ -235,11 +272,27 @@ export class Client extends EventEmitter { */ async query( query: string, - opts?: { signal?: AbortSignal }, + paramsOrOpts?: QueryParam[] | { signal?: AbortSignal }, + maybeOpts?: { signal?: AbortSignal }, ): Promise { + // Disambiguate the two overloads: + // query(q) — no params, no opts + // query(q, opts) — legacy 2-arg opts form (back-compat) + // query(q, params) — positional $N parameters + // query(q, params, opts) — params + opts + const hasParams = Array.isArray(paramsOrOpts); + const params = hasParams ? (paramsOrOpts as QueryParam[]) : undefined; + const opts = hasParams + ? maybeOpts + : (paramsOrOpts as { signal?: AbortSignal } | undefined); + const start = Date.now(); try { - const reply = await this.send({ type: "Query", query }, opts); + const request: Message = + params === undefined + ? { type: "Query", query } + : { type: "QueryWithParams", query, params: params.map(toWireParam) }; + const reply = await this.send(request, opts); let result: QueryResult; switch (reply.type) { case "ResultRows": @@ -638,11 +691,12 @@ function openSocket( } export { encode, tryDecode } from "./protocol.js"; -export type { Message } from "./protocol.js"; +export type { Message, WireParam } from "./protocol.js"; export { MAX_PAYLOAD_SIZE, MAX_ROWS, MAX_COLUMNS, + MAX_PARAMS, } from "./protocol.js"; export { diff --git a/clients/ts/src/protocol.ts b/clients/ts/src/protocol.ts index a63a70d..c57fdf3 100644 --- a/clients/ts/src/protocol.ts +++ b/clients/ts/src/protocol.ts @@ -10,6 +10,12 @@ export const MSG_CONNECT = 0x01; export const MSG_CONNECT_OK = 0x02; export const MSG_QUERY = 0x03; +/** + * Query carrying positional `$N` parameters. Pure protocol addition: old + * servers reject it with the existing "unknown message type" error, and the + * plain `MSG_QUERY` frame is unchanged. Requires powdb-server ≥ 0.4.7. + */ +export const MSG_QUERY_PARAMS = 0x04; export const MSG_RESULT_ROWS = 0x07; export const MSG_RESULT_SCALAR = 0x08; export const MSG_RESULT_OK = 0x09; @@ -30,6 +36,25 @@ export const MAX_ROWS = 10_000_000; /** Maximum number of columns allowed in a result set. */ export const MAX_COLUMNS = 4096; +/** Maximum number of bound parameters in a QueryWithParams message. */ +export const MAX_PARAMS = 4096; + +/** + * A positional parameter value for {@link MSG_QUERY_PARAMS}. Wire encoding + * per param: a 1-byte tag followed by the body — + * `0` null (no body), `1` int (8B LE i64), `2` float (8B LE f64), + * `3` bool (1B), `4` str (length-prefixed UTF-8). + * + * Integral numbers are sent as ints, non-integral numbers as floats; `null` + * binds PowQL `null`. `bigint` is always an int. + */ +export type WireParam = + | { tag: "null" } + | { tag: "int"; value: bigint } + | { tag: "float"; value: number } + | { tag: "bool"; value: boolean } + | { tag: "str"; value: string }; + export type Message = | { type: "Connect"; @@ -46,6 +71,7 @@ export type Message = } | { type: "ConnectOk"; version: string } | { type: "Query"; query: string } + | { type: "QueryWithParams"; query: string; params: WireParam[] } | { type: "ResultRows"; columns: string[]; rows: string[][] } | { type: "ResultScalar"; value: string } | { type: "ResultOk"; affected: bigint } @@ -86,6 +112,42 @@ export function encode(msg: Message): Buffer { payload = encodeString(msg.query); msgType = MSG_QUERY; break; + case "QueryWithParams": { + const parts: Buffer[] = [encodeString(msg.query)]; + const count = Buffer.alloc(2); + count.writeUInt16LE(msg.params.length, 0); + parts.push(count); + for (const p of msg.params) { + switch (p.tag) { + case "null": + parts.push(Buffer.from([0])); + break; + case "int": { + const b = Buffer.alloc(9); + b.writeUInt8(1, 0); + b.writeBigInt64LE(p.value, 1); + parts.push(b); + break; + } + case "float": { + const b = Buffer.alloc(9); + b.writeUInt8(2, 0); + b.writeDoubleLE(p.value, 1); + parts.push(b); + break; + } + case "bool": + parts.push(Buffer.from([3, p.value ? 1 : 0])); + break; + case "str": + parts.push(Buffer.from([4]), encodeString(p.value)); + break; + } + } + payload = Buffer.concat(parts); + msgType = MSG_QUERY_PARAMS; + break; + } case "ResultRows": { const parts: Buffer[] = []; const colCount = Buffer.alloc(2); @@ -188,6 +250,37 @@ function decodePayload(msgType: number, payload: Buffer): Message { return { type: "ConnectOk", version: decodeString(payload, cursor) }; case MSG_QUERY: return { type: "Query", query: decodeString(payload, cursor) }; + case MSG_QUERY_PARAMS: { + const query = decodeString(payload, cursor); + const count = readU16(payload, cursor, "param count"); + if (count > MAX_PARAMS) { + throw new Error(`too many parameters: ${count} (max ${MAX_PARAMS})`); + } + const params: WireParam[] = []; + for (let i = 0; i < count; i++) { + const tag = readU8(payload, cursor, "param tag"); + switch (tag) { + case 0: + params.push({ tag: "null" }); + break; + case 1: + params.push({ tag: "int", value: readI64(payload, cursor, "int param") }); + break; + case 2: + params.push({ tag: "float", value: readF64(payload, cursor, "float param") }); + break; + case 3: + params.push({ tag: "bool", value: readU8(payload, cursor, "bool param") !== 0 }); + break; + case 4: + params.push({ tag: "str", value: decodeString(payload, cursor) }); + break; + default: + throw new Error(`unknown param tag: ${tag}`); + } + } + return { type: "QueryWithParams", query, params }; + } case MSG_RESULT_ROWS: { const colCount = readU16(payload, cursor, "column count"); if (colCount > MAX_COLUMNS) { @@ -259,6 +352,33 @@ function decodeString(buf: Buffer, cursor: { pos: number }): string { return s; } +function readU8(buf: Buffer, cursor: { pos: number }, label: string): number { + if (cursor.pos + 1 > buf.length) { + throw new Error(`truncated ${label}`); + } + const value = buf.readUInt8(cursor.pos); + cursor.pos += 1; + return value; +} + +function readI64(buf: Buffer, cursor: { pos: number }, label: string): bigint { + if (cursor.pos + 8 > buf.length) { + throw new Error(`truncated ${label}`); + } + const value = buf.readBigInt64LE(cursor.pos); + cursor.pos += 8; + return value; +} + +function readF64(buf: Buffer, cursor: { pos: number }, label: string): number { + if (cursor.pos + 8 > buf.length) { + throw new Error(`truncated ${label}`); + } + const value = buf.readDoubleLE(cursor.pos); + cursor.pos += 8; + return value; +} + function readU16(buf: Buffer, cursor: { pos: number }, label: string): number { if (cursor.pos + 2 > buf.length) { throw new Error(`truncated ${label}`); diff --git a/clients/ts/test/client.test.ts b/clients/ts/test/client.test.ts index 0599ac6..3ac256e 100644 --- a/clients/ts/test/client.test.ts +++ b/clients/ts/test/client.test.ts @@ -695,6 +695,49 @@ async function main() { } }); + // ────────────────────────────────────────────────────────── + console.log("\nPARAMETER BINDING ($N)"); + // ────────────────────────────────────────────────────────── + + await test("query with params stores injection-shaped strings byte-faithfully", async () => { + await client.query(`type ${tbl("P")} { required name: str, age: int }`); + const evil = `x"; drop ${tbl("P")}; filter .age > "0`; + const ins = await client.query( + `insert ${tbl("P")} { name := $1, age := $2 }`, + [evil, 9], + ); + assert.equal(ins.kind, "ok"); + const r = await client.query(`${tbl("P")} filter .age = $1 { .name }`, [9]); + assert.equal(r.kind, "rows"); + if (r.kind === "rows") assert.deepEqual(r.rows, [[evil]]); + // The table was not dropped by the injection-shaped string. + const c = await client.query(`count(${tbl("P")})`); + assertScalar(c, "1"); + }); + + await test("params bind null, bool, int, and float", async () => { + await client.query( + `type ${tbl("PT")} { required name: str, n: int, f: float, ok: bool }`, + ); + await client.query( + `insert ${tbl("PT")} { name := $1, n := $2, f := $3, ok := $4 }`, + ["row", -7, 2.5, true], + ); + // null param round-trips as PowQL null. + await client.query( + `insert ${tbl("PT")} { name := $1, n := $2 }`, + ["nullish", null], + ); + const r = await client.query(`${tbl("PT")} filter .n = null { .name }`); + assert.equal(r.kind, "rows"); + if (r.kind === "rows") assert.deepEqual(r.rows, [["nullish"]]); + }); + + await test("old no-params query path still works", async () => { + const r = await client.query(`${tbl("P")} { .name }`); + assert.equal(r.kind, "rows"); + }); + // ────────────────────────────────────────────────────────── console.log("\nCONNECTION LIFECYCLE"); // ────────────────────────────────────────────────────────── diff --git a/clients/ts/test/protocol.test.ts b/clients/ts/test/protocol.test.ts index 93787ed..0f79d49 100644 --- a/clients/ts/test/protocol.test.ts +++ b/clients/ts/test/protocol.test.ts @@ -201,6 +201,57 @@ async function main() { } }); + console.log("\nQueryWithParams — positional $N binding round-trip"); + + await test("encode/decode QueryWithParams preserves query and all param types", () => { + const buf = encode({ + type: "QueryWithParams", + query: "insert User { name := $1, age := $2, ok := $3, note := $4, f := $5 }", + params: [ + { tag: "str", value: `a"b\\c; drop User` }, + { tag: "int", value: -7n }, + { tag: "bool", value: true }, + { tag: "null" }, + { tag: "float", value: 2.5 }, + ], + }); + // New frame must use the dedicated 0x04 tag. + assert.equal(buf.readUInt8(0), 0x04); + const decoded = tryDecode(buf); + assert.ok(decoded, "frame should decode"); + assert.equal(decoded.msg.type, "QueryWithParams"); + if (decoded.msg.type === "QueryWithParams") { + assert.ok(decoded.msg.query.includes("$1")); + assert.equal(decoded.msg.params.length, 5); + assert.deepStrictEqual(decoded.msg.params[0], { + tag: "str", + value: `a"b\\c; drop User`, + }); + assert.deepStrictEqual(decoded.msg.params[1], { tag: "int", value: -7n }); + assert.deepStrictEqual(decoded.msg.params[2], { tag: "bool", value: true }); + assert.deepStrictEqual(decoded.msg.params[3], { tag: "null" }); + assert.deepStrictEqual(decoded.msg.params[4], { + tag: "float", + value: 2.5, + }); + } + }); + + await test("decode rejects an unknown param tag", () => { + // header + empty query + count=1 + bogus tag 0x63 + const payload = Buffer.concat([ + lpString(""), + Buffer.from([0x01, 0x00]), // count = 1 (u16 LE) + Buffer.from([0x63]), // bogus tag + ]); + const frame = Buffer.alloc(6 + payload.length); + frame.writeUInt8(0x04, 0); + frame.writeUInt8(0, 1); + frame.writeUInt32LE(payload.length, 2); + payload.copy(frame, 6); + assert.throws(() => tryDecode(frame), /unknown param tag/); + }); + console.log("\nCancellation — abort during in-flight query"); await test("AbortSignal rejects the pending query without destroying the socket", async () => { diff --git a/crates/bench/benches/powql.rs b/crates/bench/benches/powql.rs index a017812..ed6991b 100644 --- a/crates/bench/benches/powql.rs +++ b/crates/bench/benches/powql.rs @@ -231,6 +231,37 @@ fn bench_powql_filter_only(c: &mut Criterion) { }); } +// ───── Workload: range_scan_indexed ──────────────────────────────────────── +// +// Selective range over a NON-unique B+tree index on `age`. The executor walks +// composite (value, rid) keys and heap-fetches only the matching rows, so a +// narrow range should beat the full SeqScan of `filter_only`. Mirrors the +// `filter_only` setup but adds a non-unique index and uses a tight bound. + +fn bench_range_scan_indexed(c: &mut Criterion) { + let (mut engine, _tmp) = setup_user_fixture(); + // `alter ... add index` creates a NON-unique B+tree index. + engine + .execute_powql("alter User add index .age") + .expect("build age index"); + + // Narrow range (3 of 60 distinct ages ≈ 5% selectivity). + let queries = gen_queries(|i| { + let lo = 20 + (i % 40); + format!("User filter .age >= {lo} and .age < {}", lo + 3) + }); + warm_plan_cache(&mut engine, &queries); + + let mut idx: usize = 0; + c.bench_function("range_scan_indexed", |b| { + b.iter(|| { + let q = &queries[idx % queries.len()]; + idx = idx.wrapping_add(1); + black_box(engine.execute_powql(q).expect("query failed")) + }); + }); +} + // ───── Legacy 5b. powql_filter_projection ────────────────────────────────── // // Kept as-is for gate continuity. Non-index filter with projection. @@ -659,6 +690,7 @@ criterion_group! { // Legacy + workload 1/3 (thesis guards, gate continuity). bench_powql_point, bench_powql_filter_only, + bench_range_scan_indexed, bench_powql_filter_projection, bench_powql_aggregation, // Workload 2. diff --git a/crates/cli/src/main.rs b/crates/cli/src/main.rs index f99072b..4979c88 100644 --- a/crates/cli/src/main.rs +++ b/crates/cli/src/main.rs @@ -899,6 +899,33 @@ async fn exec_remote( code } +// ─── Multi-line input ─────────────────────────────────────────────────────── + +/// True when `buffer` has unbalanced `{`/`(` outside string literals, i.e. the +/// REPL should read another line before executing. String literals follow the +/// lexer's rules (`crates/query/src/lexer.rs`): a backslash escapes the next +/// character, so `\"` inside a string does not terminate it. +fn needs_continuation(buffer: &str) -> bool { + let mut depth: i64 = 0; + let mut in_str = false; + let mut chars = buffer.chars(); + while let Some(c) = chars.next() { + match c { + '"' if in_str => in_str = false, + '"' => in_str = true, + // Lexer treats backslash as an escape inside strings; skip the + // escaped char so `\"` doesn't toggle the string state. + '\\' if in_str => { + chars.next(); + } + '{' | '(' if !in_str => depth += 1, + '}' | ')' if !in_str => depth -= 1, + _ => {} + } + } + depth > 0 || in_str +} + // ─── Embedded mode ────────────────────────────────────────────────────────── fn run_embedded(data_dir: &str) { @@ -915,96 +942,121 @@ fn run_embedded(data_dir: &str) { rl.load_history(&hist).ok(); let mut timing_enabled = false; + let mut buffer = String::new(); loop { - let line = match rl.readline("powql> ") { + let prompt = if buffer.is_empty() { + "powql> " + } else { + " ...> " + }; + let line = match rl.readline(prompt) { Ok(line) => line, Err(rustyline::error::ReadlineError::Eof) => break, - Err(rustyline::error::ReadlineError::Interrupted) => continue, + Err(rustyline::error::ReadlineError::Interrupted) => { + // Abandon any partial multi-line statement. + buffer.clear(); + continue; + } Err(e) => { eprintln!("Error: {e}"); break; } }; - let trimmed = line.trim(); - if trimmed.is_empty() { - continue; - } - - rl.add_history_entry(trimmed).ok(); - - // ── Meta-commands ────────────────────────────────────────────── - if trimmed.starts_with('.') { - match trimmed { - ".quit" | ".exit" => break, - ".help" => { - println!("Meta-commands:"); - println!(" .tables List all tables"); - println!(" .schema
Show columns and types for a table"); - println!(" .timing Toggle query timing on/off"); - println!(" .help Show this help"); - println!(" .quit / .exit Exit the REPL"); - } - ".tables" => { - let tables = engine.catalog().list_tables(); - if tables.is_empty() { - println!("(no tables)"); - } else { - for t in &tables { - println!(" {t}"); - } - println!( - "({} table{})", - tables.len(), - if tables.len() == 1 { "" } else { "s" } - ); + // Meta-commands are only recognized at the start of a statement. + if buffer.is_empty() { + let trimmed = line.trim(); + if trimmed.is_empty() { + continue; + } + // ── Meta-commands ────────────────────────────────────────── + if trimmed.starts_with('.') { + rl.add_history_entry(trimmed).ok(); + match trimmed { + ".quit" | ".exit" => break, + ".help" => { + println!("Meta-commands:"); + println!(" .tables List all tables"); + println!(" .schema
Show columns and types for a table"); + println!(" .timing Toggle query timing on/off"); + println!(" .help Show this help"); + println!(" .quit / .exit Exit the REPL"); } - } - ".timing" => { - timing_enabled = !timing_enabled; - println!("Timing is {}.", if timing_enabled { "on" } else { "off" }); - } - cmd if cmd.starts_with(".schema") => { - let table_name = cmd.strip_prefix(".schema").unwrap().trim(); - if table_name.is_empty() { - eprintln!("Usage: .schema "); - } else if let Some(schema) = engine.catalog().schema(table_name) { - println!("Table: {}", schema.table_name); - println!(" {:<20} {:<12} Required", "Column", "Type"); - println!(" {:-<20} {:-<12} {:-<8}", "", "", ""); - for col in &schema.columns { + ".tables" => { + let tables = engine.catalog().list_tables(); + if tables.is_empty() { + println!("(no tables)"); + } else { + for t in &tables { + println!(" {t}"); + } println!( - " {:<20} {:<12} {}", - col.name, - match col.type_id { - powdb_storage::types::TypeId::Int => "int", - powdb_storage::types::TypeId::Float => "float", - powdb_storage::types::TypeId::Bool => "bool", - powdb_storage::types::TypeId::Str => "str", - powdb_storage::types::TypeId::DateTime => "datetime", - powdb_storage::types::TypeId::Uuid => "uuid", - powdb_storage::types::TypeId::Bytes => "bytes", - powdb_storage::types::TypeId::Empty => "empty", - }, - if col.required { "yes" } else { "no" } + "({} table{})", + tables.len(), + if tables.len() == 1 { "" } else { "s" } ); } - } else { - eprintln!("Error: table '{table_name}' not found"); + } + ".timing" => { + timing_enabled = !timing_enabled; + println!("Timing is {}.", if timing_enabled { "on" } else { "off" }); + } + cmd if cmd.starts_with(".schema") => { + let table_name = cmd.strip_prefix(".schema").unwrap().trim(); + if table_name.is_empty() { + eprintln!("Usage: .schema "); + } else if let Some(schema) = engine.catalog().schema(table_name) { + println!("Table: {}", schema.table_name); + println!(" {:<20} {:<12} Required", "Column", "Type"); + println!(" {:-<20} {:-<12} {:-<8}", "", "", ""); + for col in &schema.columns { + println!( + " {:<20} {:<12} {}", + col.name, + match col.type_id { + powdb_storage::types::TypeId::Int => "int", + powdb_storage::types::TypeId::Float => "float", + powdb_storage::types::TypeId::Bool => "bool", + powdb_storage::types::TypeId::Str => "str", + powdb_storage::types::TypeId::DateTime => "datetime", + powdb_storage::types::TypeId::Uuid => "uuid", + powdb_storage::types::TypeId::Bytes => "bytes", + powdb_storage::types::TypeId::Empty => "empty", + }, + if col.required { "yes" } else { "no" } + ); + } + } else { + eprintln!("Error: table '{table_name}' not found"); + } + } + other => { + eprintln!("Unknown meta-command: {other}"); + eprintln!("Type .help for available commands."); } } - other => { - eprintln!("Unknown meta-command: {other}"); - eprintln!("Type .help for available commands."); - } + continue; } + } + + // Accumulate input until braces/parens balance outside strings. + buffer.push_str(&line); + buffer.push('\n'); + if needs_continuation(&buffer) { continue; } + let statement = buffer.trim().to_string(); + buffer.clear(); + if statement.is_empty() { + continue; + } + rl.add_history_entry(&statement).ok(); + // ── Execute PowQL query ──────────────────────────────────────── let start = Instant::now(); - match engine.execute_powql(trimmed) { + match engine.execute_powql(&statement) { Ok(result) => { print_local_result(&result); if timing_enabled { @@ -1088,52 +1140,76 @@ async fn run_remote(addr: String, db: String, password: Option, username rl.load_history(&hist).ok(); let mut timing_enabled = false; + let mut buffer = String::new(); loop { - let line = match rl.readline("powql> ") { + let prompt = if buffer.is_empty() { + "powql> " + } else { + " ...> " + }; + let line = match rl.readline(prompt) { Ok(line) => line, Err(rustyline::error::ReadlineError::Eof) => break, - Err(rustyline::error::ReadlineError::Interrupted) => continue, + Err(rustyline::error::ReadlineError::Interrupted) => { + buffer.clear(); + continue; + } Err(e) => { eprintln!("Error: {e}"); break; } }; - let trimmed = line.trim(); - if trimmed.is_empty() { - continue; + // Meta-commands are only recognized at the start of a statement. + if buffer.is_empty() { + let trimmed = line.trim(); + if trimmed.is_empty() { + continue; + } + // Handle local-only meta-commands in remote mode + if trimmed.starts_with('.') { + rl.add_history_entry(trimmed).ok(); + match trimmed { + ".quit" | ".exit" => break, + ".timing" => { + timing_enabled = !timing_enabled; + println!("Timing is {}.", if timing_enabled { "on" } else { "off" }); + } + ".help" => { + println!("Meta-commands (remote mode):"); + println!(" .timing Toggle query timing on/off"); + println!(" .help Show this help"); + println!(" .quit / .exit Exit the REPL"); + println!(); + println!("Note: .tables and .schema are only available in embedded mode."); + } + _ => { + eprintln!( + "Meta-commands (.tables, .schema) are not supported in remote mode." + ); + eprintln!("Type .help for available commands."); + } + } + continue; + } } - rl.add_history_entry(trimmed).ok(); + // Accumulate input until braces/parens balance outside strings. + buffer.push_str(&line); + buffer.push('\n'); + if needs_continuation(&buffer) { + continue; + } - // Handle local-only meta-commands in remote mode - if trimmed.starts_with('.') { - match trimmed { - ".quit" | ".exit" => break, - ".timing" => { - timing_enabled = !timing_enabled; - println!("Timing is {}.", if timing_enabled { "on" } else { "off" }); - } - ".help" => { - println!("Meta-commands (remote mode):"); - println!(" .timing Toggle query timing on/off"); - println!(" .help Show this help"); - println!(" .quit / .exit Exit the REPL"); - println!(); - println!("Note: .tables and .schema are only available in embedded mode."); - } - _ => { - eprintln!("Meta-commands (.tables, .schema) are not supported in remote mode."); - eprintln!("Type .help for available commands."); - } - } + let statement = buffer.trim().to_string(); + buffer.clear(); + if statement.is_empty() { continue; } + rl.add_history_entry(&statement).ok(); - let q = Message::Query { - query: trimmed.to_string(), - }; + let q = Message::Query { query: statement }; if q.write_to(&mut writer).await.is_err() { eprintln!("write error — disconnected"); break; @@ -1319,4 +1395,24 @@ mod tests { assert_eq!(render_remote_cell(""), ""); assert_eq!(render_remote_cell("NULL"), "NULL"); } + + #[test] + fn continuation_tracking() { + assert!(needs_continuation("type User {")); + assert!(needs_continuation("type User {\n required name: str,")); + assert!(!needs_continuation("type User { required name: str }")); + // Brace inside a string literal must not count. + assert!(!needs_continuation(r#"insert U { s := "}" }"#)); + assert!(needs_continuation(r#"insert U { s := "}" "#)); + // Parens. + assert!(needs_continuation("count(User filter (")); + assert!(!needs_continuation("count(User)")); + // Nested. + assert!(needs_continuation("insert U { a := (1 + ")); + // Over-closed input is NOT a continuation — let the parser error. + assert!(!needs_continuation("User }")); + // Escaped quote inside a string must not end the string. + assert!(needs_continuation(r#"insert U { s := "a\" "#)); + assert!(!needs_continuation(r#"insert U { s := "a\"b" }"#)); + } } diff --git a/crates/query/src/ast.rs b/crates/query/src/ast.rs index eb61d80..85aa016 100644 --- a/crates/query/src/ast.rs +++ b/crates/query/src/ast.rs @@ -42,6 +42,13 @@ pub enum AlterAction { AddIndex { column: String, }, + /// `alter
add unique .` — creates a UNIQUE B+Tree + /// index on `column`. Scans existing data first and fails if any + /// duplicate (non-null) value is present. Errors if the column is + /// already indexed (no in-place upgrade). + AddUnique { + column: String, + }, } /// `drop User` @@ -196,6 +203,9 @@ pub struct FieldDef { pub name: String, pub type_name: String, pub required: bool, + /// `true` when declared with the `unique` modifier — auto-creates a + /// unique B+Tree index on this column at table-create time. + pub unique: bool, } #[derive(Debug, Clone, PartialEq)] @@ -326,6 +336,22 @@ pub enum Literal { Bool(bool), } +/// A bound value supplied for a `$N` placeholder in +/// [`crate::parser::parse_with_params`]. +/// +/// Unlike [`Literal`], this carries a `Null` variant so a parameter can +/// bind PowQL `null` (substituted as `Token::Null`, not a string). Values +/// are turned into literal *tokens* before parsing, so an injection-shaped +/// string is inert data — it can never change the query's shape. +#[derive(Debug, Clone, PartialEq)] +pub enum ParamValue { + Null, + Int(i64), + Float(f64), + Bool(bool), + Str(String), +} + #[derive(Debug, Clone, Copy, PartialEq)] pub enum BinOp { Eq, diff --git a/crates/query/src/canonicalize.rs b/crates/query/src/canonicalize.rs index e723d92..dd13ea1 100644 --- a/crates/query/src/canonicalize.rs +++ b/crates/query/src/canonicalize.rs @@ -205,6 +205,7 @@ fn hash_token(h: u64, tok: &Token, literals: &mut Vec) -> u64 { Token::Multi => hash_byte(h, 0x1B), Token::Link => hash_byte(h, 0x1C), Token::Index => hash_byte(h, 0x1D), + Token::Unique => hash_byte(h, 0x7E), Token::On => hash_byte(h, 0x1E), Token::Asc => hash_byte(h, 0x1F), Token::Desc => hash_byte(h, 0x20), diff --git a/crates/query/src/executor/mod.rs b/crates/query/src/executor/mod.rs index 5c765f3..d0354e2 100644 --- a/crates/query/src/executor/mod.rs +++ b/crates/query/src/executor/mod.rs @@ -220,9 +220,8 @@ mod tests; pub use self::prepared::PreparedQuery; use self::plan_exec::{ - compute_group_aggregate, execute_window, format_plan_tree, hash_join, - lower_unindexed_range_scans, range_matches, synthesize_range_predicate, - try_extract_equi_join_keys, + compute_group_aggregate, execute_window, format_plan_tree, hash_join, lower_unindexed_scans, + range_matches, synthesize_range_predicate, try_extract_equi_join_keys, }; /// Mission infra-1: classify a parsed statement as read-only vs. mutating. @@ -435,7 +434,7 @@ impl Engine { .map_err(|e| QueryError::Execution(format!("plan cache lock poisoned: {e}")))? .get_with_substitution(hash, &literals); if let Some(plan) = cached { - let plan = lower_unindexed_range_scans(&self.catalog, &plan); + let plan = lower_unindexed_scans(&self.catalog, &plan); let result = self.execute_plan(&plan); // Mission B (post-review): statement-boundary WAL // group commit. Catalog::wal_log now only appends; @@ -458,7 +457,7 @@ impl Engine { QueryError::Execution(format!("plan cache lock poisoned: {e}")) })? .insert(hash, plan.clone()); - let plan = lower_unindexed_range_scans(&self.catalog, &plan); + let plan = lower_unindexed_scans(&self.catalog, &plan); let result = self.execute_plan(&plan); if !self.in_transaction { self.catalog @@ -474,7 +473,7 @@ impl Engine { // consistent error shape. return match planner::plan(input) { Ok(plan) => { - let plan = lower_unindexed_range_scans(&self.catalog, &plan); + let plan = lower_unindexed_scans(&self.catalog, &plan); let result = self.execute_plan(&plan); if !self.in_transaction { self.catalog @@ -498,7 +497,7 @@ impl Engine { let plan_us = plan_start.elapsed().as_micros(); let exec_start = Instant::now(); - let plan = lower_unindexed_range_scans(&self.catalog, &plan); + let plan = lower_unindexed_scans(&self.catalog, &plan); let result = self.execute_plan(&plan); if !self.in_transaction { self.catalog @@ -532,6 +531,57 @@ impl Engine { result } + /// Execute PowQL with `$N` placeholders bound to positional `params`. + /// + /// Task 4: parameters are substituted as literal *tokens* before + /// parsing (see [`crate::parser::parse_with_params`]), so untrusted + /// input can never change the query's shape. This path deliberately + /// **bypasses the plan cache** — template caching is a follow-up — and + /// otherwise mirrors the non-cached tail of [`Engine::execute_powql`]. + pub fn execute_powql_with_params( + &mut self, + input: &str, + params: &[crate::ast::ParamValue], + ) -> Result { + let _budget = self.enter_memory_budget(); + let stmt = crate::parser::parse_with_params(input, params) + .map_err(|e| QueryError::Parse(e.to_string()))?; + let plan = + crate::planner::plan_statement(stmt).map_err(|e| QueryError::Parse(e.to_string()))?; + let plan = lower_unindexed_scans(&self.catalog, &plan); + let result = self.execute_plan(&plan); + if !self.in_transaction { + self.catalog + .sync_wal() + .map_err(|e| QueryError::StorageError(e.to_string()))?; + } + result + } + + /// Read-only variant of [`Engine::execute_powql_with_params`]. + /// + /// Mirrors [`Engine::execute_powql_readonly`]: parses with bound + /// params, rejects any write statement with + /// [`QueryError::ReadonlyNeedsWrite`] so the caller can escalate to the + /// write lock, then executes under a shared borrow. No plan-cache + /// interaction. + pub fn execute_powql_readonly_with_params( + &self, + input: &str, + params: &[crate::ast::ParamValue], + ) -> Result { + let _budget = self.enter_memory_budget(); + let stmt = crate::parser::parse_with_params(input, params) + .map_err(|e| QueryError::Parse(e.to_string()))?; + if !is_read_only_statement(&stmt) { + return Err(QueryError::ReadonlyNeedsWrite); + } + let plan = + crate::planner::plan_statement(stmt).map_err(|e| QueryError::Parse(e.to_string()))?; + let plan = lower_unindexed_scans(&self.catalog, &plan); + self.execute_plan_readonly(&plan) + } + /// Plan cache stats — useful for benches and debugging. pub fn plan_cache_stats(&self) -> (u64, u64, usize) { let cache = self.plan_cache.lock().unwrap_or_else(|e| e.into_inner()); @@ -576,7 +626,7 @@ impl Engine { .map_err(|e| QueryError::Execution(format!("plan cache lock poisoned: {e}")))? .get_with_substitution(hash, &literals); if let Some(plan) = cached { - let plan = lower_unindexed_range_scans(&self.catalog, &plan); + let plan = lower_unindexed_scans(&self.catalog, &plan); return self.execute_plan_readonly(&plan); } // Miss: plan + insert + execute. The planner is pure, so this @@ -587,14 +637,14 @@ impl Engine { .lock() .map_err(|e| QueryError::Execution(format!("plan cache lock poisoned: {e}")))? .insert(hash, plan.clone()); - let plan = lower_unindexed_range_scans(&self.catalog, &plan); + let plan = lower_unindexed_scans(&self.catalog, &plan); return self.execute_plan_readonly(&plan); } // Lex error — fall through to the planner for a consistent error // shape (though `parse` above would usually have caught it). let plan = crate::planner::plan_statement(stmt).map_err(|e| QueryError::Parse(e.to_string()))?; - let plan = lower_unindexed_range_scans(&self.catalog, &plan); + let plan = lower_unindexed_scans(&self.catalog, &plan); self.execute_plan_readonly(&plan) } diff --git a/crates/query/src/executor/plan_exec.rs b/crates/query/src/executor/plan_exec.rs index c0ebea6..0fd8916 100644 --- a/crates/query/src/executor/plan_exec.rs +++ b/crates/query/src/executor/plan_exec.rs @@ -728,36 +728,32 @@ impl Engine { (values, key_idx) }; + // Upsert requires the `on` column to be unique — otherwise + // there is no well-defined row to overwrite and a plain + // insert could silently create duplicate keys. + if self.catalog.is_index_unique(table, key_column) != Some(true) { + return Err(QueryError::Execution(format!( + "upsert on .{key_column} requires a unique column (declare it with \ + `unique {key_column}: ` or `alter {table} add unique .{key_column}`)" + ))); + } + let key_value = values[key_idx].clone(); - // Probe the index for a conflict. + // Probe the unique index for a conflict. let existing = { let tbl = self .catalog .get_table(table) .ok_or_else(|| QueryError::TableNotFound(table.to_string()))?; - if tbl.has_index(key_column) { - // Upsert key lookup: return the first match. - // For unique indexes this is the only match. - // For non-unique indexes on a key column, also - // just the first (upsert semantics). - let rids = tbl.index_lookup_all(key_column, &key_value); - rids.into_iter().next().and_then(|rid| { - tbl.heap - .get(rid) - .map(|data| (rid, decode_row(&tbl.schema, &data))) - }) - } else { - // No index — linear scan for the key. - let mut found = None; - for (rid, row) in tbl.scan() { - if row[key_idx] == key_value { - found = Some((rid, row)); - break; - } - } - found - } + // The key column is guaranteed unique above, so this + // returns at most one matching row. + let rids = tbl.index_lookup_all(key_column, &key_value); + rids.into_iter().next().and_then(|rid| { + tbl.heap + .get(rid) + .map(|data| (rid, decode_row(&tbl.schema, &data))) + }) }; if let Some((rid, mut existing_row)) = existing { @@ -1394,16 +1390,15 @@ impl Engine { let columns: Vec = fields .iter() .enumerate() - .map( - |(i, (fname, tname, req))| -> Result { - Ok(ColumnDef { - name: fname.clone(), - type_id: type_name_to_id(tname).map_err(QueryError::TypeError)?, - required: *req, - position: i as u16, - }) - }, - ) + .map(|(i, f)| -> Result { + Ok(ColumnDef { + name: f.name.clone(), + type_id: type_name_to_id(&f.type_name) + .map_err(QueryError::TypeError)?, + required: f.required, + position: i as u16, + }) + }) .collect::, _>>()?; let schema = Schema { table_name: name.clone(), @@ -1412,6 +1407,13 @@ impl Engine { self.catalog .create_table(schema) .map_err(|e| QueryError::StorageError(e.to_string()))?; + // Declaring a field `unique` auto-creates a unique B+tree + // index, which is where uniqueness is enforced on writes. + for f in fields.iter().filter(|f| f.unique) { + self.catalog + .create_index_unique(name, &f.name, true) + .map_err(|e| QueryError::StorageError(e.to_string()))?; + } Ok(QueryResult::Created(name.clone())) } @@ -1456,6 +1458,48 @@ impl Engine { message: format!("index on '{table}.{column}' created"), }) } + AlterAction::AddUnique { column } => { + // No DropIndex exists, so we cannot upgrade an existing + // non-unique index in place — reject it cleanly. + if self.catalog.has_index(table, column) { + return Err(QueryError::Execution(format!( + "cannot add unique on {table}.{column}: column already indexed" + ))); + } + // Scan existing rows for duplicate (non-null) values + // before creating the unique index. + { + let tbl = self + .catalog + .get_table(table) + .ok_or_else(|| QueryError::TableNotFound(table.to_string()))?; + let col_idx = tbl.schema.column_index(column).ok_or_else(|| { + QueryError::ColumnNotFound { + table: table.to_string(), + column: column.clone(), + } + })?; + let mut seen = std::collections::HashSet::new(); + for (_, row) in tbl.scan() { + let v = &row[col_idx]; + if v.is_empty() { + continue; + } + if !seen.insert(v.clone()) { + return Err(QueryError::Execution(format!( + "cannot add unique on {table}.{column}: \ + duplicate value {v:?} exists" + ))); + } + } + } + self.catalog + .create_index_unique(table, column, true) + .map_err(|e| QueryError::StorageError(e.to_string()))?; + Ok(QueryResult::Executed { + message: format!("unique index on '{table}.{column}' created"), + }) + } }, PlanNode::DropTable { name } => { @@ -1681,9 +1725,45 @@ impl Engine { let start_inclusive = start.as_ref().map(|(_, inc)| *inc).unwrap_or(true); let end_inclusive = end.as_ref().map(|(_, inc)| *inc).unwrap_or(true); - // Range scans only use the btree fast path for unique indexes, - // because non-unique indexes store composite keys (column_value - // + RowId) that don't directly compare against raw column values. + // Non-unique index: walk the composite (value, rid) leaf + // chain between prefix bounds, fetch each row from the heap, + // and recheck. The recheck enforces exclusive bounds + // (range_rids is inclusive) and defensively skips any decoded + // null (nulls are never indexed, so they must not match). + if tbl.is_index_unique(column) == Some(false) { + if let Some(btree) = tbl.index(column) { + if start_val.is_some() || end_val.is_some() { + let col_idx = schema.column_index(column).ok_or_else(|| { + QueryError::ColumnNotFound { + table: String::new(), + column: column.clone(), + } + })?; + let rids = btree.range_rids(start_val.as_ref(), end_val.as_ref()); + let mut rows: Vec> = Vec::with_capacity(rids.len()); + for rid in rids { + if let Some(data) = tbl.heap.get(rid) { + let row = decode_row(schema, &data); + if !row[col_idx].is_empty() + && range_matches( + &row[col_idx], + &start_val, + start_inclusive, + &end_val, + end_inclusive, + ) + { + rows.push(row); + } + } + } + return Ok(QueryResult::Rows { columns, rows }); + } + } + } + + // Range scans use the btree fast path for unique indexes, + // walking raw column-value keys directly. if tbl.is_index_unique(column) == Some(true) { if let Some(btree) = tbl.index(column) { let hits: Vec<(Value, RowId)> = match (&start_val, &end_val) { @@ -2649,10 +2729,12 @@ impl Engine { for (idx, val) in resolved.iter() { row[*idx] = val.clone(); } - self.catalog - .update_hinted(table, rid, &row, Some(changed_cols)) - .map_err(|e| e.to_string()) - .ok(); + if let Err(e) = + self.catalog + .update_hinted(table, rid, &row, Some(changed_cols)) + { + return Some(Err(QueryError::StorageError(e.to_string()))); + } count += 1; } self.view_registry.mark_dependents_dirty(table); @@ -3357,19 +3439,20 @@ pub(super) fn hash_join( QueryResult::Rows { columns, rows } } -/// Lower unindexed `RangeScan` nodes to `Filter(SeqScan)` so that all -/// downstream fast paths (count, project+limit, sort+limit, agg, update, -/// delete) continue to fire. +/// Lower unindexed `RangeScan` and `IndexScan` nodes to `Filter(SeqScan)` +/// so that all downstream fast paths (count, project+limit, sort+limit, +/// agg, update, delete) continue to fire. /// -/// The planner emits `RangeScan` speculatively for every range inequality -/// (`.age > 30`) because it has no catalog access. When the column has a -/// B-tree index, `RangeScan` is the correct plan. When it doesn't, the -/// executor's `RangeScan` fallback materialises every matching row with +/// The planner emits `RangeScan` (for `.age > 30`) and `IndexScan` (for +/// `.email = lit`) speculatively because it has no catalog access. When +/// the column has a B-tree index, those plans are correct. When it +/// doesn't, the executor's fallbacks materialise every matching row with /// full `decode_row` — bypassing the compiled-predicate fast paths that -/// `Filter(SeqScan)` would trigger. +/// `Filter(SeqScan)` would trigger. Lowering both speculative leaf kinds +/// also keeps EXPLAIN honest: it prints the plan that actually runs. /// /// This pass runs once per query, before execution. -pub(super) fn lower_unindexed_range_scans(catalog: &Catalog, plan: &PlanNode) -> PlanNode { +pub(super) fn lower_unindexed_scans(catalog: &Catalog, plan: &PlanNode) -> PlanNode { match plan { PlanNode::RangeScan { table, @@ -3378,11 +3461,12 @@ pub(super) fn lower_unindexed_range_scans(catalog: &Catalog, plan: &PlanNode) -> end, } => { if let Some(tbl) = catalog.get_table(table) { - // Keep RangeScan only for unique indexes — their btree - // stores raw column values. Non-unique indexes store - // composite keys that don't directly compare against - // column values, so lower them to Filter(SeqScan). - if tbl.is_index_unique(column) == Some(true) { + // Keep RangeScan whenever ANY index exists on the column: + // unique indexes store raw column values, non-unique indexes + // store composite (value, rid) keys that the executor walks + // natively via BTree::range_rids. Only lower to Filter(SeqScan) + // when the column is unindexed. + if tbl.has_index(column) { return plan.clone(); } } @@ -3395,23 +3479,23 @@ pub(super) fn lower_unindexed_range_scans(catalog: &Catalog, plan: &PlanNode) -> } } PlanNode::Filter { input, predicate } => PlanNode::Filter { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), predicate: predicate.clone(), }, PlanNode::Project { input, fields } => PlanNode::Project { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), fields: fields.clone(), }, PlanNode::Sort { input, keys } => PlanNode::Sort { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), keys: keys.clone(), }, PlanNode::Limit { input, count } => PlanNode::Limit { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), count: count.clone(), }, PlanNode::Offset { input, count } => PlanNode::Offset { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), count: count.clone(), }, PlanNode::Aggregate { @@ -3419,12 +3503,12 @@ pub(super) fn lower_unindexed_range_scans(catalog: &Catalog, plan: &PlanNode) -> function, field, } => PlanNode::Aggregate { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), function: *function, field: field.clone(), }, PlanNode::Distinct { input } => PlanNode::Distinct { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), }, PlanNode::GroupBy { input, @@ -3432,7 +3516,7 @@ pub(super) fn lower_unindexed_range_scans(catalog: &Catalog, plan: &PlanNode) -> aggregates, having, } => PlanNode::GroupBy { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), keys: keys.clone(), aggregates: aggregates.clone(), having: having.clone(), @@ -3442,25 +3526,25 @@ pub(super) fn lower_unindexed_range_scans(catalog: &Catalog, plan: &PlanNode) -> table, assignments, } => PlanNode::Update { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), table: table.clone(), assignments: assignments.clone(), }, PlanNode::Delete { input, table } => PlanNode::Delete { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), table: table.clone(), }, PlanNode::Window { input, windows } => PlanNode::Window { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), windows: windows.clone(), }, PlanNode::Union { left, right, all } => PlanNode::Union { - left: Box::new(lower_unindexed_range_scans(catalog, left)), - right: Box::new(lower_unindexed_range_scans(catalog, right)), + left: Box::new(lower_unindexed_scans(catalog, left)), + right: Box::new(lower_unindexed_scans(catalog, right)), all: *all, }, PlanNode::Explain { input } => PlanNode::Explain { - input: Box::new(lower_unindexed_range_scans(catalog, input)), + input: Box::new(lower_unindexed_scans(catalog, input)), }, PlanNode::NestedLoopJoin { left, @@ -3468,11 +3552,28 @@ pub(super) fn lower_unindexed_range_scans(catalog: &Catalog, plan: &PlanNode) -> on, kind, } => PlanNode::NestedLoopJoin { - left: Box::new(lower_unindexed_range_scans(catalog, left)), - right: Box::new(lower_unindexed_range_scans(catalog, right)), + left: Box::new(lower_unindexed_scans(catalog, left)), + right: Box::new(lower_unindexed_scans(catalog, right)), on: on.clone(), kind: *kind, }, + PlanNode::IndexScan { table, column, key } => { + if let Some(tbl) = catalog.get_table(table) { + if tbl.has_index(column) { + return plan.clone(); + } + } + PlanNode::Filter { + input: Box::new(PlanNode::SeqScan { + table: table.clone(), + }), + predicate: Expr::BinaryOp( + Box::new(Expr::Field(column.clone())), + BinOp::Eq, + Box::new(key.clone()), + ), + } + } // Leaf nodes: no children to recurse into. _ => plan.clone(), } @@ -3707,12 +3808,15 @@ pub(super) fn format_plan_tree(plan: &PlanNode, depth: usize) -> String { PlanNode::CreateTable { name, fields } => { let fs: Vec = fields .iter() - .map(|(n, t, r)| { - if *r { - format!("{n}: {t} required") - } else { - format!("{n}: {t}") + .map(|f| { + let mut mods = String::new(); + if f.required { + mods.push_str(" required"); + } + if f.unique { + mods.push_str(" unique"); } + format!("{}: {}{mods}", f.name, f.type_name) }) .collect(); format!("{indent}CreateTable name={name} fields=[{}]", fs.join(", ")) diff --git a/crates/query/src/executor/tests.rs b/crates/query/src/executor/tests.rs index bfc1509..d0ce94e 100644 --- a/crates/query/src/executor/tests.rs +++ b/crates/query/src/executor/tests.rs @@ -3292,6 +3292,120 @@ fn test_explain_filter() { } } +fn explain_text(engine: &mut Engine, q: &str) -> String { + match engine.execute_powql(q).unwrap() { + QueryResult::Rows { rows, .. } => rows + .iter() + .map(|r| match &r[0] { + Value::Str(s) => s.as_str(), + _ => "", + }) + .collect::>() + .join("\n"), + _ => panic!("expected rows"), + } +} + +#[test] +fn test_explain_eq_filter_unindexed_shows_seqscan_not_indexscan() { + let mut engine = test_engine(); + // `email` has NO index in test_engine; the planner folds + // `.email = lit` to IndexScan speculatively. EXPLAIN must show + // what actually runs: Filter over SeqScan. + let text = explain_text( + &mut engine, + r#"explain User filter .email = "alice@ex.com""#, + ); + assert!(!text.contains("IndexScan"), "got: {text}"); + assert!(text.contains("Filter"), "got: {text}"); + assert!(text.contains("SeqScan"), "got: {text}"); +} + +#[test] +fn test_explain_eq_filter_indexed_shows_indexscan() { + let mut engine = test_engine(); + engine.execute_powql("alter User add index .email").unwrap(); + let text = explain_text( + &mut engine, + r#"explain User filter .email = "alice@ex.com""#, + ); + assert!(text.contains("IndexScan"), "got: {text}"); +} + +fn sorted_names(r: QueryResult) -> Vec { + match r { + QueryResult::Rows { rows, .. } => { + let mut v: Vec = rows.iter().map(|r| format!("{:?}", r[0])).collect(); + v.sort(); + v + } + other => panic!("expected rows, got {other:?}"), + } +} + +#[test] +fn test_range_scan_uses_nonunique_index_same_results() { + let mut engine = test_engine(); // Alice 30, Bob 25, Charlie 35 + let unindexed = engine + .execute_powql("User filter .age > 26 and .age <= 35 { .name }") + .unwrap(); + engine.execute_powql("alter User add index .age").unwrap(); + let indexed = engine + .execute_powql("User filter .age > 26 and .age <= 35 { .name }") + .unwrap(); + assert_eq!(sorted_names(unindexed), sorted_names(indexed)); // Alice, Charlie +} + +#[test] +fn test_range_scan_between_uses_nonunique_index() { + let mut engine = test_engine(); + let unindexed = engine + .execute_powql("User filter .age between 25 and 30 { .name }") + .unwrap(); + engine.execute_powql("alter User add index .age").unwrap(); + let indexed = engine + .execute_powql("User filter .age between 25 and 30 { .name }") + .unwrap(); + assert_eq!(sorted_names(unindexed), sorted_names(indexed)); // Alice, Bob +} + +#[test] +fn test_range_scan_indexed_exclusive_bound_excludes_boundary() { + let mut engine = test_engine(); + engine.execute_powql("alter User add index .age").unwrap(); + // Bob is exactly 25; `.age > 25` must exclude him. + let names = sorted_names( + engine + .execute_powql("User filter .age > 25 { .name }") + .unwrap(), + ); + assert_eq!(names, vec!["Str(\"Alice\")", "Str(\"Charlie\")"]); +} + +#[test] +fn test_range_scan_indexed_excludes_nulls() { + let mut engine = test_engine(); + engine + .execute_powql(r#"insert User { name := "Dana", email := "d@ex.com" }"#) + .unwrap(); // age null + engine.execute_powql("alter User add index .age").unwrap(); + match engine + .execute_powql("User filter .age < 100 { .name }") + .unwrap() + { + QueryResult::Rows { rows, .. } => assert_eq!(rows.len(), 3, "null age must not match"), + other => panic!("expected rows, got {other:?}"), + } +} + +#[test] +fn test_explain_range_indexed_shows_rangescan() { + let mut engine = test_engine(); + engine.execute_powql("alter User add index .age").unwrap(); + let text = explain_text(&mut engine, "explain User filter .age > 26"); + assert!(text.contains("RangeScan"), "got: {text}"); +} + #[test] fn test_explain_does_not_execute() { let mut engine = test_engine(); @@ -4300,3 +4414,305 @@ fn test_window_agg_with_order_keeps_running_frame() { assert_eq!(by_name["b"], Value::Float(15.0)); assert_eq!(by_name["c"], Value::Float(20.0)); } + +// --------------------------------------------------------------------------- +// UNIQUE constraint enforcement (Task 3) +// --------------------------------------------------------------------------- + +fn unique_engine() -> Engine { + let id = TEST_COUNTER.fetch_add(1, Ordering::SeqCst); + let dir = std::env::temp_dir().join(format!("powdb_uniq_{}_{}", std::process::id(), id)); + let mut engine = Engine::new(&dir).unwrap(); + engine + .execute_powql("type Acct { required unique email: str, id: int }") + .unwrap(); + engine + .execute_powql(r#"insert Acct { email := "a@x.com", id := 1 }"#) + .unwrap(); + engine +} + +#[test] +fn test_unique_dup_insert_rejected() { + let mut engine = unique_engine(); + let err = engine + .execute_powql(r#"insert Acct { email := "a@x.com", id := 2 }"#) + .unwrap_err(); + assert!( + err.to_string() + .contains("unique constraint violation on Acct.email"), + "{err}" + ); + match engine.execute_powql("count(Acct)").unwrap() { + QueryResult::Scalar(Value::Int(n)) => assert_eq!(n, 1), + other => panic!("expected scalar, got {other:?}"), + } +} + +#[test] +fn test_unique_update_into_dup_rejected() { + let mut engine = unique_engine(); + engine + .execute_powql(r#"insert Acct { email := "b@x.com", id := 2 }"#) + .unwrap(); + let err = engine + .execute_powql(r#"Acct filter .id = 2 update { email := "a@x.com" }"#) + .unwrap_err(); + assert!( + err.to_string().contains("unique constraint violation"), + "{err}" + ); + // The losing row keeps its own value (rolled back / never applied). + match engine + .execute_powql(r#"Acct filter .email = "b@x.com" { .id }"#) + .unwrap() + { + QueryResult::Rows { rows, .. } => assert_eq!(rows.len(), 1), + other => panic!("expected rows, got {other:?}"), + } +} + +#[test] +fn test_unique_update_to_same_value_allowed() { + let mut engine = unique_engine(); + // Updating a unique column to its own current value must NOT trip the + // constraint (existing rid == self). + engine + .execute_powql(r#"Acct filter .id = 1 update { email := "a@x.com", id := 9 }"#) + .unwrap(); + match engine.execute_powql("count(Acct)").unwrap() { + QueryResult::Scalar(Value::Int(n)) => assert_eq!(n, 1), + other => panic!("expected scalar, got {other:?}"), + } +} + +#[test] +fn test_upsert_requires_unique_and_no_dup_ids() { + let id = TEST_COUNTER.fetch_add(1, Ordering::SeqCst); + let dir = std::env::temp_dir().join(format!("powdb_ups_{}_{}", std::process::id(), id)); + let mut engine = Engine::new(&dir).unwrap(); + engine + .execute_powql("type W { unique id: int, v: str }") + .unwrap(); + engine + .execute_powql(r#"upsert W on .id { id := 1, v := "first" }"#) + .unwrap(); + // Known bug regression: a plain insert of the same id must now fail + // instead of silently creating a second id=1 row. + assert!(engine + .execute_powql(r#"insert W { id := 1, v := "second" }"#) + .is_err()); + engine + .execute_powql(r#"upsert W on .id { id := 1, v := "third" }"#) + .unwrap(); + match engine.execute_powql("count(W)").unwrap() { + QueryResult::Scalar(Value::Int(n)) => assert_eq!(n, 1), + other => panic!("expected scalar, got {other:?}"), + } + // upsert on a NON-unique column is a clean error. + engine.execute_powql("type W2 { id: int }").unwrap(); + let err = engine + .execute_powql("upsert W2 on .id { id := 1 }") + .unwrap_err(); + assert!( + err.to_string().contains("requires a unique column"), + "{err}" + ); +} + +#[test] +fn test_alter_add_unique_fails_on_existing_dups() { + let id = TEST_COUNTER.fetch_add(1, Ordering::SeqCst); + let dir = std::env::temp_dir().join(format!("powdb_audup_{}_{}", std::process::id(), id)); + let mut engine = Engine::new(&dir).unwrap(); + engine.execute_powql("type L { e: str }").unwrap(); + engine.execute_powql(r#"insert L { e := "x" }"#).unwrap(); + engine.execute_powql(r#"insert L { e := "x" }"#).unwrap(); + assert!(engine.execute_powql("alter L add unique .e").is_err()); +} + +#[test] +fn test_alter_add_unique_succeeds_then_enforces() { + let id = TEST_COUNTER.fetch_add(1, Ordering::SeqCst); + let dir = std::env::temp_dir().join(format!("powdb_au_{}_{}", std::process::id(), id)); + let mut engine = Engine::new(&dir).unwrap(); + engine.execute_powql("type L { e: str }").unwrap(); + engine.execute_powql(r#"insert L { e := "x" }"#).unwrap(); + engine.execute_powql(r#"insert L { e := "y" }"#).unwrap(); + engine.execute_powql("alter L add unique .e").unwrap(); + // Now enforced on subsequent inserts. + assert!(engine.execute_powql(r#"insert L { e := "x" }"#).is_err()); + // Adding unique on an already-indexed column is a clean error. + let err = engine.execute_powql("alter L add unique .e").unwrap_err(); + assert!(err.to_string().contains("already indexed"), "{err}"); +} + +#[test] +fn test_unique_constraint_survives_reopen() { + let id = TEST_COUNTER.fetch_add(1, Ordering::SeqCst); + let dir = std::env::temp_dir().join(format!("powdb_uniq_re_{}_{}", std::process::id(), id)); + { + let mut engine = Engine::new(&dir).unwrap(); + engine + .execute_powql("type Acct { required unique email: str }") + .unwrap(); + engine + .execute_powql(r#"insert Acct { email := "a@x.com" }"#) + .unwrap(); + // Dropped here without explicit checkpoint — recovery path must + // restore the unique flag from catalog.bin + WAL replay. + } + let mut engine = Engine::new(&dir).unwrap(); + assert!(engine + .execute_powql(r#"insert Acct { email := "a@x.com" }"#) + .is_err()); +} + +// --------------------------------------------------------------------------- +// Task 4: wire parameter binding ($1..$N), token-level substitution. +// --------------------------------------------------------------------------- + +#[test] +fn test_params_bind_injection_shaped_strings_byte_faithfully() { + use crate::ast::ParamValue; + let mut engine = test_engine(); + let evil = r#"x"; drop User; filter .age > "0"#; + engine + .execute_powql_with_params( + "insert User { name := $1, email := $2, age := $3 }", + &[ + ParamValue::Str(evil.to_string()), + ParamValue::Str("e@x.com".into()), + ParamValue::Int(40), + ], + ) + .unwrap(); + let r = engine + .execute_powql_with_params( + "User filter .email = $1 { .name }", + &[ParamValue::Str("e@x.com".into())], + ) + .unwrap(); + match r { + QueryResult::Rows { rows, .. } => { + assert_eq!(rows.len(), 1); + assert_eq!(rows[0][0], Value::Str(evil.to_string())); + } + other => panic!("expected rows, got {other:?}"), + } + // Table survived; 4 rows total. + match engine.execute_powql("count(User)").unwrap() { + QueryResult::Scalar(Value::Int(n)) => assert_eq!(n, 4), + other => panic!("{other:?}"), + } +} + +#[test] +fn test_params_errors() { + use crate::ast::ParamValue; + let mut engine = test_engine(); + // Out-of-range placeholder is a clean error. + assert!(engine + .execute_powql_with_params("User filter .age > $2", &[ParamValue::Int(1)]) + .is_err()); + // The no-params API rejects an unbound placeholder. + assert!(engine.execute_powql("User filter .age > $1").is_err()); + // Null param round-trips as PowQL null. + engine + .execute_powql_with_params( + "insert User { name := $1, email := $2, age := $3 }", + &[ + ParamValue::Str("N".into()), + ParamValue::Str("n@x.com".into()), + ParamValue::Null, + ], + ) + .unwrap(); + match engine + .execute_powql("User filter .age = null { .name }") + .unwrap() + { + QueryResult::Rows { rows, .. } => assert_eq!(rows.len(), 1), + other => panic!("{other:?}"), + } +} + +#[test] +fn test_params_all_types_round_trip() { + use crate::ast::ParamValue; + let mut engine = test_engine(); + engine + .execute_powql("type Mix { required name: str, n: int, f: float, ok: bool }") + .unwrap(); + engine + .execute_powql_with_params( + "insert Mix { name := $1, n := $2, f := $3, ok := $4 }", + &[ + ParamValue::Str("row".into()), + ParamValue::Int(-7), + ParamValue::Float(2.5), + ParamValue::Bool(true), + ], + ) + .unwrap(); + match engine + .execute_powql("Mix filter .n = -7 { .name }") + .unwrap() + { + QueryResult::Rows { rows, .. } => assert_eq!(rows.len(), 1), + other => panic!("{other:?}"), + } +} + +#[test] +fn test_params_readonly_path() { + use crate::ast::ParamValue; + let engine = { + let mut e = test_engine(); + // mutate up front, then exercise the readonly param path on &self. + e.execute_powql_with_params( + "insert User { name := $1, email := $2, age := $3 }", + &[ + ParamValue::Str("Zed".into()), + ParamValue::Str("z@x.com".into()), + ParamValue::Int(99), + ], + ) + .unwrap(); + e + }; + let r = engine + .execute_powql_readonly_with_params( + "User filter .name = $1 { .age }", + &[ParamValue::Str("Zed".into())], + ) + .unwrap(); + match r { + QueryResult::Rows { rows, .. } => { + assert_eq!(rows.len(), 1); + assert_eq!(rows[0][0], Value::Int(99)); + } + other => panic!("{other:?}"), + } + // A write statement through the readonly param path escalates. + assert!(matches!( + engine.execute_powql_readonly_with_params( + "insert User { name := $1, email := $2 }", + &[ParamValue::Str("a".into()), ParamValue::Str("b".into())], + ), + Err(crate::result::QueryError::ReadonlyNeedsWrite) + )); +} + +#[test] +fn test_no_params_regression_path_unchanged() { + let mut engine = test_engine(); + // Plain queries with no placeholders still work identically. + match engine + .execute_powql("User filter .age > 26 { .name }") + .unwrap() + { + QueryResult::Rows { rows, .. } => assert_eq!(rows.len(), 2), + other => panic!("{other:?}"), + } +} diff --git a/crates/query/src/lexer.rs b/crates/query/src/lexer.rs index 567d523..d22124a 100644 --- a/crates/query/src/lexer.rs +++ b/crates/query/src/lexer.rs @@ -191,6 +191,7 @@ pub fn lex(input: &str) -> Result, LexError> { "multi" => Token::Multi, "link" => Token::Link, "index" => Token::Index, + "unique" => Token::Unique, "on" => Token::On, "asc" => Token::Asc, "desc" => Token::Desc, diff --git a/crates/query/src/parser.rs b/crates/query/src/parser.rs index 5b5ced3..45fceec 100644 --- a/crates/query/src/parser.rs +++ b/crates/query/src/parser.rs @@ -101,6 +101,62 @@ pub fn parse(input: &str) -> Result { message: e.message, position: e.position, })?; + parse_tokens(tokens) +} + +/// Parse PowQL with `$N` placeholders bound to positional `params`. +/// +/// Binding happens at the **token level**: the input is lexed, each +/// `$N` placeholder token is replaced in place with the literal token +/// for `params[N-1]` (a string param becomes a `StringLit` byte-for-byte, +/// `null` becomes `Token::Null`), and the resulting token stream is parsed +/// normally. Values are never re-lexed or string-interpolated, so an +/// injection-shaped string is inert data — it can never change the query's +/// shape. +/// +/// Placeholders are 1-based (`$1`, `$2`, …). A reference to a placeholder +/// with no corresponding parameter is a clean [`ParseError::Syntax`], as is +/// a non-numeric `$name` (the named-parameter form belongs to the in-process +/// prepared API, not the positional wire-binding path). +pub fn parse_with_params(input: &str, params: &[ParamValue]) -> Result { + let mut tokens = lex(input).map_err(|e| ParseError::Lex { + message: e.message, + position: e.position, + })?; + for tok in tokens.iter_mut() { + if let Token::Param(name) = tok { + let n: usize = name.parse().map_err(|_| ParseError::Syntax { + message: format!( + "positional parameters must be numeric (`$1`, `$2`, …); got `${name}`" + ), + })?; + if n == 0 { + return Err(ParseError::Syntax { + message: "parameter placeholders are 1-based; `$0` is invalid".into(), + }); + } + let p = params.get(n - 1).ok_or_else(|| ParseError::Syntax { + message: format!( + "query references ${n} but only {} parameter(s) were supplied", + params.len() + ), + })?; + *tok = match p { + ParamValue::Null => Token::Null, + ParamValue::Int(v) => Token::IntLit(*v), + ParamValue::Float(v) => Token::FloatLit(*v), + ParamValue::Bool(v) => Token::BoolLit(*v), + ParamValue::Str(s) => Token::StringLit(s.clone()), + }; + } + } + parse_tokens(tokens) +} + +/// Shared tail of [`parse`] / [`parse_with_params`]: run the recursive +/// descent over an already-lexed (and possibly param-substituted) token +/// stream and reject any trailing tokens. +fn parse_tokens(tokens: Vec) -> Result { let mut parser = Parser { tokens, pos: 0, @@ -1314,10 +1370,12 @@ impl Parser { self.advance(); Ok(Expr::Literal(Literal::Bool(v))) } - Token::Param(name) => { - self.advance(); - Ok(Expr::Param(name)) - } + // `$N` placeholders are only valid through + // `parse_with_params`, which substitutes them for literal + // tokens before this expression parser ever runs. Reaching a + // raw `Token::Param` here means the caller used the plain + // (no-params) path with a placeholder — surface the standard + // unexpected-token error so the message names the parameter. Token::Null => { self.advance(); Ok(Expr::Null) @@ -1561,6 +1619,23 @@ impl Parser { action: AlterAction::AddIndex { column }, })); } + // `alter
add unique .` + if *self.peek() == Token::Unique { + self.advance(); + let column = match self.advance() { + Token::DotIdent(n) => n, + t => { + return Err(ParseError::UnexpectedToken { + expected: ". after add unique".into(), + got: t.display_name(), + }) + } + }; + return Ok(Statement::AlterTable(AlterTableExpr { + table, + action: AlterAction::AddUnique { column }, + })); + } // optional `column` keyword if *self.peek() == Token::Column { self.advance(); @@ -1769,12 +1844,21 @@ impl Parser { self.expect(&Token::LBrace)?; let mut fields = Vec::new(); while !matches!(self.peek(), Token::RBrace | Token::Eof) { - let required = if *self.peek() == Token::Required { - self.advance(); - true - } else { - false - }; + // Accept `required` and `unique` modifiers in either order. + let (mut required, mut unique) = (false, false); + loop { + match self.peek() { + Token::Required => { + self.advance(); + required = true; + } + Token::Unique => { + self.advance(); + unique = true; + } + _ => break, + } + } let field_name = match self.advance() { Token::Ident(n) => n, t => { @@ -1798,6 +1882,7 @@ impl Parser { name: field_name, type_name, required, + unique, }); if *self.peek() == Token::Comma { self.advance(); @@ -1850,6 +1935,7 @@ fn tokens_to_text(tokens: &[Token]) -> String { Token::Multi => out.push_str("multi"), Token::Link => out.push_str("link"), Token::Index => out.push_str("index"), + Token::Unique => out.push_str("unique"), Token::On => out.push_str("on"), Token::Asc => out.push_str("asc"), Token::Desc => out.push_str("desc"), @@ -2878,6 +2964,42 @@ mod tests { } } + #[test] + fn test_parse_type_with_unique_modifier() { + let stmt = parse("type User { required unique email: str, age: int }").unwrap(); + match stmt { + Statement::CreateType(ct) => { + assert!(ct.fields[0].required && ct.fields[0].unique); + assert!(!ct.fields[1].unique); + } + other => panic!("expected CreateType, got {other:?}"), + } + } + + #[test] + fn test_parse_type_unique_before_required() { + // Modifiers accepted in either order. + let stmt = parse("type User { unique required email: str }").unwrap(); + match stmt { + Statement::CreateType(ct) => { + assert!(ct.fields[0].required && ct.fields[0].unique); + } + other => panic!("expected CreateType, got {other:?}"), + } + } + + #[test] + fn test_parse_alter_add_unique() { + let stmt = parse("alter User add unique .email").unwrap(); + match stmt { + Statement::AlterTable(at) => assert!(matches!( + at.action, + AlterAction::AddUnique { ref column } if column == "email" + )), + other => panic!("expected AlterTable, got {other:?}"), + } + } + #[test] fn test_parse_alter_drop_column() { let stmt = parse("alter User drop column status").unwrap(); diff --git a/crates/query/src/plan.rs b/crates/query/src/plan.rs index 26f4878..bba8d22 100644 --- a/crates/query/src/plan.rs +++ b/crates/query/src/plan.rs @@ -1,5 +1,16 @@ use crate::ast::{AggFunc, AlterAction, Assignment, Expr, JoinKind, WindowFunc}; +/// A column definition carried by `PlanNode::CreateTable`. Replaces the +/// old `(name, type_name, required)` tuple so the `unique` modifier can +/// flow from the parser through to the executor's DDL arm. +#[derive(Debug, Clone)] +pub struct CreateField { + pub name: String, + pub type_name: String, + pub required: bool, + pub unique: bool, +} + /// Physical plan nodes — what the executor actually runs. #[derive(Debug, Clone)] pub enum PlanNode { @@ -115,7 +126,7 @@ pub enum PlanNode { }, CreateTable { name: String, - fields: Vec<(String, String, bool)>, + fields: Vec, }, /// Create a materialized view: execute query, store results, register. CreateView { diff --git a/crates/query/src/planner.rs b/crates/query/src/planner.rs index cd35759..5354ec8 100644 --- a/crates/query/src/planner.rs +++ b/crates/query/src/planner.rs @@ -509,7 +509,12 @@ fn plan_create_type(ct: CreateTypeExpr) -> Result { let fields = ct .fields .into_iter() - .map(|f| (f.name, f.type_name, f.required)) + .map(|f| crate::plan::CreateField { + name: f.name, + type_name: f.type_name, + required: f.required, + unique: f.unique, + }) .collect(); Ok(PlanNode::CreateTable { name: ct.name, diff --git a/crates/query/src/token.rs b/crates/query/src/token.rs index 4cdea1b..cb2f503 100644 --- a/crates/query/src/token.rs +++ b/crates/query/src/token.rs @@ -24,6 +24,7 @@ pub enum Token { Multi, // multi Link, // link Index, // index + Unique, // unique On, // on Conflict, // conflict Asc, // asc @@ -175,6 +176,7 @@ impl Token { Token::Multi => "'multi'".into(), Token::Link => "'link'".into(), Token::Index => "'index'".into(), + Token::Unique => "'unique'".into(), Token::On => "'on'".into(), Token::Conflict => "'conflict'".into(), Token::Asc => "'asc'".into(), diff --git a/crates/query/tests/durability.rs b/crates/query/tests/durability.rs index c0e8f06..117c096 100644 --- a/crates/query/tests/durability.rs +++ b/crates/query/tests/durability.rs @@ -190,7 +190,9 @@ fn test_mixed_mutations_survive_crash() { let mut engine = Engine::new(&dir).unwrap(); exec( &mut engine, - "type P { required id: int, price: int, tag: str }", + // `id` is unique so `upsert P on .id` is valid (breaking change + // since 0.4.7: the upsert key column must be unique). + "type P { required unique id: int, price: int, tag: str }", ); for i in 1..=300i64 { exec( diff --git a/crates/server/src/handler.rs b/crates/server/src/handler.rs index 46ccd8d..884826e 100644 --- a/crates/server/src/handler.rs +++ b/crates/server/src/handler.rs @@ -1,4 +1,4 @@ -use crate::protocol::Message; +use crate::protocol::{Message, WireParam}; use powdb_auth::{Permission, Role, UserStore}; use powdb_query::executor::{is_read_only_statement, Engine}; use powdb_query::parser; @@ -296,6 +296,63 @@ fn dispatch_query( eng.execute_powql(query) } +/// Convert a wire parameter into the query-crate [`ParamValue`] used for +/// token-level binding. +fn wire_param_to_value(p: &WireParam) -> powdb_query::ast::ParamValue { + use powdb_query::ast::ParamValue; + match p { + WireParam::Null => ParamValue::Null, + WireParam::Int(v) => ParamValue::Int(*v), + WireParam::Float(v) => ParamValue::Float(*v), + WireParam::Bool(v) => ParamValue::Bool(*v), + WireParam::Str(s) => ParamValue::Str(s.clone()), + } +} + +/// Parameterized counterpart of [`dispatch_query`]. Routes through the exact +/// same role-enforcement and read/write escalation logic, but binds the +/// `$N` placeholders at the token level via the query crate's +/// `parse_with_params` path. A string parameter can never change the query's +/// shape — it is substituted as a literal token, not interpolated text. +fn dispatch_query_with_params( + engine: &Arc>, + query: &str, + params: &[WireParam], + principal: Option<&Principal>, +) -> Result { + let bound: Vec = params.iter().map(wire_param_to_value).collect(); + + // Parse once (with params bound) so role enforcement and read/write + // classification see exactly the statement that will execute. + let stmt_result = parser::parse_with_params(query, &bound).map_err(|e| e.to_string()); + + if let Ok(stmt) = &stmt_result { + check_statement_permitted(principal, stmt)?; + } + + let can_try_read = matches!(&stmt_result, Ok(s) if is_read_only_statement(s)); + if can_try_read { + let res = { + let eng = engine + .read() + .map_err(|e| QueryError::Execution(format!("lock poisoned: {e}")))?; + eng.execute_powql_readonly_with_params(query, &bound) + }; + match res { + Ok(r) => return Ok(r), + Err(QueryError::ReadonlyNeedsWrite) => { + // Escalate to the write path below. + } + Err(e) => return Err(e), + } + } + + let mut eng = engine + .write() + .map_err(|e| QueryError::Execution(format!("lock poisoned: {e}")))?; + eng.execute_powql_with_params(query, &bound) +} + pub async fn handle_connection(stream: S, opts: ConnOpts<'_>) where S: AsyncRead + AsyncWrite + Unpin, @@ -499,6 +556,45 @@ where } } } + Message::QueryWithParams { query, params } => { + if query.len() > MAX_QUERY_LENGTH { + Message::Error { + message: format!( + "query too large: {} bytes (max {})", + query.len(), + MAX_QUERY_LENGTH + ), + } + } else { + debug!(peer = %peer, query = %query, n_params = params.len(), "received parameterized query"); + let handle = tokio::task::spawn_blocking({ + let engine = engine.clone(); + let query = query.clone(); + let params = params.clone(); + let principal = principal.clone(); + move || { + dispatch_query_with_params(&engine, &query, ¶ms, principal.as_ref()) + } + }); + let abort_handle = handle.abort_handle(); + match tokio::time::timeout(query_timeout, handle).await { + Ok(Ok(Ok(result))) => query_result_to_message(result), + Ok(Ok(Err(e))) => Message::Error { + message: sanitize_error(&e.to_string()), + }, + Ok(Err(e)) => Message::Error { + message: format!("internal error: {e}"), + }, + Err(_) => { + abort_handle.abort(); + warn!(peer = %peer, query = %query, "query timeout exceeded"); + Message::Error { + message: "query timeout exceeded".into(), + } + } + } + } + } Message::Disconnect => { debug!(peer = %peer, "received DISCONNECT"); break; diff --git a/crates/server/src/protocol.rs b/crates/server/src/protocol.rs index a1e2d3c..e12e24b 100644 --- a/crates/server/src/protocol.rs +++ b/crates/server/src/protocol.rs @@ -4,6 +4,10 @@ use zeroize::Zeroizing; const MSG_CONNECT: u8 = 0x01; const MSG_CONNECT_OK: u8 = 0x02; const MSG_QUERY: u8 = 0x03; +/// Query carrying positional `$N` parameters (Task 4). Pure protocol +/// addition: old clients never send it, and old servers reject it with the +/// existing "unknown message type" error — no existing frame changes shape. +const MSG_QUERY_PARAMS: u8 = 0x04; const MSG_RESULT_ROWS: u8 = 0x07; const MSG_RESULT_SCALAR: u8 = 0x08; const MSG_RESULT_OK: u8 = 0x09; @@ -26,6 +30,23 @@ const MAX_COLUMNS: usize = 4096; /// Maximum number of rows allowed in a single result message. const MAX_ROWS: usize = 10_000_000; +/// Maximum number of bound parameters in a single QueryWithParams message. +const MAX_PARAMS: usize = 4096; + +/// A positional parameter value carried by [`Message::QueryWithParams`]. +/// +/// Wire encoding per param: a 1-byte tag followed by the body — +/// `0` null (no body), `1` int (8B LE i64), `2` float (8B LE f64), +/// `3` bool (1B), `4` str (length-prefixed UTF-8). +#[derive(Debug, Clone, PartialEq)] +pub enum WireParam { + Null, + Int(i64), + Float(f64), + Bool(bool), + Str(String), +} + #[derive(Debug, Clone)] pub enum Message { Connect { @@ -46,6 +67,11 @@ pub enum Message { Query { query: String, }, + /// A query string with positional `$N` parameters bound at the server. + QueryWithParams { + query: String, + params: Vec, + }, ResultRows { columns: Vec, rows: Vec>, @@ -93,6 +119,32 @@ impl Message { } Message::ConnectOk { version } => (MSG_CONNECT_OK, encode_string(version)), Message::Query { query } => (MSG_QUERY, encode_string(query)), + Message::QueryWithParams { query, params } => { + let mut buf = encode_string(query); + buf.extend_from_slice(&(params.len() as u16).to_le_bytes()); + for p in params { + match p { + WireParam::Null => buf.push(0), + WireParam::Int(v) => { + buf.push(1); + buf.extend_from_slice(&v.to_le_bytes()); + } + WireParam::Float(v) => { + buf.push(2); + buf.extend_from_slice(&v.to_le_bytes()); + } + WireParam::Bool(v) => { + buf.push(3); + buf.push(if *v { 1 } else { 0 }); + } + WireParam::Str(s) => { + buf.push(4); + buf.extend_from_slice(&encode_string(s)); + } + } + } + (MSG_QUERY_PARAMS, buf) + } Message::ResultRows { columns, rows } => { let mut buf = Vec::new(); buf.extend_from_slice(&(columns.len() as u16).to_le_bytes()); @@ -184,6 +236,64 @@ impl Message { let query = decode_string(payload, &mut 0)?; Ok(Message::Query { query }) } + MSG_QUERY_PARAMS => { + let mut pos = 0; + let query = decode_string(payload, &mut pos)?; + if pos + 2 > payload.len() { + return Err("truncated param count".into()); + } + let count_bytes: [u8; 2] = payload[pos..pos + 2] + .try_into() + .map_err(|_| "invalid param count bytes".to_string())?; + let count = u16::from_le_bytes(count_bytes) as usize; + pos += 2; + if count > MAX_PARAMS { + return Err("too many parameters".into()); + } + let mut params = Vec::with_capacity(count); + for _ in 0..count { + if pos >= payload.len() { + return Err("truncated param tag".into()); + } + let tag = payload[pos]; + pos += 1; + let p = match tag { + 0 => WireParam::Null, + 1 => { + if pos + 8 > payload.len() { + return Err("truncated int param".into()); + } + let b: [u8; 8] = payload[pos..pos + 8] + .try_into() + .map_err(|_| "invalid int param bytes".to_string())?; + pos += 8; + WireParam::Int(i64::from_le_bytes(b)) + } + 2 => { + if pos + 8 > payload.len() { + return Err("truncated float param".into()); + } + let b: [u8; 8] = payload[pos..pos + 8] + .try_into() + .map_err(|_| "invalid float param bytes".to_string())?; + pos += 8; + WireParam::Float(f64::from_le_bytes(b)) + } + 3 => { + if pos + 1 > payload.len() { + return Err("truncated bool param".into()); + } + let v = payload[pos] != 0; + pos += 1; + WireParam::Bool(v) + } + 4 => WireParam::Str(decode_string(payload, &mut pos)?), + other => return Err(format!("unknown param tag: {other}")), + }; + params.push(p); + } + Ok(Message::QueryWithParams { query, params }) + } MSG_RESULT_ROWS => { let mut pos = 0; if pos + 2 > payload.len() { @@ -488,6 +598,47 @@ mod tests { } } + #[test] + fn test_encode_decode_query_with_params() { + let msg = Message::QueryWithParams { + query: "insert User { name := $1, age := $2, ok := $3, note := $4 }".into(), + params: vec![ + WireParam::Str(r#"a"b\c; drop User"#.into()), + WireParam::Int(-7), + WireParam::Bool(true), + WireParam::Null, + ], + }; + let bytes = msg.encode(); + // The new frame must use the dedicated 0x04 tag. + assert_eq!(bytes[0], 0x04); + match Message::decode(&bytes).unwrap() { + Message::QueryWithParams { query, params } => { + assert!(query.contains("$1")); + assert_eq!(params.len(), 4); + assert!(matches!(¶ms[0], WireParam::Str(s) if s == r#"a"b\c; drop User"#)); + assert!(matches!(¶ms[1], WireParam::Int(-7))); + assert!(matches!(¶ms[2], WireParam::Bool(true))); + assert!(matches!(¶ms[3], WireParam::Null)); + } + other => panic!("expected QueryWithParams, got {other:?}"), + } + } + + #[test] + fn test_query_with_params_float_round_trip() { + let msg = Message::QueryWithParams { + query: "T filter .f = $1".into(), + params: vec![WireParam::Float(2.5)], + }; + match Message::decode(&msg.encode()).unwrap() { + Message::QueryWithParams { params, .. } => { + assert!(matches!(¶ms[0], WireParam::Float(f) if (*f - 2.5).abs() < 1e-12)); + } + other => panic!("expected QueryWithParams, got {other:?}"), + } + } + #[test] fn test_decode_garbage_never_panics() { // Feed a wide range of malformed/truncated byte sequences to the @@ -506,6 +657,37 @@ mod tests { vec![0x07, 0x00, 0x02, 0x00, 0x00, 0x00, 0xFF, 0xFF], // RESULT_OK with a truncated 8-byte affected field. vec![0x09, 0x00, 0x03, 0x00, 0x00, 0x00, 0x01, 0x02, 0x03], + // QUERY_PARAMS (0x04) claiming a query string len with no data. + vec![0x04, 0x00, 0x04, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0xFF, 0xFF], + // QUERY_PARAMS: empty query string, claims 1 param, no param bytes. + vec![ + 0x04, 0x00, 0x06, 0x00, 0x00, 0x00, // header, payload_len=6 + 0x00, 0x00, 0x00, 0x00, // query string len = 0 + 0x01, 0x00, // param count = 1, then nothing + ], + // QUERY_PARAMS: 1 int param with a truncated i64 body. + vec![ + 0x04, 0x00, 0x0B, 0x00, 0x00, 0x00, // header, payload_len=11 + 0x00, 0x00, 0x00, 0x00, // query len = 0 + 0x01, 0x00, // param count = 1 + 0x01, // tag = int, then only 3 of 8 bytes + 0x01, 0x02, 0x03, + ], + // QUERY_PARAMS: 1 str param with a truncated string body. + vec![ + 0x04, 0x00, 0x0F, 0x00, 0x00, 0x00, // header, payload_len=15 + 0x00, 0x00, 0x00, 0x00, // query len = 0 + 0x01, 0x00, // param count = 1 + 0x04, // tag = str + 0xFF, 0xFF, 0xFF, 0xFF, // str len huge, no data + ], + // QUERY_PARAMS: unknown param tag byte. + vec![ + 0x04, 0x00, 0x0B, 0x00, 0x00, 0x00, // header, payload_len=11 + 0x00, 0x00, 0x00, 0x00, // query len = 0 + 0x01, 0x00, // param count = 1 + 0x63, // bogus tag + ], ]; for bytes in cases { let result = Message::decode(&bytes); diff --git a/crates/storage/src/btree.rs b/crates/storage/src/btree.rs index 8446b57..8d5300a 100644 --- a/crates/storage/src/btree.rs +++ b/crates/storage/src/btree.rs @@ -926,6 +926,32 @@ impl BTree { self.lookup_prefix(&Value::Int(col_val)) } + /// Range scan over a NON-unique index: return RowIds for all entries + /// whose column value lies in [start, end] (inclusive; pass None for + /// an unbounded side). Composite-key bounds reuse the prefix encoding: + /// (start, RowId::MIN) .. (end, RowId::MAX). The caller is expected to + /// recheck exclusive bounds against the decoded row. + pub fn range_rids(&self, start: Option<&Value>, end: Option<&Value>) -> Vec { + let collect = |pairs: Vec<(Value, RowId)>| { + pairs + .into_iter() + .filter_map(|(k, _)| Self::rid_from_composite(&k)) + .collect() + }; + match (start, end) { + (Some(s), Some(e)) => { + let lo = Self::make_prefix_start(s); + let hi = Self::make_prefix_end(e); + self.range(&lo, &hi) + .filter_map(|(k, _)| Self::rid_from_composite(&k)) + .collect() + } + (Some(s), None) => collect(self.range_from(&Self::make_prefix_start(s))), + (None, Some(e)) => collect(self.range_to(&Self::make_prefix_end(e))), + (None, None) => collect(self.range_from(&Self::make_prefix_start(&Value::Empty))), + } + } + /// Delete a specific (col_val, rid) entry from a non-unique /// secondary index. pub fn delete_non_unique(&mut self, col_val: &Value, rid: RowId) -> bool { @@ -1996,6 +2022,39 @@ mod tests { assert!(hits.is_empty()); } + #[test] + fn test_non_unique_range_rids() { + let mut bt = temp_btree("nonunique_range"); + let rids: Vec = (0..6u32) + .map(|i| RowId { + page_id: i, + slot_index: 0, + }) + .collect(); + for (i, rid) in rids.iter().enumerate() { + bt.insert_non_unique_int((i as i64) * 10, *rid); // 0,10,20,30,40,50 + } + // 10 <= v <= 30 → rids[1..=3] + let hits = bt.range_rids(Some(&Value::Int(10)), Some(&Value::Int(30))); + assert_eq!(hits, vec![rids[1], rids[2], rids[3]]); + // unbounded below + let hits = bt.range_rids(None, Some(&Value::Int(10))); + assert_eq!(hits, vec![rids[0], rids[1]]); + // unbounded above + let hits = bt.range_rids(Some(&Value::Int(40)), None); + assert_eq!(hits, vec![rids[4], rids[5]]); + // duplicates within the range all come back + bt.insert_non_unique_int( + 20, + RowId { + page_id: 99, + slot_index: 7, + }, + ); + let hits = bt.range_rids(Some(&Value::Int(20)), Some(&Value::Int(20))); + assert_eq!(hits.len(), 2); + } + #[test] fn test_non_unique_delete() { let mut bt = temp_btree("nonunique_delete"); diff --git a/crates/storage/src/catalog.rs b/crates/storage/src/catalog.rs index 16de6ba..75f5411 100644 --- a/crates/storage/src/catalog.rs +++ b/crates/storage/src/catalog.rs @@ -1185,6 +1185,20 @@ impl Catalog { self.persist() } + /// Whether `table.column` has a UNIQUE index. Returns `Some(true)` for + /// a unique index, `Some(false)` for a non-unique index, and `None` + /// when the column is not indexed or the table is unknown. + pub fn is_index_unique(&self, table: &str, column: &str) -> Option { + self.get_table(table)?.is_index_unique(column) + } + + /// Whether `table.column` has any index (unique or non-unique). + pub fn has_index(&self, table: &str, column: &str) -> bool { + self.get_table(table) + .map(|t| t.has_index(column)) + .unwrap_or(false) + } + pub fn index_lookup(&self, table: &str, column: &str, key: &Value) -> io::Result> { Ok(self .by_name(table)? diff --git a/crates/storage/src/table.rs b/crates/storage/src/table.rs index 4b7134b..ff2dcfa 100644 --- a/crates/storage/src/table.rs +++ b/crates/storage/src/table.rs @@ -372,6 +372,27 @@ impl Table { /// straight through `BTree::insert_int` to skip the generic /// `Value::Ord` dispatch on every binary-search comparison. pub fn insert(&mut self, values: &Row) -> io::Result { + // Unique constraint pre-check: reject the insert BEFORE touching + // the heap if any unique column already holds the incoming value. + // Every write path that can change an indexed column (plain, + // prepared, upsert) funnels through here or `update_hinted`; the + // byte-patch fast paths are guarded to never touch indexed columns. + for entry in &self.indexed_cols { + if !entry.unique { + continue; + } + let val = &values[entry.col_idx]; + if !val.is_empty() && entry.btree.lookup(val).is_some() { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + format!( + "unique constraint violation on {}.{}", + self.schema.table_name, entry.col_name + ), + )); + } + } + encode_row_into_with_layout( &self.schema, &self.row_layout, @@ -829,6 +850,38 @@ impl Table { let old_row = if touches_index { self.get(rid) } else { None }; + // Unique constraint pre-check (before any heap mutation): a unique + // column's new value must not already exist on a DIFFERENT row. + // Updating a row to its own current value is always legal. + if touches_index { + for entry in &self.indexed_cols { + if !entry.unique { + continue; + } + let new_val = &values[entry.col_idx]; + if new_val.is_empty() { + continue; + } + // No change to this column's value → cannot create a dup. + if let Some(old) = old_row.as_ref() { + if &old[entry.col_idx] == new_val { + continue; + } + } + if let Some(existing_rid) = entry.btree.lookup(new_val) { + if existing_rid != rid { + return Err(io::Error::new( + io::ErrorKind::InvalidInput, + format!( + "unique constraint violation on {}.{}", + self.schema.table_name, entry.col_name + ), + )); + } + } + } + } + encode_row_into_with_layout( &self.schema, &self.row_layout, diff --git a/docs/POWQL.md b/docs/POWQL.md index 79377d4..cab627f 100644 --- a/docs/POWQL.md +++ b/docs/POWQL.md @@ -72,14 +72,16 @@ User group .status having count(.name) > 5 { .status, n: count(.name) } ## Schema Definition -Tables are defined using the `type` keyword. Each field has a name and a type, optionally prefixed with `required` to enforce non-null values. +Tables are defined using the `type` keyword. Each field has a name and a type, optionally prefixed with the modifiers `required` (enforce non-null values) and/or `unique` (enforce that no two non-null rows share a value). The modifiers may appear in either order. + +Declaring a field `unique` automatically creates a unique B+tree index on that column; duplicate inserts/updates/upserts are then rejected with a `unique constraint violation` error. ### Syntax ``` type { - [required] : , - [required] : , + [required] [unique] : , + [required] [unique] : , ... } ``` @@ -87,10 +89,10 @@ type { ### Examples ``` --- A simple user table +-- A simple user table; email must be unique across all rows type User { required name: str, - required email: str, + required unique email: str, age: int } @@ -106,7 +108,7 @@ type Record { } ``` -Fields without `required` are nullable -- they can hold empty/null values. +Fields without `required` are nullable -- they can hold empty/null values. Null values are exempt from the `unique` constraint (multiple rows may be null). ### Supported Types @@ -251,15 +253,26 @@ o.total ### Parameters -Query parameters are prefixed with `$` and bound at execution time: +Positional placeholders `$1`, `$2`, … bind untrusted values without string +interpolation. They are 1-based (`?` is not a placeholder — `??` is the +COALESCE operator): ``` -User filter .age > $min_age -User filter .name = $target -insert User { name := $name, email := $email, age := $age } +User filter .name = $1 +User filter .age > $1 and .age <= $2 +insert User { name := $1, email := $2, age := $3 } ``` -Parameters enable safe, reusable queries without string interpolation. See [Prepared Queries](#prepared-queries) for the execution API. +Binding happens at the **token level**: each `$N` is replaced with the +literal token for the supplied value *before* parsing, so an +injection-shaped string is inert data and can never change the query's +shape. A `null` parameter binds PowQL `null`. A placeholder with no +matching argument (or a `$0`) is a clean parse error. + +Over the wire this is the `client.query(powql, params)` form (see +[AGENTS.md](../AGENTS.md) for the client API and the `QueryWithParams` +message). For the in-process Rust execution API, see +[Prepared Queries](#prepared-queries). ### Comparison Operators @@ -1041,6 +1054,16 @@ alter User add index .age Indexes are persistent (BIDX format in the data directory) and survive restart. Re-running `add index` on an existing index is a no-op. +#### Add Unique + +Create a unique B+tree index on a column, enforcing that no two non-null rows share a value: + +``` +alter User add unique .email +``` + +The command first scans the existing data — if any duplicate (non-null) value is already present, it fails and the index is not created. It also fails if the column already has a (non-unique) index, since there is no in-place index upgrade; drop and recreate the table to change an existing index's uniqueness. Once created, the constraint is enforced on every subsequent insert/update/upsert and survives restart. + ### DROP TABLE Remove a table entirely: @@ -1149,6 +1172,8 @@ upsert
on . { } [on conflict { **Breaking change (since 0.4.7):** the `on` column must be **unique** — declare it with the `unique` modifier (`unique email: str`) or `alter
add unique .`. Upserting on a non-unique column is rejected with an error. This closes a prior bug where `upsert` on a non-unique column could silently create duplicate-key rows. + ### Examples Basic upsert (insert or replace all fields on conflict): @@ -1307,6 +1332,8 @@ PowQL has seven data types plus a null representation. | **Alter add** | `alter User add column status: str` | `ALTER TABLE User ADD COLUMN status TEXT` | | **Alter drop** | `alter User drop column status` | `ALTER TABLE User DROP COLUMN status` | | **Create index** | `alter User add index .email` | `CREATE INDEX ON User (email)` | +| **Unique column** | `type User { unique email: str }` | `CREATE TABLE User (email TEXT UNIQUE)` | +| **Add unique** | `alter User add unique .email` | `CREATE UNIQUE INDEX ON User (email)` | | **Create view** | `materialize V as User filter .active = true` | `CREATE MATERIALIZED VIEW V AS SELECT * FROM User WHERE active` | | **Refresh view** | `refresh V` | `REFRESH MATERIALIZED VIEW V` | | **Drop view** | `drop view V` | `DROP MATERIALIZED VIEW V` | @@ -1325,6 +1352,7 @@ PowQL has seven data types plus a null representation. | Assignment | `:=` | `=` or `SET col = val` | | Table definition | `type Name { ... }` | `CREATE TABLE Name (...)` | | Required/NOT NULL | `required field: type` | `field TYPE NOT NULL` | +| Unique constraint | `unique field: type` | `field TYPE UNIQUE` | | String literals | `"double quotes"` | `'single quotes'` | | Query shape | Pipeline: `Table verb verb { proj }` | Clausal: `SELECT proj FROM Table WHERE ... ORDER BY ...` | | Aggregates | Wrapping: `count(Table filter ...)` | Inline: `SELECT COUNT(*) FROM Table WHERE ...` | diff --git a/scripts/README.md b/scripts/README.md index ad1dac6..e522c98 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -53,3 +53,27 @@ in the cycle fails the CI job (`set -euo pipefail` inside the script). Resets the criterion benchmark baselines after intentional perf changes. Documented inside the script itself. + +## `agent-eval/` — agent-DX falsification harness + +A model-agnostic, **offline** harness that scores how well an LLM writes +correct PowQL given only `AGENTS.md` and a schema — and lets you compare that +hit rate against the same model writing SQL for SQLite over identical data. + +```bash +bash scripts/agent-eval/setup.sh # build CLI + seed .golden-data/ +python3 scripts/agent-eval/run.py \ + scripts/agent-eval/examples/golden-candidates.jsonl # smoke: 6/7 (one intentional fail) +``` + +- `setup.sh` builds `powdb-cli` and seeds a pristine `.golden-data/` dir + (gitignored) from `schema.powql` + `seed.powql` — 10 related tables. +- `tasks.json` holds 26 natural-language tasks, each with a deterministic + `check` (`scalar` / `rowcount` / `rows` / `error` / `ok`), covering the + AGENTS.md footgun list. +- `run.py` (Python 3 stdlib only) copies the golden dir per candidate, runs + each candidate statement through `powdb-cli --exec`, scores the output, and + prints a per-category pass rate. Always exits 0 — it's a measurement tool. +- No model calls anywhere, and **not wired into CI**. See + `scripts/agent-eval/README.md` for the full contract and the SQLite + baseline procedure. diff --git a/scripts/agent-eval/README.md b/scripts/agent-eval/README.md new file mode 100644 index 0000000..52e9a37 --- /dev/null +++ b/scripts/agent-eval/README.md @@ -0,0 +1,119 @@ +# agent-eval — PowDB agent-DX falsification harness + +A model-agnostic, **offline** harness that measures how well an LLM can write +correct PowQL given only PowDB's own docs. It exists to falsify the claim +"an agent can pick up PowQL from `AGENTS.md` and get common queries right on +the first try" — and to compare that hit rate against the same model writing +SQL for SQLite over identical data. + +Nothing here calls a model. Nothing here runs in CI. It is scaffolding: you +supply the model's answers as a JSONL file; `run.py` scores them against a +freshly seeded database, entirely locally. + +## What's in here + +| File | Purpose | +|---|---| +| `schema.powql` | 10 related tables (`type` DDL, one per line) | +| `seed.powql` | deterministic seed rows (one `insert` per line) | +| `sqlite-baseline/schema.sql` + `seed.sql` | the same data in SQLite, for the baseline pass | +| `tasks.json` | 26 natural-language tasks, each with a deterministic `check` | +| `setup.sh` | builds `powdb-cli`, seeds the pristine golden data dir | +| `run.py` | offline scorer (Python 3 stdlib only) | +| `examples/golden-candidates.jsonl` | hand-written known-good answers, for a self-smoke | +| `.golden-data/` | the seeded source-of-truth DB (gitignored; recreated by `setup.sh`) | + +## The harness contract + +The unit of evaluation is one task → one PowQL statement. + +1. Give the model **only**: `AGENTS.md` (the 5-minute PowQL guide), + `schema.powql` (the table definitions it may reference), and **one** task + `prompt` from `tasks.json`. Do **not** give it the seed data, the + expected answer, or other tasks. +2. The model returns **exactly one** PowQL statement. +3. Append a line to your candidates file: + ```json + {"task_id": "agg-02", "statement": "sum(orders filter .status = \"paid\" { .total })"} + ``` +4. Score the whole file offline: + ```bash + bash scripts/agent-eval/setup.sh # once: build CLI + seed golden data + python3 scripts/agent-eval/run.py candidates.jsonl + ``` + +`run.py` copies the golden data dir for **each** candidate (so a mutating +statement can never pollute the next one), runs the statement through +`powdb-cli --exec`, and scores stdout/exit-code against the task's `check`. +It prints a per-category pass rate and writes `results.json`. It always +exits 0 — it is a measurement tool, not a gate. + +## Check types + +Each task in `tasks.json` carries one `check`: + +| `type` | passes when | +|---|---| +| `scalar` | the single output value equals `expected` (exact string compare; numbers compared as printed, e.g. `"4.25"`, `"3036"`) | +| `rowcount` | the result has exactly `expected` data rows | +| `rows` | the result rows, sorted, equal `expected` (sorted); use only for small results | +| `error` | the statement is **rejected** (non-zero exit) — used for the gotcha tasks (e.g. `create table`, `count:`-as-alias) | +| `ok` | the statement runs successfully (DDL / upsert that has no row output to assert) | + +The output extractor in `run.py` mirrors the CLI's print format +(`crates/cli/src/main.rs` → `print_local_result` / `print_table`): a scalar +is a lone line; a table is `header` + `---+---` + data rows + `(N rows)`; +empty results print `(empty set)`; mutations print `N row(s) affected`. + +## Tasks cover the AGENTS.md footgun list + +The 26 tasks deliberately probe the documented gotchas: `:=` vs `=`, +`==` vs `=`, `type` (not `create table`), leading-dot field refs, +trailing-brace projection, `n:`-style aliases (plus an `error` task that +asserts `count:` as an alias is rejected), `group`/`having`, inner/left +join (with the "smaller table on the right" note), IN-subquery, null checks +(`= null`), `between`, `distinct` + `count(distinct …)`, `case`, +`order`/`limit`/`offset`, transactions (`begin`/`insert`/`rollback`/count), +`alter add column` / `add index`, upsert, and count-all (`count(Table)`, +since there is no bare `count(*)`). + +## SQLite baseline (side-by-side number) + +To get the comparison figure, run the **same prompts** with the **same +model** against the SQLite mirror, and score with the same check semantics: + +1. Build the baseline DB: + ```bash + sqlite3 /tmp/agent-eval-baseline.db < scripts/agent-eval/sqlite-baseline/schema.sql + sqlite3 /tmp/agent-eval-baseline.db < scripts/agent-eval/sqlite-baseline/seed.sql + ``` +2. For each task, give the model SQLite's docs + the same `tables_hint` + schema + the task `prompt`; collect one SQL statement per task. +3. Score each with `sqlite3 /tmp/agent-eval-baseline.db ""` and the same + `check` (scalar = the single cell; rowcount = number of result rows; + `error` = non-zero `sqlite3` exit). A tiny SQL-side scorer is left as an + exercise — the check semantics are identical; only the runner changes. +4. Report the two pass rates side by side, e.g. + `PowQL 24/26 (92%) vs SQLite 25/26 (96%)`. + +The interesting outcome is not "PowDB wins"; it's whether a model that has +never seen PowQL lands within a few points of its SQL baseline given only +`AGENTS.md`. A large gap is a docs bug, not a model bug — fix `AGENTS.md`. + +## Follow-ups + +- **`unique` constraints.** `schema.powql` does **not** use the `unique` + field modifier because it is not merged on this branch. Several columns + are naturally unique (`users.email`, `products.sku`, `orders.id`, …); once + the UNIQUE-constraints work lands, declare them `unique`, add a + `unique`-violation `error` task, and change `upsert-01` to key on a + genuinely-unique column. Until then `upsert` keys on a plain column + (which currently works on any column on this branch). +- **Batch seeding.** `setup.sh` runs one CLI process per statement (~60 + total). Fine at this scale; if it ever bites, feed all statements through + one REPL stdin once multi-line REPL input lands. + +## Not wired into CI + +By design. No model calls happen anywhere in CI. This harness is run on +demand by a human (or an agent) to measure docs/DX quality. diff --git a/scripts/agent-eval/examples/golden-candidates.jsonl b/scripts/agent-eval/examples/golden-candidates.jsonl new file mode 100644 index 0000000..cb43b70 --- /dev/null +++ b/scripts/agent-eval/examples/golden-candidates.jsonl @@ -0,0 +1,7 @@ +{"task_id": "count-01", "statement": "count(users)"} +{"task_id": "group-01", "statement": "orders group .city { .city, n: count(.id) } having n >= 2"} +{"task_id": "ddl-01", "statement": "create table foo { id: int }"} +{"task_id": "subq-01", "statement": "users filter .id in (orders filter .total > 500 { .user_id }) { .name }"} +{"task_id": "count-02", "statement": "count(orders filter .status = \"paid\")"} +{"task_id": "alias-02", "statement": "orders group .city { .city, count: count(.id) }"} +{"task_id": "count-04", "statement": "count(users filter .age > 100)"} diff --git a/scripts/agent-eval/run.py b/scripts/agent-eval/run.py new file mode 100644 index 0000000..00d38ec --- /dev/null +++ b/scripts/agent-eval/run.py @@ -0,0 +1,311 @@ +#!/usr/bin/env python3 +"""run.py — offline, model-agnostic scorer for the PowDB agent-eval harness. + +Reads a candidates JSONL file (one {"task_id": ..., "statement": ...} per +line), runs each statement against a fresh per-candidate copy of the golden +data dir, and scores the CLI output against the matching task's `check` in +tasks.json. + +Stdlib only. No model calls, no network. Always exits 0 (this is a scoring +tool, not a CI gate). Writes results.json next to the candidates file and +prints a per-category pass-rate summary. + +Usage: + python3 scripts/agent-eval/run.py + python3 scripts/agent-eval/run.py --tasks + +Setup (once): scripts/agent-eval/setup.sh (builds the CLI, seeds .golden-data/) + +CLI output formats this scorer understands (see crates/cli/src/main.rs +print_local_result / print_table): + - Scalar : a single line holding the value, e.g. "5" or "4.25". + - Rows : a header line, a "---+---" separator, N data lines, + then a "(N rows)" / "(1 row)" trailer. Empty results + print "(empty set)". + - Modified : "N row(s) affected". + - Created : "type NAME created". + - Executed : a free-form message (e.g. "index on '...' created", + "transaction rolled back"). + - Error : exit code 1, message on stderr ("Error: ..."). +""" + +import json +import os +import re +import shutil +import subprocess +import sys +import tempfile + +HERE = os.path.dirname(os.path.abspath(__file__)) +REPO_ROOT = os.path.abspath(os.path.join(HERE, "..", "..")) +CLI = os.path.join(REPO_ROOT, "target", "release", "powdb-cli") +GOLDEN = os.path.join(HERE, ".golden-data") +DEFAULT_TASKS = os.path.join(HERE, "tasks.json") + +TIMEOUT_SECS = 30 + + +# ── CLI invocation ─────────────────────────────────────────────────────────── + + +class RunResult: + def __init__(self, ok, stdout, stderr): + self.ok = ok # True when exit code == 0 + self.stdout = stdout + self.stderr = stderr + + +def run_statement(statement): + """Run one statement against a private copy of the golden data dir.""" + tmp = tempfile.mkdtemp(prefix="powdb_eval_") + data_dir = os.path.join(tmp, "data") + try: + shutil.copytree(GOLDEN, data_dir) + proc = subprocess.run( + [CLI, "--data-dir", data_dir, "--exec", statement], + capture_output=True, + text=True, + timeout=TIMEOUT_SECS, + ) + return RunResult(proc.returncode == 0, proc.stdout, proc.stderr) + except subprocess.TimeoutExpired: + return RunResult(False, "", "timeout") + finally: + shutil.rmtree(tmp, ignore_errors=True) + + +# ── Output extraction ──────────────────────────────────────────────────────── + + +def _content_lines(stdout): + """Significant output lines: drop blanks and the table chrome. + + Drops the separator line ("---+---") and the "(N rows)"/"(empty set)" + trailer so what remains is header + data (for tables) or the lone value + (for scalars). + """ + out = [] + for raw in stdout.splitlines(): + line = raw.rstrip() + if line.strip() == "": + continue + if re.fullmatch(r"-[-+]*-?", line.strip()): + continue + if re.fullmatch(r"\(\d+ rows?\)", line.strip()): + continue + if line.strip() == "(empty set)": + continue + out.append(line) + return out + + +def extract_scalar(stdout): + """Last numeric token of the last significant line, as a string. + + Handles the transaction batch case where the scalar count is the last + of several output blocks. + """ + lines = _content_lines(stdout) + if not lines: + return None + nums = re.findall(r"-?\d+(?:\.\d+)?", lines[-1]) + if not nums: + # fall back to the whole last line, trimmed + return lines[-1].strip() + return nums[-1] + + +def is_empty_set(stdout): + return any(l.strip() == "(empty set)" for l in stdout.splitlines()) + + +def extract_rows(stdout): + """Parse table data rows into a sorted list of cell lists. + + A table has a header line then data lines, all pipe-delimited. The + header is the first content line; the rest are data. Returns [] for an + empty set. Cells are stripped. + """ + if is_empty_set(stdout): + return [] + lines = _content_lines(stdout) + if len(lines) <= 1: + # no data rows (only a header, or nothing) + return [] + data = lines[1:] # drop header + rows = [] + for line in data: + cells = [c.strip() for c in line.split("|")] + rows.append(cells) + rows.sort() + return rows + + +def count_data_rows(stdout): + """Number of data rows. Prefer the explicit "(N rows)" trailer.""" + m = re.search(r"\((\d+) rows?\)", stdout) + if m: + return int(m.group(1)) + if is_empty_set(stdout): + return 0 + # scalar / modified / created outputs are not row sets + return 0 + + +# ── Scoring ────────────────────────────────────────────────────────────────── + + +def score(check, res): + """Return (passed: bool, detail: str) for one candidate run.""" + ctype = check.get("type") + + if ctype == "error": + if res.ok: + return False, "expected the statement to be rejected, but it succeeded" + return True, "rejected as expected" + + # All non-error checks require a successful run first. + if not res.ok: + return False, "statement failed: " + (res.stderr.strip() or "non-zero exit") + + if ctype == "ok": + return True, "executed" + + if ctype == "scalar": + got = extract_scalar(res.stdout) + want = str(check["expected"]) + if got == want: + return True, "scalar={}".format(got) + return False, "scalar got={!r} want={!r}".format(got, want) + + if ctype == "rowcount": + got = count_data_rows(res.stdout) + want = int(check["expected"]) + if got == want: + return True, "rowcount={}".format(got) + return False, "rowcount got={} want={}".format(got, want) + + if ctype == "rows": + got = extract_rows(res.stdout) + want = sorted([[str(c) for c in row] for row in check["expected"]]) + if got == want: + return True, "rows matched ({} rows)".format(len(got)) + return False, "rows got={} want={}".format(got, want) + + return False, "unknown check type: {!r}".format(ctype) + + +# ── Driver ─────────────────────────────────────────────────────────────────── + + +def load_tasks(path): + with open(path) as f: + tasks = json.load(f) + return {t["id"]: t for t in tasks} + + +def category_of(task_id): + return task_id.split("-")[0] + + +def main(argv): + if len(argv) < 2: + print("usage: run.py [--tasks tasks.json]", file=sys.stderr) + return 0 # scoring tool: never a hard failure + + candidates_path = argv[1] + tasks_path = DEFAULT_TASKS + if "--tasks" in argv: + tasks_path = argv[argv.index("--tasks") + 1] + + if not os.path.exists(CLI): + print( + "error: powdb-cli not found at {}\n run: bash {}/setup.sh".format( + CLI, HERE + ), + file=sys.stderr, + ) + return 0 + if not os.path.isdir(GOLDEN): + print( + "error: golden data dir not found at {}\n run: bash {}/setup.sh".format( + GOLDEN, HERE + ), + file=sys.stderr, + ) + return 0 + + tasks = load_tasks(tasks_path) + + results = [] + with open(candidates_path) as f: + for lineno, raw in enumerate(f, 1): + raw = raw.strip() + if not raw: + continue + try: + cand = json.loads(raw) + except json.JSONDecodeError as e: + print("line {}: bad JSON: {}".format(lineno, e), file=sys.stderr) + continue + task_id = cand.get("task_id") + statement = cand.get("statement", "") + task = tasks.get(task_id) + if task is None: + results.append( + { + "task_id": task_id, + "passed": False, + "detail": "no such task_id in tasks.json", + } + ) + continue + res = run_statement(statement) + passed, detail = score(task["check"], res) + results.append( + { + "task_id": task_id, + "passed": passed, + "detail": detail, + "statement": statement, + } + ) + + # ── report ──────────────────────────────────────────────────────────── + total = len(results) + passed = sum(1 for r in results if r["passed"]) + + print() + for r in results: + mark = "PASS" if r["passed"] else "FAIL" + print(" [{}] {:<10} {}".format(mark, r["task_id"], r["detail"])) + + # per-category rollup + cats = {} + for r in results: + c = category_of(r["task_id"] or "") + cats.setdefault(c, [0, 0]) + cats[c][1] += 1 + if r["passed"]: + cats[c][0] += 1 + + print("\n by category:") + for c in sorted(cats): + p, n = cats[c] + print(" {:<10} {}/{}".format(c, p, n)) + + print("\n TOTAL: {}/{} passed".format(passed, total)) + + out_path = os.path.join(os.path.dirname(os.path.abspath(candidates_path)), "results.json") + with open(out_path, "w") as f: + json.dump( + {"total": total, "passed": passed, "results": results}, f, indent=2 + ) + print(" wrote {}".format(out_path)) + + return 0 # always succeed + + +if __name__ == "__main__": + sys.exit(main(sys.argv)) diff --git a/scripts/agent-eval/schema.powql b/scripts/agent-eval/schema.powql new file mode 100644 index 0000000..d5e4961 --- /dev/null +++ b/scripts/agent-eval/schema.powql @@ -0,0 +1,23 @@ +-- PowDB agent-eval schema: 10 related tables. +-- One `type` statement per line (setup.sh streams these line-by-line +-- through `powdb-cli --exec`). Blank lines and `--` comment lines are +-- skipped by setup.sh / run.py. +-- +-- NOTE (follow-up): `unique` is NOT used here because the `unique` field +-- modifier is not merged on this branch. Once Task 3 (UNIQUE constraints) +-- lands, the natural uniqueness constraints below should be declared: +-- users.email, users.id, products.sku, categories.id, orders.id, +-- payments.id, sessions.token, addresses.id, inventory.product_id. +-- Until then these are plain columns and uniqueness is only a property of +-- the seed data, not an enforced constraint. + +type categories { required id: int, required name: str, parent_id: int } +type users { required id: int, required name: str, required email: str, city: str, age: int, active: bool } +type addresses { required id: int, required user_id: int, city: str, country: str, zip: str } +type products { required id: int, required sku: str, required name: str, category_id: int, price: float, active: bool } +type inventory { required product_id: int, required quantity: int, warehouse: str } +type orders { required id: int, required user_id: int, total: float, status: str, city: str } +type order_items { required id: int, required order_id: int, required product_id: int, quantity: int, unit_price: float } +type payments { required id: int, required order_id: int, amount: float, method: str, status: str } +type reviews { required id: int, required product_id: int, required user_id: int, rating: int, body: str } +type sessions { required id: int, required user_id: int, token: str, active: bool } diff --git a/scripts/agent-eval/seed.powql b/scripts/agent-eval/seed.powql new file mode 100644 index 0000000..abb8643 --- /dev/null +++ b/scripts/agent-eval/seed.powql @@ -0,0 +1,76 @@ +-- PowDB agent-eval seed data. Deterministic; every expected answer in +-- tasks.json is hand-computed against exactly these rows. One `insert` +-- per line (setup.sh streams line-by-line). Do not reorder or edit +-- without updating tasks.json expected values. + +-- categories: 4 rows (Electronics root, with two children; Books root) +insert categories { id := 1, name := "Electronics", parent_id := null } +insert categories { id := 2, name := "Phones", parent_id := 1 } +insert categories { id := 3, name := "Laptops", parent_id := 1 } +insert categories { id := 4, name := "Books", parent_id := null } + +-- users: 5 rows. cities: NYC x2 (1,2), LA x2 (3,4), SF x1 (5). +-- ages: 30,22,45,28,17. active: t,f,t,t,f. user 5 is a minor. +insert users { id := 1, name := "Alice", email := "alice@ex.com", city := "NYC", age := 30, active := true } +insert users { id := 2, name := "Bob", email := "bob@ex.com", city := "NYC", age := 22, active := false } +insert users { id := 3, name := "Carol", email := "carol@ex.com", city := "LA", age := 45, active := true } +insert users { id := 4, name := "Dave", email := "dave@ex.com", city := "LA", age := 28, active := true } +insert users { id := 5, name := "Erin", email := "erin@ex.com", city := "SF", age := 17, active := false } + +-- addresses: one per user, plus a second for Alice. age column absent. +insert addresses { id := 1, user_id := 1, city := "NYC", country := "US", zip := "10001" } +insert addresses { id := 2, user_id := 2, city := "NYC", country := "US", zip := "10002" } +insert addresses { id := 3, user_id := 3, city := "LA", country := "US", zip := "90001" } +insert addresses { id := 4, user_id := 4, city := "LA", country := "US", zip := "90002" } +insert addresses { id := 5, user_id := 5, city := "SF", country := "US", zip := "94101" } +insert addresses { id := 6, user_id := 1, city := "Boston", country := "US", zip := "02108" } + +-- products: 5 rows across 3 leaf categories. prices vary; one inactive. +insert products { id := 1, sku := "P-PHONE-1", name := "Phone X", category_id := 2, price := 699.0, active := true } +insert products { id := 2, sku := "P-PHONE-2", name := "Phone Y", category_id := 2, price := 499.0, active := true } +insert products { id := 3, sku := "P-LAP-1", name := "Laptop Pro", category_id := 3, price := 1299.0, active := true } +insert products { id := 4, sku := "P-LAP-2", name := "Laptop Air", category_id := 3, price := 999.0, active := false } +insert products { id := 5, sku := "P-BOOK-1", name := "PowQL Book", category_id := 4, price := 39.0, active := true } + +-- inventory: one row per product; product 4 out of stock (qty 0). +insert inventory { product_id := 1, quantity := 50, warehouse := "W1" } +insert inventory { product_id := 2, quantity := 20, warehouse := "W1" } +insert inventory { product_id := 3, quantity := 5, warehouse := "W2" } +insert inventory { product_id := 4, quantity := 0, warehouse := "W2" } +insert inventory { product_id := 5, quantity := 200, warehouse := "W3" } + +-- orders: 6 rows. order.city denormalized from the placing user. +-- city counts: NYC x3 (o1,o2,o3), LA x2 (o4,o5), SF x1 (o6). +-- status: paid x4 (o1,o2,o4,o5), pending x1 (o3), cancelled x1 (o6). +insert orders { id := 1, user_id := 1, total := 699.0, status := "paid", city := "NYC" } +insert orders { id := 2, user_id := 1, total := 39.0, status := "paid", city := "NYC" } +insert orders { id := 3, user_id := 2, total := 499.0, status := "pending", city := "NYC" } +insert orders { id := 4, user_id := 3, total := 1299.0, status := "paid", city := "LA" } +insert orders { id := 5, user_id := 4, total := 999.0, status := "paid", city := "LA" } +insert orders { id := 6, user_id := 5, total := 39.0, status := "cancelled", city := "SF" } + +-- order_items: line items. ratings/body absent. +insert order_items { id := 1, order_id := 1, product_id := 1, quantity := 1, unit_price := 699.0 } +insert order_items { id := 2, order_id := 2, product_id := 5, quantity := 1, unit_price := 39.0 } +insert order_items { id := 3, order_id := 3, product_id := 2, quantity := 1, unit_price := 499.0 } +insert order_items { id := 4, order_id := 4, product_id := 3, quantity := 1, unit_price := 1299.0 } +insert order_items { id := 5, order_id := 5, product_id := 4, quantity := 1, unit_price := 999.0 } +insert order_items { id := 6, order_id := 6, product_id := 5, quantity := 1, unit_price := 39.0 } + +-- payments: one per non-cancelled order (5 rows; order 6 cancelled, no payment). +insert payments { id := 1, order_id := 1, amount := 699.0, method := "card", status := "settled" } +insert payments { id := 2, order_id := 2, amount := 39.0, method := "card", status := "settled" } +insert payments { id := 3, order_id := 3, amount := 499.0, method := "paypal", status := "pending" } +insert payments { id := 4, order_id := 4, amount := 1299.0, method := "card", status := "settled" } +insert payments { id := 5, order_id := 5, amount := 999.0, method := "card", status := "settled" } + +-- reviews: 4 rows. ratings 5,4,3,5. product 1 has two reviews. +insert reviews { id := 1, product_id := 1, user_id := 1, rating := 5, body := "great" } +insert reviews { id := 2, product_id := 1, user_id := 3, rating := 4, body := "good" } +insert reviews { id := 3, product_id := 3, user_id := 4, rating := 3, body := "ok" } +insert reviews { id := 4, product_id := 5, user_id := 1, rating := 5, body := "love it" } + +-- sessions: 3 rows. active: t,f,t. +insert sessions { id := 1, user_id := 1, token := "tok-aaa", active := true } +insert sessions { id := 2, user_id := 2, token := "tok-bbb", active := false } +insert sessions { id := 3, user_id := 3, token := "tok-ccc", active := true } diff --git a/scripts/agent-eval/setup.sh b/scripts/agent-eval/setup.sh new file mode 100755 index 0000000..865586e --- /dev/null +++ b/scripts/agent-eval/setup.sh @@ -0,0 +1,59 @@ +#!/usr/bin/env bash +# setup.sh — build powdb-cli and create a pristine, seeded golden data dir. +# +# The golden dir (.golden-data/) is the read-only source of truth: run.py +# copies it per candidate so every scored statement runs against identical +# state. Re-running this script rebuilds the golden dir from scratch. +# +# No model calls, no network. Pure local scaffolding. +set -euo pipefail + +# Resolve paths relative to this script so it works from any cwd. +HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$HERE/../.." && pwd)" +CLI="$REPO_ROOT/target/release/powdb-cli" +GOLDEN="$HERE/.golden-data" + +echo "==> building powdb-cli (release)" +( cd "$REPO_ROOT" && cargo build --release -p powdb-cli ) + +if [ ! -x "$CLI" ]; then + echo "error: expected CLI at $CLI after build" >&2 + exit 1 +fi + +echo "==> resetting golden data dir: $GOLDEN" +rm -rf "$GOLDEN" +mkdir -p "$GOLDEN" + +# Stream schema then seed, one statement per line, through --exec. +# Skip blank lines and `--` comments. One process per statement keeps the +# error surface obvious at seed scale (~60 statements). +seed_file() { + local file="$1" + local n=0 + while IFS= read -r line || [ -n "$line" ]; do + # strip leading/trailing whitespace + local trimmed="${line#"${line%%[![:space:]]*}"}" + trimmed="${trimmed%"${trimmed##*[![:space:]]}"}" + [ -z "$trimmed" ] && continue + case "$trimmed" in + --*) continue ;; + esac + if ! "$CLI" --data-dir "$GOLDEN" --exec "$trimmed" >/dev/null; then + echo "error: statement failed while seeding $file:" >&2 + echo " $trimmed" >&2 + exit 1 + fi + n=$((n + 1)) + done < "$file" + echo " applied $n statements from $(basename "$file")" +} + +echo "==> applying schema.powql" +seed_file "$HERE/schema.powql" +echo "==> applying seed.powql" +seed_file "$HERE/seed.powql" + +echo "==> golden data dir ready: $GOLDEN" +echo " next: python3 $HERE/run.py $HERE/examples/golden-candidates.jsonl" diff --git a/scripts/agent-eval/sqlite-baseline/schema.sql b/scripts/agent-eval/sqlite-baseline/schema.sql new file mode 100644 index 0000000..9735206 --- /dev/null +++ b/scripts/agent-eval/sqlite-baseline/schema.sql @@ -0,0 +1,14 @@ +-- SQLite mirror of schema.powql, used for the side-by-side baseline pass. +-- Same tables, same columns, same data (see seed.sql). Load with: +-- sqlite3 baseline.db < schema.sql && sqlite3 baseline.db < seed.sql + +CREATE TABLE categories (id INTEGER NOT NULL, name TEXT NOT NULL, parent_id INTEGER); +CREATE TABLE users (id INTEGER NOT NULL, name TEXT NOT NULL, email TEXT NOT NULL, city TEXT, age INTEGER, active INTEGER); +CREATE TABLE addresses (id INTEGER NOT NULL, user_id INTEGER NOT NULL, city TEXT, country TEXT, zip TEXT); +CREATE TABLE products (id INTEGER NOT NULL, sku TEXT NOT NULL, name TEXT NOT NULL, category_id INTEGER, price REAL, active INTEGER); +CREATE TABLE inventory (product_id INTEGER NOT NULL, quantity INTEGER NOT NULL, warehouse TEXT); +CREATE TABLE orders (id INTEGER NOT NULL, user_id INTEGER NOT NULL, total REAL, status TEXT, city TEXT); +CREATE TABLE order_items (id INTEGER NOT NULL, order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, quantity INTEGER, unit_price REAL); +CREATE TABLE payments (id INTEGER NOT NULL, order_id INTEGER NOT NULL, amount REAL, method TEXT, status TEXT); +CREATE TABLE reviews (id INTEGER NOT NULL, product_id INTEGER NOT NULL, user_id INTEGER NOT NULL, rating INTEGER, body TEXT); +CREATE TABLE sessions (id INTEGER NOT NULL, user_id INTEGER NOT NULL, token TEXT, active INTEGER); diff --git a/scripts/agent-eval/sqlite-baseline/seed.sql b/scripts/agent-eval/sqlite-baseline/seed.sql new file mode 100644 index 0000000..d04d5fb --- /dev/null +++ b/scripts/agent-eval/sqlite-baseline/seed.sql @@ -0,0 +1,61 @@ +-- SQLite mirror of seed.powql. Byte-for-byte the same data so the +-- PowDB and SQLite pass rates compare like-for-like. booleans are 0/1. + +INSERT INTO categories VALUES (1, 'Electronics', NULL); +INSERT INTO categories VALUES (2, 'Phones', 1); +INSERT INTO categories VALUES (3, 'Laptops', 1); +INSERT INTO categories VALUES (4, 'Books', NULL); + +INSERT INTO users VALUES (1, 'Alice', 'alice@ex.com', 'NYC', 30, 1); +INSERT INTO users VALUES (2, 'Bob', 'bob@ex.com', 'NYC', 22, 0); +INSERT INTO users VALUES (3, 'Carol', 'carol@ex.com', 'LA', 45, 1); +INSERT INTO users VALUES (4, 'Dave', 'dave@ex.com', 'LA', 28, 1); +INSERT INTO users VALUES (5, 'Erin', 'erin@ex.com', 'SF', 17, 0); + +INSERT INTO addresses VALUES (1, 1, 'NYC', 'US', '10001'); +INSERT INTO addresses VALUES (2, 2, 'NYC', 'US', '10002'); +INSERT INTO addresses VALUES (3, 3, 'LA', 'US', '90001'); +INSERT INTO addresses VALUES (4, 4, 'LA', 'US', '90002'); +INSERT INTO addresses VALUES (5, 5, 'SF', 'US', '94101'); +INSERT INTO addresses VALUES (6, 1, 'Boston', 'US', '02108'); + +INSERT INTO products VALUES (1, 'P-PHONE-1', 'Phone X', 2, 699.0, 1); +INSERT INTO products VALUES (2, 'P-PHONE-2', 'Phone Y', 2, 499.0, 1); +INSERT INTO products VALUES (3, 'P-LAP-1', 'Laptop Pro', 3, 1299.0, 1); +INSERT INTO products VALUES (4, 'P-LAP-2', 'Laptop Air', 3, 999.0, 0); +INSERT INTO products VALUES (5, 'P-BOOK-1', 'PowQL Book', 4, 39.0, 1); + +INSERT INTO inventory VALUES (1, 50, 'W1'); +INSERT INTO inventory VALUES (2, 20, 'W1'); +INSERT INTO inventory VALUES (3, 5, 'W2'); +INSERT INTO inventory VALUES (4, 0, 'W2'); +INSERT INTO inventory VALUES (5, 200, 'W3'); + +INSERT INTO orders VALUES (1, 1, 699.0, 'paid', 'NYC'); +INSERT INTO orders VALUES (2, 1, 39.0, 'paid', 'NYC'); +INSERT INTO orders VALUES (3, 2, 499.0, 'pending', 'NYC'); +INSERT INTO orders VALUES (4, 3, 1299.0, 'paid', 'LA'); +INSERT INTO orders VALUES (5, 4, 999.0, 'paid', 'LA'); +INSERT INTO orders VALUES (6, 5, 39.0, 'cancelled', 'SF'); + +INSERT INTO order_items VALUES (1, 1, 1, 1, 699.0); +INSERT INTO order_items VALUES (2, 2, 5, 1, 39.0); +INSERT INTO order_items VALUES (3, 3, 2, 1, 499.0); +INSERT INTO order_items VALUES (4, 4, 3, 1, 1299.0); +INSERT INTO order_items VALUES (5, 5, 4, 1, 999.0); +INSERT INTO order_items VALUES (6, 6, 5, 1, 39.0); + +INSERT INTO payments VALUES (1, 1, 699.0, 'card', 'settled'); +INSERT INTO payments VALUES (2, 2, 39.0, 'card', 'settled'); +INSERT INTO payments VALUES (3, 3, 499.0, 'paypal', 'pending'); +INSERT INTO payments VALUES (4, 4, 1299.0, 'card', 'settled'); +INSERT INTO payments VALUES (5, 5, 999.0, 'card', 'settled'); + +INSERT INTO reviews VALUES (1, 1, 1, 5, 'great'); +INSERT INTO reviews VALUES (2, 1, 3, 4, 'good'); +INSERT INTO reviews VALUES (3, 3, 4, 3, 'ok'); +INSERT INTO reviews VALUES (4, 5, 1, 5, 'love it'); + +INSERT INTO sessions VALUES (1, 1, 'tok-aaa', 1); +INSERT INTO sessions VALUES (2, 2, 'tok-bbb', 0); +INSERT INTO sessions VALUES (3, 3, 'tok-ccc', 1); diff --git a/scripts/agent-eval/tasks.json b/scripts/agent-eval/tasks.json new file mode 100644 index 0000000..1ac7016 --- /dev/null +++ b/scripts/agent-eval/tasks.json @@ -0,0 +1,183 @@ +[ + { + "id": "ddl-01", + "prompt": "Create a table named foo with a single integer column id. (Reminder: PowDB is not SQL.)", + "tables_hint": [], + "check": { "type": "error" }, + "note": "Gotcha: the CREATE TABLE keyword is `type`, not `create table`. A `create table ...` statement must be rejected." + }, + { + "id": "alias-01", + "prompt": "Count the orders in each city, returning the city and the count aliased as `n`.", + "tables_hint": ["orders"], + "check": { "type": "rowcount", "expected": 3 }, + "note": "Gotcha: alias syntax is `n: count(.id)` (alias before colon), not `count(.id) as n`." + }, + { + "id": "alias-02", + "prompt": "Count the orders in each city, returning the city and the count aliased with the name `count`.", + "tables_hint": ["orders"], + "check": { "type": "error" }, + "note": "Gotcha: an aggregate keyword like `count` cannot be used as an alias name; `count: count(.id)` fails with `expected alias name`." + }, + { + "id": "eq-01", + "prompt": "Return the names of users whose city is exactly NYC.", + "tables_hint": ["users"], + "check": { "type": "rows", "expected": [["Alice"], ["Bob"]] }, + "note": "Gotcha: equality is `=`, not `==`. Field refs need a leading dot: `.city`." + }, + { + "id": "count-01", + "prompt": "How many users are there in total?", + "tables_hint": ["users"], + "check": { "type": "scalar", "expected": "5" }, + "note": "count-all form is `count(users)` (there is no bare `count(*)`)." + }, + { + "id": "count-02", + "prompt": "How many orders have status 'paid'?", + "tables_hint": ["orders"], + "check": { "type": "scalar", "expected": "4" }, + "note": "Filtered aggregate: count(orders filter .status = \"paid\")." + }, + { + "id": "count-03", + "prompt": "How many order_items rows are there?", + "tables_hint": ["order_items"], + "check": { "type": "scalar", "expected": "6" } + }, + { + "id": "count-04", + "prompt": "How many users are active?", + "tables_hint": ["users"], + "check": { "type": "scalar", "expected": "3" }, + "note": "Boolean compare: .active = true." + }, + { + "id": "proj-01", + "prompt": "List the names of all active products.", + "tables_hint": ["products"], + "check": { "type": "rowcount", "expected": 4 }, + "note": "Projection is trailing braces `{ .name }`, not SELECT." + }, + { + "id": "group-01", + "prompt": "Count orders per city; return only cities with at least 2 orders, as city and count `n`.", + "tables_hint": ["orders"], + "check": { "type": "rowcount", "expected": 2 }, + "note": "group + having; having references the alias `n`." + }, + { + "id": "group-02", + "prompt": "For each product that has 2 or more reviews, return the product_id and review count `n`.", + "tables_hint": ["reviews"], + "check": { "type": "rows", "expected": [["1", "2"]] }, + "note": "group + having on reviews; only product 1 has 2 reviews." + }, + { + "id": "join-01", + "prompt": "Inner join users to their orders, returning user name and order total. (Put the smaller table on the right.)", + "tables_hint": ["users", "orders"], + "check": { "type": "rowcount", "expected": 6 }, + "note": "inner join with `as` aliases and `on u.id = o.user_id`." + }, + { + "id": "join-02", + "prompt": "Left join users to sessions, returning every user's name alongside their session token (null where there is no session).", + "tables_hint": ["users", "sessions"], + "check": { "type": "rowcount", "expected": 5 }, + "note": "left join keeps all 5 users; Dave and Erin have null tokens." + }, + { + "id": "subq-01", + "prompt": "Return the names of users who have placed at least one order with a total greater than 500.", + "tables_hint": ["users", "orders"], + "check": { "type": "rows", "expected": [["Alice"], ["Carol"], ["Dave"]] }, + "note": "IN-subquery: users filter .id in (orders filter .total > 500 { .user_id })." + }, + { + "id": "null-01", + "prompt": "Return the names of top-level categories (those with no parent).", + "tables_hint": ["categories"], + "check": { "type": "rows", "expected": [["Books"], ["Electronics"]] }, + "note": "Null check is `.parent_id = null`, not IS NULL." + }, + { + "id": "between-01", + "prompt": "Return the names of users whose age is between 25 and 45 inclusive.", + "tables_hint": ["users"], + "check": { "type": "rows", "expected": [["Alice"], ["Carol"], ["Dave"]] }, + "note": "between is inclusive: .age between 25 and 45." + }, + { + "id": "distinct-01", + "prompt": "Return the distinct category_id values used by products.", + "tables_hint": ["products"], + "check": { "type": "rowcount", "expected": 3 }, + "note": "distinct projection: products distinct { .category_id }." + }, + { + "id": "distinct-02", + "prompt": "How many distinct cities appear in the orders table?", + "tables_hint": ["orders"], + "check": { "type": "scalar", "expected": "3" }, + "note": "count(distinct orders { .city })." + }, + { + "id": "txn-01", + "prompt": "Inside a transaction, insert a throwaway order (id 99, user_id 1, total 1.0, status 'x', city 'Z'), then roll back, then count the orders. The count must be unchanged.", + "tables_hint": ["orders"], + "check": { "type": "scalar", "expected": "6" }, + "note": "begin / insert / rollback / count(orders) as one statement batch separated by `;`." + }, + { + "id": "alter-01", + "prompt": "Add a string column `nickname` to the users table.", + "tables_hint": ["users"], + "check": { "type": "ok" }, + "note": "DDL uses `alter users add column nickname: str` (not ADD COLUMN ... TEXT)." + }, + { + "id": "alter-02", + "prompt": "Create an index on the users.city column.", + "tables_hint": ["users"], + "check": { "type": "ok" }, + "note": "alter users add index .city." + }, + { + "id": "order-01", + "prompt": "Return the name and age of the two oldest users after skipping the single oldest (i.e. order by age descending, offset 1, limit 2).", + "tables_hint": ["users"], + "check": { "type": "rows", "expected": [["Alice", "30"], ["Dave", "28"]] }, + "note": "order .age desc offset 1 limit 2 { .name, .age }." + }, + { + "id": "case-01", + "prompt": "For every user, return a label that is 'adult' when age is 18 or more, otherwise 'minor'.", + "tables_hint": ["users"], + "check": { "type": "rowcount", "expected": 5 }, + "note": "case when .age >= 18 then \"adult\" else \"minor\" end, aliased." + }, + { + "id": "upsert-01", + "prompt": "Upsert an inventory row keyed on product_id: set product_id 1 to quantity 999 in warehouse W1.", + "tables_hint": ["inventory"], + "check": { "type": "ok" }, + "note": "upsert inventory on .product_id { ... }. On this branch upsert works on any column (UNIQUE not yet merged)." + }, + { + "id": "agg-01", + "prompt": "What is the average review rating across all reviews?", + "tables_hint": ["reviews"], + "check": { "type": "scalar", "expected": "4.25" }, + "note": "avg(reviews { .rating })." + }, + { + "id": "agg-02", + "prompt": "What is the total of all order totals for paid orders?", + "tables_hint": ["orders"], + "check": { "type": "scalar", "expected": "3036" }, + "note": "sum(orders filter .status = \"paid\" { .total })." + } +] diff --git a/site/powql.html b/site/powql.html index eb1c758..97a8cde 100644 --- a/site/powql.html +++ b/site/powql.html @@ -71,7 +71,7 @@

Data Types

Null(empty)0 bytesAbsence of a value
-

Fields marked required cannot be null. All other fields are nullable by default. Use is null / is not null to check, and ?? to coalesce.

+

Fields marked required cannot be null. All other fields are nullable by default. Use is null / is not null to check, and ?? to coalesce. A field marked unique (e.g. required unique email: str, modifiers in either order) auto-creates a unique index and rejects duplicate non-null values on insert/update/upsert.

Schema Definition

@@ -304,7 +304,8 @@

DELETE

User delete

UPSERT

-
-- Insert or update on conflict
+    

The on column must be unique (since 0.4.7) — declare it unique in the type or run alter User add unique .email first.

+
-- Insert or update on conflict (.email must be unique)
 upsert User on .email { name := "Alice", email := "alice@example.com", age := 30 }
 
 -- Explicit conflict handling
@@ -342,7 +343,10 @@ 

Alter Table

-- Create an index alter User add index .email -alter User add index .age
+alter User add index .age + +-- Add a unique constraint (scans for existing dups first; fails if any) +alter User add unique .email

Drop Table

drop User
@@ -450,6 +454,16 @@

PowQL vs SQL Cheat Sheet

alter User add index .email CREATE INDEX ON User (email) + + Unique column + type User { unique email: str } + CREATE TABLE User (email TEXT UNIQUE) + + + Add unique + alter User add unique .email + CREATE UNIQUE INDEX ON User (email) + Alias User { full_name: .name } @@ -476,6 +490,7 @@

Key Syntactic Differences

Assignment:== or SET col = val Table definitiontype Name { ... }CREATE TABLE Name (...) Required / NOT NULLrequired field: typefield TYPE NOT NULL + Unique constraintunique field: typefield TYPE UNIQUE String literals"double quotes"'single quotes' Query shapePipeline: Table verb verb { proj }Clausal: SELECT proj FROM Table WHERE ... AggregatesWrapping: count(Table filter ...)Inline: SELECT COUNT(*) FROM ...