Use dolt JSON encoding by zachmu · Pull Request #2639 · dolthub/doltgresql

zachmu · 2026-04-24T22:17:18Z

This gives us the storage, merge, and perf benefits for JSON in doltgres.

This change also results in a behavior change for ORDER BY on JSONB columns. Postgres has a particular b-tree order it uses to store JSON documents in order to efficient perform its lookups. Dolt's method for efficient JSON retrieval is quite different, and has nothing to do with the order in which documents are stored in a primary index.

github-actions · 2026-04-24T22:29:15Z

	Main	PR
covering_index_scan_postgres	1330.66/s	1318.26/s	-1.0%
index_join_postgres	197.65/s	195.56/s	-1.1%
index_join_scan_postgres	208.27/s	209.81/s	+0.7%
index_scan_postgres	12.04/s	12.17/s	+1.0%
oltp_point_select	2323.20/s	2335.44/s	+0.5%
oltp_read_only	1869.89/s	1878.91/s	+0.4%
select_random_points	131.92/s	130.20/s	-1.4%
select_random_ranges	1069.44/s	1071.54/s	+0.1%
table_scan_postgres	11.75/s	11.82/s	+0.5%
types_table_scan_postgres	5.44/s	${\color{red}4.72/s}$	${\color{red}-13.3\%}$

github-actions · 2026-04-24T23:49:58Z

	Main	PR
Total	42090	42090
Successful	17621	17707
Failures	24469	24383
Partial Successes¹	5352	5378

	Main	PR
Successful	41.8651%	42.0694%
Failures	58.1349%	57.9306%

${\color{red}Regressions (2)}$

json_encoding

QUERY:          SELECT '"\u0000"'::json;
RECEIVED ERROR: row sets differ:
    Postgres:
        {""}
    Doltgres:
        {"\"\""}

jsonb

QUERY:          SELECT count(*) FROM testjsonb WHERE j > '{"p":1}';
RECEIVED ERROR: row sets differ:
    Postgres:
        {884}
    Doltgres:
        {174}

${\color{lightgreen}Progressions (90)}$

domain

QUERY: drop domain ddef1 restrict;

QUERY: drop domain ddef2 restrict;

QUERY: drop domain ddef3 restrict;

QUERY: drop domain ddef4 restrict;

QUERY: drop sequence ddef4_seq;

QUERY: create domain vchar4 varchar(4);

QUERY: create domain dinter vchar4 check (substring(VALUE, 1, 1) = 'x');

QUERY: create domain dtop dinter check (substring(VALUE, 2, 1) = '1');

QUERY: create temp table dtest(f1 dtop);

QUERY: drop table dtest;

QUERY: create domain str_domain as text not null;

QUERY: create table domain_test (a int, b int);

QUERY: insert into domain_test values (1, 2);

QUERY: insert into domain_test values (1, 2);

QUERY: create domain pos_int as int4 check (value > 0) not null;

QUERY: create function doubledecrement(p1 pos_int) returns pos_int as $$
declare v pos_int;
begin
    return p1;
end$$ language plpgsql;

QUERY: create or replace function doubledecrement(p1 pos_int) returns pos_int as $$
declare v pos_int := 0;
begin
    return p1;
end$$ language plpgsql;

QUERY: create or replace function doubledecrement(p1 pos_int) returns pos_int as $$
declare v pos_int := 1;
begin
    v := p1 - 1;
    return v - 1;
end$$ language plpgsql;

QUERY: create type ddtest1 as (f1 posint);

QUERY: create table ddtest2(f1 ddtest1);

QUERY: drop table ddtest2;

QUERY: create table ddtest2(f1 ddtest1[]);

QUERY: drop table ddtest2;

QUERY: create domain ddtest1d as ddtest1;

QUERY: create table ddtest2(f1 ddtest1d);

QUERY: drop table ddtest2;

QUERY: drop domain ddtest1d;

QUERY: create domain ddtest1d as ddtest1[];

QUERY: create table ddtest2(f1 ddtest1d);

QUERY: drop table ddtest2;

QUERY: drop domain ddtest1d;

QUERY: create domain posint2 as posint check (value % 2 = 0);

QUERY: create table ddtest2(f1 posint2);

QUERY: drop table ddtest2;

QUERY: drop type ddtest1;

QUERY: create or replace function array_elem_check(numeric) returns numeric as $$
declare
  x numeric(4,2)[1];
begin
  x[1] := $1;
  return x[1];
end$$ language plpgsql;

QUERY: create domain mynums as numeric(4,2)[1];

QUERY: create or replace function array_elem_check(numeric) returns numeric as $$
declare
  x mynums;
begin
  x[1] := $1;
  return x[1];
end$$ language plpgsql;

QUERY: create domain mynums2 as mynums;

QUERY: create or replace function array_elem_check(numeric) returns numeric as $$
declare
  x mynums2;
begin
  x[1] := $1;
  return x[1];
end$$ language plpgsql;

These are tests that we're marking as Successful, however they do not match the expected output in some way. This is due to small differences, such as different wording on the error messages, or the column names being incorrect while the data itself is correct. ↩

nicktobey · 2026-04-28T01:14:21Z

 		Function: func(ctx *sql.Context, val any, targetType *pgtypes.DoltgresType) (any, error) {
-			return targetType.IoInput(ctx, val.(string))
+			switch v := val.(type) {
+			case string:


Instead of being a string, we might also encounter a type that implements sql.Wrapper[string], such as TextStorage.

nicktobey · 2026-04-28T01:25:43Z

+	switch v := val.(type) {
+	case sql.JSONWrapper:
+		return v.ToInterface(ctx)
+	case string:


Do we need to handle sql.Wrapper[string] too?

nicktobey · 2026-04-28T01:31:00Z

-					Value: pgtypes.JsonValueCopy(item.Value),
-				})
-			}
+	wrapper1, ok1 := val1Interface.(sql.JSONWrapper)


Not a blocker, but as written, this doesn't benefit from JSON optimizations and fully deserializes both inputs.

It's likely that for most inputs, there's not much performance to be gained. But concatenating two arrays would optimize very well and I imagine is a common use case. To get that benefit, we would want to add a method to types.MutableJSON and implement it in IndexedJsonDocument. Then we can check if the inputs implement MutableJSON and delegate to that method.

nicktobey · 2026-04-28T01:36:55Z

+		if !ok {
+			return nil, nil
+		}
+		v, err := wrapper.ToInterface(ctx)


So in general, every time we call ToInterface, we defeat all JSON optimizations because we have to fully load the value.

If we want to benefit from JSON optimizations, we need to check if the input implements one of the interfaces that IndexedJsonDocument implements: ComparableJSON, MutableJSON, and SearchableJSON

itoqa · 2026-05-05T01:17:05Z

Ito Test Report ❌

24 test cases ran. 2 failed, 22 passed.

Across 24 executed JSON/JSONB test cases, 22 passed and 2 failed, with successful verification of legacy/new migration interoperability, restart and rewrite parity with no semantic drift, large nested payload stability, deterministic ORDER BY and comparator/operator consistency, extraction edge cases (including negative indexes and controlled type errors), casting behavior (including overflow handling and concurrent workloads), merge/concurrency semantics, and targeted coverage reruns after environmental repair. The key confirmed defects were a High-severity panic path for short invalid JSON in shared json_in/jsonb_in handling (unguarded input[:10] slicing) and a Medium-severity precision-loss issue where very large numeric literals collapse and compare equal, although a later rerun of short invalid inputs length 0–9 in both routes completed without panic and preserved session usability.

❌ Failed (2)

Category	Summary	Screenshot
Migration	⚠️ Short invalid JSON input triggers a panic via unguarded slicing in `json_in_callable`.
Ordering	🟠 Distinct very large JSON numeric literals collapse to one stored value and compare equal.

⚠️ Short invalid JSON input panics

What failed: Instead of returning only a controlled invalid JSON syntax error, the server executes a panic path (slice bounds out of range) for short invalid input.
Impact: Invalid user input can trigger server panic handling on a core SQL type-input path, risking query interruption and noisy failure behavior in shared sessions. This affects both json and jsonb input paths because jsonb_in reuses the same callable.
Steps to reproduce:
1. Connect to the database with psql.
2. Run SELECT ''::json; using a short invalid JSON literal.
3. Observe the panic trace and then run SELECT 1; to verify the process remained up after panic handling.
Stub / mock context: Authentication was temporarily bypassed by setting server/authentication_scram.go EnableAuthentication to false so local SQL migration checks could run deterministically; the failing behavior itself comes from production JSON parsing logic and not from mocked SQL responses.
Code analysis: Reviewed JSON input functions and panic handling in the server. json_in_callable slices input[:10] unconditionally on parse errors, which panics for strings shorter than 10 characters; jsonb_in points directly to the same callable, expanding blast radius.
Why this is likely a bug: The panic is directly caused by deterministic unsafe slicing in production input-validation code, not by test-only mocking or setup behavior.

Relevant code:

server/functions/json.go (lines 58-65)

func json_in_callable(ctx *sql.Context, _ [2]*pgtypes.DoltgresType, val any) (any, error) {
	input := val.(string)
	var jsonVal any
	err := json.Unmarshal(unsafe.Slice(unsafe.StringData(input), len(input)), &jsonVal)
	if err != nil {
		return nil, pgtypes.ErrInvalidSyntaxForType.New("json", input[:10]+"...")
	}

server/functions/jsonb.go (lines 40-47)

// jsonb_in represents the PostgreSQL function of jsonb type IO input.
var jsonb_in = framework.Function1{
	Name:       "jsonb_in",
	Return:     pgtypes.JsonB,
	Parameters: [1]*pgtypes.DoltgresType{pgtypes.Cstring},
	Strict:     true,
	Callable:   json_in_callable,
}

server/doltgres_handler.go (lines 631-636)

pan2err := func(err *error) {
	if HandlePanics {
		if recoveredPanic := recover(); recoveredPanic != nil {
			// debug.Stack() here prints the stack trace of the original panic, not the lexical stack of this defer function
			*err = goerrors.Join(*err, errors.Errorf("DoltgresHandler caught panic: %v: %s", recoveredPanic, debug.Stack()))
		}

🟠 Large JSON numerics lose precision

What failed: Distinct large numeric inputs are normalized to the same stored value and jsonb_cmp returns equality (0) between different literals, but they should remain distinguishable and ordered consistently.
Impact: Queries that rely on JSONB numeric precision for large boundary values can return incorrect ordering and equality results. This can misclassify rows in comparison-dependent workflows until values are represented with precision-safe types.
Steps to reproduce:
1. Insert multiple distinct large JSONB numeric literals beyond float64 precision into a table.
2. Query the stored values and run pairwise jsonb_cmp comparisons across inserted rows.
3. Observe that distinct inputs collapse to the same value and compare equal.
Stub / mock context: Database authentication was temporarily bypassed by setting EnableAuthentication to false in server/authentication_scram.go so local SQL checks could run without SCRAM login. No stubs or mocks were applied to JSONB ordering/comparator logic for this test.
Code analysis: I inspected the JSONB input and comparator code paths and found the production path parses JSON into generic interface values (which uses float-backed numbers), then compares wrappers via CompareJSON; this creates a plausible precision-loss path before comparison.
Why this is likely a bug: The production JSONB input path converts numeric literals through generic JSON unmarshalling before comparison, which can collapse very large numbers and produce false equality/order results for distinct inputs.

Relevant code:

server/functions/jsonb.go (lines 40-47)

// jsonb_in represents the PostgreSQL function of jsonb type IO input.
var jsonb_in = framework.Function1{
	Name:       "jsonb_in",
	Return:     pgtypes.JsonB,
	Parameters: [1]*pgtypes.DoltgresType{pgtypes.Cstring},
	Strict:     true,
	Callable:   json_in_callable,
}

server/functions/json.go (lines 58-65)

func json_in_callable(ctx *sql.Context, _ [2]*pgtypes.DoltgresType, val any) (any, error) {
	input := val.(string)
	var jsonVal any
	err := json.Unmarshal(unsafe.Slice(unsafe.StringData(input), len(input)), &jsonVal)
	if err != nil {
		return nil, pgtypes.ErrInvalidSyntaxForType.New("json", input[:10]+"...")
	}
	return types.JSONDocument{Val: jsonVal}, nil
}

server/types/type.go (lines 318-320)

case sql.JSONWrapper:
	res, err := types.CompareJSON(ctx, ab, v2)
	return res, err

✅ Passed (22)

Category	Summary	Screenshot
Casting	Assignment-style and explicit json::jsonb casts on string and wrapper JSON values preserved logical JSON content across all checked rows.	N/A
Casting	Out-of-range JSON numeric casts to smallint, integer, and bigint returned target-specific overflow errors without truncation.	N/A
Casting	Concurrent cast-heavy reads and writes completed without intermittent unexpected type errors.	N/A
Coverage	Normalization handles JSON/JSONB wrapper elements; prior blocker was environmental, and rerun passed twice.
Coverage	JSONB comparison matrix coverage is present and rerun subtests passed after environment repair.
Extraction	Operators -> and #> returned matching nested object, path element, and array index values for both json and jsonb columns.
Extraction	Unsupported JSON operator operand types produced controlled errors, and a valid extraction query succeeded immediately after in the same session.
Extraction	SQL verification confirmed ->> and #>> formatting behavior: strings are unquoted text, JSON null becomes empty text, and scalar/object values are text-encoded consistently for json and jsonb.
Extraction	Negative indexing behaved correctly for both json and jsonb: -len returned first element (10), -(len+1) returned NULL, and -1 returned last element (30) for both -> and #> paths.
Merge	Concatenating left_obj with right_obj on the overlap fixture produced same=20 (right-hand overwrite) while preserving non-collision keys (k=1, r=2).	N/A
Merge	Under concurrent conflicting concatenation updates on the same row, conflict resolution produced a valid serializable outcome (one committed full object) and no hybrid or partially merged structure.	N/A
Merge	Legacy duplicate-key fixture behavior was stable pre/post rewrite: duplicate-key resolution and concatenation outputs remained consistent (`dup` observed as 2 in payload, concatenation with right object produced 3 both times).	N/A
Migration	Legacy JSONB compatibility path is present and the earlier block was environmental.
Migration	JSON and JSONB rows persisted across restart with unchanged values.
Migration	Large nested JSON insert and repeated conversion queries completed without instability.
Migration	Length 0-9 invalid payloads were rejected in both json/jsonb paths across two sessions, and immediate valid follow-up queries succeeded after each error, indicating no panic or dropped session.
Migration	A consolidated SQL query combining extraction, cast-to-text, comparison predicates, and ORDER BY executed successfully across mixed legacy/new JSONB rows; both subsets were present with no query abort or silent row drop.
Migration	Legacy JSONB rows were snapshotted, rewritten, server-restarted, and revalidated with zero semantic drift (drift_count=0; all rows unchanged pre/post rewrite and restart).
Ordering	Re-run confirmed deterministic ASC/DESC ORDER BY behavior for the canonical JSONB corpus.
Ordering	Comparator sign and relational operators remained aligned with zero mismatches.
Ordering	GROUP BY and equality JOIN checks matched expected JSONB equality semantics.
Ordering	Comparator registration and runtime operator behavior stayed coherent with no contradictions.

Commit: 3d4c9d8

View Full Run

Tell us how we did: Give Ito Feedback

zachmu added 17 commits April 9, 2026 15:42

using indexedjsondocument

5e6ea56

merge main

93cd5f9

fix test harness for json

f7648e1

always use json types in json func output

6409b07

more json conversion

9c0c972

converting more jsonb stuff

c7679a1

more jsonb conversion

fa5f48d

Merge branch 'main' into zachmu/json-enc

e06e03d

fixed jsonb_object_field

2ea3f7a

merge dev branch

ef86f58

turn json encoding back on

d39a2f4

merge dev branch

6324c85

turn json encoding back on

298b82c

Fixed remaining areas of JSON support

7c74053

formatting

c2cf3e1

linting

a67af1d

formatting

6a575ad

zachmu added 4 commits April 24, 2026 15:41

unfocus test

58b76e1

unfocus test

de02e2e

removed JSONB ordering test

94133f2

deleted unused json document type

cf79f4a

zachmu requested a review from nicktobey April 24, 2026 23:30

Merge branch 'main' into zachmu/json-enc

09ca8e5

zachmu and others added 5 commits April 27, 2026 12:48

restore deleted json_document.go

47f98ac

Restore legacy ser/deser paths

ad4e7c7

[ga-format-pr] Run scripts/format_repo.sh

589098a

new dolt

13249bf

merge main

f648270

zachmu added 2 commits April 27, 2026 15:37

new dolt

af13d08

revert latest dolt changes

3d4c9d8

nicktobey reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use dolt JSON encoding#2639

Use dolt JSON encoding#2639
zachmu wants to merge 29 commits into
mainfrom
zachmu/json-enc

zachmu commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

nicktobey Apr 28, 2026

Uh oh!

nicktobey Apr 28, 2026

Uh oh!

nicktobey Apr 28, 2026

Uh oh!

nicktobey Apr 28, 2026

Uh oh!

itoqa Bot commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zachmu commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

${\color{red}Regressions (2)}$

json_encoding

jsonb

${\color{lightgreen}Progressions (90)}$

domain

Footnotes

Uh oh!

nicktobey Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

nicktobey Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

nicktobey Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

nicktobey Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

itoqa Bot commented May 5, 2026

Ito Test Report ❌

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zachmu commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading