Integration tests by rlmanrique · Pull Request #97 · weaviate/weaviate-benchmarking

rlmanrique · 2026-03-30T13:56:49Z

Summary

Adds integration tests that exercise the full insert→query cycle against a real Weaviate instance, covering paths that unit tests couldn't reach (data loading, gRPC querying, recall computation, JSON serialisation).

In writing the tests, a pre-existing bug was found and fixed: when no ground truth neighbors are provided (e.g. random-vectors mode), neighborLimit is 0, causing a 0.0/0.0 float division that produces NaN. That NaN then propagates into JSON
output and crashes serialisation with json: unsupported value: NaN.

Changes

cmd/integration_test.go — three tests gated behind //go:build integration:

TestIntegration_QueriesSucceed — inserts 300 vectors, runs 50 random-vector queries, asserts zero failures and positive QPS/latency.
TestIntegration_RecallForExactNeighbors — queries with exact copies of inserted vectors (ground truth = the vector itself), asserts recall > 90%.
TestIntegration_ResultsJSON — runs a real query cycle and verifies the JSON serialiser produces well-formed output with the expected keys.

Tests skip automatically if Weaviate is not reachable, so a plain go test ./cmd/... is unaffected.

cmd/benchmark_run.go — bug fix for NaN recall/NDCG:

processQueueGrpc: skip recall/NDCG computation when query.Neighbors is empty instead of dividing by zero.
analyze: guard mean recall/NDCG against empty slices, returning 0 instead of NaN.

benchmarker/CLAUDE.md — new file documenting build commands, architecture, and how to run both unit and integration tests.

How to run the integration tests

docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest
go test -tags integration ./cmd/...

Three tests gated behind the `integration` build tag that connect to a real Weaviate instance (localhost:50051 / localhost:8080) and exercise the paths that unit tests couldn't reach: - TestIntegration_QueriesSucceed: insert→query with random vectors, asserts zero failed queries and positive QPS/latency values. - TestIntegration_RecallForExactNeighbors: queries with exact copies of inserted vectors so the nearest neighbour is always itself; asserts recall >90%. - TestIntegration_ResultsJSON: verifies the JSON serialiser produces well-formed output with the expected keys after a real query run. Tests skip automatically when Weaviate is not reachable, so they are safe to run in a standard `go test ./cmd/...` invocation. Run with: go test -tags integration ./cmd/... Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When QueryWithNeighbors.Neighbors is empty (e.g. random-vectors mode), neighborLimit was 0, causing a 0.0/0.0 float division that produced NaN. NaN then propagated into the JSON output, crashing serialisation. Two-part fix: - processQueueGrpc: skip recall/NDCG computation entirely when the query has no ground truth neighbors, leaving the recall/ndcg slices empty. - analyze: guard the mean recall/NDCG division against empty slices, returning 0 instead of NaN. Found by TestIntegration_ResultsJSON. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

orca-security-eu

Orca Security Scan Summary

Status	Check	Issues by priority
Passed	Infrastructure as Code	0 0 0 0	View in Orca
Passed	SAST	0 0 0 0	View in Orca
Passed	Secrets	0 0 0 0	View in Orca
Passed	Vulnerabilities	0 0 0 0	View in Orca

Copilot

Pull request overview

Adds integration tests for the benchmarker that exercise a full insert → query → metrics/output cycle against a real Weaviate instance, and fixes a NaN-producing recall/NDCG edge case that could break JSON serialization.

Changes:

Added //go:build integration tests covering query success, recall on exact matches, and JSON output shape.
Fixed recall/NDCG computation to avoid NaN when no ground-truth neighbors exist.
Added benchmarker/CLAUDE.md with build/test/run guidance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
benchmarker/cmd/integration_test.go	New integration test suite (insert/query/recall/JSON) against a live Weaviate instance.
benchmarker/cmd/benchmark_run.go	Guards recall/NDCG aggregation to avoid NaN propagation and JSON marshalling failures.
benchmarker/CLAUDE.md	Developer documentation for building/running/tests (unit + integration).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T14:10:50Z

+	writeChunk(&Batch{Vectors: vectors, Offset: 0}, &grpcClient, cfg)
+
+	// Give HNSW time to finish indexing before querying.
+	time.Sleep(3 * time.Second)
+}


The fixed time.Sleep(3 * time.Second) after inserts is likely to make integration tests flaky on slower machines/CI (indexing may take longer) and unnecessarily slow when indexing is faster. Consider polling for readiness (e.g., waitReady or an object-count check) with a timeout instead of sleeping a constant duration.

Copilot · 2026-03-30T14:10:50Z

+			recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit)
+			ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)
+			log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)
+			*recall = append(*recall, recallQuery)
+			*ndcg = append(*ndcg, ndcgQuery)


len(query.Neighbors) > 0 is not sufficient to avoid division-by-zero/NaN here: if cfg.Limit is 0 (not currently validated against in Config.Validate()), then neighborLimit := min(cfg.Limit, len(query.Neighbors)) becomes 0 and both recall and NDCG computations still divide by zero. Consider guarding on neighborLimit > 0 (or returning 0 when k<=0 in calculateLinearNDCG).

Suggested change

recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit)

ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)

log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)

*recall = append(*recall, recallQuery)

*ndcg = append(*ndcg, ndcgQuery)

if neighborLimit > 0 {

recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit)

ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)

log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)

*recall = append(*recall, recallQuery)

*ndcg = append(*ndcg, ndcgQuery)

}

Copilot · 2026-03-30T14:10:51Z

+			ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)
+			log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)
+			*recall = append(*recall, recallQuery)
+			*ndcg = append(*ndcg, ndcgQuery)


log.Debugf(...) is executed while holding m.Lock(). Logging can be comparatively slow and increases lock contention for the hot path; consider moving the debug log outside the critical section (compute values first, unlock, then log), while still appending to the shared slices under the mutex.

Copilot · 2026-03-30T14:10:51Z

+// skipIfWeaviateUnavailable skips the test if Weaviate is not reachable.
+// Start a local instance with:
+//
+//	docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest
+func skipIfWeaviateUnavailable(t *testing.T, origin string) {
+	t.Helper()
+	ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+	defer cancel()
+	conn, err := grpc.DialContext(ctx, origin, grpc.WithInsecure(), grpc.WithBlock()) //nolint:staticcheck
+	if err != nil {
+		t.Skipf("Weaviate not available at %s: %v", origin, err)
+	}
+	conn.Close()
+}


skipIfWeaviateUnavailable only checks the gRPC endpoint, but setupTestCollection also needs the HTTP endpoint (createClient uses cfg.HttpOrigin) to create/delete the class. If HTTP is down (or exposed on a different port), these tests will fail/hang instead of skipping. Consider checking both gRPC and the configured HTTP readiness endpoint before proceeding.

Copilot · 2026-03-30T14:10:51Z

+	// Delete first (ignore error — collection may not exist yet).
+	_ = client.Schema().ClassDeleter().WithClassName(cfg.ClassName).Do(context.Background())
+
+	classObj := &models.Class{
+		Class:           cfg.ClassName,
+		VectorIndexType: "hnsw",
+		VectorIndexConfig: map[string]interface{}{
+			"distance":               cfg.DistanceMetric,
+			"efConstruction":         float64(cfg.EfConstruction),
+			"maxConnections":         float64(cfg.MaxConnections),
+			"cleanupIntervalSeconds": cfg.CleanupIntervalSeconds,
+			"flatSearchCutoff":       cfg.FlatSearchCutoff,
+		},
+	}
+	err := client.Schema().ClassCreator().WithClass(classObj).Do(context.Background())
+	require.NoError(t, err, "create collection %q", cfg.ClassName)
+
+	t.Cleanup(func() {
+		_ = client.Schema().ClassDeleter().WithClassName(cfg.ClassName).Do(context.Background())
+	})


setupTestCollection uses context.Background() for schema create/delete calls. If Weaviate is partially reachable or misconfigured, these can hang indefinitely and stall the test suite. Consider using a context with timeout (similar to the gRPC dial) for schema operations as well.

rlmanrique and others added 3 commits March 30, 2026 15:50

Add CLAUDE.md and document integration test instructions

a00415c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

orca-security-eu Bot reviewed Mar 30, 2026

View reviewed changes

rlmanrique requested a review from Copilot March 30, 2026 14:05

Copilot started reviewing on behalf of rlmanrique March 30, 2026 14:05 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

trengrj approved these changes Mar 31, 2026

View reviewed changes

rlmanrique merged commit 46b0922 into main Mar 31, 2026
9 checks passed

rlmanrique deleted the integration-tests branch March 31, 2026 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration tests#97

Integration tests#97
rlmanrique merged 3 commits into
mainfrom
integration-tests

rlmanrique commented Mar 30, 2026

Uh oh!

orca-security-eu Bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rlmanrique commented Mar 30, 2026

Uh oh!

orca-security-eu Bot left a comment

Choose a reason for hiding this comment

Orca Security Scan Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants