Integration tests#97
Conversation
Three tests gated behind the `integration` build tag that connect to a real Weaviate instance (localhost:50051 / localhost:8080) and exercise the paths that unit tests couldn't reach: - TestIntegration_QueriesSucceed: insert→query with random vectors, asserts zero failed queries and positive QPS/latency values. - TestIntegration_RecallForExactNeighbors: queries with exact copies of inserted vectors so the nearest neighbour is always itself; asserts recall >90%. - TestIntegration_ResultsJSON: verifies the JSON serialiser produces well-formed output with the expected keys after a real query run. Tests skip automatically when Weaviate is not reachable, so they are safe to run in a standard `go test ./cmd/...` invocation. Run with: go test -tags integration ./cmd/... Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When QueryWithNeighbors.Neighbors is empty (e.g. random-vectors mode), neighborLimit was 0, causing a 0.0/0.0 float division that produced NaN. NaN then propagated into the JSON output, crashing serialisation. Two-part fix: - processQueueGrpc: skip recall/NDCG computation entirely when the query has no ground truth neighbors, leaving the recall/ndcg slices empty. - analyze: guard the mean recall/NDCG division against empty slices, returning 0 instead of NaN. Found by TestIntegration_ResultsJSON. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
There was a problem hiding this comment.
Pull request overview
Adds integration tests for the benchmarker that exercise a full insert → query → metrics/output cycle against a real Weaviate instance, and fixes a NaN-producing recall/NDCG edge case that could break JSON serialization.
Changes:
- Added
//go:build integrationtests covering query success, recall on exact matches, and JSON output shape. - Fixed recall/NDCG computation to avoid NaN when no ground-truth neighbors exist.
- Added
benchmarker/CLAUDE.mdwith build/test/run guidance.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| benchmarker/cmd/integration_test.go | New integration test suite (insert/query/recall/JSON) against a live Weaviate instance. |
| benchmarker/cmd/benchmark_run.go | Guards recall/NDCG aggregation to avoid NaN propagation and JSON marshalling failures. |
| benchmarker/CLAUDE.md | Developer documentation for building/running/tests (unit + integration). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| writeChunk(&Batch{Vectors: vectors, Offset: 0}, &grpcClient, cfg) | ||
|
|
||
| // Give HNSW time to finish indexing before querying. | ||
| time.Sleep(3 * time.Second) | ||
| } |
There was a problem hiding this comment.
The fixed time.Sleep(3 * time.Second) after inserts is likely to make integration tests flaky on slower machines/CI (indexing may take longer) and unnecessarily slow when indexing is faster. Consider polling for readiness (e.g., waitReady or an object-count check) with a timeout instead of sleeping a constant duration.
| recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit) | ||
| ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit) | ||
| log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery) | ||
| *recall = append(*recall, recallQuery) | ||
| *ndcg = append(*ndcg, ndcgQuery) |
There was a problem hiding this comment.
len(query.Neighbors) > 0 is not sufficient to avoid division-by-zero/NaN here: if cfg.Limit is 0 (not currently validated against in Config.Validate()), then neighborLimit := min(cfg.Limit, len(query.Neighbors)) becomes 0 and both recall and NDCG computations still divide by zero. Consider guarding on neighborLimit > 0 (or returning 0 when k<=0 in calculateLinearNDCG).
| recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit) | |
| ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit) | |
| log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery) | |
| *recall = append(*recall, recallQuery) | |
| *ndcg = append(*ndcg, ndcgQuery) | |
| if neighborLimit > 0 { | |
| recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit) | |
| ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit) | |
| log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery) | |
| *recall = append(*recall, recallQuery) | |
| *ndcg = append(*ndcg, ndcgQuery) | |
| } |
| ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit) | ||
| log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery) | ||
| *recall = append(*recall, recallQuery) | ||
| *ndcg = append(*ndcg, ndcgQuery) |
There was a problem hiding this comment.
log.Debugf(...) is executed while holding m.Lock(). Logging can be comparatively slow and increases lock contention for the hot path; consider moving the debug log outside the critical section (compute values first, unlock, then log), while still appending to the shared slices under the mutex.
| // skipIfWeaviateUnavailable skips the test if Weaviate is not reachable. | ||
| // Start a local instance with: | ||
| // | ||
| // docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest | ||
| func skipIfWeaviateUnavailable(t *testing.T, origin string) { | ||
| t.Helper() | ||
| ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) | ||
| defer cancel() | ||
| conn, err := grpc.DialContext(ctx, origin, grpc.WithInsecure(), grpc.WithBlock()) //nolint:staticcheck | ||
| if err != nil { | ||
| t.Skipf("Weaviate not available at %s: %v", origin, err) | ||
| } | ||
| conn.Close() | ||
| } |
There was a problem hiding this comment.
skipIfWeaviateUnavailable only checks the gRPC endpoint, but setupTestCollection also needs the HTTP endpoint (createClient uses cfg.HttpOrigin) to create/delete the class. If HTTP is down (or exposed on a different port), these tests will fail/hang instead of skipping. Consider checking both gRPC and the configured HTTP readiness endpoint before proceeding.
| // Delete first (ignore error — collection may not exist yet). | ||
| _ = client.Schema().ClassDeleter().WithClassName(cfg.ClassName).Do(context.Background()) | ||
|
|
||
| classObj := &models.Class{ | ||
| Class: cfg.ClassName, | ||
| VectorIndexType: "hnsw", | ||
| VectorIndexConfig: map[string]interface{}{ | ||
| "distance": cfg.DistanceMetric, | ||
| "efConstruction": float64(cfg.EfConstruction), | ||
| "maxConnections": float64(cfg.MaxConnections), | ||
| "cleanupIntervalSeconds": cfg.CleanupIntervalSeconds, | ||
| "flatSearchCutoff": cfg.FlatSearchCutoff, | ||
| }, | ||
| } | ||
| err := client.Schema().ClassCreator().WithClass(classObj).Do(context.Background()) | ||
| require.NoError(t, err, "create collection %q", cfg.ClassName) | ||
|
|
||
| t.Cleanup(func() { | ||
| _ = client.Schema().ClassDeleter().WithClassName(cfg.ClassName).Do(context.Background()) | ||
| }) |
There was a problem hiding this comment.
setupTestCollection uses context.Background() for schema create/delete calls. If Weaviate is partially reachable or misconfigured, these can hang indefinitely and stall the test suite. Consider using a context with timeout (similar to the gRPC dial) for schema operations as well.
Summary
Adds integration tests that exercise the full insert→query cycle against a real Weaviate instance, covering paths that unit tests couldn't reach (data loading, gRPC querying, recall computation, JSON serialisation).
In writing the tests, a pre-existing bug was found and fixed: when no ground truth neighbors are provided (e.g. random-vectors mode), neighborLimit is 0, causing a 0.0/0.0 float division that produces NaN. That NaN then propagates into JSON
output and crashes serialisation with json: unsupported value: NaN.
Changes
cmd/integration_test.go — three tests gated behind //go:build integration:
Tests skip automatically if Weaviate is not reachable, so a plain go test ./cmd/... is unaffected.
cmd/benchmark_run.go — bug fix for NaN recall/NDCG:
benchmarker/CLAUDE.md — new file documenting build commands, architecture, and how to run both unit and integration tests.
How to run the integration tests
docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest
go test -tags integration ./cmd/...