Skip to content

Integration tests#97

Merged
rlmanrique merged 3 commits into
mainfrom
integration-tests
Mar 31, 2026
Merged

Integration tests#97
rlmanrique merged 3 commits into
mainfrom
integration-tests

Conversation

@rlmanrique
Copy link
Copy Markdown
Contributor

Summary

Adds integration tests that exercise the full insert→query cycle against a real Weaviate instance, covering paths that unit tests couldn't reach (data loading, gRPC querying, recall computation, JSON serialisation).

In writing the tests, a pre-existing bug was found and fixed: when no ground truth neighbors are provided (e.g. random-vectors mode), neighborLimit is 0, causing a 0.0/0.0 float division that produces NaN. That NaN then propagates into JSON
output and crashes serialisation with json: unsupported value: NaN.

Changes

cmd/integration_test.go — three tests gated behind //go:build integration:

  • TestIntegration_QueriesSucceed — inserts 300 vectors, runs 50 random-vector queries, asserts zero failures and positive QPS/latency.
  • TestIntegration_RecallForExactNeighbors — queries with exact copies of inserted vectors (ground truth = the vector itself), asserts recall > 90%.
  • TestIntegration_ResultsJSON — runs a real query cycle and verifies the JSON serialiser produces well-formed output with the expected keys.

Tests skip automatically if Weaviate is not reachable, so a plain go test ./cmd/... is unaffected.

cmd/benchmark_run.go — bug fix for NaN recall/NDCG:

  • processQueueGrpc: skip recall/NDCG computation when query.Neighbors is empty instead of dividing by zero.
  • analyze: guard mean recall/NDCG against empty slices, returning 0 instead of NaN.

benchmarker/CLAUDE.md — new file documenting build commands, architecture, and how to run both unit and integration tests.

How to run the integration tests

docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest
go test -tags integration ./cmd/...

rlmanrique and others added 3 commits March 30, 2026 15:50
Three tests gated behind the `integration` build tag that connect to a
real Weaviate instance (localhost:50051 / localhost:8080) and exercise
the paths that unit tests couldn't reach:

- TestIntegration_QueriesSucceed: insert→query with random vectors,
  asserts zero failed queries and positive QPS/latency values.
- TestIntegration_RecallForExactNeighbors: queries with exact copies of
  inserted vectors so the nearest neighbour is always itself; asserts
  recall >90%.
- TestIntegration_ResultsJSON: verifies the JSON serialiser produces
  well-formed output with the expected keys after a real query run.

Tests skip automatically when Weaviate is not reachable, so they are
safe to run in a standard `go test ./cmd/...` invocation. Run with:

  go test -tags integration ./cmd/...

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When QueryWithNeighbors.Neighbors is empty (e.g. random-vectors mode),
neighborLimit was 0, causing a 0.0/0.0 float division that produced NaN.
NaN then propagated into the JSON output, crashing serialisation.

Two-part fix:
- processQueueGrpc: skip recall/NDCG computation entirely when the query
  has no ground truth neighbors, leaving the recall/ndcg slices empty.
- analyze: guard the mean recall/NDCG division against empty slices,
  returning 0 instead of NaN.

Found by TestIntegration_ResultsJSON.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@orca-security-eu orca-security-eu Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds integration tests for the benchmarker that exercise a full insert → query → metrics/output cycle against a real Weaviate instance, and fixes a NaN-producing recall/NDCG edge case that could break JSON serialization.

Changes:

  • Added //go:build integration tests covering query success, recall on exact matches, and JSON output shape.
  • Fixed recall/NDCG computation to avoid NaN when no ground-truth neighbors exist.
  • Added benchmarker/CLAUDE.md with build/test/run guidance.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
benchmarker/cmd/integration_test.go New integration test suite (insert/query/recall/JSON) against a live Weaviate instance.
benchmarker/cmd/benchmark_run.go Guards recall/NDCG aggregation to avoid NaN propagation and JSON marshalling failures.
benchmarker/CLAUDE.md Developer documentation for building/running/tests (unit + integration).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +112 to +116
writeChunk(&Batch{Vectors: vectors, Offset: 0}, &grpcClient, cfg)

// Give HNSW time to finish indexing before querying.
time.Sleep(3 * time.Second)
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixed time.Sleep(3 * time.Second) after inserts is likely to make integration tests flaky on slower machines/CI (indexing may take longer) and unnecessarily slow when indexing is faster. Consider polling for readiness (e.g., waitReady or an object-count check) with a timeout instead of sleeping a constant duration.

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +162
recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit)
ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)
log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)
*recall = append(*recall, recallQuery)
*ndcg = append(*ndcg, ndcgQuery)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(query.Neighbors) > 0 is not sufficient to avoid division-by-zero/NaN here: if cfg.Limit is 0 (not currently validated against in Config.Validate()), then neighborLimit := min(cfg.Limit, len(query.Neighbors)) becomes 0 and both recall and NDCG computations still divide by zero. Consider guarding on neighborLimit > 0 (or returning 0 when k<=0 in calculateLinearNDCG).

Suggested change
recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit)
ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)
log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)
*recall = append(*recall, recallQuery)
*ndcg = append(*ndcg, ndcgQuery)
if neighborLimit > 0 {
recallQuery := float64(len(intersection(ids, query.Neighbors[:neighborLimit]))) / float64(neighborLimit)
ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)
log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)
*recall = append(*recall, recallQuery)
*ndcg = append(*ndcg, ndcgQuery)
}

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +162
ndcgQuery := calculateLinearNDCG(ids, query.Neighbors, neighborLimit)
log.Debugf("Query took %s, recall %f, ndcg %f", took, recallQuery, ndcgQuery)
*recall = append(*recall, recallQuery)
*ndcg = append(*ndcg, ndcgQuery)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Debugf(...) is executed while holding m.Lock(). Logging can be comparatively slow and increases lock contention for the hot path; consider moving the debug log outside the critical section (compute values first, unlock, then log), while still appending to the shared slices under the mutex.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +62
// skipIfWeaviateUnavailable skips the test if Weaviate is not reachable.
// Start a local instance with:
//
// docker run -p 8080:8080 -p 50051:50051 semitechnologies/weaviate:latest
func skipIfWeaviateUnavailable(t *testing.T, origin string) {
t.Helper()
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()
conn, err := grpc.DialContext(ctx, origin, grpc.WithInsecure(), grpc.WithBlock()) //nolint:staticcheck
if err != nil {
t.Skipf("Weaviate not available at %s: %v", origin, err)
}
conn.Close()
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipIfWeaviateUnavailable only checks the gRPC endpoint, but setupTestCollection also needs the HTTP endpoint (createClient uses cfg.HttpOrigin) to create/delete the class. If HTTP is down (or exposed on a different port), these tests will fail/hang instead of skipping. Consider checking both gRPC and the configured HTTP readiness endpoint before proceeding.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +103
// Delete first (ignore error — collection may not exist yet).
_ = client.Schema().ClassDeleter().WithClassName(cfg.ClassName).Do(context.Background())

classObj := &models.Class{
Class: cfg.ClassName,
VectorIndexType: "hnsw",
VectorIndexConfig: map[string]interface{}{
"distance": cfg.DistanceMetric,
"efConstruction": float64(cfg.EfConstruction),
"maxConnections": float64(cfg.MaxConnections),
"cleanupIntervalSeconds": cfg.CleanupIntervalSeconds,
"flatSearchCutoff": cfg.FlatSearchCutoff,
},
}
err := client.Schema().ClassCreator().WithClass(classObj).Do(context.Background())
require.NoError(t, err, "create collection %q", cfg.ClassName)

t.Cleanup(func() {
_ = client.Schema().ClassDeleter().WithClassName(cfg.ClassName).Do(context.Background())
})
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setupTestCollection uses context.Background() for schema create/delete calls. If Weaviate is partially reachable or misconfigured, these can hang indefinitely and stall the test suite. Consider using a context with timeout (similar to the gRPC dial) for schema operations as well.

Copilot uses AI. Check for mistakes.
@rlmanrique rlmanrique merged commit 46b0922 into main Mar 31, 2026
9 checks passed
@rlmanrique rlmanrique deleted the integration-tests branch March 31, 2026 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants