test: upgrade integration harnesses for dual-server execution#706
test: upgrade integration harnesses for dual-server execution#706anubhav756 wants to merge 1 commit into
Conversation
| try: | ||
| print("Opening toolbox server process...") | ||
| # Make toolbox executable | ||
| os.chmod("toolbox", 0o700) |
There was a problem hiding this comment.
Does this work for windows? Since we're adding windows support to the get_toolbox_binary_url method, should these commands be fixed everywhere?
Currently I am also okay with removing the windows support since our CI runs on linux.
There was a problem hiding this comment.
Would it come under just testing infra change? Should it be moved to another PR?
| return request.param | ||
|
|
||
|
|
||
| @pytest.fixture(autouse=True) |
There was a problem hiding this comment.
nit: autouse pulls in the parametrized toolbox_server_url, so the ×2 matrix hits every test under the root tests/conftest.py: including mocked unit tests. adk avoids this by scoping the fixture to tests/integration/conftest.py. Can we limit the matrix to the e2e tests (non-autouse, or a marker) instead of session-wide autouse?
For core, langchain, llamaindex
| "pytest-aioresponses==0.3.0", | ||
| "pytest-asyncio==1.4.0", | ||
| "pytest-cov==7.1.0", | ||
| "numpy<2.2.0", |
There was a problem hiding this comment.
Could you add a comment explaining why numpy<2.2.0 is needed? Unexplained upper bounds tend to rot nobody knows when it's safe to lift, and since caps propagate, the moment another test dep requires numpy>=2.2 the resolver can't satisfy both and silently backtracks to old versions. A one-line note would keep this from becoming a mystery pin later.
994fbdf to
f81589a
Compare
Description
This PR upgrades the integration test harnesses across the SDK (
core,adk,langchain, andllamaindex) to support running integration tests against multiple Toolbox server configurations simultaneously.By parameterizing the
toolbox_server_urlfixture and updating the Cloud Build configurations, the CI will now spin up and test against two server instances:Motivation
This is a pure testing infrastructure change. It lays the groundwork for the upcoming
DRAFT-2026-v1protocol support, ensuring we can test backward compatibility (graceful protocol downgrading) against older servers without breaking any existing test cases.Note
The presubmits fail because in this PR, we configured the test suite to run against both stable (
5000) and draft (5001). It fails because it successfully negotiated the draft spec. This is fixed in the #703 branch (which is stacked on top of this PR).Note
This PR focusses only on parameterizing the tests. The fallback logic is broken at this stage which caused crashes against the stable server, so we added the
if "DRAFT" not in vfilter there temporarily. However, in the upcoming PR #703 that workaround is removed.