Bump inspect-ai from 0.3.220 to 0.3.223 by dependabot[bot] · Pull Request #83 · VectorInstitute/inspect-mlflow

dependabot · 2026-05-19T04:03:57Z

Bumps inspect-ai from 0.3.220 to 0.3.223.

Changelog

0.3.223 (18 May 2026)

Config: Add inspect log export-config command to export a run config from an existing log file.

Anthropic: Skip thinking blocks when placing lookback cache_control.

AsyncFilesystem: Add get_file() and exists() methods.

Inspect View: Fix regression where switching task tabs would reload log, causing latency.

0.3.222 (16 May 2026)

Scanners: Declare Scanner import in a way that's compatible with pyright type checking.

0.3.221 (16 May 2026)

OpenAI: Add GPT 5.5 as computer use model and exclude 'chat' and 'instant' models from computer use.

OpenAI Compatible: Parse OpenRouter-style reasoning_details in OpenAI-compatible responses.

Anthropic: Capture extra_body fields from Message response.

OpenRouter: Enable Anthropic prompt caching by default for openrouter/anthropic/* models.

VLLM: Preserve dotted vLLM server arg keys.

Bedrock: Drop unsupported sampling params for Claude 4.7+.

Bedrock: Route top_k correctly for Nova models.

SageMaker: Add prompt_logprobs support in chat mode via GenerateConfig, parse prompt logprobs from completion mode responses, enabling perplexity() and target_perplexity() scorers end-to-end.

Model API: --adaptive-connections is now enabled by default (defaults to 100 per model connection).

Model API: Cache lookup of openai and anthropic packages at sample initialization.

Model API: Remove semaphore around calls to count_tokens() (they are already retried and gated by max_samples).

Model Info: Cache model info database lookup results so that failed lookups don't repeat fuzzy model name search.

Limits: Added suspend_token_limit() context manager for suspending token tracking and limit enforcement within a scope.

Datasets: hf_dataset retries transient Hugging Face errors (rate limits, timeouts, Hub-unreachable cache misses) up to 3 times (5 in CI) with exponential backoff. Pass retry=False to disable.

Datasets: Reject sample ids that collide under str() coercion.

Datasets: Treat NaN from HuggingFace dataset as None is treated (converted to "").

Datasets: Use HuggingFace revision in cache key for downloaded datasets.

Datasets: Propagate hf_dataset(..., shuffle=True) to EvalDataset.shuffled.

Tool Calling: Raise a ToolError if there is a null byte in command input.

Scoring: Store and aggregate results for cancelled eval runs.

Scoring: match(numeric=True) no longer matches digit-substrings (e.g. target 5 against 25); now correctly handles negative, decimal, and scientific-notation targets, and recognises unicode-formatted numbers (unicode minus, vulgar fractions like ½, Chinese numerals, fullwidth digits) in both targets and model output.

Scoring: match(numeric=True, location="exact") is now strict — values like "5 some text" no longer match target "5".

Analysis: Use score reducer in evals_df() column name when there are multiple reducers.

Hooks: Cache list of registered hooks (invalidate cache on registry_add()).

Config: Add --run-config option to inspect eval for single-file run configuration.

Eval Set: Run Inspect Scout scanners over each task's logs as part of eval_set (CLI --scanner / ScannerConfig). Scans incrementally as logs land, reuses prior results across resumes, and renders progress alongside the existing eval view.

Eval Set: Fail fast with "No inspect tasks were found at the specified paths." when a task spec resolves to nothing (e.g. uninstalled package); previously crashed with IndexError inside resolve_tasks after passing an empty task list to eval.

Eval Set: Add score_display argument to eval_set() function.

Eval Log: Preflight ETag check on S3 conditional write (required for S3 backends that don't implement conditional writes).

Eval Log: Make log_file_info() robust to non-standard filenames; added log_file_info_async() / log_files_from_ls_async() so view-server header reads don't block the event loop.

Imports: Delay importing heavier dependencies (e.g. s3fs, boto3, numpy, rich.markdown) for faster imports of inspect_ai module.

Logging: INSPECT_PY_LOGGER_FORMAT env var (rich/plain/json) for non-TTY-friendly single-line console logs.

Docker Compose: accept depends_on / pull_policy / privileged / shm_size / ulimits in ComposeService.

Task Display: Honor terminal COLUMNS and LINES for dumb terminals.

Validation: Reject unknown GenerateConfig fields with an error.

Memory: Log condensing no longer retains unchanged JSON copies in long evals.

Memory: Don't retain message lists in buffer DB (memory leak on long agentic samples).

... (truncated)

Commits

90a7b1c Update CHANGELOG for version 0.3.223
df27f82 Bump to latest (#3970)
60c6326 AsyncFilesystem: add get_file and exists (#3964)
644f46f Merge branch 'kaifronsdal-fix/cache-control-skip-thinking'
61ad3ec changelog / lint
c7a73dc Merge branch 'main' into fix/cache-control-skip-thinking
c4a8f2f Add inspect log export-config command (#3959)
a634db7 Skip thinking blocks when placing lookback cache_control
eea5a68 Mount transcript search API from Scout when Scout is installed (#3947)
7b0f474 docs: document running vLLM solver and judge on separate servers (#3957)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [inspect-ai](https://github.com/UKGovernmentBEIS/inspect_ai) from 0.3.220 to 0.3.223. - [Changelog](https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/CHANGELOG.md) - [Commits](UKGovernmentBEIS/inspect_ai@0.3.220...0.3.223) --- updated-dependencies: - dependency-name: inspect-ai dependency-version: 0.3.223 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot added dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code labels May 19, 2026

amrit110 merged commit a2ae98e into main May 20, 2026
1 check passed

amrit110 deleted the dependabot/uv/inspect-ai-0.3.223 branch May 20, 2026 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump inspect-ai from 0.3.220 to 0.3.223#83

Bump inspect-ai from 0.3.220 to 0.3.223#83
amrit110 merged 1 commit into
mainfrom
dependabot/uv/inspect-ai-0.3.223

dependabot Bot commented on behalf of github May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github May 19, 2026

0.3.223 (18 May 2026)

0.3.222 (16 May 2026)

0.3.221 (16 May 2026)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant