Summary
On the current preview release (@aws/agentcore@preview, CLI v1.0.0-preview.1), any harness that has a non-null environmentArtifact.containerConfiguration.containerUri fails at invoke time with:
runtimeClientError: Command '['/usr/local/bin/ctr', '-a', '/run/containerd/containerd.sock',
'run', '-d', '--net-host',
'--mount=type=bind,src=/mnt/data,dst=/mnt/data,options=rbind:rw',
'<containerUri>', 'customer-env', '/bin/sh', '-c', 'sleep infinity']'
returned non-zero exit status 1.
Harnesses that do not set environmentArtifact (i.e. use the default image) work fine in the same project, same region, same execution role template, same session format.
This looks like a service-side bug in the AgentCore Harness runtime's customer-env spawn path, not in the CLI. I'm filing it here per the preview bug-report channel in the README; feel free to transfer it to the right internal repo.
Reproducer (no custom image, no custom CLI changes)
Minimal harness config — uses the exact image shown in the official docs at harness-environment.html ("Or reference a pre-built image: public.ecr.aws/docker/library/node:slim" — repro below uses python:3.12-slim-bookworm from the same public registry; node:slim reproduces identically):
{
"name": "probe",
"model": {
"provider": "bedrock",
"modelId": "us.anthropic.claude-opus-4-5-20251101-v1:0"
},
"memory": { "name": "someMemory" },
"containerUri": "public.ecr.aws/docker/library/python:3.12-slim-bookworm",
"sessionStoragePath": "/mnt/data",
"maxIterations": 10,
"timeoutSeconds": 300,
"authorizerType": "AWS_IAM"
}
agentcore deploy --yes
agentcore invoke --harness probe --session-id probe-diag-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --user-id me 'PROBE OK'
Result
Error: Command '['/usr/local/bin/ctr', '-a', '/run/containerd/containerd.sock', 'run', '-d',
'--net-host', '--mount=type=bind,src=/mnt/data,dst=/mnt/data,options=rbind:rw',
'public.ecr.aws/docker/library/python:3.12-slim-bookworm', 'customer-env',
'/bin/sh', '-c', 'sleep infinity']' returned non-zero exit status 1.
Control (works)
Identical harness config with the containerUri field removed — the same invoke call succeeds and the agent replies normally. GetHarness shows environmentArtifact: null on the working one and environmentArtifact.containerConfiguration.containerUri: "public.ecr.aws/i0n3d3i5/harness-us-east-1:latest" (the managed harness runtime image) on the working one — i.e., environmentArtifact is what changes behavior, and nothing below the service layer.
Things I ruled out
- Our image / our Dockerfile — reproduces with the stock public Python image from AWS's own docs. Reproduces with
public.ecr.aws/docker/library/node:slim as well.
- Architecture mismatch — both images are multi-arch manifests that include
linux/arm64; the microVM host is arm64 (confirmed via uname -m on a working default-image harness → aarch64).
- ECR pull permissions — same error with public ECR (no creds needed) and with private ECR after attaching
ecr:BatchCheckLayerAvailability / ecr:GetDownloadUrlForLayer / ecr:BatchGetImage to the harness execution role. The AgentCore runtime logs confirm Pulled customer image: ... succeeds before the ctr run call fails.
- Missing mount destination — adding
RUN mkdir -p /mnt/data to a custom image changes nothing. Stock public images that have no /mnt/data baked in also fail, and ctr's rbind option creates the destination if absent.
- CLI — the same harness created directly via
bedrock-agentcore-control CreateHarness with the same JSON reproduces; so does one created by @aws/agentcore@preview with either #929 or #930 applied.
- Session id format / length — other harnesses in the same project work with the same session-id generator (33+ chars).
What probably needs to happen service-side
ctr run exiting with status 1 is almost always one of: image fails to mount root fs, OCI config/user/capabilities rejected, container name in use, or snapshotter error. Any of them writes a specific message to stderr. That stderr is currently being swallowed by the harness runtime's error wrapper — the caller only ever sees non-zero exit status 1, with no detail. Fixing that alone would unblock customer self-diagnosis of every bug in this area.
Two asks
- Investigate why the
customer-env ctr run fails for all non-default containerConfiguration.containerUri values on the current preview.
- In the harness runtime's
subprocess.run(...) wrapper around ctr, capture and re-raise (or log to the customer's log stream) ctr's stdout+stderr when it exits non-zero, so future bugs in this area aren't opaque.
Evidence
Invoke log from the probe harness (full log retained locally):
[16:13:22.533] INVOKE REQUEST (Session: probe-diag-20260422-161500-aryan-qrstuvwx12)
runtimeArn: arn:aws:bedrock-agentcore:us-east-1:216989103356:harness/cic101pptagent_probe-JMU2AFlACj
prompt: "Just reply with the text: PROBE OK"
[16:13:26.182] ERROR CONTEXT: stream error
[16:13:26.182] ERROR: runtimeClientError: Command '['/usr/local/bin/ctr', '-a',
'/run/containerd/containerd.sock', 'run', '-d', '--net-host',
'--mount=type=bind,src=/mnt/data,dst=/mnt/data,options=rbind:rw',
'public.ecr.aws/docker/library/python:3.12-slim-bookworm', 'customer-env',
'/bin/sh', '-c', 'sleep infinity']' returned non-zero exit status 1.
GetHarness on the broken harness (trimmed):
{
"harnessName": "cic101pptagent_probe",
"status": "READY",
"environmentArtifact": {
"containerConfiguration": {
"containerUri": "public.ecr.aws/docker/library/python:3.12-slim-bookworm"
}
},
"environment": {
"agentCoreRuntimeEnvironment": {
"agentRuntimeArn": "arn:aws:bedrock-agentcore:us-east-1:216989103356:runtime/harness_cic101pptagent_probe-HQxcVa26D8",
"networkConfiguration": { "networkMode": "PUBLIC" },
"filesystemConfigurations": [{ "sessionStorage": { "mountPath": "/mnt/data" } }]
}
}
}
Control (working) harness — identical config minus containerUri:
GetHarness → environmentArtifact: null
- Same invoke prompt → returns
PLAIN OK
agentcore invoke --exec 'python3 --version && uname -m' → Python 3.10.19, aarch64
Environment
- CLI:
@aws/agentcore@preview @ v1.0.0-preview.1 (also repro'd on a local build of main)
- Region:
us-east-1
- Account:
216989103356
- AWS CLI v2 / node v20 / macOS arm64 host
Related
Neither PR changes the behavior reported here — both successfully build/push an image and create the harness; invoke still hits this service-side failure.
Summary
On the current preview release (
@aws/agentcore@preview, CLIv1.0.0-preview.1), any harness that has a non-nullenvironmentArtifact.containerConfiguration.containerUrifails at invoke time with:Harnesses that do not set
environmentArtifact(i.e. use the default image) work fine in the same project, same region, same execution role template, same session format.This looks like a service-side bug in the AgentCore Harness runtime's
customer-envspawn path, not in the CLI. I'm filing it here per the preview bug-report channel in the README; feel free to transfer it to the right internal repo.Reproducer (no custom image, no custom CLI changes)
Minimal harness config — uses the exact image shown in the official docs at
harness-environment.html("Or reference a pre-built image:public.ecr.aws/docker/library/node:slim" — repro below usespython:3.12-slim-bookwormfrom the same public registry;node:slimreproduces identically):{ "name": "probe", "model": { "provider": "bedrock", "modelId": "us.anthropic.claude-opus-4-5-20251101-v1:0" }, "memory": { "name": "someMemory" }, "containerUri": "public.ecr.aws/docker/library/python:3.12-slim-bookworm", "sessionStoragePath": "/mnt/data", "maxIterations": 10, "timeoutSeconds": 300, "authorizerType": "AWS_IAM" }agentcore deploy --yes agentcore invoke --harness probe --session-id probe-diag-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --user-id me 'PROBE OK'Result
Control (works)
Identical harness config with the
containerUrifield removed — the sameinvokecall succeeds and the agent replies normally.GetHarnessshowsenvironmentArtifact: nullon the working one andenvironmentArtifact.containerConfiguration.containerUri: "public.ecr.aws/i0n3d3i5/harness-us-east-1:latest"(the managed harness runtime image) on the working one — i.e.,environmentArtifactis what changes behavior, and nothing below the service layer.Things I ruled out
public.ecr.aws/docker/library/node:slimas well.linux/arm64; the microVM host is arm64 (confirmed viauname -mon a working default-image harness →aarch64).ecr:BatchCheckLayerAvailability/ecr:GetDownloadUrlForLayer/ecr:BatchGetImageto the harness execution role. The AgentCore runtime logs confirmPulled customer image: ...succeeds before thectr runcall fails.RUN mkdir -p /mnt/datato a custom image changes nothing. Stock public images that have no/mnt/databaked in also fail, andctr'srbindoption creates the destination if absent.bedrock-agentcore-controlCreateHarness with the same JSON reproduces; so does one created by@aws/agentcore@previewwith either #929 or #930 applied.What probably needs to happen service-side
ctr runexiting with status 1 is almost always one of: image fails to mount root fs, OCI config/user/capabilities rejected, container name in use, or snapshotter error. Any of them writes a specific message to stderr. That stderr is currently being swallowed by the harness runtime's error wrapper — the caller only ever seesnon-zero exit status 1, with no detail. Fixing that alone would unblock customer self-diagnosis of every bug in this area.Two asks
customer-envctr runfails for all non-defaultcontainerConfiguration.containerUrivalues on the current preview.subprocess.run(...)wrapper aroundctr, capture and re-raise (or log to the customer's log stream)ctr's stdout+stderr when it exits non-zero, so future bugs in this area aren't opaque.Evidence
Invoke log from the probe harness (full log retained locally):
GetHarness on the broken harness (trimmed):
{ "harnessName": "cic101pptagent_probe", "status": "READY", "environmentArtifact": { "containerConfiguration": { "containerUri": "public.ecr.aws/docker/library/python:3.12-slim-bookworm" } }, "environment": { "agentCoreRuntimeEnvironment": { "agentRuntimeArn": "arn:aws:bedrock-agentcore:us-east-1:216989103356:runtime/harness_cic101pptagent_probe-HQxcVa26D8", "networkConfiguration": { "networkMode": "PUBLIC" }, "filesystemConfigurations": [{ "sessionStorage": { "mountPath": "/mnt/data" } }] } } }Control (working) harness — identical config minus
containerUri:GetHarness→environmentArtifact: nullPLAIN OKagentcore invoke --exec 'python3 --version && uname -m'→Python 3.10.19,aarch64Environment
@aws/agentcore@preview@v1.0.0-preview.1(also repro'd on a local build ofmain)us-east-1216989103356Related
harness.jsondockerfilefield)dockerfilefield in harness.json is silently ignored on deploy (no image is built or pushed) #927, recommended)dockerfilefield in harness.json is silently ignored on deploy (no image is built or pushed) #927 viaDockerImageAsset)Neither PR changes the behavior reported here — both successfully build/push an image and create the harness; invoke still hits this service-side failure.