Harness preview: customer-env 'ctr run' fails (exit 1) for any environmentArtifact.containerConfiguration.containerUri, including stock public images from the docs

## Summary

On the current preview release (`@aws/agentcore@preview`, CLI `v1.0.0-preview.1`), **any** harness that has a non-null `environmentArtifact.containerConfiguration.containerUri` fails at invoke time with:

```
runtimeClientError: Command '['/usr/local/bin/ctr', '-a', '/run/containerd/containerd.sock',
'run', '-d', '--net-host',
'--mount=type=bind,src=/mnt/data,dst=/mnt/data,options=rbind:rw',
'<containerUri>', 'customer-env', '/bin/sh', '-c', 'sleep infinity']'
returned non-zero exit status 1.
```

Harnesses that do **not** set `environmentArtifact` (i.e. use the default image) work fine in the same project, same region, same execution role template, same session format.

This looks like a service-side bug in the AgentCore Harness runtime's `customer-env` spawn path, not in the CLI. I'm filing it here per the preview bug-report channel in the README; feel free to transfer it to the right internal repo.

## Reproducer (no custom image, no custom CLI changes)

Minimal harness config — uses the exact image shown in the official docs at [`harness-environment.html`](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/harness-environment.html) ("Or reference a pre-built image: `public.ecr.aws/docker/library/node:slim`" — repro below uses `python:3.12-slim-bookworm` from the same public registry; `node:slim` reproduces identically):

```json
{
  "name": "probe",
  "model": {
    "provider": "bedrock",
    "modelId": "us.anthropic.claude-opus-4-5-20251101-v1:0"
  },
  "memory": { "name": "someMemory" },
  "containerUri": "public.ecr.aws/docker/library/python:3.12-slim-bookworm",
  "sessionStoragePath": "/mnt/data",
  "maxIterations": 10,
  "timeoutSeconds": 300,
  "authorizerType": "AWS_IAM"
}
```

```bash
agentcore deploy --yes
agentcore invoke --harness probe --session-id probe-diag-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --user-id me 'PROBE OK'
```

### Result

```
Error: Command '['/usr/local/bin/ctr', '-a', '/run/containerd/containerd.sock', 'run', '-d',
'--net-host', '--mount=type=bind,src=/mnt/data,dst=/mnt/data,options=rbind:rw',
'public.ecr.aws/docker/library/python:3.12-slim-bookworm', 'customer-env',
'/bin/sh', '-c', 'sleep infinity']' returned non-zero exit status 1.
```

### Control (works)

Identical harness config with the `containerUri` field removed — the same `invoke` call succeeds and the agent replies normally. `GetHarness` shows `environmentArtifact: null` on the working one and `environmentArtifact.containerConfiguration.containerUri: "public.ecr.aws/i0n3d3i5/harness-us-east-1:latest"` (the managed harness runtime image) on the working one — i.e., `environmentArtifact` is what changes behavior, and nothing below the service layer.

## Things I ruled out

- **Our image / our Dockerfile** — reproduces with the stock public Python image from AWS's own docs. Reproduces with `public.ecr.aws/docker/library/node:slim` as well.
- **Architecture mismatch** — both images are multi-arch manifests that include `linux/arm64`; the microVM host is arm64 (confirmed via `uname -m` on a working default-image harness → `aarch64`).
- **ECR pull permissions** — same error with public ECR (no creds needed) and with private ECR after attaching `ecr:BatchCheckLayerAvailability` / `ecr:GetDownloadUrlForLayer` / `ecr:BatchGetImage` to the harness execution role. The AgentCore runtime logs confirm `Pulled customer image: ...` succeeds before the `ctr run` call fails.
- **Missing mount destination** — adding `RUN mkdir -p /mnt/data` to a custom image changes nothing. Stock public images that have no `/mnt/data` baked in also fail, and `ctr`'s `rbind` option creates the destination if absent.
- **CLI** — the same harness created directly via `bedrock-agentcore-control` CreateHarness with the same JSON reproduces; so does one created by `@aws/agentcore@preview` with either [#929](https://github.com/aws/agentcore-cli/pull/929) or [#930](https://github.com/aws/agentcore-cli/pull/930) applied.
- **Session id format / length** — other harnesses in the same project work with the same session-id generator (33+ chars).

## What probably needs to happen service-side

`ctr run` exiting with status 1 is almost always one of: image fails to mount root fs, OCI config/user/capabilities rejected, container name in use, or snapshotter error. Any of them writes a specific message to stderr. That stderr is currently being **swallowed** by the harness runtime's error wrapper — the caller only ever sees `non-zero exit status 1`, with no detail. Fixing that alone would unblock customer self-diagnosis of every bug in this area.

### Two asks

1. Investigate why the `customer-env` `ctr run` fails for all non-default `containerConfiguration.containerUri` values on the current preview.
2. In the harness runtime's `subprocess.run(...)` wrapper around `ctr`, capture and re-raise (or log to the customer's log stream) `ctr`'s stdout+stderr when it exits non-zero, so future bugs in this area aren't opaque.

## Evidence

**Invoke log from the probe harness** (full log retained locally):

```
[16:13:22.533] INVOKE REQUEST (Session: probe-diag-20260422-161500-aryan-qrstuvwx12)
  runtimeArn: arn:aws:bedrock-agentcore:us-east-1:216989103356:harness/cic101pptagent_probe-JMU2AFlACj
  prompt: "Just reply with the text: PROBE OK"

[16:13:26.182] ERROR CONTEXT: stream error
[16:13:26.182] ERROR: runtimeClientError: Command '['/usr/local/bin/ctr', '-a',
  '/run/containerd/containerd.sock', 'run', '-d', '--net-host',
  '--mount=type=bind,src=/mnt/data,dst=/mnt/data,options=rbind:rw',
  'public.ecr.aws/docker/library/python:3.12-slim-bookworm', 'customer-env',
  '/bin/sh', '-c', 'sleep infinity']' returned non-zero exit status 1.
```

**GetHarness on the broken harness** (trimmed):

```json
{
  "harnessName": "cic101pptagent_probe",
  "status": "READY",
  "environmentArtifact": {
    "containerConfiguration": {
      "containerUri": "public.ecr.aws/docker/library/python:3.12-slim-bookworm"
    }
  },
  "environment": {
    "agentCoreRuntimeEnvironment": {
      "agentRuntimeArn": "arn:aws:bedrock-agentcore:us-east-1:216989103356:runtime/harness_cic101pptagent_probe-HQxcVa26D8",
      "networkConfiguration": { "networkMode": "PUBLIC" },
      "filesystemConfigurations": [{ "sessionStorage": { "mountPath": "/mnt/data" } }]
    }
  }
}
```

**Control (working) harness** — identical config minus `containerUri`:

- `GetHarness` → `environmentArtifact: null`
- Same invoke prompt → returns `PLAIN OK`
- `agentcore invoke --exec 'python3 --version && uname -m'` → `Python 3.10.19`, `aarch64`

## Environment

- CLI: `@aws/agentcore@preview` @ `v1.0.0-preview.1` (also repro'd on a local build of `main`)
- Region: `us-east-1`
- Account: `216989103356`
- AWS CLI v2 / node v20 / macOS arm64 host

## Related

- [#927](https://github.com/aws/agentcore-cli/issues/927) (root issue: CLI silently ignores `harness.json` `dockerfile` field)
- [#929](https://github.com/aws/agentcore-cli/pull/929) (AWS-authored fix for #927, recommended)
- [#930](https://github.com/aws/agentcore-cli/pull/930) (community fix for #927 via `DockerImageAsset`)

Neither PR changes the behavior reported here — both successfully build/push an image and create the harness; invoke still hits this service-side failure.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harness preview: customer-env 'ctr run' fails (exit 1) for any environmentArtifact.containerConfiguration.containerUri, including stock public images from the docs #931

Summary

Reproducer (no custom image, no custom CLI changes)

Result

Control (works)

Things I ruled out

What probably needs to happen service-side

Two asks

Evidence

Environment

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Harness preview: customer-env 'ctr run' fails (exit 1) for any environmentArtifact.containerConfiguration.containerUri, including stock public images from the docs #931

Description

Summary

Reproducer (no custom image, no custom CLI changes)

Result

Control (works)

Things I ruled out

What probably needs to happen service-side

Two asks

Evidence

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions