Internal-only ingress LB returns 'Container App is stopped or does not exist' for a healthy replica (North Europe, Workload Profiles Consumption)

SUMMARY

From a Regional VNet-integrated Azure App Service in North Europe, all calls to a Container App configured with internal-only ingress in the same region return HTTP 404 with the body "Container App is stopped or does not exist", despite the Container App being verifiably running and its application returning 200 on direct probes. After switching the Container App to external ingress plus an IP allowlist of the App Service VNet CIDR, the exact same call succeeds. Nothing else in the path changed. This indicates the internal-only ingress load balancer is failing to route to a healthy replica.

ENVIRONMENT

Region: North Europe.
Subscription: c75b0d6c-4736-48a9-87ad-8a20215ff2af, Artfluence main.

Container App
  name: clip-service
  resource group: rg-artfluence-northeurope-dev
  Container App Environment: cae-artfluence-northeurope-dev, Workload Profiles Consumption
  FQDN: clip-service.blackdesert-d3aaefa8.northeurope.azurecontainerapps.io
  Internal IP when internal-only: 10.20.0.114
  Replica state during repro: Running, Ready, all probes green.

App Service, the calling client
  name: app-artfluenceapi-dev-ne-01
  resource group: rg-artfluence-northeurope-dev
  plan: asp-artfluence-dev-ne-01, B1 tier
  VNet integration: enabled, into snet-appservice-dev, 10.20.0.0/24
  VNet: vnet-artfluence-northeurope-dev, 10.20.0.0/16
  Same VNet as the CAE.

HOW WE PROVED THE CAE INFRA IS THE FAULT NOT THE APPLICATION

1. With clip-service set to internal-only ingress, transport Auto, external false, an HTTPS call from inside the App Service Kudu SSH console targeting the internal FQDN at path /health consistently returns HTTP 404 with body "Container App is stopped or does not exist".

2. From the same Kudu SSH console at the same instant, a direct HTTPS call to the replica IP using curl with --resolve to bind the internal FQDN to 10.20.0.114 succeeded with HTTP 200 and body status ok, device cpu. The application is healthy, the network path to the replica is open, the private DNS resolves correctly, but the CAE internal ingress load balancer at the FQDN official address is not forwarding to the replica.

3. We confirmed all of the following are NOT the cause: no Deny NSG rules touch the path. Container App probes Startup Liveness Readiness all green for over 10 minutes before the test. Application listening port confirmed as 8000 inside the container, matches ingress targetPort. Private DNS zone privatelink.northeurope.azurecontainerapps.io resolves to the CAE internal LB IP correctly. Fresh revision and fresh replica reproduce the same symptom.

4. As a control we switched clip-service to external ingress with an IP restriction of Allow 10.20.0.0/16, Deny all else. With NO OTHER CHANGE on either side, the same Kudu-side call to the external FQDN returns HTTP 200 immediately. Laptop calls correctly return HTTP 403, proving the allowlist works.

The only differing factor between the 404 stopped or does not exist error and the 200 ok response is whether the Container App ingress is internal-only or external. Traffic origin, replica, application, private DNS, VNet, and workload were all identical across both tests.

IMPACT

We had to ship the external-plus-allowlist workaround to unblock active development. Concerns:
- clip-service is reachable from any IP that can spoof being inside the /16 CIDR. We mitigate with a shared-secret CLIP_API_KEY header but defence in depth is reduced.
- It blocks our dev and prod cleanup story. Once prod needs the same access we have to either widen the allowlist further or VNet-peer prod in. Neither matches the documented Microsoft pattern of internal-only Container Apps inside a VNet reached via Regional VNet integration from App Service.

WHAT WE NEED FROM SUPPORT

1. Confirm whether this is a known issue with the internal-ingress load balancer in North Europe, or in Workload Profiles Consumption environments more generally, and if so the workaround or target fix ETA.

2. If it is not a known issue, please pull the CAE internal LB telemetry and Envoy logs for the time window 2026-05-12 around 14 UTC through 2026-05-13 around 17 UTC. We have repeat reproductions in that window and can re-run on demand.

3. Confirm that, after switching back to internal-only ingress, no action on our side, such as a specific revision spec or flag or NSG adjustment or Private DNS zone tweak, would have made the request succeed. We want to rule out misconfiguration on our end before accepting this as a platform bug.

REPRO READINESS

We can reproduce on demand by flipping clip-service ingress back to internal-only. We will keep the current external-plus-allowlist workaround in place for normal operation, and coordinate a time window with the support engineer to switch to internal-only for the repro.

CONTACT

Primary contact Julien, julien@artfluence.app Email-preferred for asynchronous back-and-forth. Happy to do screen share on demand for the repro.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal-only ingress LB returns 'Container App is stopped or does not exist' for a healthy replica (North Europe, Workload Profiles Consumption) #1714

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Internal-only ingress LB returns 'Container App is stopped or does not exist' for a healthy replica (North Europe, Workload Profiles Consumption) #1714

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions