fix(pipeline): Pre-create ollama-internal network when LiteLLM enabled by stefanko-ch · Pull Request #617 · stefanko-ch/Nexus-Stack

stefanko-ch · 2026-05-24T07:23:00Z

Summary

LiteLLM's compose declares ollama-internal as an external Docker network so it can reach the Ollama stack's container by name when both stacks are enabled. If Ollama is NOT enabled, the network doesn't exist and docker compose up for LiteLLM aborts BEFORE the container is created:

network ollama-internal declared as external, but could not be found

The operator sees a Bad Gateway on https://litellm.<domain> because nothing is listening on the proxied port. This was the live state on production — confirmed by SSH'ing in and seeing compose ps empty + the error on manual docker compose up.

This is a hard dependency that defeats LiteLLM's core value proposition: it's meant to be an OpenAI-compatible proxy for any provider (OpenAI, Anthropic, Mistral, ...), so operator-supplied API keys should be sufficient — Ollama is just one of N possible backends.

Fix

Two parts, in two commits.

Part 1 — pre-create the network (initial commit)

Add an idempotent pre-compose block to compose_runner.render_remote_script:

docker network inspect ollama-internal >/dev/null 2>&1 || \
    docker network create --label managed-by=nexus-stack ollama-internal

Part 2 — symmetric ownership (`1923edf`, after Copilot review)

Originally the Ollama stack still declared ollama-internal as a compose-managed network (driver: bridge, name: ollama-internal) while LiteLLM declared it external: true. Joint LiteLLM+Ollama deployments then relied on Compose v2's tolerance for pre-existing networks — version-dependent behaviour that the merged fix never actually exercised on production.

Switch Ollama's compose to also declare ollama-internal as external: true + name: ollama-internal. Network ownership now lives entirely in the pre-create block — neither compose project tries to create it, and Compose's tolerance for pre-existing networks no longer matters.

Pre-create flag is renamed and broadened accordingly: parameter litellm_network_prep → ollama_internal_network_prep, default inference "litellm" in enabled → "litellm" in enabled OR "ollama" in enabled. Same pattern as the existing dify_storage_prep / metabase_storage_prep flags.

Behaviour matrix after the fix

Enabled stacks	Pre-create runs?	Network state at compose-up
neither	no	network absent (correct — no consumer)
litellm only	yes	network exists; LiteLLM joins as external; operator wires real-provider keys via `config.yaml`
ollama only	yes	network exists; Ollama joins as external
litellm + ollama (joint)	yes (rendered once)	network exists; both stacks join as external; cross-stack DNS `http://ollama:11434` resolves with no race on creation

Updated docs

Replaced the "Requires the ollama stack to be enabled" warning in stacks/litellm/docker-compose.yml with the actual behaviour (pipeline pre-creates the network).

Test plan

test_render_ollama_internal_network_prep_only_when_flagged — block present iff flag set
test_render_ollama_internal_network_prep_is_idempotent — inspect → || → create short-circuit shape locked
test_run_compose_up_network_prep_default_when_litellm_in_enabled — LiteLLM-only default inference
test_run_compose_up_network_prep_default_when_ollama_in_enabled — Ollama-only default inference (new in Part 2)
test_run_compose_up_network_prep_renders_once_in_joint_case — joint case renders exactly one pre-create block, not duplicated per matching service (new in Part 2)
test_run_compose_up_network_prep_omitted_when_neither_in_enabled
test_run_compose_up_network_prep_explicit_override_beats_enabled_inference
All 51 test_compose_runner.py tests pass
pre-commit run clean (ruff format, ruff check, mypy strict)

Live verification: manually applied the network-create on production server while debugging — LiteLLM now runs (https://litellm.nexus-stack.ch returns 200 instead of 502). Next deploy via this code will be self-healing.

Out of scope

A more general "external network declarations" mechanism. We have exactly one cross-stack network today (ollama-internal); over-engineering this would be premature abstraction.
Defaulting the LiteLLM config.yaml to omit the Ollama model entry when Ollama is disabled. Operator can edit the template; the existing comment in the template already explains the "Option B" wiring for real-provider keys.

LiteLLM's docker-compose declares `ollama-internal` as an external network so it can reach the Ollama stack's container by service name when both stacks run side-by-side. Without a network of that name already existing, `docker compose up` aborts with network ollama-internal declared as external, but could not be found BEFORE the container is even created — the operator gets a Bad Gateway on https://litellm.<domain>/ because nothing is listening on the proxied port. The hard dependency on Ollama being enabled defeats LiteLLM's core value proposition (OpenAI-compatible proxy for ANY provider — operator-supplied OPENAI_API_KEY / ANTHROPIC_API_KEY should be sufficient). Fix: add an idempotent pre-compose block to compose_runner that inspects-then-creates `ollama-internal` whenever LiteLLM is enabled, mirroring the existing dify_storage_prep / metabase_storage_prep flag pattern. When Ollama is ALSO enabled, its own compose joins the same network by name (already pinned via `name: ollama-internal` on both sides) — no behaviour change for the joint-enabled case. Tests pin: (a) block presence is gated on the flag, (b) the inspect-||-create shape is idempotent under set -euo pipefail, (c) run_compose_up defaults the flag to `"litellm" in enabled`, (d) explicit override beats inference.

github-actions · 2026-05-24T07:23:49Z

Coverage report — nexus_deploy

File	Stmts	Miss	Cover	Missing
__init__.py	5	0	100%
_remote.py	15	0	100%
cli.py	4	0	100%
compose_restart.py	40	0	100%
compose_runner.py	80	0	100%
config.py	136	0	100%
firewall.py	206	0	100%
gitea.py	587	55	90%	691–692, 697, 720–721, 733–734, 770–771, 783–784, 802–803, 828–829, 851–852, 863–864, 919–920, 928–929, 934, 940–941, 965–966, 999–1000, 1003, 1034–1035, 1076–1077, 1082–1083, 1123–1124, 1155–1156, 1179–1180, 1185–1186, 1285–1286, 1291–1292, 1766, 1770, 1791, 1819–1820, 1907
hetzner_capacity.py	126	0	100%
infisical.py	203	0	100%
kestra.py	176	3	98%	223, 427, 768
orchestrator.py	620	73	88%	456, 616, 628, 798–799, 804–805, 837–839, 848, 853–855, 866, 903–904, 909–910, 930, 965–966, 971–972, 980, 1005–1006, 1014, 1036–1037, 1042–1043, 1095–1096, 1101–1102, 1335, 1338, 1408, 1414–1415, 1420–1421, 1455, 1562–1563, 1568–1569, 1618–1619, 1624–1625, 1684, 1699, 1756, 1761–1762, 1767–1768, 1775, 1781, 1948, 1955, 1967–1968, 1973–1974, 1980, 1986, 2059–2060, 2081–2082
pipeline.py	207	13	93%	165–166, 350, 388, 468, 485, 580–581, 626–627, 717–718, 765
r2_tokens.py	113	2	98%	87, 150
s3_persistence.py	199	1	99%	315
s3_restore.py	103	0	100%
secret_sync.py	99	0	100%
seeder.py	98	0	100%
service_env.py	379	33	91%	1062, 1064–1066, 1074–1075, 1403–1406, 1411–1417, 1446–1450, 1466–1470, 1492, 1494, 1516–1517, 1524, 1633
services.py	310	1	99%	1939
setup.py	165	13	92%	238, 308–311, 319, 323–328, 344
ssh.py	56	0	100%
stack_sync.py	96	0	100%
tfvars.py	44	0	100%
tofu.py	86	0	100%
workspace_coords.py	101	0	100%
TOTAL	4254	194	95%

codecov · 2026-05-24T07:24:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR makes the LiteLLM stack resilient when Ollama is not enabled by ensuring the external Docker network ollama-internal exists before docker compose up runs, preventing LiteLLM startup from aborting due to a missing external network.

Changes:

Add an optional litellm_network_prep pre-compose step in compose_runner to inspect || create the ollama-internal Docker network.
Default litellm_network_prep to enabled when "litellm" is in the enabled-services list (with an explicit override escape hatch).
Add unit tests covering script rendering, default inference behavior, and explicit override semantics; update the LiteLLM compose comment to reflect the new pipeline behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`src/nexus_deploy/compose_runner.py`	Introduces `litellm_network_prep` and injects an idempotent `docker network inspect …
`tests/unit/test_compose_runner.py`	Adds regression tests validating that the network-prep block is rendered only when intended and that default/override behavior is correct.
`stacks/litellm/docker-compose.yml`	Updates comments to document that the deployment pipeline pre-creates the external `ollama-internal` network so LiteLLM can start without Ollama.

- docs/stacks/litellm.md: replace the "Ollama MUST be enabled" paragraph with the actual post-fix behaviour — the deploy pipeline pre-creates ollama-internal idempotently, so LiteLLM starts cleanly whether or not Ollama is also enabled. Remove the two-step "edits required for no-Ollama" instructions since the compose change is no longer needed. - tests/unit/test_compose_runner.py::test_render_litellm_network_prep_is_idempotent: tighten the assertion from a loose `\"||\" in script` (which could pass if any unrelated || appeared in the rendered bash) to a full inspect→||→create chain match. Whitespace and the bash line-continuation backslash are normalised so the test isn't brittle to renderer-side line-wrap tweaks.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

…c ownership Previously the Ollama stack declared `ollama-internal` as a compose-managed network (`driver: bridge`, `name: ollama-internal`) while LiteLLM declared it `external: true`. The pipeline's pre-create block was gated on LiteLLM-being-enabled, which left two ambiguous cases: * Joint LiteLLM + Ollama: pre-create runs, network exists with the `managed-by=nexus-stack` label. Ollama's compose-up then tries to treat it as a project-managed network. Modern Compose v2 tolerates this with a warning, but the behaviour is version-dependent and the joint case was never tested by the previous fix — the merged PR (#617) only verified the LiteLLM-only path on production. * Ollama-only future: would have been fine before, but breaks if Ollama's compose later moves to `external: true` for any reason. Switch Ollama's compose to also declare `ollama-internal` as `external: true` + `name: ollama-internal`. Network ownership now lives entirely in the pre-create block, neither compose project tries to create it, and Compose's tolerance for pre-existing networks no longer matters. Rename the gate parameter `litellm_network_prep` → `ollama_internal_network_prep` since it's no longer LiteLLM-specific, and broaden the default inference to fire when either stack is enabled. Added two new tests: * `test_run_compose_up_network_prep_default_when_ollama_in_enabled` locks the ollama-only firing path (regression guard for the inference change). * `test_run_compose_up_network_prep_renders_once_in_joint_case` confirms the joint case renders exactly one pre-create block, not a duplicate per matching service. Addresses Copilot review comment 3308563617 on PR #617.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

🤖 I have created a release *beep* *boop* --- ## [0.68.0](v0.67.0...v0.68.0) (2026-05-28) ### 🚀 Features * **stacks:** Add Evidence — SQL+markdown BI for analytics engineers ([#616](#616)) ([ac9ef64](ac9ef64)) ### 🐛 Bug Fixes * **hedgedoc:** Seeded admin account + R2 snapshot/restore persistence ([#619](#619)) ([a04d529](a04d529)) * **pipeline:** Pre-create ollama-internal network when LiteLLM enabled ([#617](#617)) ([3ea36be](3ea36be)) * **stacks:** Raise Metabase JVM heap to prevent OOM ([#620](#620)) ([70bb786](70bb786)) * **stacks:** Remove hardcoded credential fallbacks (audit C1-C3) ([#623](#623)) ([3002629](3002629)) * **stacks:** Resolve host-port 8888 collision (adminer ⇔ seaweedfs) ([#624](#624)) ([6831d2c](6831d2c)) * **tofu:** Validate cloudflare_account_id, cloudflare_zone_id, domain shapes ([#622](#622)) ([47f25c6](47f25c6)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 24, 2026 07:23

Copilot started reviewing on behalf of stefanko-ch May 24, 2026 07:23 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Comment thread stacks/litellm/docker-compose.yml Outdated

Comment thread tests/unit/test_compose_runner.py Outdated

stefanko-ch requested a review from Copilot May 24, 2026 07:58

Copilot started reviewing on behalf of stefanko-ch May 24, 2026 07:58 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

stefanko-ch requested a review from Copilot May 27, 2026 05:02

Copilot started reviewing on behalf of stefanko-ch May 27, 2026 05:02 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Comment thread src/nexus_deploy/compose_runner.py Outdated

stefanko-ch requested a review from Copilot May 27, 2026 06:38

Copilot started reviewing on behalf of stefanko-ch May 27, 2026 06:38 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Comment thread src/nexus_deploy/compose_runner.py

stefanko-ch merged commit 3ea36be into main May 27, 2026
9 checks passed

stefanko-ch deleted the fix/litellm-cross-stack-network branch May 27, 2026 06:55

github-actions Bot mentioned this pull request May 27, 2026

chore(main): release 0.68.0 #621

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pipeline): Pre-create ollama-internal network when LiteLLM enabled#617

fix(pipeline): Pre-create ollama-internal network when LiteLLM enabled#617
stefanko-ch merged 3 commits into
mainfrom
fix/litellm-cross-stack-network

stefanko-ch commented May 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 24, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stefanko-ch commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Part 1 — pre-create the network (initial commit)

Part 2 — symmetric ownership (1923edf, after Copilot review)

Behaviour matrix after the fix

Updated docs

Test plan

Out of scope

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 24, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stefanko-ch commented May 24, 2026 •

edited

Loading

Part 2 — symmetric ownership (`1923edf`, after Copilot review)

github-actions Bot commented May 24, 2026 •

edited

Loading