Skip to content

fix(pipeline): Pre-create ollama-internal network when LiteLLM enabled#617

Merged
stefanko-ch merged 3 commits into
mainfrom
fix/litellm-cross-stack-network
May 27, 2026
Merged

fix(pipeline): Pre-create ollama-internal network when LiteLLM enabled#617
stefanko-ch merged 3 commits into
mainfrom
fix/litellm-cross-stack-network

Conversation

@stefanko-ch

@stefanko-ch stefanko-ch commented May 24, 2026

Copy link
Copy Markdown
Owner

Summary

LiteLLM's compose declares ollama-internal as an external Docker network so it can reach the Ollama stack's container by name when both stacks are enabled. If Ollama is NOT enabled, the network doesn't exist and docker compose up for LiteLLM aborts BEFORE the container is created:

network ollama-internal declared as external, but could not be found

The operator sees a Bad Gateway on https://litellm.<domain> because nothing is listening on the proxied port. This was the live state on production — confirmed by SSH'ing in and seeing compose ps empty + the error on manual docker compose up.

This is a hard dependency that defeats LiteLLM's core value proposition: it's meant to be an OpenAI-compatible proxy for any provider (OpenAI, Anthropic, Mistral, ...), so operator-supplied API keys should be sufficient — Ollama is just one of N possible backends.

Fix

Two parts, in two commits.

Part 1 — pre-create the network (initial commit)

Add an idempotent pre-compose block to compose_runner.render_remote_script:

docker network inspect ollama-internal >/dev/null 2>&1 || \
    docker network create --label managed-by=nexus-stack ollama-internal

Part 2 — symmetric ownership (1923edf, after Copilot review)

Originally the Ollama stack still declared ollama-internal as a compose-managed network (driver: bridge, name: ollama-internal) while LiteLLM declared it external: true. Joint LiteLLM+Ollama deployments then relied on Compose v2's tolerance for pre-existing networks — version-dependent behaviour that the merged fix never actually exercised on production.

Switch Ollama's compose to also declare ollama-internal as external: true + name: ollama-internal. Network ownership now lives entirely in the pre-create block — neither compose project tries to create it, and Compose's tolerance for pre-existing networks no longer matters.

Pre-create flag is renamed and broadened accordingly: parameter litellm_network_prepollama_internal_network_prep, default inference "litellm" in enabled"litellm" in enabled OR "ollama" in enabled. Same pattern as the existing dify_storage_prep / metabase_storage_prep flags.

Behaviour matrix after the fix

Enabled stacks Pre-create runs? Network state at compose-up
neither no network absent (correct — no consumer)
litellm only yes network exists; LiteLLM joins as external; operator wires real-provider keys via config.yaml
ollama only yes network exists; Ollama joins as external
litellm + ollama (joint) yes (rendered once) network exists; both stacks join as external; cross-stack DNS http://ollama:11434 resolves with no race on creation

Updated docs

Replaced the "Requires the ollama stack to be enabled" warning in stacks/litellm/docker-compose.yml with the actual behaviour (pipeline pre-creates the network).

Test plan

  • test_render_ollama_internal_network_prep_only_when_flagged — block present iff flag set
  • test_render_ollama_internal_network_prep_is_idempotentinspect → || → create short-circuit shape locked
  • test_run_compose_up_network_prep_default_when_litellm_in_enabled — LiteLLM-only default inference
  • test_run_compose_up_network_prep_default_when_ollama_in_enabled — Ollama-only default inference (new in Part 2)
  • test_run_compose_up_network_prep_renders_once_in_joint_case — joint case renders exactly one pre-create block, not duplicated per matching service (new in Part 2)
  • test_run_compose_up_network_prep_omitted_when_neither_in_enabled
  • test_run_compose_up_network_prep_explicit_override_beats_enabled_inference
  • All 51 test_compose_runner.py tests pass
  • pre-commit run clean (ruff format, ruff check, mypy strict)

Live verification: manually applied the network-create on production server while debugging — LiteLLM now runs (https://litellm.nexus-stack.ch returns 200 instead of 502). Next deploy via this code will be self-healing.

Out of scope

  • A more general "external network declarations" mechanism. We have exactly one cross-stack network today (ollama-internal); over-engineering this would be premature abstraction.
  • Defaulting the LiteLLM config.yaml to omit the Ollama model entry when Ollama is disabled. Operator can edit the template; the existing comment in the template already explains the "Option B" wiring for real-provider keys.

LiteLLM's docker-compose declares `ollama-internal` as an external
network so it can reach the Ollama stack's container by service name
when both stacks run side-by-side. Without a network of that name
already existing, `docker compose up` aborts with

    network ollama-internal declared as external, but could not be
    found

BEFORE the container is even created — the operator gets a Bad
Gateway on https://litellm.<domain>/ because nothing is listening
on the proxied port. The hard dependency on Ollama being enabled
defeats LiteLLM's core value proposition (OpenAI-compatible proxy
for ANY provider — operator-supplied OPENAI_API_KEY / ANTHROPIC_API_KEY
should be sufficient).

Fix: add an idempotent pre-compose block to compose_runner that
inspects-then-creates `ollama-internal` whenever LiteLLM is enabled,
mirroring the existing dify_storage_prep / metabase_storage_prep
flag pattern. When Ollama is ALSO enabled, its own compose joins
the same network by name (already pinned via `name: ollama-internal`
on both sides) — no behaviour change for the joint-enabled case.

Tests pin: (a) block presence is gated on the flag, (b) the
inspect-||-create shape is idempotent under set -euo pipefail,
(c) run_compose_up defaults the flag to `"litellm" in enabled`,
(d) explicit override beats inference.
Copilot AI review requested due to automatic review settings May 24, 2026 07:23
@github-actions

github-actions Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

coverage

Coverage report — nexus_deploy
FileStmtsMissCoverMissing
__init__.py50100% 
_remote.py150100% 
cli.py40100% 
compose_restart.py400100% 
compose_runner.py800100% 
config.py1360100% 
firewall.py2060100% 
gitea.py5875590%691–692, 697, 720–721, 733–734, 770–771, 783–784, 802–803, 828–829, 851–852, 863–864, 919–920, 928–929, 934, 940–941, 965–966, 999–1000, 1003, 1034–1035, 1076–1077, 1082–1083, 1123–1124, 1155–1156, 1179–1180, 1185–1186, 1285–1286, 1291–1292, 1766, 1770, 1791, 1819–1820, 1907
hetzner_capacity.py1260100% 
infisical.py2030100% 
kestra.py176398%223, 427, 768
orchestrator.py6207388%456, 616, 628, 798–799, 804–805, 837–839, 848, 853–855, 866, 903–904, 909–910, 930, 965–966, 971–972, 980, 1005–1006, 1014, 1036–1037, 1042–1043, 1095–1096, 1101–1102, 1335, 1338, 1408, 1414–1415, 1420–1421, 1455, 1562–1563, 1568–1569, 1618–1619, 1624–1625, 1684, 1699, 1756, 1761–1762, 1767–1768, 1775, 1781, 1948, 1955, 1967–1968, 1973–1974, 1980, 1986, 2059–2060, 2081–2082
pipeline.py2071393%165–166, 350, 388, 468, 485, 580–581, 626–627, 717–718, 765
r2_tokens.py113298%87, 150
s3_persistence.py199199%315
s3_restore.py1030100% 
secret_sync.py990100% 
seeder.py980100% 
service_env.py3793391%1062, 1064–1066, 1074–1075, 1403–1406, 1411–1417, 1446–1450, 1466–1470, 1492, 1494, 1516–1517, 1524, 1633
services.py310199%1939
setup.py1651392%238, 308–311, 319, 323–328, 344
ssh.py560100% 
stack_sync.py960100% 
tfvars.py440100% 
tofu.py860100% 
workspace_coords.py1010100% 
TOTAL425419495% 

@codecov

codecov Bot commented May 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the LiteLLM stack resilient when Ollama is not enabled by ensuring the external Docker network ollama-internal exists before docker compose up runs, preventing LiteLLM startup from aborting due to a missing external network.

Changes:

  • Add an optional litellm_network_prep pre-compose step in compose_runner to inspect || create the ollama-internal Docker network.
  • Default litellm_network_prep to enabled when "litellm" is in the enabled-services list (with an explicit override escape hatch).
  • Add unit tests covering script rendering, default inference behavior, and explicit override semantics; update the LiteLLM compose comment to reflect the new pipeline behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/nexus_deploy/compose_runner.py Introduces litellm_network_prep and injects an idempotent `docker network inspect …
tests/unit/test_compose_runner.py Adds regression tests validating that the network-prep block is rendered only when intended and that default/override behavior is correct.
stacks/litellm/docker-compose.yml Updates comments to document that the deployment pipeline pre-creates the external ollama-internal network so LiteLLM can start without Ollama.

Comment thread stacks/litellm/docker-compose.yml Outdated
Comment thread tests/unit/test_compose_runner.py Outdated
- docs/stacks/litellm.md: replace the "Ollama MUST be enabled"
  paragraph with the actual post-fix behaviour — the deploy pipeline
  pre-creates ollama-internal idempotently, so LiteLLM starts
  cleanly whether or not Ollama is also enabled. Remove the
  two-step "edits required for no-Ollama" instructions since the
  compose change is no longer needed.

- tests/unit/test_compose_runner.py::test_render_litellm_network_prep_is_idempotent:
  tighten the assertion from a loose `\"||\" in script` (which could
  pass if any unrelated || appeared in the rendered bash) to a
  full inspect→||→create chain match. Whitespace and the bash
  line-continuation backslash are normalised so the test isn't
  brittle to renderer-side line-wrap tweaks.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/nexus_deploy/compose_runner.py Outdated
…c ownership

Previously the Ollama stack declared `ollama-internal` as a
compose-managed network (`driver: bridge`, `name: ollama-internal`)
while LiteLLM declared it `external: true`. The pipeline's pre-create
block was gated on LiteLLM-being-enabled, which left two ambiguous
cases:

  * Joint LiteLLM + Ollama: pre-create runs, network exists with the
    `managed-by=nexus-stack` label. Ollama's compose-up then tries to
    treat it as a project-managed network. Modern Compose v2 tolerates
    this with a warning, but the behaviour is version-dependent and
    the joint case was never tested by the previous fix — the merged
    PR (#617) only verified the LiteLLM-only path on production.
  * Ollama-only future: would have been fine before, but breaks if
    Ollama's compose later moves to `external: true` for any reason.

Switch Ollama's compose to also declare `ollama-internal` as
`external: true` + `name: ollama-internal`. Network ownership now lives
entirely in the pre-create block, neither compose project tries to
create it, and Compose's tolerance for pre-existing networks no longer
matters. Rename the gate parameter `litellm_network_prep` →
`ollama_internal_network_prep` since it's no longer LiteLLM-specific,
and broaden the default inference to fire when either stack is enabled.

Added two new tests:
  * `test_run_compose_up_network_prep_default_when_ollama_in_enabled`
    locks the ollama-only firing path (regression guard for the
    inference change).
  * `test_run_compose_up_network_prep_renders_once_in_joint_case`
    confirms the joint case renders exactly one pre-create block, not
    a duplicate per matching service.

Addresses Copilot review comment 3308563617 on PR #617.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/nexus_deploy/compose_runner.py
@stefanko-ch stefanko-ch merged commit 3ea36be into main May 27, 2026
9 checks passed
@stefanko-ch stefanko-ch deleted the fix/litellm-cross-stack-network branch May 27, 2026 06:55
stefanko-ch pushed a commit that referenced this pull request May 28, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.68.0](v0.67.0...v0.68.0)
(2026-05-28)


### 🚀 Features

* **stacks:** Add Evidence — SQL+markdown BI for analytics engineers
([#616](#616))
([ac9ef64](ac9ef64))


### 🐛 Bug Fixes

* **hedgedoc:** Seeded admin account + R2 snapshot/restore persistence
([#619](#619))
([a04d529](a04d529))
* **pipeline:** Pre-create ollama-internal network when LiteLLM enabled
([#617](#617))
([3ea36be](3ea36be))
* **stacks:** Raise Metabase JVM heap to prevent OOM
([#620](#620))
([70bb786](70bb786))
* **stacks:** Remove hardcoded credential fallbacks (audit C1-C3)
([#623](#623))
([3002629](3002629))
* **stacks:** Resolve host-port 8888 collision (adminer ⇔ seaweedfs)
([#624](#624))
([6831d2c](6831d2c))
* **tofu:** Validate cloudflare_account_id, cloudflare_zone_id, domain
shapes ([#622](#622))
([47f25c6](47f25c6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants