Crusoe Solutions Library

Introduction

This repository is a curated collection of solutions designed to deploy and manage infrastructure and other applications on Crusoe Cloud.

Prerequisites

These solutions are built for Crusoe Cloud, and will require you to install some (or all) of the following tools:

Terraform (and the Terraform Provider for Crusoe)
Crusoe CLI

Each solution README will also list its own specific prerequisites.

Solutions

Training

TorchTitan pre-training benchmark as a PyTorchJob for Crusoe Managed Kubernetes

TorchTitan is a widely-used reference Pytorch program for benchmarking the pretraining of Llama 3.1 and other models. This implementation is designed to be run as a PyTorchJob on CMK.

Inference

LangChain × Crusoe AI

The langchain-crusoe package integrates Crusoe's Managed Inference service with the LangChain ecosystem. It provides a ChatCrusoe class that wraps Crusoe's OpenAI-compatible API, giving you access to leading open-source models — including Llama 3.3, DeepSeek V3/R1, Qwen3, Gemma 3, and Kimi-K2 — through a standard LangChain interface.

Key capabilities:

Drop-in LangChain integration via BaseChatOpenAI — streaming, async, tool calling, and structured output work out of the box
LangSmith tracing with ls_provider="crusoe" for built-in observability
Project attribution via CRUSOE_PROJECT_ID header for multi-tenant usage tracking
Flexible configuration — API key, base URL, and project ID all configurable via environment variables

pip install langchain-crusoe

from langchain_crusoe import ChatCrusoe

llm = ChatCrusoe(model="meta-llama/Llama-3.3-70B-Instruct")
response = llm.invoke("Explain MemoryAlloy inference technology in one paragraph.")

See the langchain-crusoe README for full setup instructions and usage examples.

Serving HuggingFace Models on CMK with KServe

Deploy open-source LLMs from HuggingFace on Crusoe Managed Kubernetes (CMK) using KServe and vLLM, from a single-GPU endpoint to disaggregated prefill-decode across heterogeneous GPU pools. Supports both NVIDIA and AMD GPU clusters.

Key capabilities:

NVIDIA GPU serving — single-GPU, multi-node tensor parallelism, and disaggregated prefill-decode across A100/H100 node pools
AMD GPU serving — single-node and multi-node serving on MI300X using ROCm-based vLLM; supports large MoE models like MiniMax-M2
Model deployment — deploy any HuggingFace model with an OpenAI-compatible /v1/chat/completions endpoint; large models (70B+) use persistent storage backed by the Crusoe SSD CSI driver
One-command setup — make setup (NVIDIA) or make setup-amd (AMD) provisions the CMK cluster, installs the GPU operator and KServe, and creates the model namespace end-to-end

See the crusoe-kserve-example README for full setup instructions and usage examples.

Storage

Shared Volumes NFS Setup

This solution will install all the necessary drivers, packages and configurations to enable your Crusoe Cloud VMs to mount Crusoe Shared Volumes via NFS.

OCI Registry Cache for Google Artifact Registry

This is a working solution of an OCI Image registry, on Kubernetes, that acts as a cache for an upstream Google Artifact Registry.

Performance

Multi-VM NCCL Test

Crusoe Cloud GPU VMs are equipped with high-performance NVIDIA Mellanox InfiniBand (IB) networking. This solution will set up your VMs with necessary configurations to use the pre-loaded NCCL all_reduce test on your VMs and test InfiniBand networking performance.

Observability

Crusoe Managed Kubernetes logs to Google Cloud Logging

For your applications running on Crusoe Managed Kubernetes cluster, you can collect, filter and ship logs using Fluent Bit to send to a centralized location. This solution provides a set of Kubernetes manifest files needed to configure those logs to be sent to Google Cloud Logging using Fluent Bit.

Identity & Security

Crusoe to Splunk HEC Log Forwarder

Crusoe Cloud provides a 90-day history of who did what in your cloud, when, where, and with what result - also called Crusoe Audit Logs. This solution provides a sample Python tool to fetch those logs and forward them to a Splunk HTTP Event Collector (HEC).

Networking

/etc/hosts Pin

A daemon that resolves a hostname on a fixed interval and keeps the resulting A/AAAA records in /etc/hosts. Works around undesirable TTL cache values from intermediate DNS resolvers

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
assets		assets
b300-nccltest-cmk-mpijob		b300-nccltest-cmk-mpijob
cmk-as-oidc-provider		cmk-as-oidc-provider
create-vms-and-run-nccl-test		create-vms-and-run-nccl-test
crusoe-kserve-example		crusoe-kserve-example
crusoe-managed-kubernetes-logs-to-gcp		crusoe-managed-kubernetes-logs-to-gcp
crusoe-splunk-hec		crusoe-splunk-hec
crusoe-watch-agent		crusoe-watch-agent
etchosts-pin		etchosts-pin
ipsec-tunnel-cmk		ipsec-tunnel-cmk
jupyterhub-with-crusoe-auth-helmchart		jupyterhub-with-crusoe-auth-helmchart
langchain-crusoe		langchain-crusoe
nccl-allreduce-test-vms		nccl-allreduce-test-vms
registry-cache-gar		registry-cache-gar
shared-volumes-driver-setup		shared-volumes-driver-setup
slurm-custom-image		slurm-custom-image
torchtitan-llama3_1_8B-kubernetes-pytorchjob		torchtitan-llama3_1_8B-kubernetes-pytorchjob
.DS_Store		.DS_Store
CODEOWNERS		CODEOWNERS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crusoe Solutions Library

Table of contents

Introduction

Prerequisites

Solutions

Training

Inference

Storage

Performance

Observability

Identity & Security

Networking

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crusoe Solutions Library

Table of contents

Introduction

Prerequisites

Solutions

Training

Inference

Storage

Performance

Observability

Identity & Security

Networking

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages