Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
5b67cf2
feat(dflash): align server props and thinking controls
easel May 22, 2026
2560086
feat(lucebox): add release CLI and Docker prebuilds
easel May 22, 2026
84ddd04
feat(lucebox): add benchmark and profile evidence suite
easel May 22, 2026
3f600f9
feat(cpp-server): port /props + /v1/messages/count_tokens
easel May 22, 2026
8d6ff04
feat(cpp-server): thinking-budget surface (--think-max-tokens, finish…
easel May 22, 2026
b4b46a4
feat(cpp-server): Level 1 thinking-budget reprompt + codex review fixes
easel May 22, 2026
14f51c3
fix(bench): cap digit-run length in semantic_hint to avoid Python int…
easel May 22, 2026
2a89e62
fix(cpp-server): pre-open <think> when enable_thinking=true (Qwen3.6)
easel May 23, 2026
ed8cbc5
feat(dflash): add bench_long_ctx.py long-context KV sweep
easel Apr 27, 2026
3e8323d
fix(cpp-server): phase-2 gate diagnostics + entrypoint draft-path + m…
easel May 23, 2026
6e66891
fix(lucebox): raise ds4_eval max_tokens 4096 -> 16000 (upstream default)
easel May 23, 2026
5a1d79e
build(bake): expose DFLASH_CUDA_ARCHES as bake variable
easel May 23, 2026
9017c92
feat(lucebox): expose --think-max-tokens via DFLASH_THINK_MAX, defaul…
easel May 23, 2026
53c85c5
fix(entrypoint): priority-order draft resolution to avoid safetensors
easel May 23, 2026
a44a7bc
fix(cpp-server): forward Qwen3.6 <think>/</think> tokens to emitter
easel May 23, 2026
aacb9bb
docs(thinking-budget): document multi-dialect response aliasing
easel May 23, 2026
5059fea
feat(cpp-server): Level 2 in-process force-close on AR path
easel May 23, 2026
43ad46b
chore(dflash): drop Python server.py and tests that import from it
easel May 23, 2026
c7c4bcb
feat(cpp-server,bench): emit reasoning aliases per multi-dialect spec
easel May 23, 2026
3c8a218
Merge origin/main into integration/props-uv-squared-clean
easel May 23, 2026
a1144dd
perf(cpp-server,L2): spec-decode tail-off to AR instead of full bypass
easel May 23, 2026
430c413
bench(scripts): add sweep_ds4_2case.sh — portable ds4-eval 2-case swe…
easel May 23, 2026
8404bce
docs(tuning-snapshots): L2 force-close partial bench — case 1 FAIL→PASS
easel May 23, 2026
6bb6ba1
fix(cpp-server): finish_reason=length + #ifdef out phase2-gate debug
easel May 23, 2026
932fe6d
data(ds4-eval): sindri post-merge sweep snapshot — config A-baseline-…
easel May 23, 2026
05015f5
fix(scripts): sweep_ds4_2case.sh — fetch+rebase before push to avoid …
easel May 23, 2026
408f054
data(ds4-eval): sindri post-merge sweep snapshot — config B-kv-mix
easel May 23, 2026
c39326a
fix(cpp-server,L2): spec-decode→AR tail-off correctness (audit #47)
easel May 24, 2026
34157d2
feat(cpp-server,L2): multi-token close-tag support in BudgetHook
easel May 24, 2026
fea7aae
fix(cpp-server): restore PrefixCache::stats() + full_stats() (regress…
easel May 24, 2026
4685f0e
ci(docker): ship dflash_server binary in cuda12 image
easel May 24, 2026
87bd540
feat(cpp-server,L2): port budget force-close to gemma4 + laguna backends
easel May 24, 2026
5c785f0
fix(cpp-server,L2): force-close budget uses generated-since-entry frame
easel May 24, 2026
6e11681
fix(cpp-server): thinking_tokens / reasoning_tokens use emitter mode-…
easel May 24, 2026
7e5435d
fix(cpp-server): PrefixCache /props snapshot uses atomic hit counters
easel May 24, 2026
b86342d
fix(cpp-server,L2): gemma4 force-close uses generated-since-entry frame
easel May 24, 2026
cfe28e7
fix(cpp-server): extract normalize_anthropic_system helper
easel May 24, 2026
8c37f3c
docs(thinking-budget): rewrite as standalone design spec
easel May 24, 2026
ca09f64
feat(cpp-server): thinking-budget v2 — model-card defaults + 5-tier e…
easel May 24, 2026
7a22c5e
fix(cpp-server): /props + /v1/models advertise all 5 effort tiers + m…
easel May 24, 2026
487de20
fix(cpp-server): effort tiers cap at max_ctx, not default_max_tokens
easel May 24, 2026
7398103
feat(model_cards): add Laguna-XS.2 sidecar (non-reasoning code MoE, s…
easel May 24, 2026
4380e4d
ci(docker): bundle share/model_cards/ sidecars into runtime image
easel May 24, 2026
e183780
data(ds4-eval): cross-backend snapshot — openrouter, vidar, bragi swe…
easel May 24, 2026
e0af2ce
docs(model_cards): add Laguna download_urls to sidecar
easel May 24, 2026
0ed4444
data(ds4-eval): bragi no-think sweep — 5/8 PASS, beats global-budget 2/8
easel May 24, 2026
15ec3fc
docs(tuning-snapshots): update SUMMARY with no-think final results
easel May 24, 2026
ff2784b
docs(specs): add model-cards.md + props-endpoint.md
easel May 24, 2026
a9a16e8
feat(cpp-server): address codex r1/r2 P2 items + add sidecar JSON Schema
easel May 24, 2026
65a1148
fix(bench-http): --no-think sends thinking={type:disabled} explicitly
easel May 24, 2026
ab4d6a8
docs(specs): OpenAPI 3.1 for /props + spec accuracy review
easel May 24, 2026
7e12454
feat(cpp-server,/props): expose model_card wholesale + new budget_env…
easel May 24, 2026
18dd63a
feat(bench): add --area forge for tool-calling eval via forge-guardrails
easel May 24, 2026
8cf241c
merge: origin/main into integration/props-uv-squared-clean
easel May 24, 2026
9b13474
data(forge): OpenRouter Qwen3.6-27b baseline — 28/30 = 93.3% (--no-th…
easel May 24, 2026
449dcfe
docs(tuning-snapshots): add forge --area baseline to SUMMARY (28/30 =…
easel May 24, 2026
79083c5
feat(bench-http): --parallel N runs cases concurrently
easel May 24, 2026
3b80fa8
feat(cpp-server): emit usage.timings.{prefill_ms,decode_ms,decode_tok…
easel May 24, 2026
b838d33
refactor(bench): drop forge-guardrails pypi dep, inline forge.* into …
easel May 24, 2026
1933cd8
feat(model_cards): add Gemma 4 sidecars (26B-A4B-it MoE + 31B-it dense)
easel May 24, 2026
109cebd
feat(cpp-server): thinking-budget v2 + multi-dialect reasoning aliases
easel May 24, 2026
51a208e
data(tuning-snapshots): bragi nothink full 92 + OR 5-model comparison
easel May 24, 2026
ab7c006
feat(bench-http): per-row timed_out flag + prefill tok/s + thinking-t…
easel May 24, 2026
3621d49
docs(run-request): luce-dflash --think 92-case sweep at think_max 4k …
easel May 24, 2026
428ce3e
data(openrouter): Laguna --area forge (28/30 → 5/30 = 16.7%) + ds4-ev…
easel May 24, 2026
56a4355
feat(bench-http): extract reasoning_tokens from OpenAI-style usage de…
easel May 24, 2026
d7d0684
data(openrouter): Laguna ds4-eval fixed-slug — 53/92 = 57.6% (--no-th…
easel May 24, 2026
3d27801
data(openrouter): forge --area cross-model — DS4f 100%, Gemma 4 86.7%…
easel May 24, 2026
27f169d
docs(run-request): sindri RTX 3090 Ti qwen3.6 --no-think full-92
easel May 24, 2026
4ec18f6
data(openrouter): qwen3.6-27b ds4-eval --no-think full 92 — 51/92 = 5…
easel May 24, 2026
3a30ccf
docs(experiments): cache impact + sampling variance design
easel May 24, 2026
cf8eb4b
feat(bench): enrich --area forge rows with iterations[] + ds4-eval pa…
easel May 24, 2026
833ee75
feat(bench-http): capture OR provider + model_version + cost_usd per row
easel May 24, 2026
c35a8a4
feat(bench-http): probe /props at start, record server_info + --host-…
easel May 24, 2026
a8dec65
docs(run-request): forge against vidar native ds4-server (DS4f)
easel May 24, 2026
43b7a46
data(sindri): RTX 3090 Ti qwen3.6-27b ds4-eval --no-think full 92 — 5…
easel May 25, 2026
e242ef3
data(tuning-snapshots): qwen36 v2 + vidar think + OR frontier/fill-ma…
easel May 25, 2026
de6c1ef
perf(qwen35): [ar-decode] summary log + sindri perf sweep iter 1+2
easel May 25, 2026
e64d824
data(perfsweep): validation finds tuned config doesn't beat canonical
easel May 25, 2026
3ed5062
feat(/props): expose chunk + target_device + draft_device under runtime
easel May 25, 2026
1357136
feat(/props): expose chunk + target_device + draft_device + [ar-decod…
easel May 25, 2026
b5e72f8
Merge pr/server (1357136) — /props.runtime expansion lands on PR #269
easel May 25, 2026
e73e18d
docs(run-request): bump hard_limit_reply_budget for Qwen3.6 (schema +…
easel May 25, 2026
b2cb6fe
docs(run-request): engine-side budget signaling overhaul (Qwen3.6)
easel May 25, 2026
eb065d9
feat(bench-http): --area code (HumanEval) integrated into unified har…
easel May 25, 2026
d63d3b0
docs(run-request): bragi gemma4 + laguna config issues block bench ma…
easel May 25, 2026
069c412
feat(server): qwen3.6 budget-signaling overhaul (Phase A+B+C) + phase…
easel May 25, 2026
e245b6b
feat(server): qwen3.6 budget-signaling overhaul (Phase A+B+C) + phase…
easel May 25, 2026
c9ecdf4
Merge pr/server (e245b6b) — budget-signaling overhaul lands on PR #269
easel May 25, 2026
f3116a6
feat(bench-http): --area longctx (long-context frontier) integrated
easel May 25, 2026
3fbeea0
feat(bench-http): --area agent (agent-shape probe) integrated
easel May 25, 2026
c1dac2f
docs(run-request): --area swe (SWE-bench Verified) integration plan
easel May 25, 2026
57fdad0
refactor(bench): centralize HE+autotune prompts in bench_humaneval; l…
easel May 25, 2026
2d226e8
docs(run-request): gemma4 detail — //thought leak + missing thinking
easel May 25, 2026
f1d30f2
fix(cpp-server,gemma4): chat template missing thinking-guard + opener…
easel May 25, 2026
8f3af56
docs(run-request): gemma4 — definitive crash repro at prompt_tokens=169
easel May 25, 2026
734cfac
docs(run-request): reassign gemma4 mul_mat crash investigation to erik
easel May 25, 2026
67c444a
fix(entrypoint): pick draft GGUF that matches target's architecture
easel May 25, 2026
f40aa8f
docs(run-request): gemma4 mul_mat crash resolved — root cause was wro…
easel May 25, 2026
a4411de
docs(experiments): standard thinking-control probe protocol + gemma4-…
easel May 25, 2026
77284cc
docs(experiments): gemma4-26b — sampling + brevity addenda confirm "t…
easel May 25, 2026
0cb2674
fix(server,/props): default_generation_settings reflect model card sa…
easel May 25, 2026
16bb31e
fix(thinking-budget): transition cue + 4096 reply default — gemma4 fo…
easel May 25, 2026
041f491
refactor(server): simplify thinking control — sidecar terminator + dr…
easel May 25, 2026
1569889
feat(bench,gemma4): wire bench sampling overrides + gemma thinking_te…
easel May 25, 2026
c2d725f
feat(server): post-close degenerate-decode watchdog + sibling-field s…
easel May 25, 2026
8538ff9
fix(server,bench): broaden degenerate-decode watchdog + surface flag …
easel May 25, 2026
4e9abda
feat(gemma4): wire prefill/decode timing into GenerateResult (mirrors…
easel May 26, 2026
637dbca
fix(model_cards): laguna-xs.2 — speculator URLs, reasoning toggle, te…
easel May 26, 2026
7786b35
refactor(server): drop Level 1 Phase-2 reprompt fallback
easel May 26, 2026
92f84cd
fix(laguna): correct chat template + eos_chat_id fallback to </assist…
easel May 26, 2026
5f1c7c1
chore: regenerate uv.lock (anthropic dep added; align with pyproject)
easel May 26, 2026
598cc88
data(tuning-snapshots): gemma-4-26b ds4-eval-92 think + nothink + 31b…
easel May 26, 2026
417009d
feat(bench,laguna): add reasoning_effort knob + laguna comparison sweeps
easel May 26, 2026
c501e50
data(tuning-snapshots): laguna think 2x2 matrix complete (bragi + OR)
easel May 26, 2026
3966d98
data(tuning-snapshots): gemma-4-26b thinking-budget visibility sweep
easel May 26, 2026
3f90e7d
fix(bench): omit unset sampling fields so server's card defaults apply
easel May 26, 2026
89b1dfe
refactor(bench): replace luce-hub bench scripts with luce-bench dep
easel May 26, 2026
925d41f
chore: remove benchmark snapshots from lucebox-hub
easel May 26, 2026
2ae6f8d
chore(.gitignore): exclude dflash/docs/tuning-snapshots/
easel May 26, 2026
94adb89
merge: luce-org/main into integration/props-uv-squared-clean
easel May 26, 2026
1df9099
merge: luce-org/main into integration/props-uv-squared-clean (post-re…
easel May 26, 2026
e30abf2
chore(deps): bump luce-bench v0.2.3 → v0.2.4
easel May 27, 2026
490ff95
feat: absorb luce-bench into the monorepo as a uv workspace member
easel May 27, 2026
8d25d27
feat(lucebox): re-wire profile.py to drive benchmarks via luce-bench
easel May 27, 2026
469edf3
fix(ci): docker.yml path filter targets server/ (post-rename)
easel May 27, 2026
a4888f3
fix(server): remove duplicate model-card-resolution block + dup include
easel May 27, 2026
fc78809
chore(luce-bench): drop dormant nested .github + .gitignore
easel May 27, 2026
07bc60b
fix(docker): COPY luce-bench/ workspace member into build context
easel May 27, 2026
8524f58
feat(harness): add run_lucebench.sh — luce-bench as a harness client
easel May 27, 2026
7bbf9af
feat: harness as uv workspace member + lucebox profile delegates to it
easel May 27, 2026
9599f91
chore(deps): merge luce-bench[forge] + harness workspace member
easel May 27, 2026
7501564
feat: port all harness clients (codex, opencode, hermes, pi, openclaw)
easel May 27, 2026
befd00e
fix(harness): inline SPDX license string (build sandbox can't read ..…
easel May 27, 2026
4f900c9
fix(entrypoint): warn loudly when multiple targets in models/
easel May 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Local venv and Python caches — uv rebuilds inside the image.
.venv/
**/__pycache__/
**/*.pyc

# Build artefacts.
**/build/
**/build-*/
dflash/build/

# Model weights — bind-mount at runtime instead of baking into the image.
dflash/models/
**/*.gguf
**/*.safetensors

# Git metadata. Submodule contents are kept; .git files inside the worktree
# are not needed at build time.
.git/
**/.git
**/.gitignore.local

# Local agent / IDE state.
.claude/
.idea/
.vscode/

# Misc large or volatile.
*.log
*.tmp
*.swp
**/*.bin
**/*.npy
6 changes: 6 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ jobs:
# full sync and builds megakernel against torch.
run: bash scripts/check_uv_workspace.sh

- name: Lint Python surfaces touched by lucebox tooling
run: uv run --frozen --extra dev ruff check .

- name: Typecheck lucebox CLI
run: uv run --frozen --extra dev python -m mypy --package lucebox

build:
name: Build (cmake + uv sync --extra megakernel)
runs-on: ubuntu-latest
Expand Down
147 changes: 147 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
name: Docker prebuilds

# Builds the cuda12 lucebox-hub Docker image defined in docker-bake.hcl
# and pushes it to GHCR. The bake file is the source of
# truth for arch matrices and CUDA pinning; this workflow only handles
# fetching submodules, freeing runner disk, signing in to the registry, and
# wiring the cache.

on:
# Build + push to GHCR when a GitHub Release is published. The release tag
# becomes one of the image tags via docker/metadata-action's `type=ref,
# event=tag` + `type=semver` rules below.
release:
types: [published]
# Build-only CI guard on PRs that touch the docker surface. We never push
# from a PR — even if we wanted to, GITHUB_TOKEN on PRs from forks lacks
# `packages:write`. The point is to catch Dockerfile / bake-file / arch-
# list regressions before they land on main.
pull_request:
paths:
- Dockerfile
- docker-bake.hcl
- .dockerignore
- .github/workflows/docker.yml
- server/CMakeLists.txt
- server/src/**
- server/test/**
- server/include/**
- server/scripts/**
- server/deps/**
- server/pyproject.toml
- pyproject.toml
- uv.lock
- lucebox.sh
- lucebox/**
# Manual trigger for one-off rebuilds or pre-release smoke tests. The
# `push` input controls whether the resulting images land in GHCR or only
# populate the buildx cache.
workflow_dispatch:
inputs:
push:
description: "Push images to GHCR after build"
type: boolean
default: false

# Single in-flight build per ref. New pushes cancel the previous run so we
# don't queue 30-min compiles.
concurrency:
group: docker-${{ github.ref }}
cancel-in-progress: true

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository_owner }}/lucebox-hub

jobs:
build:
name: ${{ matrix.variant }}
# ubuntu-latest = 4 CPU / 16 GB RAM / 14 GB free disk on the GitHub-
# hosted plan. The disk-free step at the top of the job claws back
# ~30 GB, which is enough to land a 14 GB image with build cache.
# CPU is the harder constraint: the fat-binary arch list can take hours
# on hosted runners. If you outgrow this:
# • Larger GitHub-hosted runners (`ubuntu-latest-8-cores`, paid)
# halve wall time.
# • A self-hosted runner with the host's nvcc avoids the
# containerised CUDA toolkit pull entirely.
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
strategy:
fail-fast: false
matrix:
variant: [cuda12]
steps:
- name: Free runner disk space
# The default ubuntu-latest image keeps ~25 GB of preinstalled
# tooling (Android SDK, .NET, Haskell, ghc, etc.) we don't need.
# Pinned action; check upstream releases before bumping.
uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1
with:
tool-cache: true
android: true
dotnet: true
haskell: true
large-packages: false # slow; preinstalled apt packages we don't need
swap-storage: true

- uses: actions/checkout@v4
with:
# Submodule contents are needed by the cmake build (llama.cpp ggml
# subtree, mit-han-lab Block-Sparse-Attention). The Dockerfile
# asserts they're present before running cmake.
submodules: recursive

- uses: docker/setup-buildx-action@v3

- name: Log in to GHCR
# Skip on PR runs: we never push from a PR and the token from a fork
# PR can't `packages:write` anyway.
if: github.event_name != 'pull_request'
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Derive image metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
# Suffix every tag with the variant so future CUDA stacks can
# coexist under the same image name. Examples:
# ghcr.io/<owner>/lucebox-hub:cuda12
# ghcr.io/<owner>/lucebox-hub:v0.2.0-cuda12
# ghcr.io/<owner>/lucebox-hub:main-cuda12
# ghcr.io/<owner>/lucebox-hub:sha-abc1234-cuda12
flavor: |
latest=false
suffix=-${{ matrix.variant }},onlatest=true
tags: |
type=raw,value=${{ matrix.variant }},suffix=,priority=1000,enable=${{ github.event_name == 'release' }}
type=ref,event=branch
type=ref,event=tag
type=ref,event=pr
type=sha,prefix=sha-
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}

- name: Build and push
uses: docker/bake-action@v5
with:
files: |
docker-bake.hcl
${{ steps.meta.outputs.bake-file }}
targets: ${{ matrix.variant }}
push: ${{ github.event_name == 'release' || (github.event_name == 'workflow_dispatch' && inputs.push) }}
Comment thread
cubic-dev-ai[bot] marked this conversation as resolved.
# gha cache stores layer blobs in the workflow's Actions cache,
# scoped by variant so future CUDA stacks don't evict each other.
# mode=max also caches multi-stage intermediate layers (the
# builder stage with the 30-min nvcc compile), which is the whole
# point of doing this.
set: |
${{ matrix.variant }}.cache-from=type=gha,scope=${{ matrix.variant }}
${{ matrix.variant }}.cache-to=type=gha,scope=${{ matrix.variant }},mode=max
58 changes: 58 additions & 0 deletions .github/workflows/release-luce-bench.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
name: Release luce-bench

# Builds and publishes the luce-bench package to PyPI when a tag
# matching `luce-bench-v*` is pushed (e.g. `luce-bench-v0.2.5`). The
# tag's version suffix must match `luce-bench/pyproject.toml`'s
# `[project] version` — the workflow asserts this and fails otherwise.
#
# Uses PyPI trusted publishing (OIDC): set up the publisher in the
# PyPI project settings as `easel/lucebox-hub` repo + this workflow
# file + the `pypi` environment. No long-lived API token needed.

on:
push:
tags:
- 'luce-bench-v*'

permissions:
contents: read

jobs:
build-and-publish:
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/luce-bench
permissions:
id-token: write # trusted publishing
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
version: latest

- name: Verify tag version matches pyproject.toml
run: |
set -euo pipefail
tag="${GITHUB_REF##*/}" # luce-bench-v0.2.5
tag_version="${tag#luce-bench-v}" # 0.2.5
file_version=$(awk -F'"' '/^version[[:space:]]*=/{print $2; exit}' luce-bench/pyproject.toml)
if [ "$tag_version" != "$file_version" ]; then
echo "Tag version ($tag_version) does not match luce-bench/pyproject.toml version ($file_version)"
exit 1
fi
echo "Releasing luce-bench v$tag_version"

- name: Build wheel + sdist
working-directory: luce-bench
run: |
uv build --out-dir dist

- name: Publish to PyPI (trusted publisher)
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: luce-bench/dist
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,18 @@ fix-plan.md
# Harness test artifacts
.harness-work/
health

# lucebox host-side generated config + benchmark output
.lucebox/
models/.lucebox/

# Claude Code session state (worktrees, agent scratchpads)
.claude/

# Benchmark snapshots live in the standalone luce-bench-baselines repo
# (https://github.com/easel/luce-bench-baselines) — not in lucebox-hub.
dflash/docs/tuning-snapshots/

# luce-bench --sweep default output dir (per-host bench runs); reference
# baselines live in github.com/easel/luce-bench-baselines.
luce-bench/snapshots/
Loading