diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..01913b7 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,66 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What this is + +`onde-cli` is a native Rust terminal UI (binary name `onde`) for managing an Onde Inference account and running a local model pipeline: fine-tune a safetensors small language model with LoRA, merge the adapter, export to GGUF, test it in a local chat, upload it to Hugging Face, and assign it to an Onde app. All packaging (npm, PyPI, NuGet, pub.dev, Homebrew, crates.io) is thin wrappers around this one binary; the Rust crate is the source of truth. + +## Build, run, test + +```sh +cargo build # debug build +cargo run # launches the TUI +cargo test # unit tests only (the heavy ones are #[ignore]) +cargo clippy --all-targets +``` + +The build **bakes credentials in at compile time** via `build.rs`, which reads `.env` (or the environment in CI). The build fails if any of these are missing: `ONDE_APP_ID`, `ONDE_APP_SECRET`, `GRESIQ_API_KEY`, `GRESIQ_API_SECRET`. `HF_TOKEN` is optional (defaults to empty). Changing `.env` triggers a rebuild. These are exposed in code as `env!(...)` constants in `src/app.rs`. + +On macOS the build pulls in candle's `metal` and `accelerate` features, so inference and training run on Metal; elsewhere they fall back to CPU. + +### Tests that load real models + +`src/gguf.rs` has two `#[ignore]` integration tests that export a ~1GB GGUF and run inference through `onde::mistralrs`. They need a Qwen3-0.6B model cached in the Onde App Group container and skip gracefully when it is absent. + +```sh +cargo test gguf::tests::exported_qwen3_gguf_is_runnable -- --ignored --nocapture +cargo test gguf::tests::finetune_merge_export_run -- --ignored --nocapture +``` + +Env knobs for these: `ONDE_TEST_MODEL_DIR`, `ONDE_TEST_DTYPE` (`f16`/`q8_0`), `ONDE_TEST_LR`. + +### Debugging the TUI + +`main.rs` redirects both stdout and stderr to `~/.cache/onde/debug.log` before ratatui takes the alternate screen, because `mistral.rs` writes to both fds and would otherwise tear up the TUI. To see what the app or the inference engine is doing, tail that log; `println!`/`eprintln!`/`log::*` all land there, not on screen. + +## Architecture + +### TUI event loop (`app.rs`, `ui.rs`, `main.rs`) + +The whole app is one `App` struct plus a `Screen` enum acting as a state machine. `app::run` owns a single `tokio::sync::mpsc` channel of `AuthEvent`s and a `crossterm` `EventStream`, multiplexed with `tokio::select!`. Keystrokes mutate `App` synchronously; anything slow (network calls, downloads, fine-tune, merge, GGUF export, chat inference) is spawned as a background tokio task or OS thread that streams progress back as `AuthEvent`s, which `App::apply` folds into state. `ui.rs` is a pure render of `App` and intentionally does not depend on the SDK directly (it re-exports `OndeApp`/`OndeModel` through `app.rs`). When adding a feature, the pattern is: add a `Screen` variant, a key handler, an `AuthEvent` variant, and a background task that emits progress. + +Background work uses two task kinds deliberately: network/IO uses `tokio::spawn`; CPU-heavy tensor work (`finetune`, `merge`, `gguf`) uses `std::thread::spawn` so it does not starve the async runtime. + +### The local model pipeline + +These modules form the fine-tune-to-deploy chain, each running on a background thread and streaming a `*Progress` enum: + +- `finetune.rs` — hand-written LoRA trainer (candle). Builds a Qwen forward pass (RMSNorm, RoPE, GQA, optional Qwen3 QK-norm), trains LoRA A/B on q/v projections in F32, writes `lora_adapter.safetensors`. Gradients are sanitized (non-finite elements zeroed) and globally norm-clipped before each AdamW step; a step is skipped entirely if its global grad norm is non-finite. Skipping this guard corrupts every weight after the first bad step. +- `merge.rs` — folds the LoRA adapter back into base weights (`W + scale·(B@A)`), writes a merged `model.safetensors` plus copied config/tokenizer. +- `gguf.rs` — **hand-rolled GGUF writer** (no llama.cpp). Conventions that must hold for mistral.rs/candle to load and run the file correctly: tensor dims are written innermost-first (reverse of safetensors shape, because candle reverses on read); `head_dim` comes from config and is emitted as `attention.key_length`/`value_length` (Qwen3 decouples it from `hidden_size/num_heads`); `token_type` is an INT32 array; `general.architecture` is chosen from `model_type` so Qwen3 routes to candle's `quantized_qwen3` loader. +- `chat.rs` — loads a local GGUF via `onde::mistralrs::GgufModelBuilder` (the same engine the Onde SDK uses) and streams a multi-turn chat, so a model can be tested before publishing. +- `hf_upload.rs` / `hf_clone.rs` / `hf_search.rs` / `hf.rs` — Hugging Face Hub: upload the GGUF, check/create a repo, search models, and resolve/merge the local HF cache (incl. the macOS Onde App Group container). + +### Account / deploy side (`gresiq.rs`, `token.rs`, `project.rs`) + +`gresiq.rs` wraps `smbcloud-gresiq-sdk` for apps and the model catalog. "Deploying" a model means `assign_model(app_id, model_id)` against a catalog entry; the end app (e.g. `rumilearnpersian`) then fetches that assignment through the Onde SDK's `load_assigned_model` and downloads the GGUF. The CLI can only assign models that already exist in the GresIQ catalog. `token.rs` persists the auth token; `project.rs` manages per-project fine-tune workspaces under `~/.onde`. + +### The `onde` dependency + +The `onde` crate provides the inference engine (`onde::mistralrs`, a vendored mistral.rs) and `onde::inference::models::SUPPORTED_MODEL_INFO` (the supported-model catalog the inference picker mirrors). It is normally the published crate; `Cargo.toml` has commented `path`/`[patch.crates-io]` blocks for developing against local checkouts of `onde`, the `smbcloud-*` crates, and `candle`. Never commit with those uncommented. + +## Conventions + +- Git: merge feature/release/hotfix branches with `--no-ff` (explicit merge commits). Tag the merge commit that holds the final release state. See `.agents/skills/git/SKILL.md`. +- Distribution changes: see `.agents/skills/distribution/SKILL.md` before touching any wrapper package; keep all channel versions aligned with the Rust crate version. diff --git a/src/app.rs b/src/app.rs index 3a91723..31d137b 100644 --- a/src/app.rs +++ b/src/app.rs @@ -97,35 +97,44 @@ pub struct AdapterEntry { pub kind: ArtifactKind, } -impl AdapterEntry { - /// Classify the location of this artifact for display purposes. - /// - /// - `"Onde Inference"`: inside the shared App Group container - /// - `"HF Cache"`: inside `~/.cache/huggingface/hub` or `$HF_HOME` - /// - the raw path string: anything else, usually a custom or local export - pub fn location_label(&self) -> String { - let path_str = self.path.to_string_lossy(); +/// Classify a filesystem path into a short, human-friendly location label. +/// +/// Long absolute paths (e.g. the App Group container) get truncated to +/// gibberish in narrow TUI fields, so anywhere we'd otherwise print a raw +/// path we show this label instead. +/// +/// - `"Onde Inference"`: inside the shared App Group container +/// - `"HF Cache"`: inside `~/.cache/huggingface/hub` or `$HF_HOME` +/// - the parent directory string: anything else, usually a custom or local export +pub fn location_label_for_path(path: &std::path::Path) -> String { + let path_str = path.to_string_lossy(); + + // App Group container (macOS) + if path_str.contains("group.com.ondeinference.apps") { + return "Onde Inference".to_string(); + } - // App Group container (macOS) - if path_str.contains("group.com.ondeinference.apps") { - return "Onde Inference".to_string(); - } + // Standard HF cache locations + if path_str.contains(".cache/huggingface/hub") { + return "HF Cache".to_string(); + } + if let Ok(hf_home) = std::env::var("HF_HOME") + && path_str.starts_with(&hf_home) + { + return "HF Cache".to_string(); + } - // Standard HF cache locations - if path_str.contains(".cache/huggingface/hub") { - return "HF Cache".to_string(); - } - if let Ok(hf_home) = std::env::var("HF_HOME") - && path_str.starts_with(&hf_home) - { - return "HF Cache".to_string(); - } + // For custom or local paths, just show the directory. + path.parent() + .map(|p| p.to_string_lossy().to_string()) + .unwrap_or_else(|| path_str.to_string()) +} - // For custom or local paths, just show the directory. - self.path - .parent() - .map(|p| p.to_string_lossy().to_string()) - .unwrap_or_else(|| path_str.to_string()) +impl AdapterEntry { + /// Classify the location of this artifact for display purposes. + /// See [`location_label_for_path`]. + pub fn location_label(&self) -> String { + location_label_for_path(&self.path) } /// Whether this GGUF should show the upload-to-HuggingFace UI. diff --git a/src/gguf.rs b/src/gguf.rs index 18faaa7..393865c 100644 --- a/src/gguf.rs +++ b/src/gguf.rs @@ -1078,7 +1078,10 @@ mod tests { }); let adapter_path = adapter_path.expect("fine-tune must produce an adapter"); eprintln!("[e2e] fine-tune done, last loss {last_loss}"); - assert!(last_loss.is_finite(), "training loss is NaN/Inf — training diverged"); + assert!( + last_loss.is_finite(), + "training loss is NaN/Inf — training diverged" + ); // 2. Merge. let merged_dir = work.join("merged"); @@ -1158,8 +1161,15 @@ mod tests { Ok("f16") => GgufDtype::F16, _ => GgufDtype::Q8_0, }; - eprintln!("[test] model_dir={} dtype={}", model_dir.display(), - if matches!(dtype, GgufDtype::F16) { "f16" } else { "q8_0" }); + eprintln!( + "[test] model_dir={} dtype={}", + model_dir.display(), + if matches!(dtype, GgufDtype::F16) { + "f16" + } else { + "q8_0" + } + ); let out_dir = std::env::temp_dir().join("onde-gguf-test"); std::fs::create_dir_all(&out_dir).unwrap(); diff --git a/src/ui.rs b/src/ui.rs index 09e109d..3c4f0b7 100644 --- a/src/ui.rs +++ b/src/ui.rs @@ -1860,10 +1860,21 @@ fn render_finetune_done(frame: &mut Frame, app: &App, adapter_path: &std::path:: .style(Style::new().bg(C_SURFACE_STRONG)); let path_inner = path_block.inner(rows[3]); frame.render_widget(path_block, rows[3]); + // Show a friendly location badge plus the file name instead of the raw + // absolute path, which truncates to gibberish in this narrow box. + let adapter_file = adapter_path + .file_name() + .map(|n| n.to_string_lossy().to_string()) + .unwrap_or_default(); frame.render_widget( - Paragraph::new(adapter_path.to_string_lossy().to_string()) - .style(Style::new().fg(C_NEON)) - .wrap(Wrap { trim: true }), + Paragraph::new(Line::from(vec![ + Span::styled( + format!("[{}]", crate::app::location_label_for_path(adapter_path)), + Style::new().fg(C_NEON).bold(), + ), + Span::styled(format!(" {adapter_file}"), Style::new().fg(C_TEXT)), + ])) + .wrap(Wrap { trim: true }), path_inner, ); @@ -1938,8 +1949,8 @@ fn render_merge_gguf_section(frame: &mut Frame, app: &App, area: Rect) { Span::styled("✓ ", Style::new().fg(C_NEON)), Span::styled("Merged → ", Style::new().fg(C_NEON).bold()), Span::styled( - output_path.to_string_lossy().to_string(), - Style::new().fg(C_TEXT), + format!("[{}]", crate::app::location_label_for_path(output_path)), + Style::new().fg(C_TEXT).bold(), ), ])), rows[0],