diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..f65fc3d --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,90 @@ +# Contributing to Arkor + +Thanks for your interest! Arkor is in **alpha**: we're moving fast, breaking things on purpose, and the core idea (TypeScript-native fine-tuning for product engineers) is something we want to design *with* the people who'd use it. Issues, discussion, and PRs are all welcome. + +## Ways to help + +| Effort | What's most useful | +| ----------------------- | ----------------------------------------------------------------------------------- | +| **5 min** | Try the [Quickstart](README.md#quickstart) and [open an issue](https://github.com/arkorlab/arkor/issues/new) about anything that confused you, broke, or felt un-TypeScript. | +| **An afternoon** | Pick up a [`good first issue`](https://github.com/arkorlab/arkor/labels/good%20first%20issue) or send a small PR (doc fixes, template tweaks, error-message polish). | +| **Ongoing** | Hop into [Discord](https://discord.gg/YujCZYGrEZ) and tell us what model + dataset + workflow you wish worked. We use this to prioritize. | + +If you have an idea for a non-trivial change (new SDK factory, CLI command, Studio view), please open an issue first so we can align on the API shape before you write code. + +## Repo layout + +``` +arkor/ +├── packages/ +│ ├── arkor/ # SDK + CLI + bundled local Studio (published to npm) +│ ├── create-arkor/ # `pnpm create arkor` scaffolder (published to npm) +│ ├── cli-internal/ # private helpers shared by arkor + create-arkor +│ └── studio-app/ # Vite + React SPA bundled into `arkor` +├── e2e/cli/ # vitest-driven E2E suite for the scaffolder & build +├── assets/ # README / OG images +└── turbo.json # build / test orchestration +``` + +`cli-internal`, `studio-app`, and `e2e/cli` are private and never published. + +## Development setup + +Please use **Node.js 24 (Preferably the latest) ** and **pnpm 10.21+**. + +```bash +git clone https://github.com/arkorlab/arkor.git +cd arkor +pnpm install +pnpm build # turbo run build (covers all packages) +pnpm test # unit tests across the monorepo +pnpm typecheck # tsc across the monorepo +``` + +To work on a specific package: + +```bash +pnpm --filter arkor dev # tsdown --watch on the SDK/CLI +pnpm --filter @arkor/studio-app dev # vite dev server for the Studio SPA +pnpm --filter create-arkor dev # tsdown --watch on the scaffolder +``` + +To run the E2E scaffolder/build suite (slow; spawns real CLIs in temp dirs): + +```bash +pnpm --filter @arkor/e2e-cli test +# Skip the ` install` step inside fixtures: +SKIP_E2E_INSTALL=1 pnpm --filter @arkor/e2e-cli test +``` + +## Trying your local build + +The fastest loop is to scaffold a fresh project pointing at the workspace build: + +```bash +pnpm build +cd /tmp && node /path/to/arkor/packages/create-arkor/dist/bin.mjs my-arkor-app +cd my-arkor-app && pnpm dev +``` + +Studio runs at `http://127.0.0.1:4000` with a CSRF token injected per launch. + +## Pull request guidelines + +- **One concern per PR.** Smaller diffs land faster. +- **Tests where the surface is testable.** SDK / CLI / scaffolder logic should have a vitest case. Studio UI changes can be PR'd with a screenshot or short clip. +- **Breaking changes are fine** during alpha. We don't ship compatibility shims between `0.0.x` versions, so just note them in the PR description and the changelog stays honest. +- **Don't reintroduce removed verbs.** `arkor train`, `arkor deploy`, `arkor jobs`, and `arkor logs` were removed deliberately. Training and deploying are TS configs that run when the entrypoint executes, not CLI verbs. The CLI surface is `dev` / `build` / `start` plus auth. + +## Reporting bugs and security issues + +- **Bugs**: [GitHub Issues](https://github.com/arkorlab/arkor/issues/new) with steps to reproduce, expected vs actual, and your Node + pnpm versions. +- **Security**: please email security@arkor.ai instead of filing a public issue. We'll acknowledge within 48 hours. + +## Code of conduct + +Be kind, assume good faith, and keep technical disagreement technical. Anything else (harassment, personal attacks, exclusionary behavior) is grounds for being asked to leave. The maintainers' call is final. + +## License + +By contributing, you agree your contributions are licensed under the [MIT license](LICENSE.md). diff --git a/README.md b/README.md index 35c3b8f..0ad401a 100644 --- a/README.md +++ b/README.md @@ -1,78 +1,194 @@ -# Arkor - -> Fine-tune and deploy open-weight models with TypeScript. - -Arkor is a TypeScript framework for improving and shipping custom open-weight -models. The audience is product engineers who already build with TypeScript / -Next.js and want custom model behaviour without standing up an ML -infrastructure team. Arkor handles GPUs, fine-tuning, and serving underneath -so the user's job stays "write some TypeScript". - -> Status: alpha (`0.0.1-alpha.0`). Public APIs may change without notice. +

+ + + Arkor + +

+ +

Arkor

+ +

The TypeScript framework for fine-tuning open-weight LLMs

+ +

+ Ship custom open-weight models the same way you ship your TypeScript app. + Type-safe configs, hot reload, a local Studio (web UI) to start and watch runs, and managed GPUs. +

+ +

+ npm + MIT + node ≥22.6 + alpha + Discord +

+ +

+ Docs  ·  + Quickstart  ·  + Why Arkor  ·  +

+ +> [!WARNING] +> Arkor is **alpha** (`0.0.1-alpha.0`). APIs change without notice. We're shipping in public, and feedback shapes what lands next. + + ## Quickstart ```bash -pnpm create arkor my-app -cd my-app -pnpm install -pnpm arkor login # Auth0 PKCE flow; --anonymous also works -pnpm arkor dev # opens the local Studio GUI on http://127.0.0.1:4000 +pnpm create arkor my-arkor-app +cd my-arkor-app +pnpm dev ``` -`arkor dev` is the primary surface — it starts a local Studio with hot -reload over your TypeScript and a GUI for running training, inspecting jobs, -and trying out checkpoints in a Playground. +That's the whole setup. +**No signup required:** `arkor dev` opens **Studio**, a local web UI at `http://127.0.0.1:4000`, and silently bootstraps an anonymous workspace so you can fire off a real training run right away. -CLI-only flow (no GUI): +Run `arkor login` later if you want to claim your work under an account. -```bash -pnpm arkor build # bundles src/arkor/ into .arkor/build/index.mjs -pnpm arkor start # runs the build artifact on the cloud +### Pick a template + +The scaffolder asks which template you want. +All three start from the same small open-weight base (`unsloth/gemma-4-E4B-it`) so the first run finishes quickly. + +| Template | What it shows | Dataset | +| --------- | ------------------------------------------------------------------- | ---------------------------------- | +| `minimal` | The smallest working `createTrainer({ ... })` call. | `yahma/alpaca-cleaned` (500 rows) | +| `alpaca` | Instruction-tuning with mid-training `infer()` on every checkpoint. | `yahma/alpaca-cleaned` (1000 rows) | +| `chatml` | Multi-turn chat fine-tuning over a real chat dataset. | `stingning/ultrachat` (500 rows) | + +Skip the prompt with `pnpm create arkor my-arkor-app --template alpaca`. + +## Why Arkor + +Custom open-weight models are a real option today because of years of work in the Python ML ecosystem and the people and companies who built it out. +Arkor stands on that foundation. + +What we wanted, and didn't find, was a path that fits how TypeScript and Node developers already work: a workflow where fine-tuning, evaluation, and serving live in the same codebase as the product, with the same editor, types, and review flow. + +Type-safe configs instead of separate config files. Hot reload over your training code. A local Studio for the dev loop. + +The phrase we keep coming back to: **ship the model the same way you ship the product.** If that sounds right, you're the audience. + +## What works today + +- ✅ **Fine-tune an open-weight LLM from one file.** `createTrainer({ model, dataset, lora, ... })` runs LoRA training on the base model you point it at. +- ✅ **Pull data from HuggingFace, or bring your own URL.** The `dataset` field accepts any HF name (with optional `split`) or a blob URL to a JSONL file. +- ✅ **React to training in code, not in a dashboard.** Lifecycle callbacks (`onStarted`, `onLog`, `onCheckpoint`, `onCompleted`, `onFailed`) fire as the run streams from the cloud, fully typed. +- ✅ **Sanity-check the model before the run finishes.** Inside `onCheckpoint`, call `infer({ messages })` against the model as it's being trained. +- ✅ **Watch the run in a local Studio.** `arkor dev` opens a UI with a jobs list, live loss chart, log tail, and a Playground for chatting with your fine-tuned models. +- ✅ **Try it without an account.** Anonymous workspace by default; run `arkor login` (Auth0 PKCE) to claim your work later. + +## What's coming next + +- ⏳ **Deploy a fine-tuned model as an inference endpoint** with `createDeploy(...)`. +- ⏳ **Run evaluations on every checkpoint** with `createEval(...)`. +- ⏳ **Bring your own datasets and base models.** CSV / JSONL uploads and custom HuggingFace base models. +- ⏳ **Team and multi-org workspaces.** +- ⏳ **Self-host the training backend.** Today we host it. + +## A taste of the API + +```ts +// src/arkor/trainer.ts +import { createTrainer } from "arkor"; + +export const trainer = createTrainer({ + name: "support-bot-v1", + model: "unsloth/gemma-4-E4B-it", + dataset: { type: "huggingface", name: "yahma/alpaca-cleaned", split: "train[:1000]" }, + lora: { r: 16, alpha: 16 }, + maxSteps: 100, + callbacks: { + onLog: ({ step, loss }) => console.log(`step=${step} loss=${loss}`), + onCheckpoint: async ({ step, infer }) => { + const res = await infer({ messages: [{ role: "user", content: "Hello!" }] }); + console.log(`ckpt @ ${step}:`, await res.text()); + }, + }, +}); ``` +```ts +// src/arkor/index.ts ← discovered by `arkor dev` / `arkor build` +import { createArkor } from "arkor"; +import { trainer } from "./trainer"; + +export const arkor = createArkor({ trainer }); +``` + +`src/arkor/index.ts` is the file the CLI and Studio look for. +Your `trainer` lives in a sibling file and is registered through `createArkor`. `deploy` and `eval` will work the same way. + +To add a new one, drop a file and register it; no scaffolder rerun needed. + + + ## What's in a project ``` -my-app/ +my-arkor-app/ ├── src/arkor/ -│ ├── index.ts # umbrella — `createArkor({ trainer })` -│ └── trainer.ts # `createTrainer({ name, model, dataset, ... })` -├── arkor.config.ts # training defaults -├── .arkor/ # state + build artifact (gitignored) -└── package.json +│ ├── index.ts # createArkor({ trainer }) ← discovered by the CLI / Studio +│ └── trainer.ts # createTrainer({ ... }) +├── arkor.config.ts +├── .arkor/ # state + build artifacts (gitignored) +└── package.json # dev / build / start ``` -The umbrella is what the CLI and Studio discover. Per-role primitives — -`trainer` today, `deploy` and `eval` later — live in sibling files and get -gathered on `createArkor`. Adding a new primitive is "drop a file, register -it on the umbrella": no scaffold change required. - ## CLI -| Command | Purpose | -|---|---| -| `arkor init` | Scaffold a new project in the current directory | -| `arkor login` / `logout` / `whoami` | Auth0 PKCE / anonymous tokens | -| `arkor dev` | Launch the local Studio (hot reload + GUI) | -| `arkor build` | Bundle `src/arkor/index.ts` to `.arkor/build/index.mjs` | -| `arkor start` | Run the build artifact (auto-builds when missing) | +| Command | Purpose | +| ------------------------------------ | ---------------------------------------------------------------------- | +| `arkor init` | Scaffold a new project in the current directory | +| `arkor login` / `logout` / `whoami` | Auth0 PKCE / anonymous tokens | +| `arkor dev` | Launch the local Studio web UI (with hot reload) | +| `arkor build` | Bundle `src/arkor/index.ts` to `.arkor/build/index.mjs` | +| `arkor start` | Run the build artifact (auto-builds when missing) | + +`pnpm dev` resolves to `arkor dev` in scaffolded projects, so most workflows live behind that one command. + +## Architecture + +`arkor dev` boots a [Hono](https://hono.dev) server on `127.0.0.1:4000` that hot-reloads your code and serves a Vite + React SPA from the same origin. + +The SPA talks to your code via per-launch CSRF-token-gated `/api/*` routes (loopback-only, with a `Host` header guard against DNS rebinding); your code talks to the Arkor training backend over authenticated HTTPS. + +Training runs on managed GPUs; checkpoints stream back as SSE events that fire your `callbacks.*` in process. + +## Repository + +| Package | What it is | +| ---------------------------------------------- | ------------------------------------------- | +| [`arkor`](packages/arkor) | SDK + CLI + bundled local Studio | +| [`create-arkor`](packages/create-arkor) | `pnpm create arkor` scaffolder | + +Requires Node.js 22.6+. +(Please use Node.js 24, preferably the latest version, for contributing to this repository.) + +Works with pnpm / npm / yarn / bun. -`pnpm dev` resolves to `arkor dev` in scaffolded projects, so most workflows -live behind that one command. +## We're shipping in public -## Packages +Arkor is alpha, and the core idea (TypeScript-native fine-tuning for product engineers) is something we want to design *with* the people who'd use it. If that's you: -| Package | What it is | -|---|---| -| [`arkor`](packages/arkor) | The SDK + CLI + bundled local Studio | -| [`create-arkor`](packages/create-arkor) | `pnpm create arkor` scaffolder | +- **[File an issue](https://github.com/arkorlab/arkor/issues/new)** with the model + dataset + workflow you wish worked. We read everything. +- **Star the repo** if you want updates as we move toward `0.1`. +- **[Join Discord](https://discord.gg/YujCZYGrEZ)** for live discussion and early-access pings. -## Requirements +We're especially curious about: which open-weight base models you'd reach for first, what you'd want from `createDeploy` / `createEval`, and what breaks when you try the alpha. -- Node.js 22.6+ (the SDK relies on stable APIs from that line) -- pnpm / npm / yarn / bun all work for installs +See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup. ## License -MIT — see [LICENSE.md](LICENSE.md). +[MIT](LICENSE.md). diff --git a/assets/logo-dark.svg b/assets/logo-dark.svg new file mode 100644 index 0000000..6b55732 --- /dev/null +++ b/assets/logo-dark.svg @@ -0,0 +1,10 @@ + + + + + diff --git a/assets/logo.svg b/assets/logo.svg new file mode 100644 index 0000000..5a4e8e9 --- /dev/null +++ b/assets/logo.svg @@ -0,0 +1,9 @@ + + + + +