-
Notifications
You must be signed in to change notification settings - Fork 0
Add contributing guidelines and enhance README #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+277
−52
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
e8f4770
docs: add contributing guidelines to the repository
soleil-colza f6d056e
feat: update README and add logo assets for improved branding and cla…
soleil-colza 485b1ee
minor wording fix
soleil-colza 2c873f1
docs: update Node.js version requirement in contributing guidelines
soleil-colza 0c0a9a5
docs: update pnpm version requirement in contributing guidelines
soleil-colza ad86ecf
docs: update Node.js version recommendation for contributing guidelines
soleil-colza 661847a
fix: update Discord invite link in contributing guidelines and README
soleil-colza File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| # Contributing to Arkor | ||
|
|
||
| Thanks for your interest! Arkor is in **alpha**: we're moving fast, breaking things on purpose, and the core idea (TypeScript-native fine-tuning for product engineers) is something we want to design *with* the people who'd use it. Issues, discussion, and PRs are all welcome. | ||
|
|
||
| ## Ways to help | ||
|
|
||
| | Effort | What's most useful | | ||
| | ----------------------- | ----------------------------------------------------------------------------------- | | ||
| | **5 min** | Try the [Quickstart](README.md#quickstart) and [open an issue](https://github.com/arkorlab/arkor/issues/new) about anything that confused you, broke, or felt un-TypeScript. | | ||
| | **An afternoon** | Pick up a [`good first issue`](https://github.com/arkorlab/arkor/labels/good%20first%20issue) or send a small PR (doc fixes, template tweaks, error-message polish). | | ||
| | **Ongoing** | Hop into [Discord](https://discord.gg/YujCZYGrEZ) and tell us what model + dataset + workflow you wish worked. We use this to prioritize. | | ||
|
|
||
| If you have an idea for a non-trivial change (new SDK factory, CLI command, Studio view), please open an issue first so we can align on the API shape before you write code. | ||
|
|
||
| ## Repo layout | ||
|
|
||
| ``` | ||
| arkor/ | ||
| ├── packages/ | ||
| │ ├── arkor/ # SDK + CLI + bundled local Studio (published to npm) | ||
| │ ├── create-arkor/ # `pnpm create arkor` scaffolder (published to npm) | ||
| │ ├── cli-internal/ # private helpers shared by arkor + create-arkor | ||
| │ └── studio-app/ # Vite + React SPA bundled into `arkor` | ||
| ├── e2e/cli/ # vitest-driven E2E suite for the scaffolder & build | ||
| ├── assets/ # README / OG images | ||
| └── turbo.json # build / test orchestration | ||
| ``` | ||
|
|
||
| `cli-internal`, `studio-app`, and `e2e/cli` are private and never published. | ||
|
|
||
| ## Development setup | ||
|
|
||
| Please use **Node.js 24 (Preferably the latest) ** and **pnpm 10.21+**. | ||
|
|
||
| ```bash | ||
| git clone https://github.com/arkorlab/arkor.git | ||
| cd arkor | ||
| pnpm install | ||
| pnpm build # turbo run build (covers all packages) | ||
| pnpm test # unit tests across the monorepo | ||
| pnpm typecheck # tsc across the monorepo | ||
| ``` | ||
|
|
||
| To work on a specific package: | ||
|
|
||
| ```bash | ||
| pnpm --filter arkor dev # tsdown --watch on the SDK/CLI | ||
| pnpm --filter @arkor/studio-app dev # vite dev server for the Studio SPA | ||
| pnpm --filter create-arkor dev # tsdown --watch on the scaffolder | ||
| ``` | ||
|
|
||
| To run the E2E scaffolder/build suite (slow; spawns real CLIs in temp dirs): | ||
|
|
||
| ```bash | ||
| pnpm --filter @arkor/e2e-cli test | ||
| # Skip the `<pm> install` step inside fixtures: | ||
| SKIP_E2E_INSTALL=1 pnpm --filter @arkor/e2e-cli test | ||
| ``` | ||
|
|
||
| ## Trying your local build | ||
|
|
||
| The fastest loop is to scaffold a fresh project pointing at the workspace build: | ||
|
|
||
| ```bash | ||
| pnpm build | ||
| cd /tmp && node /path/to/arkor/packages/create-arkor/dist/bin.mjs my-arkor-app | ||
| cd my-arkor-app && pnpm dev | ||
| ``` | ||
|
|
||
| Studio runs at `http://127.0.0.1:4000` with a CSRF token injected per launch. | ||
|
|
||
| ## Pull request guidelines | ||
|
|
||
| - **One concern per PR.** Smaller diffs land faster. | ||
| - **Tests where the surface is testable.** SDK / CLI / scaffolder logic should have a vitest case. Studio UI changes can be PR'd with a screenshot or short clip. | ||
| - **Breaking changes are fine** during alpha. We don't ship compatibility shims between `0.0.x` versions, so just note them in the PR description and the changelog stays honest. | ||
| - **Don't reintroduce removed verbs.** `arkor train`, `arkor deploy`, `arkor jobs`, and `arkor logs` were removed deliberately. Training and deploying are TS configs that run when the entrypoint executes, not CLI verbs. The CLI surface is `dev` / `build` / `start` plus auth. | ||
|
|
||
| ## Reporting bugs and security issues | ||
|
|
||
| - **Bugs**: [GitHub Issues](https://github.com/arkorlab/arkor/issues/new) with steps to reproduce, expected vs actual, and your Node + pnpm versions. | ||
| - **Security**: please email security@arkor.ai instead of filing a public issue. We'll acknowledge within 48 hours. | ||
|
|
||
| ## Code of conduct | ||
|
|
||
| Be kind, assume good faith, and keep technical disagreement technical. Anything else (harassment, personal attacks, exclusionary behavior) is grounds for being asked to leave. The maintainers' call is final. | ||
|
|
||
| ## License | ||
|
|
||
| By contributing, you agree your contributions are licensed under the [MIT license](LICENSE.md). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,78 +1,194 @@ | ||
| # Arkor | ||
|
|
||
| > Fine-tune and deploy open-weight models with TypeScript. | ||
|
|
||
| Arkor is a TypeScript framework for improving and shipping custom open-weight | ||
| models. The audience is product engineers who already build with TypeScript / | ||
| Next.js and want custom model behaviour without standing up an ML | ||
| infrastructure team. Arkor handles GPUs, fine-tuning, and serving underneath | ||
| so the user's job stays "write some TypeScript". | ||
|
|
||
| > Status: alpha (`0.0.1-alpha.0`). Public APIs may change without notice. | ||
| <p align="center"> | ||
| <picture> | ||
| <source media="(prefers-color-scheme: dark)" srcset="assets/logo-dark.svg"> | ||
| <img src="assets/logo.svg" alt="Arkor" width="96"> | ||
| </picture> | ||
| </p> | ||
|
|
||
| <h1 align="center">Arkor</h1> | ||
|
|
||
| <h3 align="center">The TypeScript framework for fine-tuning open-weight LLMs</h3> | ||
|
|
||
| <p align="center"> | ||
| Ship custom open-weight models the same way you ship your TypeScript app. | ||
| Type-safe configs, hot reload, a local Studio (web UI) to start and watch runs, and managed GPUs. | ||
| </p> | ||
|
|
||
| <p align="center"> | ||
| <a href="https://www.npmjs.com/package/arkor"><img src="https://img.shields.io/npm/v/arkor?label=arkor&color=000" alt="npm"></a> | ||
| <a href="LICENSE.md"><img src="https://img.shields.io/badge/license-MIT-000" alt="MIT"></a> | ||
| <img src="https://img.shields.io/badge/node-%E2%89%A522.6-000" alt="node ≥22.6"> | ||
| <img src="https://img.shields.io/badge/status-alpha-orange" alt="alpha"> | ||
| <a href="https://discord.gg/YujCZYGrEZ"><img src="https://img.shields.io/badge/discord-join-5865F2" alt="Discord"></a> | ||
| </p> | ||
|
|
||
| <p align="center"> | ||
| <a href="https://arkor.ai/docs"><strong>Docs</strong></a> · | ||
| <a href="#quickstart"><strong>Quickstart</strong></a> · | ||
| <a href="#why-arkor"><strong>Why Arkor</strong></a> · | ||
| </p> | ||
|
|
||
| > [!WARNING] | ||
| > Arkor is **alpha** (`0.0.1-alpha.0`). APIs change without notice. We're shipping in public, and feedback shapes what lands next. | ||
|
|
||
| <!-- | ||
| Demo media goes here once recorded: | ||
| - assets/demo-cli.gif Terminalizer: pnpm create arkor → pnpm dev | ||
| - assets/demo-studio.gif Screen recording: Run Training → loss curve → Playground | ||
| --> | ||
|
|
||
| ## Quickstart | ||
|
|
||
| ```bash | ||
| pnpm create arkor my-app | ||
| cd my-app | ||
| pnpm install | ||
| pnpm arkor login # Auth0 PKCE flow; --anonymous also works | ||
| pnpm arkor dev # opens the local Studio GUI on http://127.0.0.1:4000 | ||
| pnpm create arkor my-arkor-app | ||
| cd my-arkor-app | ||
| pnpm dev | ||
| ``` | ||
|
|
||
| `arkor dev` is the primary surface — it starts a local Studio with hot | ||
| reload over your TypeScript and a GUI for running training, inspecting jobs, | ||
| and trying out checkpoints in a Playground. | ||
| That's the whole setup. | ||
| **No signup required:** `arkor dev` opens **Studio**, a local web UI at `http://127.0.0.1:4000`, and silently bootstraps an anonymous workspace so you can fire off a real training run right away. | ||
|
|
||
| CLI-only flow (no GUI): | ||
| Run `arkor login` later if you want to claim your work under an account. | ||
|
|
||
| ```bash | ||
| pnpm arkor build # bundles src/arkor/ into .arkor/build/index.mjs | ||
| pnpm arkor start # runs the build artifact on the cloud | ||
| ### Pick a template | ||
|
|
||
| The scaffolder asks which template you want. | ||
| All three start from the same small open-weight base (`unsloth/gemma-4-E4B-it`) so the first run finishes quickly. | ||
|
|
||
| | Template | What it shows | Dataset | | ||
| | --------- | ------------------------------------------------------------------- | ---------------------------------- | | ||
| | `minimal` | The smallest working `createTrainer({ ... })` call. | `yahma/alpaca-cleaned` (500 rows) | | ||
| | `alpaca` | Instruction-tuning with mid-training `infer()` on every checkpoint. | `yahma/alpaca-cleaned` (1000 rows) | | ||
| | `chatml` | Multi-turn chat fine-tuning over a real chat dataset. | `stingning/ultrachat` (500 rows) | | ||
|
|
||
| Skip the prompt with `pnpm create arkor my-arkor-app --template alpaca`. | ||
|
|
||
| ## Why Arkor | ||
|
|
||
| Custom open-weight models are a real option today because of years of work in the Python ML ecosystem and the people and companies who built it out. | ||
| Arkor stands on that foundation. | ||
|
|
||
| What we wanted, and didn't find, was a path that fits how TypeScript and Node developers already work: a workflow where fine-tuning, evaluation, and serving live in the same codebase as the product, with the same editor, types, and review flow. | ||
|
|
||
| Type-safe configs instead of separate config files. Hot reload over your training code. A local Studio for the dev loop. | ||
|
|
||
| The phrase we keep coming back to: **ship the model the same way you ship the product.** If that sounds right, you're the audience. | ||
|
|
||
| ## What works today | ||
|
|
||
| - ✅ **Fine-tune an open-weight LLM from one file.** `createTrainer({ model, dataset, lora, ... })` runs LoRA training on the base model you point it at. | ||
| - ✅ **Pull data from HuggingFace, or bring your own URL.** The `dataset` field accepts any HF name (with optional `split`) or a blob URL to a JSONL file. | ||
| - ✅ **React to training in code, not in a dashboard.** Lifecycle callbacks (`onStarted`, `onLog`, `onCheckpoint`, `onCompleted`, `onFailed`) fire as the run streams from the cloud, fully typed. | ||
| - ✅ **Sanity-check the model before the run finishes.** Inside `onCheckpoint`, call `infer({ messages })` against the model as it's being trained. | ||
| - ✅ **Watch the run in a local Studio.** `arkor dev` opens a UI with a jobs list, live loss chart, log tail, and a Playground for chatting with your fine-tuned models. | ||
| - ✅ **Try it without an account.** Anonymous workspace by default; run `arkor login` (Auth0 PKCE) to claim your work later. | ||
|
|
||
| ## What's coming next | ||
|
|
||
| - ⏳ **Deploy a fine-tuned model as an inference endpoint** with `createDeploy(...)`. | ||
| - ⏳ **Run evaluations on every checkpoint** with `createEval(...)`. | ||
| - ⏳ **Bring your own datasets and base models.** CSV / JSONL uploads and custom HuggingFace base models. | ||
| - ⏳ **Team and multi-org workspaces.** | ||
| - ⏳ **Self-host the training backend.** Today we host it. | ||
|
|
||
| ## A taste of the API | ||
|
|
||
| ```ts | ||
| // src/arkor/trainer.ts | ||
| import { createTrainer } from "arkor"; | ||
|
|
||
| export const trainer = createTrainer({ | ||
| name: "support-bot-v1", | ||
| model: "unsloth/gemma-4-E4B-it", | ||
| dataset: { type: "huggingface", name: "yahma/alpaca-cleaned", split: "train[:1000]" }, | ||
| lora: { r: 16, alpha: 16 }, | ||
| maxSteps: 100, | ||
| callbacks: { | ||
| onLog: ({ step, loss }) => console.log(`step=${step} loss=${loss}`), | ||
| onCheckpoint: async ({ step, infer }) => { | ||
| const res = await infer({ messages: [{ role: "user", content: "Hello!" }] }); | ||
| console.log(`ckpt @ ${step}:`, await res.text()); | ||
| }, | ||
| }, | ||
| }); | ||
| ``` | ||
|
|
||
| ```ts | ||
| // src/arkor/index.ts ← discovered by `arkor dev` / `arkor build` | ||
| import { createArkor } from "arkor"; | ||
| import { trainer } from "./trainer"; | ||
|
|
||
| export const arkor = createArkor({ trainer }); | ||
| ``` | ||
|
|
||
| `src/arkor/index.ts` is the file the CLI and Studio look for. | ||
| Your `trainer` lives in a sibling file and is registered through `createArkor`. `deploy` and `eval` will work the same way. | ||
|
|
||
| To add a new one, drop a file and register it; no scaffolder rerun needed. | ||
|
|
||
| <!-- | ||
| Studio screenshots go here once captured: | ||
| - assets/studio-jobs.png Jobs list | ||
| - assets/studio-chart.png Live loss + log tail | ||
| - assets/studio-playground.png Playground chat | ||
| --> | ||
|
|
||
| ## What's in a project | ||
|
|
||
| ``` | ||
| my-app/ | ||
| my-arkor-app/ | ||
| ├── src/arkor/ | ||
| │ ├── index.ts # umbrella — `createArkor({ trainer })` | ||
| │ └── trainer.ts # `createTrainer({ name, model, dataset, ... })` | ||
| ├── arkor.config.ts # training defaults | ||
| ├── .arkor/ # state + build artifact (gitignored) | ||
| └── package.json | ||
| │ ├── index.ts # createArkor({ trainer }) ← discovered by the CLI / Studio | ||
| │ └── trainer.ts # createTrainer({ ... }) | ||
| ├── arkor.config.ts | ||
| ├── .arkor/ # state + build artifacts (gitignored) | ||
| └── package.json # dev / build / start | ||
| ``` | ||
|
|
||
| The umbrella is what the CLI and Studio discover. Per-role primitives — | ||
| `trainer` today, `deploy` and `eval` later — live in sibling files and get | ||
| gathered on `createArkor`. Adding a new primitive is "drop a file, register | ||
| it on the umbrella": no scaffold change required. | ||
|
|
||
| ## CLI | ||
|
|
||
| | Command | Purpose | | ||
| |---|---| | ||
| | `arkor init` | Scaffold a new project in the current directory | | ||
| | `arkor login` / `logout` / `whoami` | Auth0 PKCE / anonymous tokens | | ||
| | `arkor dev` | Launch the local Studio (hot reload + GUI) | | ||
| | `arkor build` | Bundle `src/arkor/index.ts` to `.arkor/build/index.mjs` | | ||
| | `arkor start` | Run the build artifact (auto-builds when missing) | | ||
| | Command | Purpose | | ||
| | ------------------------------------ | ---------------------------------------------------------------------- | | ||
| | `arkor init` | Scaffold a new project in the current directory | | ||
| | `arkor login` / `logout` / `whoami` | Auth0 PKCE / anonymous tokens | | ||
| | `arkor dev` | Launch the local Studio web UI (with hot reload) | | ||
| | `arkor build` | Bundle `src/arkor/index.ts` to `.arkor/build/index.mjs` | | ||
| | `arkor start` | Run the build artifact (auto-builds when missing) | | ||
|
|
||
| `pnpm dev` resolves to `arkor dev` in scaffolded projects, so most workflows live behind that one command. | ||
|
|
||
| ## Architecture | ||
|
|
||
| `arkor dev` boots a [Hono](https://hono.dev) server on `127.0.0.1:4000` that hot-reloads your code and serves a Vite + React SPA from the same origin. | ||
|
|
||
| The SPA talks to your code via per-launch CSRF-token-gated `/api/*` routes (loopback-only, with a `Host` header guard against DNS rebinding); your code talks to the Arkor training backend over authenticated HTTPS. | ||
|
|
||
| Training runs on managed GPUs; checkpoints stream back as SSE events that fire your `callbacks.*` in process. | ||
|
|
||
| ## Repository | ||
|
|
||
| | Package | What it is | | ||
| | ---------------------------------------------- | ------------------------------------------- | | ||
| | [`arkor`](packages/arkor) | SDK + CLI + bundled local Studio | | ||
| | [`create-arkor`](packages/create-arkor) | `pnpm create arkor` scaffolder | | ||
|
|
||
| Requires Node.js 22.6+. | ||
| (Please use Node.js 24, preferably the latest version, for contributing to this repository.) | ||
|
|
||
| Works with pnpm / npm / yarn / bun. | ||
|
|
||
| `pnpm dev` resolves to `arkor dev` in scaffolded projects, so most workflows | ||
| live behind that one command. | ||
| ## We're shipping in public | ||
|
|
||
| ## Packages | ||
| Arkor is alpha, and the core idea (TypeScript-native fine-tuning for product engineers) is something we want to design *with* the people who'd use it. If that's you: | ||
|
|
||
| | Package | What it is | | ||
| |---|---| | ||
| | [`arkor`](packages/arkor) | The SDK + CLI + bundled local Studio | | ||
| | [`create-arkor`](packages/create-arkor) | `pnpm create arkor` scaffolder | | ||
| - **[File an issue](https://github.com/arkorlab/arkor/issues/new)** with the model + dataset + workflow you wish worked. We read everything. | ||
| - **Star the repo** if you want updates as we move toward `0.1`. | ||
| - **[Join Discord](https://discord.gg/YujCZYGrEZ)** for live discussion and early-access pings. | ||
|
|
||
| ## Requirements | ||
| We're especially curious about: which open-weight base models you'd reach for first, what you'd want from `createDeploy` / `createEval`, and what breaks when you try the alpha. | ||
|
|
||
| - Node.js 22.6+ (the SDK relies on stable APIs from that line) | ||
| - pnpm / npm / yarn / bun all work for installs | ||
| See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup. | ||
|
|
||
| ## License | ||
|
|
||
| MIT — see [LICENSE.md](LICENSE.md). | ||
| [MIT](LICENSE.md). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.