Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Contributing to Arkor

Thanks for your interest! Arkor is in **alpha**: we're moving fast, breaking things on purpose, and the core idea (TypeScript-native fine-tuning for product engineers) is something we want to design *with* the people who'd use it. Issues, discussion, and PRs are all welcome.

## Ways to help

| Effort | What's most useful |
| ----------------------- | ----------------------------------------------------------------------------------- |
| **5 min** | Try the [Quickstart](README.md#quickstart) and [open an issue](https://github.com/arkorlab/arkor/issues/new) about anything that confused you, broke, or felt un-TypeScript. |
| **An afternoon** | Pick up a [`good first issue`](https://github.com/arkorlab/arkor/labels/good%20first%20issue) or send a small PR (doc fixes, template tweaks, error-message polish). |
| **Ongoing** | Hop into [Discord](https://discord.gg/YujCZYGrEZ) and tell us what model + dataset + workflow you wish worked. We use this to prioritize. |

If you have an idea for a non-trivial change (new SDK factory, CLI command, Studio view), please open an issue first so we can align on the API shape before you write code.

## Repo layout

```
arkor/
├── packages/
│ ├── arkor/ # SDK + CLI + bundled local Studio (published to npm)
│ ├── create-arkor/ # `pnpm create arkor` scaffolder (published to npm)
│ ├── cli-internal/ # private helpers shared by arkor + create-arkor
│ └── studio-app/ # Vite + React SPA bundled into `arkor`
├── e2e/cli/ # vitest-driven E2E suite for the scaffolder & build
├── assets/ # README / OG images
└── turbo.json # build / test orchestration
```

`cli-internal`, `studio-app`, and `e2e/cli` are private and never published.

## Development setup

Please use **Node.js 24 (Preferably the latest) ** and **pnpm 10.21+**.

```bash
Comment thread
greptile-apps[bot] marked this conversation as resolved.
git clone https://github.com/arkorlab/arkor.git
cd arkor
pnpm install
pnpm build # turbo run build (covers all packages)
pnpm test # unit tests across the monorepo
pnpm typecheck # tsc across the monorepo
```

To work on a specific package:

```bash
pnpm --filter arkor dev # tsdown --watch on the SDK/CLI
pnpm --filter @arkor/studio-app dev # vite dev server for the Studio SPA
pnpm --filter create-arkor dev # tsdown --watch on the scaffolder
```

To run the E2E scaffolder/build suite (slow; spawns real CLIs in temp dirs):

```bash
pnpm --filter @arkor/e2e-cli test
# Skip the `<pm> install` step inside fixtures:
SKIP_E2E_INSTALL=1 pnpm --filter @arkor/e2e-cli test
```

## Trying your local build

The fastest loop is to scaffold a fresh project pointing at the workspace build:

```bash
pnpm build
cd /tmp && node /path/to/arkor/packages/create-arkor/dist/bin.mjs my-arkor-app
cd my-arkor-app && pnpm dev
```

Studio runs at `http://127.0.0.1:4000` with a CSRF token injected per launch.

## Pull request guidelines

- **One concern per PR.** Smaller diffs land faster.
- **Tests where the surface is testable.** SDK / CLI / scaffolder logic should have a vitest case. Studio UI changes can be PR'd with a screenshot or short clip.
- **Breaking changes are fine** during alpha. We don't ship compatibility shims between `0.0.x` versions, so just note them in the PR description and the changelog stays honest.
- **Don't reintroduce removed verbs.** `arkor train`, `arkor deploy`, `arkor jobs`, and `arkor logs` were removed deliberately. Training and deploying are TS configs that run when the entrypoint executes, not CLI verbs. The CLI surface is `dev` / `build` / `start` plus auth.

## Reporting bugs and security issues

- **Bugs**: [GitHub Issues](https://github.com/arkorlab/arkor/issues/new) with steps to reproduce, expected vs actual, and your Node + pnpm versions.
- **Security**: please email security@arkor.ai instead of filing a public issue. We'll acknowledge within 48 hours.

## Code of conduct

Be kind, assume good faith, and keep technical disagreement technical. Anything else (harassment, personal attacks, exclusionary behavior) is grounds for being asked to leave. The maintainers' call is final.

## License

By contributing, you agree your contributions are licensed under the [MIT license](LICENSE.md).
220 changes: 168 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,194 @@
# Arkor

> Fine-tune and deploy open-weight models with TypeScript.

Arkor is a TypeScript framework for improving and shipping custom open-weight
models. The audience is product engineers who already build with TypeScript /
Next.js and want custom model behaviour without standing up an ML
infrastructure team. Arkor handles GPUs, fine-tuning, and serving underneath
so the user's job stays "write some TypeScript".

> Status: alpha (`0.0.1-alpha.0`). Public APIs may change without notice.
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="assets/logo-dark.svg">
<img src="assets/logo.svg" alt="Arkor" width="96">
</picture>
</p>

<h1 align="center">Arkor</h1>

<h3 align="center">The TypeScript framework for fine-tuning open-weight LLMs</h3>

<p align="center">
Ship custom open-weight models the same way you ship your TypeScript app.
Type-safe configs, hot reload, a local Studio (web UI) to start and watch runs, and managed GPUs.
</p>

<p align="center">
<a href="https://www.npmjs.com/package/arkor"><img src="https://img.shields.io/npm/v/arkor?label=arkor&color=000" alt="npm"></a>
<a href="LICENSE.md"><img src="https://img.shields.io/badge/license-MIT-000" alt="MIT"></a>
<img src="https://img.shields.io/badge/node-%E2%89%A522.6-000" alt="node ≥22.6">
<img src="https://img.shields.io/badge/status-alpha-orange" alt="alpha">
<a href="https://discord.gg/YujCZYGrEZ"><img src="https://img.shields.io/badge/discord-join-5865F2" alt="Discord"></a>
</p>

<p align="center">
<a href="https://arkor.ai/docs"><strong>Docs</strong></a> &nbsp;·&nbsp;
<a href="#quickstart"><strong>Quickstart</strong></a> &nbsp;·&nbsp;
<a href="#why-arkor"><strong>Why Arkor</strong></a> &nbsp;·&nbsp;
</p>

> [!WARNING]
> Arkor is **alpha** (`0.0.1-alpha.0`). APIs change without notice. We're shipping in public, and feedback shapes what lands next.

<!--
Demo media goes here once recorded:
- assets/demo-cli.gif Terminalizer: pnpm create arkor → pnpm dev
- assets/demo-studio.gif Screen recording: Run Training → loss curve → Playground
-->

## Quickstart

```bash
pnpm create arkor my-app
cd my-app
pnpm install
pnpm arkor login # Auth0 PKCE flow; --anonymous also works
pnpm arkor dev # opens the local Studio GUI on http://127.0.0.1:4000
pnpm create arkor my-arkor-app
cd my-arkor-app
pnpm dev
```

`arkor dev` is the primary surface — it starts a local Studio with hot
reload over your TypeScript and a GUI for running training, inspecting jobs,
and trying out checkpoints in a Playground.
That's the whole setup.
**No signup required:** `arkor dev` opens **Studio**, a local web UI at `http://127.0.0.1:4000`, and silently bootstraps an anonymous workspace so you can fire off a real training run right away.

CLI-only flow (no GUI):
Run `arkor login` later if you want to claim your work under an account.

```bash
pnpm arkor build # bundles src/arkor/ into .arkor/build/index.mjs
pnpm arkor start # runs the build artifact on the cloud
### Pick a template

The scaffolder asks which template you want.
All three start from the same small open-weight base (`unsloth/gemma-4-E4B-it`) so the first run finishes quickly.

| Template | What it shows | Dataset |
| --------- | ------------------------------------------------------------------- | ---------------------------------- |
| `minimal` | The smallest working `createTrainer({ ... })` call. | `yahma/alpaca-cleaned` (500 rows) |
| `alpaca` | Instruction-tuning with mid-training `infer()` on every checkpoint. | `yahma/alpaca-cleaned` (1000 rows) |
| `chatml` | Multi-turn chat fine-tuning over a real chat dataset. | `stingning/ultrachat` (500 rows) |

Skip the prompt with `pnpm create arkor my-arkor-app --template alpaca`.

## Why Arkor

Custom open-weight models are a real option today because of years of work in the Python ML ecosystem and the people and companies who built it out.
Arkor stands on that foundation.

What we wanted, and didn't find, was a path that fits how TypeScript and Node developers already work: a workflow where fine-tuning, evaluation, and serving live in the same codebase as the product, with the same editor, types, and review flow.

Type-safe configs instead of separate config files. Hot reload over your training code. A local Studio for the dev loop.

The phrase we keep coming back to: **ship the model the same way you ship the product.** If that sounds right, you're the audience.

## What works today

- ✅ **Fine-tune an open-weight LLM from one file.** `createTrainer({ model, dataset, lora, ... })` runs LoRA training on the base model you point it at.
- ✅ **Pull data from HuggingFace, or bring your own URL.** The `dataset` field accepts any HF name (with optional `split`) or a blob URL to a JSONL file.
- ✅ **React to training in code, not in a dashboard.** Lifecycle callbacks (`onStarted`, `onLog`, `onCheckpoint`, `onCompleted`, `onFailed`) fire as the run streams from the cloud, fully typed.
- ✅ **Sanity-check the model before the run finishes.** Inside `onCheckpoint`, call `infer({ messages })` against the model as it's being trained.
- ✅ **Watch the run in a local Studio.** `arkor dev` opens a UI with a jobs list, live loss chart, log tail, and a Playground for chatting with your fine-tuned models.
- ✅ **Try it without an account.** Anonymous workspace by default; run `arkor login` (Auth0 PKCE) to claim your work later.

## What's coming next

- ⏳ **Deploy a fine-tuned model as an inference endpoint** with `createDeploy(...)`.
- ⏳ **Run evaluations on every checkpoint** with `createEval(...)`.
- ⏳ **Bring your own datasets and base models.** CSV / JSONL uploads and custom HuggingFace base models.
- ⏳ **Team and multi-org workspaces.**
- ⏳ **Self-host the training backend.** Today we host it.

## A taste of the API

```ts
// src/arkor/trainer.ts
import { createTrainer } from "arkor";

export const trainer = createTrainer({
name: "support-bot-v1",
model: "unsloth/gemma-4-E4B-it",
dataset: { type: "huggingface", name: "yahma/alpaca-cleaned", split: "train[:1000]" },
lora: { r: 16, alpha: 16 },
maxSteps: 100,
callbacks: {
onLog: ({ step, loss }) => console.log(`step=${step} loss=${loss}`),
onCheckpoint: async ({ step, infer }) => {
const res = await infer({ messages: [{ role: "user", content: "Hello!" }] });
console.log(`ckpt @ ${step}:`, await res.text());
},
},
});
```

```ts
// src/arkor/index.ts ← discovered by `arkor dev` / `arkor build`
import { createArkor } from "arkor";
import { trainer } from "./trainer";

export const arkor = createArkor({ trainer });
```

`src/arkor/index.ts` is the file the CLI and Studio look for.
Your `trainer` lives in a sibling file and is registered through `createArkor`. `deploy` and `eval` will work the same way.

To add a new one, drop a file and register it; no scaffolder rerun needed.

<!--
Studio screenshots go here once captured:
- assets/studio-jobs.png Jobs list
- assets/studio-chart.png Live loss + log tail
- assets/studio-playground.png Playground chat
-->

## What's in a project

```
my-app/
my-arkor-app/
├── src/arkor/
│ ├── index.ts # umbrella — `createArkor({ trainer })`
│ └── trainer.ts # `createTrainer({ name, model, dataset, ... })`
├── arkor.config.ts # training defaults
├── .arkor/ # state + build artifact (gitignored)
└── package.json
│ ├── index.ts # createArkor({ trainer }) ← discovered by the CLI / Studio
│ └── trainer.ts # createTrainer({ ... })
├── arkor.config.ts
├── .arkor/ # state + build artifacts (gitignored)
└── package.json # dev / build / start
```

The umbrella is what the CLI and Studio discover. Per-role primitives —
`trainer` today, `deploy` and `eval` later — live in sibling files and get
gathered on `createArkor`. Adding a new primitive is "drop a file, register
it on the umbrella": no scaffold change required.

## CLI

| Command | Purpose |
|---|---|
| `arkor init` | Scaffold a new project in the current directory |
| `arkor login` / `logout` / `whoami` | Auth0 PKCE / anonymous tokens |
| `arkor dev` | Launch the local Studio (hot reload + GUI) |
| `arkor build` | Bundle `src/arkor/index.ts` to `.arkor/build/index.mjs` |
| `arkor start` | Run the build artifact (auto-builds when missing) |
| Command | Purpose |
| ------------------------------------ | ---------------------------------------------------------------------- |
| `arkor init` | Scaffold a new project in the current directory |
| `arkor login` / `logout` / `whoami` | Auth0 PKCE / anonymous tokens |
| `arkor dev` | Launch the local Studio web UI (with hot reload) |
| `arkor build` | Bundle `src/arkor/index.ts` to `.arkor/build/index.mjs` |
| `arkor start` | Run the build artifact (auto-builds when missing) |

`pnpm dev` resolves to `arkor dev` in scaffolded projects, so most workflows live behind that one command.

## Architecture

`arkor dev` boots a [Hono](https://hono.dev) server on `127.0.0.1:4000` that hot-reloads your code and serves a Vite + React SPA from the same origin.

The SPA talks to your code via per-launch CSRF-token-gated `/api/*` routes (loopback-only, with a `Host` header guard against DNS rebinding); your code talks to the Arkor training backend over authenticated HTTPS.

Training runs on managed GPUs; checkpoints stream back as SSE events that fire your `callbacks.*` in process.

## Repository

| Package | What it is |
| ---------------------------------------------- | ------------------------------------------- |
| [`arkor`](packages/arkor) | SDK + CLI + bundled local Studio |
| [`create-arkor`](packages/create-arkor) | `pnpm create arkor` scaffolder |

Requires Node.js 22.6+.
(Please use Node.js 24, preferably the latest version, for contributing to this repository.)

Works with pnpm / npm / yarn / bun.

`pnpm dev` resolves to `arkor dev` in scaffolded projects, so most workflows
live behind that one command.
## We're shipping in public

## Packages
Arkor is alpha, and the core idea (TypeScript-native fine-tuning for product engineers) is something we want to design *with* the people who'd use it. If that's you:

| Package | What it is |
|---|---|
| [`arkor`](packages/arkor) | The SDK + CLI + bundled local Studio |
| [`create-arkor`](packages/create-arkor) | `pnpm create arkor` scaffolder |
- **[File an issue](https://github.com/arkorlab/arkor/issues/new)** with the model + dataset + workflow you wish worked. We read everything.
- **Star the repo** if you want updates as we move toward `0.1`.
- **[Join Discord](https://discord.gg/YujCZYGrEZ)** for live discussion and early-access pings.

## Requirements
We're especially curious about: which open-weight base models you'd reach for first, what you'd want from `createDeploy` / `createEval`, and what breaks when you try the alpha.

- Node.js 22.6+ (the SDK relies on stable APIs from that line)
- pnpm / npm / yarn / bun all work for installs
See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup.

## License

MIT — see [LICENSE.md](LICENSE.md).
[MIT](LICENSE.md).
10 changes: 10 additions & 0 deletions assets/logo-dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions assets/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.