feat(media): image/video/audio project kinds via od media generate by pftom · Pull Request #11 · nexu-io/open-design

pftom · 2026-04-28T14:41:47Z

Summary

Adds non-web media surfaces (image, video, audio) as first-class project
kinds. The unifying contract is:

skill workflow + project metadata tell the agent what to make;
one shell command — od media generate — is how bytes are produced.

This keeps the design tool-agnostic: any code-agent CLI with shell access
(Claude Code, Codex, Gemini, OpenCode, Cursor Agent, Qwen, …) can drive
media generation without bespoke tool integrations.

Changes

Frontend

New Project panel gains Image / Video / Audio tabs with model picker,
aspect / length / duration controls, audio-kind and voice selection.
Examples and Design Systems tabs gain layered sections so the
new media skills sit alongside prototype / slides / interactive video.
FileViewer renders generated image/*, video/*, and audio/*
files inline (next to the existing HTML preview / source views).
Icons and i18n strings (en + zh-CN) added for the new surfaces.

Shared registry

+ 'src/media/models.ts' + is the single source of truth for image / video /
audio model IDs, aspects, and defaults. Both the picker and the daemon
dispatcher consume it so they cannot drift.

Prompts

+ 'src/prompts/media-contract.ts' + is pinned last in the system prompt for
media surfaces. Its hard rules (call + 'od media generate' + , do not embed
binary in + '' + , allowed model IDs per surface) override any
softer wording earlier in the prompt stack.

Daemon

New + 'daemon/media.js' + dispatcher + + 'daemon/media-models.js' + JSON view of the
registry.
+ 'daemon/cli.js' + exposes + 'od media generate' + as a subcommand, wired through
+ 'server.js' + / + 'projects.js' + so the daemon writes generated files back into
the project dir and the FileViewer picks them up automatically.

Skills

Seed skills for the three surfaces: + 'audio-jingle' + , + 'image-poster' + ,
+ 'video-shortform' + — each with a + 'SKILL.md' + workflow and a representative
+ 'example.html' + thumbnail.

Provider note

The provider integrations behind specific model IDs (gpt-image-2,
seedance-2, suno-v5, …) may still be stubs — the dispatcher returns
success and a placeholder file. The contract stays the same; bytes get
sharper as real provider integrations land.

Test plan

+ 'pnpm install' + and + 'pnpm typecheck' + pass after the media additions.
+ 'pnpm dev:all' + boots; new project panel shows Image / Video / Audio tabs.
Creating an image / video / audio project lands the agent in a
project where the system prompt ends with the media contract.
+ 'od media generate' + returns a JSON line and writes a file under
+ 'OD_PROJECT_DIR' + ; FileViewer renders it.
Examples tab and Design Systems tab still render correctly with
the new layered sections.

Made with Cursor

- Updated project name in package.json, package-lock.json, and README files. - Changed CLI commands and references from "ocd" to "od". - Adjusted file structure references in documentation and code to reflect new naming conventions. - Enhanced .gitignore to include new runtime data files. - Updated metadata in LICENSE file to match new project name.

- Introduced CONTRIBUTING.md and CONTRIBUTING.zh-CN.md to provide clear instructions for contributors. - Outlined contribution types, local setup instructions, and merging criteria for skills and design systems. - Enhanced README files to reference the new contributing guidelines.

- Clarified DECK_FRAMEWORK_DIRECTIVE description in both English and Chinese README files to specify conditions for deck kind without a skill seed. - Added detailed workflow instructions in deck-framework.ts to emphasize the importance of copying the framework before adding content. - Enhanced discovery.ts to reinforce the framework-first approach for deck projects. - Updated system.ts to ensure proper handling of deck projects with and without bound skills, preventing re-authorship of scaling and navigation logic.

… into feat/optimize-naming

- Added a "Star us" section in both English and Chinese README files to encourage users to star the project on GitHub. - Included a new image asset for the star promotion. - Introduced a new HTML file for a dedicated star promotion page. - Updated .gitignore to exclude new cursor-related files.

… generate dispatcher Extends Open Design from web-only to a multi-modal creation tool. The unifying contract is one code-agent loop driven by skills + project metadata + prompt constraints; for non-web surfaces the agent shells out to a single dispatcher (`od media generate`) that the daemon routes per (surface, model). - Types: new Surface union, MediaAspect / AudioKind, image/video/audio ProjectKind + ProjectMetadata fields, video/audio ProjectFileKind. - NewProjectPanel: top-level surface picker + Image / Video / Audio forms with model, aspect, length, duration, voice, audio-kind pickers. - ExamplesTab + DesignSystemsTab: surface filter row that scopes before mode / scenario / category filters. - FileViewer / FileWorkspace: native <video> and <audio> previews and matching tab icons. - Daemon: parses `od.surface` and `> Surface:` blockquotes; recognises mp4 / webm / mov / mp3 / wav / ogg / m4a / flac extensions; spawns agents with OD_BIN / OD_DAEMON_URL / OD_PROJECT_ID / OD_PROJECT_DIR env so any code-agent CLI with shell access can call the dispatcher. - daemon/media.js + daemon/media-models.js: surface-agnostic dispatcher with stub providers that emit deterministic placeholder bytes (1x1 PNG, valid mp4 ftyp, mp3 frame / silent WAV) so the framework works without API keys; real provider integrations slot in later. - daemon/cli.js: `od media generate --surface ... --model ...` subcommand routes to POST /api/projects/:id/media/generate and prints one JSON line for the agent to parse. - prompts/media-contract.ts: hard contract pinned LAST in the system prompt for image/video/audio surfaces — env vars, exact invocation, registered model IDs per surface, six workflow rules. system.ts metadata block updated to point at the contract. - Seed skills: image-poster, video-shortform, audio-jingle each ship a SKILL.md with `mode/surface: image|video|audio` and a stylized example.html preview, and instruct the agent to dispatch via the contract. Made-with: Cursor

Introduce non-web media surfaces (image, video, audio) as first-class project kinds. The unifying contract is "skill workflow + project metadata tell the agent WHAT to make; one shell command — od media generate — is HOW bytes are produced", so any code-agent CLI with shell access can drive it without bespoke tools. - Frontend: New Project panel gains Image/Video/Audio tabs with model picker, aspect/length/duration controls, and audio kind/voice selection. Examples and Design Systems tabs gain layered sections. FileViewer renders the generated image/video/audio files. - Shared registry: src/media/models.ts is the single source of truth for image/video/audio model IDs, aspects, and defaults — consumed by the picker AND the daemon dispatcher. - Prompts: media-contract.ts is pinned LAST in the system prompt for media surfaces so its hard rules (call od media generate, don't emit binary in <artifact>, allowed model IDs) win over softer earlier wording. - Daemon: new media.js dispatcher + media-models.js JSON view of the registry; cli.js gets the `od media generate` subcommand wired up via server.js / projects.js so the daemon writes files back into the project dir. - Skills: audio-jingle, image-poster, video-shortform seed examples for the three surfaces. Made-with: Cursor

Bring in the parallel media-surfaces branch from PR #12. Tree is already identical to HEAD (same od media generate work landed independently), so this is a history-only merge to consolidate the two branches.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 976a6eadf2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

lefarcen

Review summary (COMMENT — not approving)

Headline: PR #11 and PR #12 are byte-identical duplicates. I ran diff <(gh pr diff 11) <(gh pr diff 12) — same 3747-line diff, same 28 files, same +2902/-78. Both branches (cursor/289994c1, cursor/47ca13ab) are Cursor worktrees from @pftom, opened ~14 seconds apart. The only meaningful difference is metadata: PR #11 has the richer description (composition diagram, architecture explanation, hyperframes-composition note), and its head is more recent — its last commit 8719c082 (Apr 28 14:46Z) merges PR #12's branch into this one to consolidate. Recommendation: land #11 (or close both and rebase a single fresh PR), close #12 as duplicate. No issue is linked; please add one (or note that the work follows the implicit non-web-surfaces direction).

Test plan: every checkbox in this PR's description is [ ] unchecked — including pnpm typecheck / pnpm build / smoke test. PR #12 reports those green ([x]). Worth either copying #12's test results across or actually re-running before merge so reviewers can see green on the chosen PR.

Core architecture is sound. The media/models.ts ↔ daemon/media-models.js registry, the media-contract.ts pinned-last-prompt, the od media generate dispatcher, and the OD_BIN/OD_PROJECT_ID env injection compose cleanly. Stub providers emit valid byte signatures (PNG, mp4 ftyp, mp3 frame, RIFF WAV) so the round-trip is testable without API keys.

Top concerns (inline below): (1) hand-mirrored registry has no enforcement that JS/TS stay in sync — the comment promises 'tests in verify' but I see none in the diff; (2) POST /api/projects/:id/media/generate has no auth/rate-limit and accepts an agent-supplied output filename (sanitized but unbounded re-writes); (3) the env-injection only covers the spawn path — confirm no other agent-spawn site is missed; (4) the contract's stub-provider disclaimer can mislead users into thinking they got real bytes; (5) prompt-side metadata duplicates the contract's 'no ' rule three times; (6) i18n is bilingual-complete (good).

lefarcen

Small correction to my earlier review.

I claimed this PR's body is richer than #12's — that's backwards. #12's description is the more detailed one (ASCII architecture diagram, file-by-file breakdown of frontend / daemon / skills sections, explicit composition note about the upcoming hyperframes worktree). This PR's body is the shorter "skill workflow + project metadata tells the agent WHAT, od media generate is HOW" framing plus a provider-stub disclaimer.

The recommendation still stands: keep this PR (#11) as the keeper because its HEAD 8719c082 is a merge of #12's branch into this one — i.e. this branch is the consolidated history, and #12 is the one to close. But consider lifting #12's body onto this PR before closing, since it's a better artifact for future archaeology.

All the technical concerns from my earlier review (registry mirror without a sync test, --output overwrite path, /api/projects/:id/media/generate rate-limit / size-cap / CORS posture, OD_DAEMON_URL hard-coded loopback, stub-provider disclaimer not flowing through to the user, anti-<artifact> rule duplicated four times) all still apply.

Apologies for the body-comparison slip.

…t dedupe) - Surface-aware model validation in generateMedia: reject mismatched (surface, audioKind, model) tuples up-front so an audio model id can no longer route through the image path. - Drop hidden designSystemId / inspirations when the New Project panel surface is image / video / audio so a stale web-tab pick can't bleed into media projects (the picker is hidden, so users couldn't see or clear it). - Single source of truth for the media model registry: src/media/ models.data.json, consumed by both src/media/models.ts and the daemon's media-models.js. No more hand-mirrored arrays drifting. - Collision-safe writes: generateMedia auto-suffixes poster.png -> poster-2.png on filename collision instead of silently clobbering an existing artifact. - Harden /api/projects/:id/media/generate: - 64KB body cap dispatched at the global JSON parser (vs 4MB elsewhere) - explicit project-id regex check (with decode round-trip) - reject cross-origin POSTs whose Origin header does not match the daemon - cap prompt / output / voice string lengths inside generateMedia - distinguish 413 from 400 in the route handler - Derive OD_DAEMON_URL from a single DAEMON_HOST constant shared with the listen() bind, so changing the bind host can't drift the agent's callback URL silently. - Add a SOLE-spawn-site comment so future agent-launch paths don't forget the OD env injection block. - New workflow rule #7 in MEDIA_GENERATION_CONTRACT: agent must surface stub providerNote ("stub-png", "stub-mp4", ...) to the user rather than narrating placeholder bytes as a real generation. - Drop the duplicated "Do NOT emit <artifact>" lines from each per- surface metadata block in renderMetadataBlock — the canonical rule lives only in the contract block now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lefarcen · 2026-04-29T08:17:48Z

👋 Thanks @pftom — extending OD to image/video/audio with a single od media generate contract is a really clean architectural bet. The single-source-of-truth registry (src/media/models.ts) consumed by both picker and dispatcher, and the media-contract pinned LAST so it overrides earlier prompt layers, are both nicely thought out. 🎨🙏

This is the keeper of the two media PRs (its HEAD is the merge of #12 into this branch).

Inline concerns are mostly the "before real providers land" hardening:

⚠️ Stub providers ship unconditionally — would gate behind OD_MEDIA_ALLOW_STUBS for prod
🔒 POST /api/projects/:id/media/generate has no rate-limit / size-cap; agent-supplied --output can clobber existing files (no overwrite:false guard)
💡 daemon/media-models.js mirrors src/media/models.ts by hand — codegen or a sync test would prevent drift

Architecture is the right shape. ✅

lefarcen

Approved.

lefarcen

Hey @pftom! 🎉 Wow, this is a substantial feature — adding image/video/audio surfaces with a clean, tool-agnostic contract. The od media generate dispatcher is elegant, and I love how it works across any code-agent CLI without custom tool definitions. Pinning the contract LAST in the system prompt is clever — hard rules win.

Found 8 items worth attention (mix of P2 verification + P3 polish). Most are edge-case hardening + reasoning gaps in the new SKILL.md files. No P1 blockers.

See inline comments below 👇

lefarcen · 2026-05-02T03:23:40Z

+  const ctx = {
+    surface,
+    model,
+    prompt: prompt || '',


P2 TOCTOU race: uniqueFilename checks await pathExists(target), then later await writeFile(target, bytes). If two concurrent od media generate calls pick the same name (same timestamp), both see "file doesn't exist" and write to the same path — second one silently overwrites the first. Rare in practice (requires sub-millisecond collision), but the function comment promises collision safety. Fix: use fs.promises.open(target, 'wx') (exclusive write) and catch EEXIST to retry.

lefarcen · 2026-05-02T03:23:40Z

+
+  const body = {
+    surface,
+    model: flags.model,


P3 Missing validation: the CLI parses --length and --duration as Number(flags.length) but doesn't validate they're positive integers. A malicious/confused agent could pass --length=-5 or --length=banana, silently getting NaN in the POST body. The dispatcher checks typeof but Number('banana') is number (albeit NaN). Suggest: const len = Number(flags.length); if (!Number.isFinite(len) || len <= 0) { console.error('--length must be a positive number'); process.exit(2); }

lefarcen · 2026-05-02T03:23:40Z

+        const expectedLocal = `http://localhost:${port}`;
+        if (origin !== expected && origin !== expectedLocal) {
+          return res.status(403).json({ error: 'cross-origin denied' });
+        }


P3 Comment stale: says "The 64kb body-size cap for this route is applied by the dispatching JSON middleware in startServer() above" — but the middleware is 100+ lines earlier and not obviously "above" when reading this route. Consider: // Body size cap: see the jsonSmall middleware ~100 lines up, applied per-route before parsing.

lefarcen · 2026-05-02T03:23:40Z

+- \`OD_DAEMON_URL\`  — base URL of the local daemon, e.g. \`http://127.0.0.1:7456\`.
+
+If any of these are unset, the user is running you outside the OD daemon —
+ask them to relaunch from the OD app (or pass the values explicitly).


P3 Reasoning gap (Lens B): the contract says "verify with `echo`" but doesn't show how (new users might type echo OD_BIN instead of echo \"$OD_BIN\"). Add one concrete example: (verify with \echo "$OD_PROJECT_ID"` — it should print the project UUID)`

lefarcen · 2026-05-02T03:23:40Z

+
+`audioKind`, `audioModel`, `audioDuration` (seconds), and (for speech)
+`voice`. Branch by `audioKind` and use the values verbatim — no
+clarifying form unless something is marked `(unknown — ask)`.


P2 Reasoning gap (Lens B — unstated assumption): Step 0 says "use the values verbatim" but Step 2 says "Compose the prompt ... Use the format the upstream model prefers." What if the metadata's audioModel is suno-v5 but the user's chat message says "make it Udio style"? The skill doesn't say which wins. Suggest adding a tiebreaker rule: "Metadata is authoritative unless the user's current message explicitly contradicts it (e.g. 'switch to Udio')." (This matches the contract's intent but the skill should state it.)

lefarcen · 2026-05-02T03:23:40Z

+3. **Palette + textures** — hex anchors when the user gave a brand
+   palette; otherwise a 3-word mood tag (e.g. "muted ochre + ink").
+4. **Camera / lens** — only if the user wants photographic realism
+   ("85mm portrait, shallow DOF") or a specific film stock.


P3 Reasoning gap (Lens B — quantification missing): Step 1 prescribes a 5-point prompt structure but doesn't say how long each section should be. A junior user might write 2 sentences per point = 10-sentence prompt = way over the model's comfort zone. Add a rough token budget: e.g. "Aim for 1-2 sentences per point; total ~100-150 words. Longer prompts don't improve quality for most image models."

lefarcen · 2026-05-02T03:23:40Z

+
+`videoModel`, `videoLength` (seconds), `videoAspect`. These are
+hard-locks — clamp the prompt to whatever the chosen model supports
+(Seedance 2 caps at 10s; Kling 4 supports up to 10s + image-to-video;


P2 Reasoning gap (Lens B — failure mode not covered): Step 1 has a shotlist table with "Motion" = "What moves, at what pace? Subject motion vs camera motion." — but doesn't warn that most current text-to-video models struggle with complex multi-object motion. A user planning "character walks left while car drives right while leaves blow" will get disappointing results. Add a constraint note: "Current models (Seedance 2, Kling 3/4, Veo 3) handle 1-2 motion elements well; 3+ often drift or freeze. Prioritize the key motion."

lefarcen · 2026-05-02T03:23:40Z

+        <label className="newproj-label">{t('newproj.videoLengthLabel')}</label>
+        <div className="pill-grid">
+          {lengths.map((s) => (
+            <button


P3 (minor): pickDefaultSkill logic prefers s.surface === surface && s.mode === surface, falls back to s.mode === surface. The comment says "legacy skills authored without `surface` still get picked up" — but this assumes mode was always set correctly. What if a skill has mode: 'prototype', surface: 'image' (authoring error)? It would never match the image surface (first condition fails on mode, second condition fails on mode !== 'image'). Not a real-world issue today (no such skills ship), but the fallback could be more robust: const modeMatch = skills.find(s => s.mode === surface || s.surface === surface); (match either field).

Dismissed — accidental empty approval; defer to the prior COMMENTED review.

pftom added 12 commits April 28, 2026 14:48

Merge branch 'feat/optimize-naming' of github.com:nexu-io/open-design…

490bbe2

… into feat/optimize-naming

Merge branch 'main' of github.com:nexu-io/open-design

1337907

Merge branch 'main' into feat/optimize-naming

19b5272

Merge remote-tracking branch 'origin/main' into cursor/47ca13ab

bc7c057

Merge remote-tracking branch 'origin/main' into cursor/289994c1

0b61be5

pftom marked this pull request as ready for review April 28, 2026 14:44

Merge PR #12 (cursor/47ca13ab) into cursor/289994c1

8719c08

Bring in the parallel media-surfaces branch from PR #12. Tree is already identical to HEAD (same od media generate work landed independently), so this is a history-only merge to consolidate the two branches.

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread daemon/media.js Outdated

Comment thread src/components/NewProjectPanel.tsx

lefarcen reviewed Apr 29, 2026

View reviewed changes

Comment thread daemon/media-models.js Outdated

Comment thread daemon/media.js Outdated

Comment thread daemon/server.js

Comment thread daemon/server.js

Comment thread src/prompts/media-contract.ts Outdated

Comment thread src/prompts/system.ts Outdated

lefarcen mentioned this pull request Apr 29, 2026

feat(media): image / video / audio surfaces with unified od media generate dispatcher #12

Merged

8 tasks

lefarcen reviewed Apr 29, 2026

View reviewed changes

lefarcen added the enhancement New feature or request label Apr 29, 2026

lefarcen previously approved these changes May 2, 2026

View reviewed changes

lefarcen added the feature New feature or enhancement label May 2, 2026

lefarcen self-requested a review May 2, 2026 03:21

lefarcen reviewed May 2, 2026

View reviewed changes

Conversation

pftom commented Apr 28, 2026

Summary

Changes

Frontend

Shared registry

Prompts

Daemon

Skills

Provider note

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

lefarcen left a comment

Choose a reason for hiding this comment

Review summary (COMMENT — not approving)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lefarcen left a comment

Choose a reason for hiding this comment

Uh oh!

lefarcen commented Apr 29, 2026

Uh oh!

lefarcen left a comment

Choose a reason for hiding this comment

Uh oh!

lefarcen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants