Skip to content

feat(compile): #5731 — embed static assets/files into standalone executables#5754

Open
proggeramlug wants to merge 5 commits into
mainfrom
worktree-issue-5731-embed-assets
Open

feat(compile): #5731 — embed static assets/files into standalone executables#5754
proggeramlug wants to merge 5 commits into
mainfrom
worktree-issue-5731-embed-assets

Conversation

@proggeramlug

@proggeramlug proggeramlug commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Closes #5731.

Adds a Bun-style embed pipeline so perry compile can bake static files (an SPA dist/, images, JSON, fonts, …) into the binary and serve them from memory — no dist/ folder required at runtime.

vite build
perry compile server.ts --embed "./dist/**" -o myapp
./myapp   # serves dist/ from memory

What's included

Compile-time

  • --embed <pattern> CLI flag — repeatable; files, directories, or */** globs relative to the project root. Merged with perry.embed (package.json) and [compile] embed (perry.toml).
  • compile/embed.rs expands patterns (walkdir + a small **/*/? matcher) and emits a C object whose constructor(101) registers each file's bytes into the runtime registry before main; the .o is appended to the link line (same proven seam the old embedded-JS object used).

Runtime (perry-runtime/src/embedded.rs)

  • Process-global, never-freed registry (mirrors the shared_sab pattern). Bytes are &'static slices into the binary's .rodata — no copy.
  • node:fs VFS: readFileSync/readFile (text + binary) and existsSync resolve $perryfs/<path> virtual paths (and bare embed keys) from the registry before touching disk — one interception point in read_file_bytes_with_options.
  • FFI symbols kept from dead-strip via #[used] typed fn-pointer statics.

perry builtin module

import { embeddedFiles, readEmbedded, isStandaloneExecutable } from "perry";
import { readFileSync } from "fs";

for (const f of embeddedFiles()) {            // { name, size, type }
  app.get("/" + f.name, (_, r) => r.type(f.type).send(readEmbedded(f.name)));
}
const html = readFileSync("$perryfs/dist/index.html", "utf8");

Design note

embeddedFiles is exposed as a function (embeddedFiles()), not a bare value like Bun's embeddedFiles. A native-module value binding lowers x.map(...) as a namespace dispatch (perry.map), which breaks array methods. Returning a real array from a call sidesteps that and is the architecture-aligned shape. readEmbedded(path) and node:fs accept either the $perryfs/<path> virtual path or the embed-relative key.

Testing

  • Unit tests: glob matcher / asset collection (perry), registry + MIME table (perry-runtime).
  • End-to-end test (crates/perry/tests/issue_5731_embedded_assets.rs): compiles a program with --embed "./dist/**", runs it, and asserts read-back via embeddedFiles(), readEmbedded(), and node:fs. Verified locally — byte-identical expected output.
  • API docs regenerated from the manifest (api-docs-drift).

Scope / follow-ups

  • Embedding currently uses host cc (native-host builds), matching the existing embedded-object generator; cross-target embedding is a follow-up.
  • Import-attribute form (import x from "./x.png" with { type: "file" }) and perry/static serveEmbeddedDir helper were deliberately left for a follow-up — the primitives here already support the SPA-server use case from the issue.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added static asset embedding to standalone builds via --embed, perry.embed, and [compile].embed.
    • Introduced the perry runtime APIs: embeddedFiles(), readEmbedded(path), and isStandaloneExecutable (including $perryfs/... access and deterministic embedded listing).
  • Bug Fixes
    • Synchronous file reads and existence checks now resolve embedded assets before falling back to disk (and ignore unresolved virtual $perryfs/... paths).
  • Documentation
    • Documented the embedding workflow, precedence behavior, and host/toolchain constraints; updated API/type references.
  • Tests
    • Added an integration test validating embedded metadata, reads, and missing-asset errors.

…utables

Add a Bun-style embed pipeline so `perry compile` can bake static files
(an SPA `dist/`, images, JSON, fonts, …) into the binary and serve them
from memory with no external files on disk.

Compile-time:
- `--embed <pattern>` CLI flag (repeatable; files, directories, or `*`/`**`
  globs relative to the project root), merged with `perry.embed`
  (package.json) and `[compile] embed` (perry.toml).
- `compile/embed.rs` expands patterns (walkdir + a small `**`/`*`/`?` glob
  matcher), then emits a C object whose `constructor(101)` registers each
  file's bytes (binary-safe octal-escaped literals) into the runtime
  registry before `main`. The `.o` is appended to the link line.

Runtime (`perry-runtime/embedded.rs`):
- Process-global, never-freed registry (mirrors the `shared_sab` pattern)
  populated by `js_register_embedded_asset`. Bytes are `&'static` slices
  into the binary's .rodata — no copy.
- node:fs VFS: `readFileSync`/`readFile` (text + binary) and `existsSync`
  resolve `$perryfs/<path>` virtual paths (and bare embed keys) from the
  registry before touching disk — single interception point in
  `read_file_bytes_with_options`.
- FFI symbols kept from dead-strip via `#[used]` typed fn-pointer statics.

`perry` builtin module:
- `embeddedFiles()` → array of `{ name, size, type }` (MIME inferred).
  Exposed as a function, not a bare value like Bun's `embeddedFiles`, so
  array methods dispatch on the result (a native value binding lowers
  `x.map(...)` as a namespace call `perry.map`).
- `readEmbedded(path)` → `Buffer`; `isStandaloneExecutable` → `true`.
- Registered in NATIVE_MODULES/RUNTIME_ONLY_MODULES + the API manifest;
  dispatch rows + extern decls in perry-codegen; `types/perry/index.d.ts`.

Tests: unit tests for the glob matcher/collection and the registry/MIME
table; an end-to-end test that compiles with `--embed`, runs, and asserts
read-back via `embeddedFiles()`, `readEmbedded()`, and node:fs.

Docs: CLI flags page + regenerated API reference/d.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GbPx8kvGXxTaPNHXwYVhn6
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds standalone-executable asset embedding through perry compile --embed, runtime lookup and introspection APIs, $perryfs/ filesystem support, and generated docs/types for the new perry module surface.

Changes

Embedded Assets Feature

Layer / File(s) Summary
API manifest and public bindings
crates/perry-api-manifest/src/entries.rs, crates/perry-api-manifest/src/entries/part_1.rs, crates/perry-codegen/src/lower_call/native_table/thread_lodash.rs, crates/perry-codegen/src/runtime_decls/stdlib_ffi/third_party.rs, types/perry/index.d.ts, docs/api/perry.d.ts
Registers perry in the API manifest, adds readEmbedded, embeddedFiles, and isStandaloneExecutable entries, defines native signature and FFI declarations, and publishes TypeScript declarations for the new module surface.
Embedded registry and runtime entrypoints
crates/perry-runtime/src/lib.rs, crates/perry-runtime/src/embedded.rs, crates/perry-runtime/src/object/native_module.rs
Adds the embedded asset registry, path normalization, registration and lookup helpers, runtime accessors for embedded files and standalone detection, symbol retention, and the isStandaloneExecutable native-module branch.
Compile-time asset resolution and object generation
crates/perry/src/commands/compile.rs, crates/perry/src/commands/compile/embed.rs
Adds the compile-time embed submodule, collects patterns from CLI and config files, expands files/directories/globs, deduplicates and sorts matches, and emits a C object that registers embedded assets at startup.
Compile pipeline wiring and cache key updates
crates/perry/src/commands/compile/types.rs, crates/perry/src/commands/compile/run_pipeline.rs, crates/perry/src/commands/dev.rs, crates/perry/src/commands/run/mod.rs, crates/perry/src/commands/compile/build_cache.rs
Adds the repeatable --embed option, tracks resolved embedded assets in compilation state, injects embed object generation into the compile pipeline, defaults embed arguments in dev and run call sites, and includes embedded asset contents in build-cache hashing.
Virtual filesystem interception
crates/perry-runtime/src/fs/mod.rs
Updates synchronous file reads and existence checks to resolve embedded assets before disk access and to treat unresolved virtual paths as missing.
Integration test and documentation
crates/perry/tests/issue_5731_embedded_assets.rs, docs/src/cli/flags.md, docs/src/api/reference.md
Adds an integration test for embedded assets, documents --embed and $perryfs/ usage, and refreshes the generated API reference for the new perry module.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CompileCommand as perry compile
  participant EmbedModule as embed.rs
  participant CCompiler as cc
  participant Runtime as perry-runtime
  participant FsModule as fs/mod.rs
  participant JSApp as JS runtime code

  User->>CompileCommand: perry compile --embed "./dist/**"
  CompileCommand->>EmbedModule: resolve_embedded_assets(...)
  EmbedModule-->>CompileCommand: [(name, abs_path), ...]
  CompileCommand->>EmbedModule: generate_embedded_asset_object(...)
  EmbedModule->>CCompiler: compile generated C
  CCompiler-->>CompileCommand: embedded-assets object
  CompileCommand->>CompileCommand: link object into binary

  Runtime->>Runtime: register embedded assets at startup

  JSApp->>Runtime: embeddedFiles()
  Runtime-->>JSApp: [{name, size, type}, ...]
  JSApp->>Runtime: readEmbedded(path)
  Runtime-->>JSApp: Buffer or Error
  JSApp->>FsModule: fs.readFileSync("$perryfs/...")
  FsModule->>Runtime: lookup embedded bytes
  Runtime-->>FsModule: bytes
  FsModule-->>JSApp: file contents
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • PerryTS/perry#5416: Also changes crates/perry/src/commands/compile/build_cache.rs, so it is directly related to the build-cache key path touched here.

Poem

🐇 I packed the bytes in binary light,
And hid dist/ safely out of sight.
$perryfs/ hums, the assets bloom,
readEmbedded() opens the room.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: embedding static assets into standalone executables.
Description check ✅ Passed The description covers the summary, concrete changes, related issue, testing, and follow-ups, matching the template well.
Linked Issues check ✅ Passed The PR implements compile-time embedding, runtime file access, node:fs integration, and standalone detection required by #5731.
Out of Scope Changes check ✅ Passed The changes appear focused on the embedding feature, tests, docs, and supporting cache logic with no clear unrelated additions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-issue-5731-embed-assets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry-runtime/src/embedded.rs`:
- Around line 216-238: `js_perry_read_embedded` currently returns a null pointer
for missing assets, which conflicts with the published `readEmbedded()`
contract. Update the behavior in `js_perry_read_embedded` to match the exposed
`perry` API: either throw an error when `lookup(&path)` fails, or change the
public typings/docs for `readEmbedded` to `Buffer | null` and keep all callers
in sync. Make sure the chosen contract is reflected consistently across the
runtime binding and the TypeScript/docs surface.
- Around line 48-53: Normalize backslash-separated paths in normalize_key()
before looking up embedded registry keys, since the current stripping logic only
handles $perryfs/ and ./ and misses Windows-style inputs like dist\index.html or
$perryfs\dist\index.html. Update the path normalization flow in
crates/perry-runtime/src/embedded.rs so embedded_fs/readEmbedded callers get the
same slash-based key regardless of separator style, while preserving the
existing VIRTUAL_PREFIX and leading ./ handling.

In `@crates/perry-runtime/src/fs/mod.rs`:
- Around line 435-443: The embedded-path handling in fs::mod::lookup should not
use is_embedded_path() as a presence test, because it matches any $perryfs/...
string. Update the read/existence flow around lookup(&path_str) so a successful
embedded registry lookup is the only condition that short-circuits to embedded
bytes, and an unresolved $perryfs/... path is treated as missing instead of
falling through to disk. Also apply the same change in the other matching branch
mentioned in the diff so existsSync/readFileSync behavior stays consistent.

In `@crates/perry/src/commands/compile/embed.rs`:
- Around line 210-221: The relative_name helper currently accepts any path that
lexically strips the project_root, so escaped inputs like ../secret.txt can
still become embed keys. Update relative_name (and its caller path handling in
crates/perry/src/commands/compile/embed.rs) to canonicalize/resolve the
candidate path against project_root and explicitly reject any path that is not
within the project root before generating the embedded name. Use the existing
relative_name function and related embed-path logic to centralize the check and
return None or an error for out-of-root paths.

In `@crates/perry/src/commands/compile/run_pipeline.rs`:
- Around line 4929-4940: The embedded asset set is only resolved in run_pipeline
after the build-cache early return, so changes to embedded files can reuse a
stale cached executable. Update the build-cache key/inputs in run_pipeline by
incorporating the resolved output of embed::resolve_embedded_assets and the
contents of each embedded asset, or otherwise prevent cache hits whenever
embedding is enabled. Make sure the object produced by
embed::generate_embedded_asset_object is represented by a stable fingerprint
instead of None so cache invalidation works correctly.

In `@types/perry/index.d.ts`:
- Around line 20-27: The JSDoc for embeddedFiles currently promises insertion
order, but the actual packaging behavior is de-duplicated and sorted, so the
contract is misleading. Update the description in the embeddedFiles declaration
in types/perry/index.d.ts to reflect the real deterministic ordering used by the
executable, or remove the ordering claim entirely while keeping the notes about
returning a fresh array and the embeddedFiles() API shape.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 00fa1895-6f12-4486-bc7b-9364745ce4a4

📥 Commits

Reviewing files that changed from the base of the PR and between e379811 and 843792a.

📒 Files selected for processing (19)
  • crates/perry-api-manifest/src/entries.rs
  • crates/perry-api-manifest/src/entries/part_1.rs
  • crates/perry-codegen/src/lower_call/native_table/thread_lodash.rs
  • crates/perry-codegen/src/runtime_decls/stdlib_ffi/third_party.rs
  • crates/perry-runtime/src/embedded.rs
  • crates/perry-runtime/src/fs/mod.rs
  • crates/perry-runtime/src/lib.rs
  • crates/perry-runtime/src/object/native_module.rs
  • crates/perry/src/commands/compile.rs
  • crates/perry/src/commands/compile/embed.rs
  • crates/perry/src/commands/compile/run_pipeline.rs
  • crates/perry/src/commands/compile/types.rs
  • crates/perry/src/commands/dev.rs
  • crates/perry/src/commands/run/mod.rs
  • crates/perry/tests/issue_5731_embedded_assets.rs
  • docs/api/perry.d.ts
  • docs/src/api/reference.md
  • docs/src/cli/flags.md
  • types/perry/index.d.ts

Comment thread crates/perry-runtime/src/embedded.rs
Comment thread crates/perry-runtime/src/embedded.rs
Comment thread crates/perry-runtime/src/fs/mod.rs
Comment thread crates/perry/src/commands/compile/embed.rs
Comment on lines +4929 to +4940
// #5731 — embed static assets into the binary. Merge `--embed` with
// `perry.embed` / `[compile] embed`, expand globs/directories relative to
// the project root, and emit a registration object linked alongside the
// user objects. The runtime serves these via `perry` / node:fs at runtime.
let embedded_assets = embed::resolve_embedded_assets(&args.embed, &project_root)?;
if !embedded_assets.is_empty() {
if let Some(obj) =
embed::generate_embedded_asset_object(&embedded_assets, &object_output_dir)?
{
obj_cleanup_paths.push(obj.clone());
obj_paths.push(obj);
obj_fingerprints.push(None);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Embedded assets don't invalidate the build cache.

Line 4933 resolves the asset set only after the build-cache early-return path has already run, and Line 4940 records the generated object with a None fingerprint. A change to dist/index.html can therefore reuse a cached executable that still contains the old bytes.

Hash the resolved asset list + contents into the build-cache inputs, or disable build-cache hits when embedding is enabled.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry/src/commands/compile/run_pipeline.rs` around lines 4929 - 4940,
The embedded asset set is only resolved in run_pipeline after the build-cache
early return, so changes to embedded files can reuse a stale cached executable.
Update the build-cache key/inputs in run_pipeline by incorporating the resolved
output of embed::resolve_embedded_assets and the contents of each embedded
asset, or otherwise prevent cache hits whenever embedding is enabled. Make sure
the object produced by embed::generate_embedded_asset_object is represented by a
stable fingerprint instead of None so cache invalidation works correctly.

Comment thread types/perry/index.d.ts
- readEmbedded: throw a catchable `Error` on a missing asset instead of
  returning a null pointer — NR_PTR NaN-boxes the raw pointer, so null
  surfaced as a bogus object rather than `null`. Keeps the `Buffer`
  contract and matches Node fs "not found" semantics.
- fs VFS: use `lookup()` (registry membership), not the prefix test, as the
  existence/short-circuit condition. An unresolved `$perryfs/...` path is now
  treated as missing for readFileSync/existsSync instead of falling through
  to a literal disk read / falsely reporting it exists. `is_embedded_path`
  → `is_virtual_path` (pure prefix test, used only to mark "missing").
- normalize_key: fold `\` to `/` so Windows-style inputs
  (`$perryfs\dist\index.html`, `dist\index.html`) match the `/`-joined keys.
- embed::relative_name: canonicalize candidate + project root before
  stripping so a pattern that escapes the root (`--embed ../secret.txt`,
  symlink) is rejected rather than producing a `../`-laden key.
- build cache: fold the resolved embedded-asset set + file contents into
  `args_key`, so editing an embedded file (no pattern change) invalidates
  the cached executable.
- docs/typings: `embeddedFiles()` documented as sorted (not insertion
  order); `readEmbedded` documented as throwing on miss.
- tests: cover backslash lookup, unresolved-virtual miss, existsSync(false),
  and readEmbedded throw.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GbPx8kvGXxTaPNHXwYVhn6

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/perry-runtime/src/embedded.rs (1)

243-250: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Throw on oversized assets or Buffer allocation failure.

bytes.len() as i32 can wrap for large embedded assets, and js_buffer_alloc can still return null; returning that null pointer violates the same NR_PTR/Buffer contract handled above.

Proposed fix
-fn throw_embed_not_found(path: &str) -> ! {
-    let message = format!("No embedded asset found for path: {path}");
+fn throw_embed_error(message: String) -> ! {
     let msg = crate::string::js_string_from_bytes(message.as_ptr(), message.len() as u32);
     let err = crate::error::js_error_new_with_message(msg);
     crate::exception::js_throw(js_nanbox_pointer(err as i64))
 }
+
+fn throw_embed_not_found(path: &str) -> ! {
+    throw_embed_error(format!("No embedded asset found for path: {path}"))
+}
@@
     let Some(bytes) = lookup(&path) else {
         throw_embed_not_found(&path);
     };
+    if bytes.len() > i32::MAX as usize {
+        throw_embed_error(format!("Embedded asset is too large to read as Buffer: {path}"));
+    }
     unsafe {
         let buf = crate::buffer::js_buffer_alloc(bytes.len() as i32, 0);
-        if !buf.is_null() {
-            let buf_data = (buf as *mut u8).add(std::mem::size_of::<crate::buffer::BufferHeader>());
-            std::ptr::copy_nonoverlapping(bytes.as_ptr(), buf_data, bytes.len());
-            (*buf).length = bytes.len() as u32;
+        if buf.is_null() {
+            throw_embed_error(format!("Failed to allocate Buffer for embedded asset: {path}"));
         }
+        let buf_data = (buf as *mut u8).add(std::mem::size_of::<crate::buffer::BufferHeader>());
+        std::ptr::copy_nonoverlapping(bytes.as_ptr(), buf_data, bytes.len());
+        (*buf).length = bytes.len() as u32;
         buf
     }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry-runtime/src/embedded.rs` around lines 243 - 250, The buffer
allocation path in embedded asset handling can silently wrap large sizes and
return a null pointer, violating the existing NR_PTR/Buffer contract. Update the
allocation logic in the embedded asset code around js_buffer_alloc so it
validates bytes.len() before casting to i32 and throws an error for oversized
assets, and also treat a null result from js_buffer_alloc as a failure instead
of returning it. Keep the behavior aligned with the surrounding buffer creation
and error-handling flow in embedded.rs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry/src/commands/compile/build_cache.rs`:
- Around line 463-467: The cache key generation in the build cache path is using
only sentinel bytes around embedded asset bytes, which can collide when the
payload contains the same markers. Update the hashing logic in the asset
embedding branch of build_cache to length-prefix the file contents before
feeding them into the hasher, using the existing bytes read from fs::read(path)
and the surrounding embed-data tag so different asset sets always produce
distinct cache inputs.

---

Outside diff comments:
In `@crates/perry-runtime/src/embedded.rs`:
- Around line 243-250: The buffer allocation path in embedded asset handling can
silently wrap large sizes and return a null pointer, violating the existing
NR_PTR/Buffer contract. Update the allocation logic in the embedded asset code
around js_buffer_alloc so it validates bytes.len() before casting to i32 and
throws an error for oversized assets, and also treat a null result from
js_buffer_alloc as a failure instead of returning it. Keep the behavior aligned
with the surrounding buffer creation and error-handling flow in embedded.rs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2cb4ebad-ab94-4077-921a-40900bb3012d

📥 Commits

Reviewing files that changed from the base of the PR and between 843792a and 04e8349.

📒 Files selected for processing (6)
  • crates/perry-runtime/src/embedded.rs
  • crates/perry-runtime/src/fs/mod.rs
  • crates/perry/src/commands/compile/build_cache.rs
  • crates/perry/src/commands/compile/embed.rs
  • crates/perry/tests/issue_5731_embedded_assets.rs
  • types/perry/index.d.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • types/perry/index.d.ts
  • crates/perry/tests/issue_5731_embedded_assets.rs
  • crates/perry-runtime/src/fs/mod.rs
  • crates/perry/src/commands/compile/embed.rs

Comment thread crates/perry/src/commands/compile/build_cache.rs Outdated
@proggeramlug proggeramlug marked this pull request as draft June 28, 2026 13:31
Ralph Küpper added 2 commits June 28, 2026 17:46
Addresses the cross-platform audit of the embed feature:

- Encode asset bytes with a module-level `.incbin` instead of a single
  octal-escaped C string literal. The file is referenced, not re-encoded,
  so a multi-MB binary asset costs ~6 lines of source and compiles in
  constant time (the old ~4x octal blow-up made a single huge string
  literal slow on clang and unbuildable past MSVC's per-literal cap).
  Length is recovered as a link-time constant (end - start label). Picks
  the Mach-O (`_`-prefixed symbols, `__TEXT,__const`) vs ELF (`.rodata`)
  pair from the host cfg.
- Fail `--embed` loudly on a Windows host: the `cc`-built object can't be
  consumed by MSVC `link.exe`, and `cl.exe` supports neither
  `__attribute__((constructor))` nor `.incbin`. Windows/cross-target embed
  is a tracked follow-up.
- Guard the e2e test with `#![cfg(not(windows))]` to match.
- readEmbedded: reject assets >= 2 GiB rather than wrap the i32 Buffer
  length to a negative/garbage size.
- build cache: key embedded assets on name + size + mtime instead of
  re-reading and hashing full contents on every compile.
- docs: note that node:fs / bare keys resolve embedded bytes before disk
  (use $perryfs/ or an absolute path to disambiguate), and that embedding
  is Unix-host-only for now.
@proggeramlug proggeramlug marked this pull request as ready for review June 28, 2026 16:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/perry/src/commands/compile/build_cache.rs`:
- Around line 465-479: The cache key generation in build_cache should fail
closed when embedded asset resolution or fingerprinting cannot be completed.
Update the logic around resolve_embedded_assets and the per-asset
fs::metadata/mtime handling so errors are propagated or converted into a
cache-invalidation path instead of silently skipping embed inputs; this should
align with run_pipeline and prevent stale manifests from being reused when
perry.embed or [compile] embed is broken. Use the existing build_cache hashing
flow and the embed resolution/fingerprinting block to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 07d93a91-96cc-40f8-b242-84713a808d1f

📥 Commits

Reviewing files that changed from the base of the PR and between 04e8349 and 2a8edb2.

📒 Files selected for processing (5)
  • crates/perry-runtime/src/embedded.rs
  • crates/perry/src/commands/compile/build_cache.rs
  • crates/perry/src/commands/compile/embed.rs
  • crates/perry/tests/issue_5731_embedded_assets.rs
  • docs/src/cli/flags.md
✅ Files skipped from review due to trivial changes (1)
  • docs/src/cli/flags.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/perry/tests/issue_5731_embedded_assets.rs
  • crates/perry/src/commands/compile/embed.rs
  • crates/perry-runtime/src/embedded.rs

Comment thread crates/perry/src/commands/compile/build_cache.rs Outdated
… build-cache key

Address CodeRabbit: args_key() swallowed embed-resolution and per-asset
fs::metadata failures via if-let, silently dropping the embed inputs and
falling back to the non-embed cache key. Since run_pipeline treats
resolve_embedded_assets as fatal (the `?` at its embed step), a broken
perry.embed / [compile] embed config — or a vanished/unstat-able file —
could hit a stale manifest and return an old binary instead of surfacing
the error.

Fold a sentinel field (embed-resolve-error / embed-stat-error /
embed-mtime-unavailable) on every failure path so the key diverges from
any successful build, forcing a cache miss → rebuild → real error.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/perry/src/commands/compile/build_cache.rs (1)

460-490: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Hash embedded bytes instead of only size and mtime.

Line 481 through Line 490 can reuse a stale cached binary when an embedded asset’s contents change without changing size or mtime. Since these bytes are baked into the output, the cache key should include the asset bytes, matching the compile-cache contract.

Proposed fix
-                match fs::metadata(path) {
-                    Ok(meta) => {
-                        hash_field(&mut hasher, "embed-size", &meta.len().to_string());
-                        match meta
-                            .modified()
-                            .ok()
-                            .and_then(|m| m.duration_since(std::time::UNIX_EPOCH).ok())
-                        {
-                            Some(dur) => hash_field(
-                                &mut hasher,
-                                "embed-mtime",
-                                &format!("{}.{:09}", dur.as_secs(), dur.subsec_nanos()),
-                            ),
-                            None => hash_field(&mut hasher, "embed-mtime-unavailable", name),
-                        }
-                    }
-                    Err(e) => hash_field(&mut hasher, "embed-stat-error", &format!("{name}: {e}")),
-                }
+                match fs::read(path) {
+                    Ok(bytes) => hash_bytes_field(&mut hasher, "embed-data", &bytes),
+                    Err(e) => hash_field(&mut hasher, "embed-read-error", &format!("{name}: {e}")),
+                }
+fn hash_bytes_field(hasher: &mut Sha256, name: &str, value: &[u8]) {
+    hasher.update(name.as_bytes());
+    hasher.update([0]);
+    hasher.update((value.len() as u64).to_le_bytes());
+    hasher.update([0]);
+    hasher.update(value);
+    hasher.update([0xff]);
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/perry/src/commands/compile/build_cache.rs` around lines 460 - 490, The
cache key logic in the embedded-assets path of build_cache.rs is too weak
because it hashes only embed-name, embed-size, and embed-mtime, which can miss
content-only changes. Update the asset hashing in the match arm that handles
Ok(assets) from resolve_embedded_assets so the cache key incorporates the actual
embedded bytes, not just metadata. Use the existing embed resolution flow and
hash_field helper around the embed-name/embed-size/embed-mtime handling to
locate and replace the stale-freshness signal with content-based hashing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/perry/src/commands/compile/build_cache.rs`:
- Around line 460-490: The cache key logic in the embedded-assets path of
build_cache.rs is too weak because it hashes only embed-name, embed-size, and
embed-mtime, which can miss content-only changes. Update the asset hashing in
the match arm that handles Ok(assets) from resolve_embedded_assets so the cache
key incorporates the actual embedded bytes, not just metadata. Use the existing
embed resolution flow and hash_field helper around the
embed-name/embed-size/embed-mtime handling to locate and replace the
stale-freshness signal with content-based hashing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: db6a4864-d7c3-43b3-8228-ac2c30e789e3

📥 Commits

Reviewing files that changed from the base of the PR and between 2a8edb2 and 72931cf.

📒 Files selected for processing (1)
  • crates/perry/src/commands/compile/build_cache.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Embed assets & files in standalone executables (inspired by Bun)

1 participant