From b09222271a9c6b7b1a1e0f7e2ecdfabed9bd5aee Mon Sep 17 00:00:00 2001 From: Linwei Shang Date: Fri, 29 May 2026 10:59:45 -0400 Subject: [PATCH] refactor(assets-sync): move Content-Type into _headers, drop assets.toml MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `Content-Type` is now declared inside any `_headers` block: the parser routes it to a dedicated `HeaderRule.content_type` field (Mime-validated) and the sync pipeline feeds it into `CreateAssetArguments.content_type` instead of the appended response headers — so the canister still emits exactly one certified Content-Type per response. Removes the `assets.toml` config file introduced in 4788ef6. From a user's mental model Content-Type is just a header, and the only remaining justification for a second config file (sketched v2 fields `ignore` / `encodings` / `allow_raw_access`) is either covered by the build pipeline, satisfied by current defaults, or no longer in scope. Co-Authored-By: Claude Opus 4.7 (1M context) --- ASSETS-TOML.md | 155 --------- Cargo.lock | 8 - Cargo.toml | 1 - README.md | 43 +-- assets-sync/Cargo.toml | 1 - assets-sync/src/asset_config.rs | 318 ------------------ assets-sync/src/glob.rs | 4 +- assets-sync/src/headers.rs | 243 +++++++++++-- assets-sync/src/lib.rs | 1 - assets-sync/src/sync.rs | 62 ++-- assets-sync/tests/bench_sync.rs | 2 +- e2e/tests/fixture/assets-toml/assets.toml | 14 - e2e/tests/fixture/assets-toml/dist/install.sh | 2 - e2e/tests/fixture/assets-toml/dist/llms.txt | 2 - .../headers-content-type/dist/_headers | 4 + .../dist/ic.did | 0 .../dist/index.html | 0 .../icp.yaml | 2 - ...assets_toml.rs => headers_content_type.rs} | 34 +- plugin/README.md | 19 +- plugin/src/lib.rs | 6 - 21 files changed, 287 insertions(+), 634 deletions(-) delete mode 100644 ASSETS-TOML.md delete mode 100644 assets-sync/src/asset_config.rs delete mode 100644 e2e/tests/fixture/assets-toml/assets.toml delete mode 100644 e2e/tests/fixture/assets-toml/dist/install.sh delete mode 100644 e2e/tests/fixture/assets-toml/dist/llms.txt create mode 100644 e2e/tests/fixture/headers-content-type/dist/_headers rename e2e/tests/fixture/{assets-toml => headers-content-type}/dist/ic.did (100%) rename e2e/tests/fixture/{assets-toml => headers-content-type}/dist/index.html (100%) rename e2e/tests/fixture/{assets-toml => headers-content-type}/icp.yaml (87%) rename e2e/tests/{assets_toml.rs => headers_content_type.rs} (64%) diff --git a/ASSETS-TOML.md b/ASSETS-TOML.md deleted file mode 100644 index 8267dd0..0000000 --- a/ASSETS-TOML.md +++ /dev/null @@ -1,155 +0,0 @@ -# `assets.toml` Support — v1: Content-Type Overrides - -## Goal - -Let users declare per-glob `Content-Type` overrides for uploaded assets through an `assets.toml` file. v1 ships **only this one field**; the broader `assets.toml` schema (ignore patterns, encoding overrides, `allow_raw_access`) is sketched in [PROPOSAL-asset-config-refactor.md](PROPOSAL-asset-config-refactor.md) — this plan is the first slice and pins the file format, validation rules, and pipeline integration that subsequent fields will build on. - -## Why content-type first - -The legacy `.ic-assets.json5` workflow allowed users to set `Content-Type` via the same `headers` block as everything else. That worked structurally — the value flowed through `CreateAssetArguments.headers` and got appended to the asset's response — but it produced a **duplicate** `Content-Type` header (the canister always prepends one derived from `CreateAssetArguments.content_type`, then appends user headers without deduping). Behavior across clients is undefined; in practice the user override is often silently ignored. - -`_headers` deliberately rejects `Content-Type` as a header rule for exactly this reason ([assets-sync/src/headers.rs](assets-sync/src/headers.rs)). So content-type needs a different home — one that feeds `CreateAssetArguments.content_type` directly, where the canister's certification machinery picks it up and writes it as the **sole** `Content-Type` of the certified response. That home is `assets.toml`. - -This also closes the only blocker preventing the developer-docs `.ic-assets.json5` from being expressible in the new config: all the `**/*.md`, `**/*.did`, `**/*.sh`, and `llms.txt` rules in that file exist solely to plug gaps in `mime_guess::from_path`. - -## Design choice: override the plugin's mime_guess step - -Today the plugin computes content-type via `mime_guess::from_path(path)` ([assets-sync/src/content.rs:40](assets-sync/src/content.rs#L40)) and stuffs the result into `CreateAssetArguments.content_type`. The canister stores it on the asset and emits exactly one `content-type` header per response ([ic-certified-assets/src/asset.rs:309-310](ic-certified-assets/src/asset.rs#L309-L310)). - -`assets.toml` slots in **before** `mime_guess` in the resolution chain: - -``` -1. assets.toml `[[asset]]` blocks (first matching `content_type` wins) -2. mime_guess::from_path(path) -3. application/octet-stream -``` - -The result still ends up in `CreateAssetArguments.content_type` and is certified by the canister exactly as today. **No canister-side change. No new candid types. No new cert-tree shape.** Same plugin-side-lowering pattern as `_headers`. - -## File format - -A TOML file passed to the plugin through the manifest's `files:` field. The plugin contract ([plugin/wit/sync-plugin.wit](plugin/wit/sync-plugin.wit)) already carries a `files: list` on `SyncExecInput` — `icp-cli` reads each declared file on the host side and forwards its contents inline. We expect at most one entry, treated as the asset config: - -```yaml -# icp.yaml (excerpt) -canisters: - - name: frontend - sync: - steps: - - type: plugin - path: ./plugins/assets-sync.wasm - dirs: - - dist # build output: read-only WASI preopen - files: - - assets.toml # host reads this and passes its contents to the plugin -``` - -`files: []` (or omitted) → no overrides. One entry → that entry's `content` is parsed as the asset config. **Two or more entries → error** with a message asking the user to consolidate, since the plugin has no merge semantics for multiple config files. - -The plugin does **not** assert the entry's `name` matches `assets.toml`. The manifest's `files:` field is already the authoritative declaration of which file is the asset config — re-validating the filename inside the plugin would just duplicate that check while preventing legitimate alternative names (e.g., a project that organises its config under `config/frontend-assets.toml`). The standard name `assets.toml` is a convention documented for users and tooling; the plugin trusts the bytes it receives. - -```toml -# assets.toml - -# Per-glob content-type overrides. First matching block wins. -[[asset]] -match = "/*.md" -content_type = "text/markdown; charset=utf-8" - -[[asset]] -match = "/llms.txt" -content_type = "text/plain; charset=utf-8" - -[[asset]] -match = "/*.did" -content_type = "text/plain; charset=utf-8" - -[[asset]] -match = "/*.sh" -content_type = "text/plain; charset=utf-8" -``` - -### Fields - -| Field | Required | Validation | -|---|---|---| -| `match` | yes | Glob over the asset key. Same dialect as `_headers` post-globs: leading `/`, `*` matches any sequence (including `/` and empty), no `**`, no `:placeholder`. | -| `content_type` | optional in the schema, but a block with no recognised v1 field is a no-op | Parsed via `mime::Mime::from_str` at config-load time. Rejects malformed MIME, empty subtype, CR/LF, and other invalid bytes — same safety guarantee as `_headers` gets from `http::HeaderValue`. | - -`content_type` is marked optional in the schema because future `[[asset]]` fields (`ignore`, `encodings`, `allow_raw_access`) will populate the same block standalone. v1 silently ignores blocks with no recognised fields rather than erroring — forward compatibility over strictness. - -Unknown top-level or per-block keys are **rejected** at parse time (`#[serde(deny_unknown_fields)]`). Typo protection beats forward compatibility for an unreleased plugin: catching `contetn_type` immediately is more useful than allowing a v2 field to pass through silently. When v2 fields land, the schema is updated atomically with the plugin that supports them. - -### Resolution - -For each asset key during sync: - -1. Walk `[[asset]]` blocks in declaration order. -2. The first block whose `match` matches the key contributes its `content_type` (when set). -3. If no block contributes a `content_type`, fall back to `mime_guess::from_path(path).first()`. -4. Final fallback: `application/octet-stream` (unchanged from today). - -**Order is semantic.** A specific block (`match = "/legacy/oldstyle.md"`) must precede a broader one (`match = "/*.md"`) to take effect — Netlify/Cloudflare convention, same as `_redirects` ordering. This is why the schema uses `[[asset]]` (TOML array-of-tables, order-preserving by spec) rather than an inline-table map (where order is parser-dependent). - -### Determinism - -Identical to the `_headers` determinism guarantee: - -- The resolver is a pure function of `(blocks, key)`. Same `assets.toml` + same asset key → same `Option` byte-for-byte. -- The canister stores the resolved string in `Asset.content_type` and emits it as the single `content-type` header of the certified response. -- No runtime injection, no hidden defaults beyond `mime_guess` and the `application/octet-stream` fallback (both of which are themselves pure functions of the file path). - -## Validation - -Rejected at config-load with a TOML span pointing at the offending line: - -- TOML syntax errors (from the `toml` crate's error type). -- `[[asset]]` block missing `match`. -- `match` value that fails the `_headers`-style pattern rules: must start with `/`, no `**`, no `:placeholder`. -- `content_type` value that fails `mime::Mime::from_str` — catches CR/LF, missing slash, empty subtype, etc. - -A **missing** `assets.toml` is treated as "no overrides" — the feature is purely opt-in. An **empty** `assets.toml` (zero `[[asset]]` blocks) is valid and equivalent to absent. - -Parsing aborts at the first malformed block — matches the `_headers`/`_redirects` philosophy of "fix issues one at a time rather than wade through cascading errors." - -## Integration in `assets-sync` - -- **Parser** (`assets-sync/src/asset_config.rs`). Uses the `toml` crate (already a candidate workspace dep — to be added). Defines `AssetConfig { asset: Vec }` and `AssetBlock { match, content_type }` with serde-deserialized fields. Returns errors with the file path and TOML span. -- **Pattern matcher**. Reuses `HeaderPattern` from `headers.rs`. As part of step 1 below, the pattern type is renamed to `KeyPattern` and lifted into a shared module (`assets-sync/src/glob.rs`); `headers.rs` re-exports it as `HeaderPattern` (alias) to keep its API stable. Single matcher, both file formats. -- **Resolver**. `AssetConfig::content_type_for(key: &str) -> Option` walks blocks in declaration order, returns the first matching block's `content_type`. -- **Plugin entry point** (`plugin/src/lib.rs`). Threads `input.files` through to `assets_sync::sync::sync(..)`. No filesystem reads for the config — `assets.toml` is **not** in `CONFIG_FILENAMES` and is **not** discovered by `scan.rs`. The config arrives inline on every invocation. -- **Sync integration** (`assets-sync/src/sync.rs`): - - `sync()` gains a `files: &[(String, String)]` parameter (name + content pairs). It calls `parse_asset_config(files)` once per run, alongside `load_redirect_rules` and `load_header_rules`. - - `parse_asset_config` validates the entry-count contract (0 → empty config, 1 → parsed, ≥2 → error) and parses the single TOML string. - - `prepare_asset` applies the override after `Content::load`: if `config.content_type_for(&source.key)` is `Some(mime)`, replace `content.media_type` before the `encoders_for(&media_type)` step. This is important — see "downstream effects" below. - - The "nothing to commit" short-circuit considers asset-config drift just as it considers `_headers` and `_redirects` drift today: an `assets.toml`-only edit triggers a sync because it can change `CreateAssetArguments.content_type` and therefore the certified response. - -### Downstream effects of overriding `media_type` - -Applying the override **before** `encoders_for(&media_type)` runs has two intentional consequences: - -1. **Compression policy follows the override.** `encoders_for` selects `gzip` for `text/*`, `*/javascript`, and `*/html` ([assets-sync/src/content.rs:81-84](assets-sync/src/content.rs#L81-L84)). A `.did` file overridden to `text/plain` will now be uploaded with a `gzip` encoding alongside `identity` — same as other text files. Without the override it stayed `application/octet-stream` and was identity-only. -2. **Drift detection sees the new type.** `is_already_in_place` and the delete-then-recreate logic in `build_operations` compare `media_type.to_string()` to the canister-stored `content_type` ([assets-sync/src/sync.rs:174](assets-sync/src/sync.rs#L174), [sync.rs:371](assets-sync/src/sync.rs#L371)). Flipping a `.did` file's content-type from `application/octet-stream` to `text/plain` triggers a one-time delete-then-create on the next sync — intentional, because the cert tree's response hash changes when the content-type changes. - -Both are correct and what the user wants. Worth noting in the README so the first sync after adopting `assets.toml` isn't surprising. - -## Step breakdown - -1. **Shared glob module.** Rename `HeaderPattern` to `KeyPattern` and lift it from `headers.rs` to `assets-sync/src/glob.rs`. `headers.rs` re-exports as a type alias for source compatibility. Zero behavior change; pure refactor. -2. **Parser + unit tests** (`asset_config.rs`). Happy path, missing-`match`, bad pattern, bad MIME, multiple blocks, unknown-field forward compat, empty file. Mirrors the test layout in `headers.rs`. -3. **Sync integration.** `load_asset_config` in `sync()`; override application in `prepare_asset`; drift-trigger in the short-circuit. Update mocked-canister tests in `sync.rs` to cover content-type override → recreate, and override-only edit → recreate without content change. -4. **E2E test** (`e2e/tests/assets_toml.rs`). Fixture with `.md` / `.did` / `.sh` / `llms.txt` files and an `assets.toml` matching the developer-docs case. Verifies the canister returns the configured `Content-Type` exactly once, and that flipping a content-type triggers a re-upload (cert tree updates). -5. **Docs.** Short section in top-level `README.md` next to `_headers` / `_redirects`. Brief note in `plugin/README.md` that the first sync after adopting `assets.toml` may recreate assets whose content-type changes — by design. - -Each step is independently shippable; together they ship the v1 surface. - -## What this plan is *not* for - -- **Response headers other than `Content-Type`.** Those go in `_headers`. The split is the whole point of the refactor. -- **A general MIME-type extension table.** No `[content_type.extensions] md = "text/markdown"` shorthand — every override is an explicit `[[asset]]` block. "No implicit rules" is a stated principle: defaults come from `mime_guess`, overrides come from `assets.toml`, nothing in between. -- **Built-in security-policy bundles.** Removed entirely per the broader proposal. - -## Parked for follow-up - -- **`[[asset]]` block growth.** When `ignore`, `encodings`, `allow_raw_access` land, each new field picks its own composition rule (per-field first-match-wins, last-wins as a global toggle, etc.) — v1 only commits to the rule for `content_type`. Future fields can revisit composition without breaking v1. -- **Overlap warnings.** v1 doesn't lint that multiple `[[asset]]` blocks could both match some asset — first-match-wins handles it implicitly. If users hit footguns (broad rule shadowing a more specific one declared later), add a sync-time warning then. diff --git a/Cargo.lock b/Cargo.lock index a932b75..9ba8feb 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -137,7 +137,6 @@ dependencies = [ "serde_bytes", "sha2 0.11.0", "tempfile", - "toml", "url", ] @@ -3313,16 +3312,9 @@ dependencies = [ "serde", "serde_spanned", "toml_datetime", - "toml_write", "winnow", ] -[[package]] -name = "toml_write" -version = "0.1.2" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5d99f8c9a7727884afe522e9bd5edbfc91a3312b36a77b5fb8926e4c31a41801" - [[package]] name = "tower" version = "0.5.3" diff --git a/Cargo.toml b/Cargo.toml index da1577b..30a7bc8 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -27,6 +27,5 @@ serde_bytes = "0.11.5" serde_json = "1.0" serde_cbor = "0.11.1" sha2 = "0.11.0" -toml = "0.8" wit-bindgen = "0.57.1" tempfile = "3.27.0" diff --git a/README.md b/README.md index 2935afe..a129f2e 100644 --- a/README.md +++ b/README.md @@ -57,40 +57,21 @@ The canister also honours a Netlify-style `_headers` file at the root of the pro X-Robots-Tag: noindex ``` -`` is an absolute path with optional `*` wildcards — `/about` is exact, `/_astro/*` is a subtree, `/*.md` matches any `.md` file at any depth. A single `*` matches any sequence including `/` and empty; `**` is not supported (redundant) and neither is `:placeholder`. All matching rules apply per the Cloudflare Pages / Netlify semantics — same-name values across rules concatenate with `, ` (RFC 7230), with `Set-Cookie` carved out (RFC 6265). The file is parsed by the plugin and lowered to per-asset header lists; `Content-Type` is reserved (set via `assets.toml`, see below). See [`plugin/README.md`](plugin/README.md#headers) for the full reference and reject list. - -## Per-glob content-type overrides - -`Content-Type` is intentionally not a `_headers` field — the canister derives it from the asset's media type and certifies it as part of the response. To override what `mime_guess::from_path` picks (or to add a `charset` parameter), declare blocks in an `assets.toml` passed to the plugin via the manifest's `files:` field: - -```yaml -# icp.yaml (excerpt) -canisters: - - name: frontend - sync: - steps: - - type: plugin - path: ./plugins/assets-sync.wasm - dirs: - - dist - files: - - assets.toml -``` +`` is an absolute path with optional `*` wildcards — `/about` is exact, `/_astro/*` is a subtree, `/*.md` matches any `.md` file at any depth. A single `*` matches any sequence including `/` and empty; `**` is not supported (redundant) and neither is `:placeholder`. All matching rules apply per the Cloudflare Pages / Netlify semantics — same-name values across rules concatenate with `, ` (RFC 7230), with `Set-Cookie` carved out (RFC 6265). `Content-Type` is recognised but routed to the asset's stored media type instead of the appended response headers — see below. See [`plugin/README.md`](plugin/README.md#headers) for the full reference and reject list. + +### `Content-Type` overrides -```toml -# assets.toml +The canister derives a `Content-Type` for every asset from its media type and certifies it as part of the response. To override what `mime_guess::from_path` picks (or to add a `charset` parameter), set `Content-Type:` inside any `_headers` block: -[[asset]] -match = "/*.md" -content_type = "text/markdown; charset=utf-8" +```text +/*.md + Content-Type: text/markdown; charset=utf-8 -[[asset]] -match = "/*.did" -content_type = "text/plain; charset=utf-8" +/*.did + Content-Type: text/plain; charset=utf-8 -[[asset]] -match = "/llms.txt" -content_type = "text/plain; charset=utf-8" +/llms.txt + Content-Type: text/plain; charset=utf-8 ``` -`match` uses the same glob dialect as `_headers`. Blocks are walked in declaration order; the first matching `content_type` wins, with `mime_guess` as the fallback. The override feeds `CreateAssetArguments.content_type`, so the canister emits exactly one `Content-Type` per response — no duplicates from layering with response headers. See [`ASSETS-TOML.md`](ASSETS-TOML.md) for the full design and validation rules. +The plugin extracts `Content-Type` and feeds it into `CreateAssetArguments.content_type` rather than appending it as a response header — so the canister emits exactly one `Content-Type` per response, no duplicates. Other headers in the same block continue to flow through `headers` as usual. `Content-Type` is single-valued, so when multiple blocks match the same asset the first matching `Content-Type` wins (other matching rules still contribute their non-`Content-Type` headers as normal). diff --git a/assets-sync/Cargo.toml b/assets-sync/Cargo.toml index 261d79b..3b4f06e 100644 --- a/assets-sync/Cargo.toml +++ b/assets-sync/Cargo.toml @@ -15,7 +15,6 @@ mime_guess.workspace = true serde.workspace = true serde_bytes.workspace = true sha2.workspace = true -toml.workspace = true url = "2" [dev-dependencies] diff --git a/assets-sync/src/asset_config.rs b/assets-sync/src/asset_config.rs deleted file mode 100644 index 5c24e40..0000000 --- a/assets-sync/src/asset_config.rs +++ /dev/null @@ -1,318 +0,0 @@ -//! Parser for `assets.toml`. v1 only recognises per-glob `content_type` -//! overrides — see [ASSETS-TOML.md](../../ASSETS-TOML.md) for the full design. -//! -//! Input arrives as `(name, content)` pairs through the plugin runtime's -//! `files:` channel ([plugin/wit/sync-plugin.wit](../../plugin/wit/sync-plugin.wit)). -//! We expect 0 or 1 entries; 2+ is rejected because the plugin has no merge -//! semantics for multiple config files. -//! -//! The standard filename is `assets.toml`, but the parser doesn't assert -//! that — the manifest's `files:` field is the authoritative declaration of -//! which file is the asset config. -//! -//! Errors include the source filename so users see which file is the -//! problem when the plugin runs under `icp-cli`. - -use crate::glob::{self, KeyPattern}; -use mime::Mime; -use serde::Deserialize; -use std::str::FromStr; - -#[derive(Debug, Clone, PartialEq, Eq)] -pub struct AssetConfig { - blocks: Vec, -} - -#[derive(Debug, Clone, PartialEq, Eq)] -pub struct AssetBlock { - pub pattern: KeyPattern, - pub content_type: Option, -} - -impl AssetConfig { - pub fn empty() -> Self { - Self { blocks: Vec::new() } - } - - pub fn is_empty(&self) -> bool { - self.blocks.is_empty() - } - - /// Returns the first matching block's `content_type`, walking blocks in - /// declaration order. Blocks without a `content_type` are skipped — they - /// exist for future fields (`ignore`, `encodings`) that haven't shipped. - pub fn content_type_for(&self, key: &str) -> Option { - for block in &self.blocks { - if block.pattern.matches(key) { - if let Some(mime) = &block.content_type { - return Some(mime.clone()); - } - } - } - None - } - - /// Build from the plugin runtime's `files:` slice. The slice carries - /// `(name, content)` pairs read by the host. - /// - /// * 0 entries → empty config (no overrides). - /// * 1 entry → parsed as the asset config. - /// * 2+ entries → error (the plugin has no merge semantics). - pub fn from_files(files: &[(String, String)]) -> Result { - match files { - [] => Ok(Self::empty()), - [(name, content)] => parse(name, content), - _ => Err(format!( - "expected at most one file in `files:`, got {} — consolidate the asset config into a single file", - files.len() - )), - } - } -} - -#[derive(Debug, Deserialize)] -#[serde(deny_unknown_fields)] -struct RawConfig { - #[serde(default)] - asset: Vec, -} - -#[derive(Debug, Deserialize)] -#[serde(deny_unknown_fields)] -struct RawBlock { - #[serde(rename = "match")] - match_: String, - #[serde(default)] - content_type: Option, -} - -fn parse(name: &str, content: &str) -> Result { - let raw: RawConfig = - toml::from_str(content).map_err(|e| format!("{name}: TOML parse error: {e}"))?; - - let mut blocks = Vec::with_capacity(raw.asset.len()); - for (idx, raw_block) in raw.asset.into_iter().enumerate() { - let block_no = idx + 1; - let pattern = glob::parse(&raw_block.match_) - .map_err(|e| format!("{name}: [[asset]] #{block_no}: invalid `match`: {e}"))?; - let content_type = raw_block - .content_type - .map(|s| { - Mime::from_str(&s).map_err(|e| { - format!("{name}: [[asset]] #{block_no}: invalid `content_type` {s:?}: {e}") - }) - }) - .transpose()?; - blocks.push(AssetBlock { - pattern, - content_type, - }); - } - Ok(AssetConfig { blocks }) -} - -#[cfg(test)] -mod tests { - use super::*; - - fn cfg(content: &str) -> AssetConfig { - parse("assets.toml", content).unwrap() - } - - fn err(content: &str) -> String { - parse("assets.toml", content).unwrap_err() - } - - // ── happy paths ─────────────────────────────────────────────────────────── - - #[test] - fn empty_content_yields_empty_config() { - let c = cfg(""); - assert!(c.is_empty()); - } - - #[test] - fn parses_single_content_type_block() { - let c = cfg(r#" -[[asset]] -match = "/*.md" -content_type = "text/markdown; charset=utf-8" -"#); - assert_eq!( - c.content_type_for("/foo.md").unwrap().to_string(), - "text/markdown; charset=utf-8" - ); - assert!(c.content_type_for("/foo.html").is_none()); - } - - #[test] - fn parses_multiple_blocks_in_order() { - let c = cfg(r#" -[[asset]] -match = "/legacy/oldstyle.md" -content_type = "text/plain" - -[[asset]] -match = "/*.md" -content_type = "text/markdown; charset=utf-8" -"#); - // Specific rule declared first wins. - assert_eq!( - c.content_type_for("/legacy/oldstyle.md") - .unwrap() - .to_string(), - "text/plain" - ); - assert_eq!( - c.content_type_for("/other.md").unwrap().to_string(), - "text/markdown; charset=utf-8" - ); - } - - #[test] - fn block_without_content_type_is_no_op_for_v1() { - // A bare `[[asset]] match = "..."` block (no v1 fields) parses - // and contributes nothing to content-type resolution. Future fields - // (ignore, encodings) will hang off the same shape. - let c = cfg(r#" -[[asset]] -match = "/*" -"#); - assert!(c.content_type_for("/anything").is_none()); - } - - #[test] - fn from_files_handles_zero_one_and_many() { - // Zero entries — empty. - let c = AssetConfig::from_files(&[]).unwrap(); - assert!(c.is_empty()); - - // One entry — parsed. - let files = vec![( - "assets.toml".into(), - r#" -[[asset]] -match = "/*.md" -content_type = "text/markdown" -"# - .into(), - )]; - let c = AssetConfig::from_files(&files).unwrap(); - assert_eq!( - c.content_type_for("/x.md").unwrap().to_string(), - "text/markdown" - ); - - // Two entries — error. - let files = vec![("a.toml".into(), "".into()), ("b.toml".into(), "".into())]; - let e = AssetConfig::from_files(&files).unwrap_err(); - assert!(e.contains("at most one"), "{e}"); - } - - #[test] - fn filename_is_not_asserted() { - // Manifest may declare any filename; the plugin trusts the host. - let files = vec![( - "config/custom.toml".into(), - r#" -[[asset]] -match = "/x" -content_type = "text/plain" -"# - .into(), - )]; - let c = AssetConfig::from_files(&files).unwrap(); - assert_eq!(c.content_type_for("/x").unwrap().to_string(), "text/plain"); - } - - // ── reject cases ────────────────────────────────────────────────────────── - - #[test] - fn rejects_unknown_top_level_field() { - let e = err(r#" -unknown_key = "x" - -[[asset]] -match = "/*" -"#); - assert!(e.contains("TOML parse error"), "{e}"); - assert!(e.contains("unknown_key") || e.contains("unknown"), "{e}"); - } - - #[test] - fn rejects_unknown_per_block_field() { - let e = err(r#" -[[asset]] -match = "/*" -contetn_type = "text/plain" -"#); - assert!(e.contains("TOML parse error"), "{e}"); - assert!(e.contains("contetn_type") || e.contains("unknown"), "{e}"); - } - - #[test] - fn rejects_missing_match_field() { - let e = err(r#" -[[asset]] -content_type = "text/plain" -"#); - assert!(e.contains("TOML parse error"), "{e}"); - } - - #[test] - fn rejects_invalid_pattern() { - let e = err(r#" -[[asset]] -match = "no-leading-slash" -content_type = "text/plain" -"#); - assert!(e.contains("invalid `match`"), "{e}"); - assert!(e.contains("absolute path"), "{e}"); - } - - #[test] - fn rejects_double_star_pattern() { - let e = err(r#" -[[asset]] -match = "/foo/**/bar" -content_type = "text/plain" -"#); - assert!(e.contains("invalid `match`"), "{e}"); - assert!(e.contains("'**'"), "{e}"); - } - - #[test] - fn rejects_malformed_mime() { - let e = err(r#" -[[asset]] -match = "/*.md" -content_type = "not a mime type" -"#); - assert!(e.contains("invalid `content_type`"), "{e}"); - } - - #[test] - fn rejects_empty_mime() { - let e = err(r#" -[[asset]] -match = "/*.md" -content_type = "" -"#); - assert!(e.contains("invalid `content_type`"), "{e}"); - } - - #[test] - fn error_names_the_block_number() { - // Both blocks are syntactically fine; the second has a bad pattern. - let e = err(r#" -[[asset]] -match = "/*.md" -content_type = "text/markdown" - -[[asset]] -match = "/foo/**/bar" -content_type = "text/plain" -"#); - assert!(e.contains("#2"), "{e}"); - } -} diff --git a/assets-sync/src/glob.rs b/assets-sync/src/glob.rs index 392bed3..1c282d1 100644 --- a/assets-sync/src/glob.rs +++ b/assets-sync/src/glob.rs @@ -1,5 +1,5 @@ -//! Glob pattern matched against asset keys, shared by `_headers` and the -//! `assets.toml` content-type overrides. +//! Glob pattern matched against asset keys, used by `_headers` to attach +//! both response headers and Content-Type overrides to assets. //! //! Syntax: leading `/`, then literal characters and a single greedy `*`. //! `*` matches any sequence of characters including `/` and empty; every diff --git a/assets-sync/src/headers.rs b/assets-sync/src/headers.rs index aa5d2f2..0814400 100644 --- a/assets-sync/src/headers.rs +++ b/assets-sync/src/headers.rs @@ -5,19 +5,30 @@ //! blocks. Errors carry a 1-based line number so the plugin can point users //! at the offending entry without a canister round-trip. //! -//! See the "File format" section of HEADERS.md for the full reject list. +//! `Content-Type` is parsed structurally onto [`HeaderRule::content_type`] +//! instead of accumulating in `headers`: the canister stores it as asset +//! metadata that drives encoder selection and certification, not as an +//! appended response header. See HEADERS.md for the full reject list. use crate::glob::KeyPattern; use http::{HeaderName, HeaderValue}; +use mime::Mime; +use std::str::FromStr; pub const HEADERS_FILENAME: &str = "_headers"; #[derive(Debug, Clone, PartialEq, Eq)] pub struct HeaderRule { pub pattern: KeyPattern, - /// Headers in declaration order. Multiple entries for the same name are - /// allowed (e.g. `Set-Cookie`); resolver semantics in `sync.rs`. + /// Response headers in declaration order. Multiple entries for the same + /// name are allowed (e.g. `Set-Cookie`); resolver semantics in `sync.rs`. + /// `Content-Type` is never stored here — it routes to [`Self::content_type`]. pub headers: Vec<(String, String)>, + /// `Content-Type` value if the block declared one. The plugin feeds this + /// into `CreateAssetArguments.content_type`, overriding `mime_guess` + /// before encoder selection runs. At most one per block; a duplicate + /// `Content-Type:` line within the same block is a parse error. + pub content_type: Option, } #[derive(Debug, Clone, PartialEq, Eq)] @@ -35,8 +46,15 @@ impl std::fmt::Display for ParseError { impl std::error::Error for ParseError {} /// Open block under construction during `parse`: -/// (line_no of the path line, pattern, headers accumulated so far). -type OpenBlock = (usize, KeyPattern, Vec<(String, String)>); +/// line_no of the path line, pattern, headers accumulated so far, +/// and the block's `Content-Type` value (if a `Content-Type:` line has +/// been seen yet — used to detect duplicates). +struct OpenBlock { + line_no: usize, + pattern: KeyPattern, + headers: Vec<(String, String)>, + content_type: Option, +} /// Resolves the per-asset header map for `key` by walking `rules` in /// declaration order. All matching rules contribute; same-name values across @@ -44,6 +62,9 @@ type OpenBlock = (usize, KeyPattern, Vec<(String, String)>); /// carved out per RFC 6265 §3 (kept as separate entries). The returned Vec is /// stable-sorted by lowercased header name so multi-valued headers preserve /// their declaration order — see the determinism guarantee in HEADERS.md. +/// +/// `Content-Type` is never present in the output — it routes through +/// [`content_type_for`] into the asset's stored `content_type` metadata. pub fn resolve(key: &str, rules: &[HeaderRule]) -> Vec<(String, String)> { use std::collections::HashMap; @@ -79,13 +100,24 @@ pub fn resolve(key: &str, rules: &[HeaderRule]) -> Vec<(String, String)> { merged } +/// Returns the `Content-Type` override for `key`, walking `rules` in +/// declaration order — first-match-wins, because `Content-Type` is +/// single-valued and accumulation semantics make no sense for it. Returns +/// `None` if no matching rule declared a `Content-Type:`, in which case the +/// caller falls back to `mime_guess`. +pub fn content_type_for(key: &str, rules: &[HeaderRule]) -> Option { + rules + .iter() + .find(|r| r.pattern.matches(key) && r.content_type.is_some()) + .and_then(|r| r.content_type.clone()) +} + /// Parses an entire `_headers` file into a list of [`HeaderRule`]s. Rules are /// returned in declaration order — the resolver walks them in order, so order /// is semantic. The first malformed line aborts parsing; we want the user to /// fix issues one at a time rather than wade through cascading errors. pub fn parse(content: &str) -> Result, ParseError> { let mut rules = Vec::new(); - // Open block: (line_no of the path line, pattern, headers accumulated so far). let mut current: Option = None; for (i, raw) in content.lines().enumerate() { @@ -120,22 +152,40 @@ pub fn parse(content: &str) -> Result, ParseError> { line: line_no, message, })?; - current = Some((line_no, pattern, Vec::new())); + current = Some(OpenBlock { + line_no, + pattern, + headers: Vec::new(), + content_type: None, + }); continue; } // Indented line: must be a `Header-Name: value` inside an open block. - let Some((_, _, headers)) = current.as_mut() else { + let Some(block) = current.as_mut() else { return Err(ParseError { line: line_no, message: "indented header line outside a path block".to_string(), }); }; - let (name, value) = parse_header(stripped).map_err(|message| ParseError { + let parsed = parse_header(stripped).map_err(|message| ParseError { line: line_no, message, })?; - headers.push((name, value)); + match parsed { + ParsedHeader::ContentType(mime) => { + if block.content_type.is_some() { + return Err(ParseError { + line: line_no, + message: "duplicate `Content-Type` in the same block".to_string(), + }); + } + block.content_type = Some(mime); + } + ParsedHeader::Other(name, value) => { + block.headers.push((name, value)); + } + } } if let Some(block) = current.take() { @@ -151,17 +201,26 @@ fn strip_comment(line: &str) -> &str { } } -fn finalize_block((line_no, pattern, headers): OpenBlock) -> Result { - if headers.is_empty() { +fn finalize_block(block: OpenBlock) -> Result { + if block.headers.is_empty() && block.content_type.is_none() { return Err(ParseError { - line: line_no, + line: block.line_no, message: "path block has no header lines under it".to_string(), }); } - Ok(HeaderRule { pattern, headers }) + Ok(HeaderRule { + pattern: block.pattern, + headers: block.headers, + content_type: block.content_type, + }) } -fn parse_header(stripped: &str) -> Result<(String, String), String> { +enum ParsedHeader { + ContentType(Mime), + Other(String, String), +} + +fn parse_header(stripped: &str) -> Result { let trimmed = stripped.trim(); let Some(colon_idx) = trimmed.find(':') else { return Err(format!( @@ -174,10 +233,12 @@ fn parse_header(stripped: &str) -> Result<(String, String), String> { return Err("header name is empty".to_string()); } if name.eq_ignore_ascii_case("content-type") { - return Err( - "'Content-Type' is derived from the asset's media type and cannot be overridden" - .to_string(), - ); + if value.is_empty() { + return Err("'Content-Type' value is empty".to_string()); + } + let mime = Mime::from_str(value) + .map_err(|e| format!("invalid `Content-Type` value '{value}': {e}"))?; + return Ok(ParsedHeader::ContentType(mime)); } if value.contains(":splat") || value.contains(":placeholder") { return Err(format!( @@ -189,7 +250,7 @@ fn parse_header(stripped: &str) -> Result<(String, String), String> { HeaderName::from_bytes(name.as_bytes()) .map_err(|e| format!("invalid header name '{name}': {e}"))?; HeaderValue::from_str(value).map_err(|e| format!("invalid header value '{value}': {e}"))?; - Ok((name.to_string(), value.to_string())) + Ok(ParsedHeader::Other(name.to_string(), value.to_string())) } #[cfg(test)] @@ -385,17 +446,91 @@ mod tests { } #[test] - fn rejects_content_type_header() { - let e = err("/about\n Content-Type: text/plain\n"); - assert!(e.message.contains("Content-Type"), "{}", e.message); + fn content_type_routes_to_dedicated_field_not_headers() { + // `Content-Type` is asset metadata, not a response header — it must + // not appear in `headers` (otherwise the canister would append it + // alongside its own derived value, producing duplicates on the wire). + let rules = parse("/llms.txt\n Content-Type: text/markdown; charset=utf-8\n").unwrap(); + assert_eq!(rules.len(), 1); + assert!( + rules[0].headers.is_empty(), + "Content-Type leaked into headers: {:?}", + rules[0].headers + ); + assert_eq!( + rules[0].content_type.as_ref().unwrap().to_string(), + "text/markdown; charset=utf-8" + ); + } + + #[test] + fn content_type_is_case_insensitive() { + let rules = parse("/llms.txt\n content-type: text/plain\n").unwrap(); + assert_eq!( + rules[0].content_type.as_ref().unwrap().to_string(), + "text/plain" + ); + } + + #[test] + fn content_type_coexists_with_other_headers() { + let input = "\ +/llms.txt + Content-Type: text/markdown; charset=utf-8 + Cache-Control: max-age=3600 + X-Robots-Tag: noindex +"; + let rules = parse(input).unwrap(); + assert_eq!( + rules[0].content_type.as_ref().unwrap().to_string(), + "text/markdown; charset=utf-8" + ); + assert_eq!( + rules[0].headers, + vec![ + ("Cache-Control".into(), "max-age=3600".into()), + ("X-Robots-Tag".into(), "noindex".into()), + ] + ); + } + + #[test] + fn block_with_only_content_type_is_valid() { + // `Content-Type` alone counts as a non-empty block — finalize must + // not reject it as "no header lines under it". + let rules = parse("/llms.txt\n Content-Type: text/markdown\n").unwrap(); + assert_eq!(rules.len(), 1); + assert!(rules[0].headers.is_empty()); + assert_eq!( + rules[0].content_type.as_ref().unwrap().to_string(), + "text/markdown" + ); } #[test] - fn rejects_content_type_case_insensitive() { - let e = err("/about\n content-type: text/plain\n"); + fn rejects_duplicate_content_type_in_same_block() { + let input = "\ +/llms.txt + Content-Type: text/markdown + Content-Type: text/plain +"; + let e = err(input); + assert!(e.message.contains("duplicate"), "{}", e.message); + assert_eq!(e.line, 3); + } + + #[test] + fn rejects_invalid_content_type_value() { + let e = err("/llms.txt\n Content-Type: not a mime\n"); assert!(e.message.contains("Content-Type"), "{}", e.message); } + #[test] + fn rejects_empty_content_type_value() { + let e = err("/llms.txt\n Content-Type:\n"); + assert!(e.message.contains("empty"), "{}", e.message); + } + #[test] fn rejects_missing_colon() { let e = err("/about\n X-Frame-Options DENY\n"); @@ -467,6 +602,15 @@ mod tests { .iter() .map(|(k, v)| (k.to_string(), v.to_string())) .collect(), + content_type: None, + } + } + + fn rule_with_content_type(pattern_src: &str, mime: &str) -> HeaderRule { + HeaderRule { + pattern: pat(pattern_src), + headers: Vec::new(), + content_type: Some(mime.parse().unwrap()), } } @@ -623,4 +767,53 @@ mod tests { assert!(resolve("/different", &rules).is_empty()); assert!(resolve("/specific/subpath", &rules).is_empty()); } + + // ── content_type_for ────────────────────────────────────────────────────── + + #[test] + fn content_type_for_returns_none_when_no_rule_matches() { + let rules = vec![rule_with_content_type("/*.md", "text/markdown")]; + assert!(content_type_for("/index.html", &rules).is_none()); + } + + #[test] + fn content_type_for_returns_none_when_matching_rule_has_no_content_type() { + // A pure response-headers block must not produce a Content-Type override. + let rules = vec![rule("/*", &[("X-Robots-Tag", "noindex")])]; + assert!(content_type_for("/anywhere", &rules).is_none()); + } + + #[test] + fn content_type_for_first_matching_rule_wins() { + // First-match-wins (Content-Type is single-valued — accumulation + // semantics make no sense). + let rules = vec![ + rule_with_content_type("/legacy/oldstyle.md", "text/plain"), + rule_with_content_type("/*.md", "text/markdown"), + ]; + assert_eq!( + content_type_for("/legacy/oldstyle.md", &rules) + .unwrap() + .to_string(), + "text/plain" + ); + assert_eq!( + content_type_for("/other.md", &rules).unwrap().to_string(), + "text/markdown" + ); + } + + #[test] + fn content_type_for_skips_matching_rules_without_content_type() { + // A broader header-only rule before a narrower Content-Type rule + // must not shadow the override. + let rules = vec![ + rule("/*", &[("X-Robots-Tag", "noindex")]), + rule_with_content_type("/*.md", "text/markdown"), + ]; + assert_eq!( + content_type_for("/intro.md", &rules).unwrap().to_string(), + "text/markdown" + ); + } } diff --git a/assets-sync/src/lib.rs b/assets-sync/src/lib.rs index 26eff34..f186b08 100644 --- a/assets-sync/src/lib.rs +++ b/assets-sync/src/lib.rs @@ -1,4 +1,3 @@ -pub mod asset_config; pub mod canister; pub mod content; pub mod glob; diff --git a/assets-sync/src/sync.rs b/assets-sync/src/sync.rs index ec0960f..5f84f25 100644 --- a/assets-sync/src/sync.rs +++ b/assets-sync/src/sync.rs @@ -68,11 +68,9 @@ fn ensure_commit_permission( pub fn sync( canister: &C, dirs: &[String], - files: &[(String, String)], identity_principal: &str, proxy_canister_id: Option<&str>, ) -> Result { - let asset_config = crate::asset_config::AssetConfig::from_files(files)?; // The assets plugin owns the URL space of its canister: every key starts at // `/`, `_redirects` lives at the project root, and the canister has no // notion of "merge two trees together". Multiple input directories would @@ -161,7 +159,7 @@ pub fn sync( // Phase 1: compute metadata only — no batch created yet. let mut project_assets: HashMap = HashMap::new(); for source in sources { - let asset = prepare_asset(source, &asset_config, &canister_assets)?; + let asset = prepare_asset(source, &project_header_rules, &canister_assets)?; project_assets.insert(asset.source.key.clone(), asset); } @@ -229,16 +227,16 @@ pub fn sync( fn prepare_asset( source: AssetSource, - asset_config: &crate::asset_config::AssetConfig, + header_rules: &[HeaderRule], canister_assets: &HashMap, ) -> Result { let mut content = Content::load(&source.path)?; - // Apply per-glob content-type override from `assets.toml` before deciding - // encoders or computing the asset's stored media type. This is what makes - // a `.did` file declared as `text/plain` pick up gzip compression and - // surface the correct `Content-Type` from the canister's certified - // response — see ASSETS-TOML.md "Downstream effects". - if let Some(override_mime) = asset_config.content_type_for(&source.key) { + // Apply per-glob `Content-Type` override from `_headers` before deciding + // encoders or computing the asset's stored media type. Routing through + // `content.media_type` is what makes a `.did` file declared as + // `text/plain` pick up gzip compression and surface the correct + // `Content-Type` from the canister's certified response. + if let Some(override_mime) = headers::content_type_for(&source.key, header_rules) { content.media_type = override_mime; } // gzip for text/* and js/html, identity for everything else. @@ -1207,7 +1205,6 @@ mod tests { let result = sync( &mock, &[dir.path().to_str().unwrap().to_string()], - &[], &Principal::anonymous().to_text(), None, ); @@ -1246,7 +1243,6 @@ mod tests { let result = sync( &mock, &[dir.path().to_str().unwrap().to_string()], - &[], &Principal::anonymous().to_text(), None, ); @@ -1265,12 +1261,7 @@ mod tests { path: f.path().to_path_buf(), key: "/test.txt".to_string(), }; - let asset = prepare_asset( - source, - &crate::asset_config::AssetConfig::empty(), - &HashMap::new(), - ) - .unwrap(); + let asset = prepare_asset(source, &[], &HashMap::new()).unwrap(); assert!( asset.encodings.contains_key("identity"), "identity must be present" @@ -1281,14 +1272,13 @@ mod tests { ); } - // assets.toml content-type override drives both the stored media type - // and the encoder selection. Without the override, a `.did` file is + // `_headers` Content-Type override drives both the stored media type and + // the encoder selection. Without the override, a `.did` file is // `application/octet-stream` (mime_guess has no entry) and gets only the // identity encoding; with the override to `text/plain`, encoders_for // selects gzip too. #[test] - fn asset_config_content_type_override_applies_to_prepare_asset() { - use crate::asset_config::AssetConfig; + fn header_content_type_override_applies_to_prepare_asset() { use std::io::Write; // Highly compressible content so gzip is genuinely smaller and gets @@ -1306,23 +1296,15 @@ mod tests { }; // No override: mime_guess returns octet-stream, gzip is not selected. - let without = prepare_asset(mk_source(), &AssetConfig::empty(), &HashMap::new()).unwrap(); + let without = prepare_asset(mk_source(), &[], &HashMap::new()).unwrap(); assert_eq!(without.media_type.to_string(), "application/octet-stream"); assert!(!without.encodings.contains_key("gzip")); - // With override to text/plain, both the media type and the encoder - // pick change. - let files = vec![( - "assets.toml".to_string(), - r#" -[[asset]] -match = "/*.did" -content_type = "text/plain; charset=utf-8" -"# - .to_string(), - )]; - let config = AssetConfig::from_files(&files).unwrap(); - let with = prepare_asset(mk_source(), &config, &HashMap::new()).unwrap(); + // With override to text/plain via `_headers`, both the media type + // and the encoder pick change. + let rules = + crate::headers::parse("/*.did\n Content-Type: text/plain; charset=utf-8\n").unwrap(); + let with = prepare_asset(mk_source(), &rules, &HashMap::new()).unwrap(); assert_eq!(with.media_type.to_string(), "text/plain; charset=utf-8"); assert!( with.encodings.contains_key("gzip"), @@ -1521,6 +1503,7 @@ content_type = "text/plain; charset=utf-8" .iter() .map(|(k, v)| (k.to_string(), v.to_string())) .collect(), + content_type: None, } } @@ -1896,7 +1879,7 @@ content_type = "text/plain; charset=utf-8" #[test] fn sync_rejects_zero_input_dirs() { let mock = SyncMock::new(); - let err = sync(&mock, &[], &[], &Principal::anonymous().to_text(), None).unwrap_err(); + let err = sync(&mock, &[], &Principal::anonymous().to_text(), None).unwrap_err(); assert!( err.contains("expected exactly one input directory"), "got: {err}" @@ -1909,7 +1892,6 @@ content_type = "text/plain; charset=utf-8" let err = sync( &mock, &["dist-a".to_string(), "dist-b".to_string()], - &[], &Principal::anonymous().to_text(), None, ) @@ -1961,7 +1943,6 @@ content_type = "text/plain; charset=utf-8" let result = sync( &mock, &[dir.path().to_str().unwrap().to_string()], - &[], &Principal::anonymous().to_text(), None, ); @@ -2002,7 +1983,6 @@ content_type = "text/plain; charset=utf-8" let result = sync( &mock, &[dir.path().to_str().unwrap().to_string()], - &[], &Principal::anonymous().to_text(), None, ); @@ -2109,7 +2089,6 @@ content_type = "text/plain; charset=utf-8" let result = sync( &mock, &[dir.path().to_str().unwrap().to_string()], - &[], &Principal::anonymous().to_text(), None, ); @@ -2133,7 +2112,6 @@ content_type = "text/plain; charset=utf-8" let result = sync( &mock, &[dir.path().to_str().unwrap().to_string()], - &[], &Principal::anonymous().to_text(), None, ); diff --git a/assets-sync/tests/bench_sync.rs b/assets-sync/tests/bench_sync.rs index a93b932..3778fe5 100644 --- a/assets-sync/tests/bench_sync.rs +++ b/assets-sync/tests/bench_sync.rs @@ -148,7 +148,7 @@ fn run_bench(label: &str, count: usize, size_bytes: usize) { let mock = BenchMock::new(); let started = std::time::Instant::now(); - let result = sync(&mock, &dirs, &[], &Principal::anonymous().to_text(), None); + let result = sync(&mock, &dirs, &Principal::anonymous().to_text(), None); let elapsed = started.elapsed(); result.expect("sync failed"); diff --git a/e2e/tests/fixture/assets-toml/assets.toml b/e2e/tests/fixture/assets-toml/assets.toml deleted file mode 100644 index ba94c7e..0000000 --- a/e2e/tests/fixture/assets-toml/assets.toml +++ /dev/null @@ -1,14 +0,0 @@ -# Content-type overrides for files mime_guess doesn't classify usefully. -# Same shape as the developer-docs `.ic-assets.json5` rules being migrated. - -[[asset]] -match = "/*.did" -content_type = "text/plain; charset=utf-8" - -[[asset]] -match = "/*.sh" -content_type = "text/plain; charset=utf-8" - -[[asset]] -match = "/llms.txt" -content_type = "text/plain; charset=utf-8" diff --git a/e2e/tests/fixture/assets-toml/dist/install.sh b/e2e/tests/fixture/assets-toml/dist/install.sh deleted file mode 100644 index aa12e04..0000000 --- a/e2e/tests/fixture/assets-toml/dist/install.sh +++ /dev/null @@ -1,2 +0,0 @@ -#!/bin/sh -echo "installing" diff --git a/e2e/tests/fixture/assets-toml/dist/llms.txt b/e2e/tests/fixture/assets-toml/dist/llms.txt deleted file mode 100644 index 57ce047..0000000 --- a/e2e/tests/fixture/assets-toml/dist/llms.txt +++ /dev/null @@ -1,2 +0,0 @@ -site: example -purpose: e2e test for assets.toml content_type override diff --git a/e2e/tests/fixture/headers-content-type/dist/_headers b/e2e/tests/fixture/headers-content-type/dist/_headers new file mode 100644 index 0000000..6f88e84 --- /dev/null +++ b/e2e/tests/fixture/headers-content-type/dist/_headers @@ -0,0 +1,4 @@ +# Content-Type overrides for files mime_guess doesn't classify usefully. + +/*.did + Content-Type: text/plain; charset=utf-8 diff --git a/e2e/tests/fixture/assets-toml/dist/ic.did b/e2e/tests/fixture/headers-content-type/dist/ic.did similarity index 100% rename from e2e/tests/fixture/assets-toml/dist/ic.did rename to e2e/tests/fixture/headers-content-type/dist/ic.did diff --git a/e2e/tests/fixture/assets-toml/dist/index.html b/e2e/tests/fixture/headers-content-type/dist/index.html similarity index 100% rename from e2e/tests/fixture/assets-toml/dist/index.html rename to e2e/tests/fixture/headers-content-type/dist/index.html diff --git a/e2e/tests/fixture/assets-toml/icp.yaml b/e2e/tests/fixture/headers-content-type/icp.yaml similarity index 87% rename from e2e/tests/fixture/assets-toml/icp.yaml rename to e2e/tests/fixture/headers-content-type/icp.yaml index 4451bbf..9afec8c 100644 --- a/e2e/tests/fixture/assets-toml/icp.yaml +++ b/e2e/tests/fixture/headers-content-type/icp.yaml @@ -17,5 +17,3 @@ canisters: path: wasms/plugin.wasm dirs: - dist - files: - - assets.toml diff --git a/e2e/tests/assets_toml.rs b/e2e/tests/headers_content_type.rs similarity index 64% rename from e2e/tests/assets_toml.rs rename to e2e/tests/headers_content_type.rs index 282df1f..32e8346 100644 --- a/e2e/tests/assets_toml.rs +++ b/e2e/tests/headers_content_type.rs @@ -1,4 +1,5 @@ -//! Integration tests for `assets.toml` end-to-end via the WASM plugin. +//! Integration tests for `Content-Type` overrides in `_headers`, end-to-end +//! via the WASM plugin. //! //! Each test deploys a fixture to a local replica, then fetches assets via //! the HTTP gateway. The gateway validates the response's `IC-Certificate` @@ -17,11 +18,12 @@ fn content_type(headers: &reqwest::header::HeaderMap) -> &str { .unwrap_or("") } -/// Deploy the `assets-toml` fixture and verify every per-glob `content_type` -/// override survives certification and reaches the HTTP gateway. +/// Deploy the `headers-content-type` fixture and verify every per-glob +/// `Content-Type` override in `_headers` survives certification and reaches +/// the HTTP gateway. #[test] fn content_type_overrides_land_on_canister() { - let tmp = setup_project("tests/fixture/assets-toml"); + let tmp = setup_project("tests/fixture/headers-content-type"); let project = tmp.path(); let _network = LocalNetwork::start(project); @@ -33,18 +35,6 @@ fn content_type_overrides_land_on_canister() { assert_eq!(r.status(), StatusCode::OK); assert_eq!(content_type(r.headers()), "text/plain; charset=utf-8"); - // .sh → text/plain; charset=utf-8 (mime_guess returns application/x-sh - // by default; the override replaces it). - let r = http_fetch(project, "/install.sh"); - assert_eq!(r.status(), StatusCode::OK); - assert_eq!(content_type(r.headers()), "text/plain; charset=utf-8"); - - // Exact-path override: llms.txt — mime_guess returns plain text/plain - // without a charset; the override adds the charset parameter. - let r = http_fetch(project, "/llms.txt"); - assert_eq!(r.status(), StatusCode::OK); - assert_eq!(content_type(r.headers()), "text/plain; charset=utf-8"); - // No override → mime_guess default applies (index.html stays text/html). let r = http_fetch(project, "/index.html"); assert_eq!(r.status(), StatusCode::OK); @@ -55,12 +45,12 @@ fn content_type_overrides_land_on_canister() { ); } -/// Edit `assets.toml` and redeploy. Expectation: the new content-type lands +/// Edit `_headers` and redeploy. Expectation: the new content-type lands /// on the canister (via delete-then-recreate triggered by content-type /// drift), and the gateway serves the updated value. #[test] fn content_type_edit_propagates_on_redeploy() { - let tmp = setup_project("tests/fixture/assets-toml"); + let tmp = setup_project("tests/fixture/headers-content-type"); let project = tmp.path(); let _network = LocalNetwork::start(project); @@ -72,10 +62,10 @@ fn content_type_edit_propagates_on_redeploy() { // Flip the .did mapping to a different MIME without touching the file. fs::write( - project.join("assets.toml"), - b"[[asset]]\nmatch = \"/*.did\"\ncontent_type = \"application/json\"\n", + project.join("dist/_headers"), + b"/*.did\n Content-Type: application/json\n", ) - .expect("rewrite assets.toml"); + .expect("rewrite _headers"); icp_cmd(project).arg("deploy").assert().success(); @@ -84,6 +74,6 @@ fn content_type_edit_propagates_on_redeploy() { assert_eq!( content_type(r.headers()), "application/json", - "edited assets.toml should propagate the new Content-Type via delete-then-recreate", + "edited _headers should propagate the new Content-Type via delete-then-recreate", ); } diff --git a/plugin/README.md b/plugin/README.md index df967ce..764f26b 100644 --- a/plugin/README.md +++ b/plugin/README.md @@ -169,11 +169,28 @@ The plugin stable-sorts the resolved list by lowercased header name before sendi - **Existing assets**: drift is detected byte-for-byte against canister-stored headers; mismatches emit a `SetAssetProperties` op with the new list. A `_headers`-only edit propagates without re-uploading content. - **3xx redirects** (rules in `_redirects` with status 301/302/307/308) synthesize their own response, so they don't inherit asset headers. The plugin populates `RedirectRule.headers` for these from any `_headers` rule whose pattern matches the redirect's `from`. 200 / 4xx rules borrow headers from the target asset, so no plumbing is needed there. +### `Content-Type` overrides + +`Content-Type` is recognised inside a `_headers` block, but it routes to the asset's stored media type instead of the appended response headers. This is the only way to override the `mime_guess::from_path` default (or to add a `charset` parameter) — the canister always derives a single, certified `Content-Type` for every asset, and an appended header would produce a duplicate on the wire. + +```text +/*.md + Content-Type: text/markdown; charset=utf-8 + Cache-Control: public, max-age=300 +``` + +The plugin extracts `Content-Type` and feeds it into `CreateAssetArguments.content_type`; other headers in the block continue to flow through `headers` as usual. `Content-Type` is single-valued, so when multiple blocks match the same asset the first matching `Content-Type` wins (other matching rules still contribute their non-`Content-Type` headers). Editing a `Content-Type` and redeploying triggers delete-then-recreate on the canister to keep the certified type in sync. + +Validation: + +- The value must parse as a MIME type via the `mime` crate. +- A duplicate `Content-Type:` line within the same block is rejected. +- Empty value is rejected. + ### Validation and unsupported syntax Header names and values are validated via the `http` crate's `HeaderName` / `HeaderValue` (rejects CR/LF, so no header injection). Rejected with a 1-based line number: -- `Content-Type` as a header name — the canister derives it from the asset's media type for certification. - Mid-path wildcards in `` (e.g. `/foo/*/bar`) — not supported. - `:splat` / `:placeholder` substitution in header values — deferred. - Patterns like `/*.html` or `/blog/:slug` — deferred (see the design plan's tier-3 follow-up). diff --git a/plugin/src/lib.rs b/plugin/src/lib.rs index ba2acce..b8c01fa 100644 --- a/plugin/src/lib.rs +++ b/plugin/src/lib.rs @@ -52,15 +52,9 @@ impl Guest for Plugin { "sync plugin: starting for canister {} (environment: {})", input.canister_id, input.environment ); - let files: Vec<(String, String)> = input - .files - .into_iter() - .map(|f| (f.name, f.content)) - .collect(); let summary = assets_sync::sync::sync( &WasiCall, &input.dirs, - &files, &input.identity_principal, input.proxy_canister_id.as_deref(), )?;