Skip to content

Encrypted Weights

Baurzhan Atinov edited this page May 14, 2026 · 1 revision

Encrypted Weights

Ship ML models to a browser without giving away the IP. AES-256-GCM encryption at the file level, key delivery via your backend, WebCrypto decrypt at session start, inference in onnxruntime-web from the decrypted bytes.

The threat model

  • ✅ A casual visitor sees .enc files in DevTools and gets ciphertext
  • ✅ A scraper without your key gets nothing usable
  • ✅ Different keys per customer / per session for licensing control
  • ⚠️ A determined reverse engineer can dump the decrypted bytes from WASM memory after they extract your key. This is unavoidable for any client-side ML; the goal is raising the bar, not absolute secrecy

File format

.enc is just AES-256-GCM ciphertext:

[ 12 bytes IV ][ N bytes ciphertext ][ 16 bytes auth tag ]

WebCrypto's AES-GCM mode handles ciphertext + tag together when you pass (data || tag) as the input. Standard, interoperable, fast (uses AES-NI on x86 browsers, hardware on iOS/Android).

Encrypt (build-time, Python)

# wasm/tools/encrypt_models.py
import secrets
from pathlib import Path
from cryptography.hazmat.primitives.ciphers.aead import AESGCM

key = secrets.token_bytes(32)            # 256-bit key
Path('.model_key').write_text(key.hex())  # gitignored; ship via your backend

for name in ['facex_detect.onnx', 'facex_tiny.onnx', ...]:
    plain = Path(name).read_bytes()
    iv = secrets.token_bytes(12)
    ct = AESGCM(key).encrypt(iv, plain, None)
    Path(name.replace('.onnx', '.enc')).write_bytes(iv + ct)

Decrypt (browser, WebCrypto)

async function loadEncryptedModel(url, key) {
  const buf = new Uint8Array(await (await fetch(url)).arrayBuffer());
  const iv = buf.subarray(0, 12);
  const ct = buf.subarray(12);
  const k = await crypto.subtle.importKey(
    'raw', key, { name: 'AES-GCM' }, false, ['decrypt']);
  const onnx = new Uint8Array(
    await crypto.subtle.decrypt({ name: 'AES-GCM', iv }, k, ct));
  const sess = await ort.InferenceSession.create(onnx, {
    executionProviders: ['wasm']
  });
  onnx.fill(0);   // wipe plaintext from JS heap
  return sess;
}

After InferenceSession.create() the model lives inside the WASM heap. Zeroing the JS-side buffer limits the window where it's accessible from outside WASM.

Where the key comes from

For a public demo: hardcoded in JS (split + XOR-obfuscated to slow down trivial extraction). Anyone determined gets it in ~10 minutes.

For production: fetch it from your backend on session start.

Express.js example

import express from 'express';
import { authenticate, getCustomerKey } from './your-auth.js';

const app = express();

app.get('/api/model-key', authenticate, async (req, res) => {
  const key = await getCustomerKey(req.user.id);   // 32 bytes hex
  res.set({
    'Cache-Control': 'no-store',
    'Content-Type': 'application/octet-stream',
  });
  res.send(Buffer.from(key, 'hex'));
});

app.listen(3000);

Browser:

const keyBuf = await fetch('/api/model-key', { credentials: 'include' })
                       .then(r => r.arrayBuffer());
const sess = await loadEncryptedModel('facex_detect.enc', new Uint8Array(keyBuf));

FastAPI example

from fastapi import FastAPI, Depends, Response
from your_auth import authenticate, get_customer_key

app = FastAPI()

@app.get('/api/model-key')
async def model_key(user = Depends(authenticate)):
    return Response(
        content=bytes.fromhex(await get_customer_key(user.id)),
        media_type='application/octet-stream',
        headers={'Cache-Control': 'no-store'},
    )

Per-customer keys

If you license the engine per company, give each one a unique key. You encrypt the model bytes once per key (or, smarter, encrypt the model once with a master key and use AES-256-KEYWRAP to wrap the master key under each customer key — then you just rotate the customer wrappers).

Suspended a customer? Stop serving their key. Their existing browser sessions die after the next reload.

Domain binding (cheap extra step)

Sign the response with HMAC over (key || origin) and verify it client-side. Doesn't prevent extraction, but it does prevent simple proxying of your endpoint from a different domain.

What this stops vs doesn't

Attack Stopped?
curl https://your.app/facex_xs.enc ✅ ciphertext only
DevTools → Network tab snoop ✅ ciphertext only
Bot scraping your repo ✅ no plaintext on disk
Key extraction by determined attacker ❌ — they can dump from WASM heap
Per-customer revocation ✅ — stop serving the key

For higher security: move inference to a server you control. You lose the "private by construction" story but get full IP protection.

Clone this wiki locally