Skip to content

regalen/ProductLens

Repository files navigation

ProductLens

A self-hosted toolkit for product content and catalog teams. Three independent modules:

  1. Image Processing — bulk ingest, transform, and export product images through repeatable pipelines.
  2. Reporting — process Pimcore xlsx exports per country (AU/NZ): produce cleansed versions and compute deltas between uploads.
  3. Content Generator — AI-powered B2B marketing copy and short product descriptions for individual products or bundles, backed by Google Gemini with optional live search grounding.

Features

Image Processing

  • Multi-method ingestion — Upload files directly, fetch from URLs, or scrape images from web pages
  • Custom processing pipelines — Chain operations like crop, resize, scale, convert, and bulk rename in any order
  • Real-time previews — See processed results before committing to a full batch run
  • Bulk export — Download processed images as ZIP or generate XLSX manifests with public image URLs
  • Public image serving — Processed images are available at /images/{workflowId}/{filename} without authentication

Reporting

  • Pimcore export ingestion — Upload xlsx exports per country (AU/NZ); strict header + country validation rejects malformed files at upload time
  • Cleansed downloads — Drop nine junk columns, filter dash-only rows, sort by IMSKU
  • Delta downloads — Identify rows whose IMSKU is new vs the previous upload, with the same cleansing rules applied
  • Shared org-level state — Everyone sees the same "current" report per country; the most recent two uploads are retained for delta comparison

Content Generator

  • Grounded AI copy — Google Gemini with Google Search grounding can verify real SKU specs before writing; refuses with a clear message if the product can't be identified
  • Bundle support — Generate consolidated solution copy for up to 5 products from any manufacturer in a single request
  • Locale-aware spelling — Choose en-GB (British: optimised, colour, organisation) or en-US (American: optimized, color, organization); defaults to en-GB for AU/NZ/SG markets
  • Marketing copy and short descriptions — Generate reseller marketing paragraphs, key selling points, product-grid short descriptions, or both in one workflow
  • Inline or split output — Toggle key selling points inline under the marketing copy (default, paste-ready) or as a separate editable section
  • Dual copy formats — Plain-text and clean HTML copies for each section, ready for Pimcore's rich-text editor
  • Prompt transparency — A "View prompt" button shows users exactly what is sent to Gemini, with current form values substituted live
  • Admin-editable LLM settings — Admins can edit prompts, templates, Gemini model, temperature, token limit, and Google Search grounding from Settings → LLM Instructions, with history, reset, and revert

Platform

  • Role-based access — Admin, pipeline editor, and user roles with per-workflow ownership
  • Soft-deleted users — Deleted users are retained for audit/history display while their usernames can be reused
  • Docker-ready — Single container deployment with persistent storage and health checks
  • Tiered rate limiting — Loose limit on the API surface, strict limit on auth + scrape endpoints; works correctly behind a reverse proxy

Quick Start

Docker (recommended)

Pull and run the latest image from GHCR:

# docker-compose.yml
services:
  productlens:
    image: ghcr.io/regalen/productlens:latest
    container_name: productlens
    ports:
      - "3446:3446"
    volumes:
      - app-data:/data
      - app-workspace:/tmp/workspace
    environment:
      - PORT=3446
      - BASE_URL=https://your-domain.com
      - JWT_SECRET=replace-with-a-strong-random-secret
      - CORS_ORIGIN=https://your-domain.com
      # Content Generator: free Gemini API key from https://aistudio.google.com/apikey
      - GEMINI_API_KEY=${GEMINI_API_KEY}
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:3446/api/health"]
      interval: 30s
      timeout: 10s
      start_period: 30s
      retries: 3

volumes:
  app-data:
  app-workspace:
# Generate a secure JWT secret
export JWT_SECRET=$(openssl rand -hex 32)

docker compose up -d

Behind a reverse proxy? The app trusts one upstream hop and reads the real client IP from X-Forwarded-For so rate limiting works per-client. The typical nginx/Traefik/Caddy → container topology works out of the box; if you stack multiple proxies, adjust app.set("trust proxy", N) in server/index.ts.

Local Development

Prerequisites: Node.js 20+

npm install
cp .env.example .env.local
# Edit .env.local and set JWT_SECRET
npm run dev

The app will be available at http://localhost:3000.

Default Credentials

On first run, a default admin account is created and required to change its password:

  • Username: admin
  • Password: admin

Change this immediately after first login.

Environment Variables

Variable Default Description
PORT 3000 (Docker: 3446) Server listen port
JWT_SECRET Random (regenerated on restart) Secret for signing JWT tokens
BASE_URL http://localhost:{PORT} Public URL for generated image links
CORS_ORIGIN http://localhost:{PORT} in development; required in production Allowed CORS origin. Use a comma-separated list for multiple origins.
DATA_DIR ./data (Docker: /data) Persistent storage directory
WORKSPACE_DIR {DATA_DIR}/workspace (Docker: /tmp/workspace) Temp directory for in-flight processing
GEMINI_API_KEY (unset) Free Gemini API key for the Content Generator — get one at https://aistudio.google.com/apikey

How It Works

Image Processing

Workflows move through a five-stage pipeline:

  1. Ingest — Add images via file upload, URL fetch, or web scraping
  2. Configure — Select or build a processing pipeline (crop, resize, scale, convert, rename)
  3. Preview — Generate and review processed previews
  4. Process — Run the full pipeline on selected images
  5. Export — Download as ZIP or XLSX with public image URLs

Reporting

  1. Pick a report type from the Reporting tab (currently Data_Missing_Report_Webvisible)
  2. Select the country (AU or NZ)
  3. Upload the latest Pimcore xlsx export — it's strict-validated against the canonical 18-column schema and the country code in every row
  4. Download:
    • Original — your raw upload, untouched
    • Cleansed — junk columns dropped, dash-only rows filtered, sorted by IMSKU
    • Delta — only the IMSKUs new since the previous upload, with cleansing applied (available once a second upload exists)

Every (report, country) keeps the most recent two uploads. The system is shared org-wide: any authenticated user sees the same current/previous and can replace it with a new upload.

Content Generator

  1. Navigate to Content Generator in the header nav
  2. Enter a Manufacturer, Part/Model No., and Description — or click + Add product to build a bundle (up to 5 products, any mix of manufacturers)
  3. Choose whether to generate Marketing copy, Product description, or both
  4. Choose a Length (Short / Medium / Long) for marketing copy and a Locale (en-GB for AU/NZ/SG, en-US for American markets)
  5. Check Show key selling points inline (on by default) to get a single paste-ready block, or uncheck to keep marketing paragraphs and key selling points as separate editable sections
  6. Click Generate — Gemini uses the configured prompt and grounding settings, then returns the requested artifacts in the chosen locale's spelling
  7. Edit generated output directly, then copy as plain text or HTML where available (ready to paste into Pimcore's Source view)
  8. Click Regenerate at any time; if you've edited the output you'll be asked to confirm before it's overwritten

GEMINI_API_KEY required. The feature returns 503 if the key is missing. A free key from Google AI Studio is sufficient.

Admin LLM Instructions

Admins can open Settings → LLM Instructions to manage Content Generator behavior without redeploying:

  • edit marketing and product-description system instructions
  • edit single-product and bundle user-message templates
  • choose an allowed Gemini model (gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-flash, gemini-1.5-pro)
  • tune temperature and max output tokens
  • enable or disable Google Search grounding
  • reset fields to factory defaults
  • view history and revert to previous versions

Settings are stored in SQLite and read fresh for every generation. Prompt history stores full prompt text and is visible to admins, so do not put secrets in prompts.

Server-side validation protects the required template placeholders and output delimiters (===MARKETING===, ===BULLETS===, ===DESCRIPTION===, ===END===) that the response parser depends on.

Data Retention

Image-processing workflows (and all associated data — uploaded source images, generated previews, processed outputs, and database rows) are automatically deleted 7 days after creation, regardless of status. This keeps the SQLite database and /data volume lean.

Export your results before the 7-day window closes. Use the ZIP or XLSX export on the output stage to download everything you need. The purge runs hourly in the background and on server startup.

Reports are exempt from this purge. Each (report, country) holds the most recent and previous upload indefinitely; uploading a new version evicts the old previous and rotates the slots.

LLM settings and history are also exempt from workflow purge. They remain in SQLite until changed by an admin or manually removed from the database.

Users are soft-deleted. Deleting a user disables login and hides the account from user management, but keeps the row so historical report uploads and LLM setting edits can still show who performed the action. Deleted users appear as Name (Deleted) in historical metadata.

Tech Stack

  • Frontend: React 19, React Router v7, Tailwind CSS v4, shadcn/ui, Lucide icons, Motion
  • Backend: Express, better-sqlite3, Sharp (image processing), ExcelJS (xlsx read/write), JSZip, Multer, Cheerio (web scraping)
  • Auth + safety: JWT cookies (httpOnly, SameSite=lax, secure in prod), bcryptjs, express-rate-limit (tiered), SSRF protection on outbound fetches
  • Build: Vite, TypeScript (strict mode, noUncheckedIndexedAccess), Vitest
  • Deploy: Docker (node:20-slim), GitHub Actions CI/CD, GHCR

Development

npm run dev          # Start dev server (Express + Vite HMR)
npm run lint         # Type-check with tsc --noEmit
npm test             # Run tests
npm run build        # Production build

License

Released under the MIT License.

About

A self-hosted image processing platform for automating product image ingestion, transformation, and export. Built for content and catalog teams that need to process images in bulk with consistent, repeatable pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages