Skip to content

th3ch0s3n1/web2pdf

Repository files navigation

web2pdf

Convert a website (URL) or a local HTML file into a PDF.

Features

  • URL or local HTML input
  • Config file (TOML) + CLI overrides
  • Uses Chromium via Playwright (good fidelity)
  • Common PDF options: page size, margins, orientation, scale, background graphics
  • Optional wait strategies for dynamic sites
  • GUI version (Windows-friendly)
  • Login/restricted pages support (storage state / cookies / basic auth / interactive login)

Install

python -m pip install -U pip
python -m pip install -e .
python -m playwright install chromium

GUI

Run the GUI:

web2pdf-gui

Login / restricted pages

Option A (recommended): Storage state (cookies + localStorage)

  1. Run an interactive login once and save session:
web2pdf convert "https://site.example.com/protected" --interactive-login --save-storage-state-to .\state.json -o first.pdf
  1. Next runs reuse the saved session:
web2pdf convert "https://site.example.com/protected" --storage-state .\state.json -o out.pdf

Option B: Import cookies.txt

Export cookies using a browser extension that outputs Netscape cookies.txt, then:

web2pdf convert "https://site.example.com/protected" --cookies-txt .\cookies.txt -o out.pdf

Option C: HTTP Basic Auth

web2pdf convert "https://site.example.com/protected" --basic-auth-user USER --basic-auth-password PASS -o out.pdf

Build a Windows .exe (PyInstaller)

This produces dist\web2pdf\web2pdf.exe.

cd C:\Users\marti\Desktop\web2pdf
# Ensure browsers are installed BEFORE building, so they can be bundled
python -m playwright install chromium

python -m pip install -e .[build]
pyinstaller .\web2pdf.spec

Playwright browser behavior in the packaged exe

This project is set up so the built exe can run without downloading browsers:

  • During the build, your local Playwright cache (usually %LOCALAPPDATA%\ms-playwright) is copied into the dist folder.
  • At runtime (inside the exe), we set PLAYWRIGHT_BROWSERS_PATH=0 so Playwright looks for a bundled .local-browsers folder near its driver.

If you want to override this behavior, set one of these environment variables before running:

  • PLAYWRIGHT_BROWSERS_PATH (custom browser location)
  • PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=0 (allow downloads)

Quick start

Convert a URL:

web2pdf "https://example.com" -o example.pdf

Convert a local HTML file:

web2pdf ".\page.html" -o page.pdf

Use a config file:

web2pdf "https://example.com" -c .\web2pdf.toml -o out.pdf

Config file

Copy web2pdf.example.toml to web2pdf.toml and tweak.

Precedence: CLI flags > config file > defaults.

Troubleshooting

  • If you built the exe but it can’t find Chromium, make sure you ran python -m playwright install chromium before pyinstaller ....
  • For corporate proxies/firewalls, you may need to preinstall the browsers on the build machine.
  • Some sites never become "networkidle" and will time out or behave flaky. Try:
web2pdf "https://www.alza.cz/" -o alza.pdf --timeout-ms 90000 --wait-until domcontentloaded --wait-until-fallbacks load --post-wait-ms 1500

Notes

This tool renders pages headlessly. Some sites block headless browsers; try adjusting the user-agent or adding extra headers/cookies.

Security note: be careful with storing credentials (basic auth, cookies, storage state). Treat state.json and cookies files like passwords.

About

Convert a website or a local HTML file into a PDF

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages