Perfectly organized worker.py into 8 files total#17
Perfectly organized worker.py into 8 files total#17Shrey1006 wants to merge 3 commits intoalphaonelabs:mainfrom
Conversation
WalkthroughSplit a monolithic worker into focused modules: security, HTTP helpers, DB setup/seeding, static serving, and three API modules (auth, activities, admin). The worker now delegates route handling to these modules while preserving central request dispatch. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant AuthAPI as api_auth.py
participant Sec as security_utils.py
participant DB as db_utils.py
Client->>AuthAPI: POST /register {username,email,password,...}
AuthAPI->>Sec: new_id(), hash_password(password, username), blind_index(username)
AuthAPI->>Sec: encrypt(PII, secret)
AuthAPI->>DB: INSERT user (id, blind_idx, encrypted fields, password_hash)
DB-->>AuthAPI: insert result
AuthAPI->>Sec: create_token(id, username, role, secret)
AuthAPI-->>Client: 200 OK { token, user }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/api_handlers.py`:
- Around line 195-199: The api_create_activity handler currently only verifies
authentication via verify_token in api_create_activity; enforce that the
authenticated user has the host role before proceeding by checking the user's
role/roles field (e.g., user.get("role") == "host" or "host" in
user.get("roles", [])) and return an authorization error (403) if not a host;
update the guard in api_create_activity (and ensure the same host-only contract
assumed by api_create_session and api_add_activity_tags is enforced here) so
that non-host members cannot create activities that would bypass later owner
checks.
In `@src/platform_utils.py`:
- Around line 224-240: The current try/except around env.DB.prepare(...)
swallows all exceptions and records uid_map entries before the INSERT succeeds,
causing /api/seed to report success even when inserts fail; change the logic in
the seeding function so you only add to uid_map after a successful run(), and
replace the broad except Exception: pass with targeted handling: either use a
DB-side "INSERT OR IGNORE" (or ON CONFLICT DO NOTHING) in the env.DB.prepare SQL
or catch specific duplicate/key-conflict errors and log them, but for any other
exception log the full error and abort/raise so callers get a failure; update
the blocks that call env.DB.prepare and the uid_map updates (the same pattern at
the other ranges referenced) accordingly.
- Around line 410-430: The current code always falls back to serving
"index.html" when a key is missing, which returns the HTML shell for missing
assets; change the logic so you only attempt the SPA fallback for extensionless
routes. After trying env.__STATIC_CONTENT.get(key, "text"), compute ext as in
the snippet (ext = key.rsplit(".",1)[-1] if "." in key else "html") and if the
original key had an extension (i.e., "." in key) and content is None then
immediately return a 404 Response (use a plain text or appropriate MIME from
_MIME if available) instead of loading index.html; keep the existing behavior of
fetching "index.html" only when there was no extension on the requested key and
include _CORS and proper Content-Type in all Responses.
In `@src/security_utils.py`:
- Around line 71-80: The hard-coded module-level secret _PEPPER must be removed
and loaded from the worker environment at runtime: replace the literal _PEPPER
with a module-level value that is initialized once from an environment variable
(e.g. os.environ["PEPPER"]) during import/startup, validate it (non-empty,
bytes) and raise a clear exception (fail closed) if the env var is missing or
invalid; update _user_salt(username: str) to use that runtime-loaded pepper
instead of a literal so functions like _user_salt and any callers relying on
_PEPPER use the environment-derived secret.
- Around line 99-125: create_token currently signs only id/username/role and
verify_token accepts it indefinitely; update create_token to include iat
(current UNIX epoch int) and exp (iat + TTL, make TTL a configurable constant or
optional arg) in the JSON payload before base64-encoding and signing with the
same _hmac logic, and update verify_token to decode and parse iat/exp then
reject the token (return None) if current time (time.time()) is >= exp (allow a
small leeway if desired); ensure the signature is computed over the payload
containing iat/exp (use the same payload variable names in create_token and
verify_token), keep the existing "payload.sig" format, and add tests exercising
expired, valid, and malformed tokens.
- Around line 25-52: The XOR stream cipher in encrypt()/decrypt() must be
replaced with an authenticated AEAD like AES-GCM: change _derive_key(secret) to
return a 32-byte key suitable for AES-256, then in encrypt() generate a fresh
random 12-byte nonce per invocation, use AESGCM(key).encrypt(nonce,
plaintext_bytes, associated_data=None), and base64-encode the concatenation of
nonce||ciphertext for storage; in decrypt() base64-decode, split out the 12-byte
nonce and ciphertext, call AESGCM(key).decrypt(nonce, ciphertext,
associated_data=None) and decode the resulting plaintext, and keep the current
error handling (return "[decryption error]" on failure). Use a well-maintained
library (e.g., cryptography.hazmat.primitives.ciphers.aead.AESGCM) and do not
reuse nonces or the keystream.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: alphaonelabs/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 05c47b96-cea2-46ce-9d9d-bd9c0c47eca0
📒 Files selected for processing (4)
src/api_handlers.pysrc/platform_utils.pysrc/security_utils.pysrc/worker.py
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
src/api_activities.py (2)
90-94:⚠️ Potential issue | 🟠 MajorEnforce the host role before activity creation.
src/worker.pyLines 10-16 documents this route as[host], but this handler only checks authentication. That lets any member create an activity and then satisfy the owner checks inapi_create_session()andapi_add_activity_tags()for that record.🧰 Proposed fix
async def api_create_activity(req, env): user = verify_token(req.headers.get("Authorization"), env.JWT_SECRET) if not user: return err("Authentication required", 401) + if user.get("role") != "host": + return err("Host role required", 403) body, bad_resp = await parse_json_object(req)As per coding guidelines,
**/*.py: Check for security vulnerabilities such as SQL injection, XSS, and insecure use of eval().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api_activities.py` around lines 90 - 94, The api_create_activity handler currently only verifies authentication; enforce that the authenticated user is a host before allowing creation by checking the user's role (from the object returned by verify_token) and returning a 403 error if it is not "host" (or the equivalent host role value used in your app). Update api_create_activity to inspect the role (e.g., user["role"] or user.role) immediately after verify_token and reject non-hosts so downstream functions like api_create_session and api_add_activity_tags cannot be exploited by members.
130-154:⚠️ Potential issue | 🟠 MajorValidate
tagsbefore mutating the database, and make the upsert race-safe.Line 131 and Line 405 call
.strip()on unvalidated JSON values. A string becomes one-character tags, and any non-string element raises after earlier writes have already succeeded. The SELECT-then-INSERT flow also loses the link under concurrent requests: the losing insert hits the exception path and never reuses the tag that was just created.🧰 Suggested direction
- for tag_name in (body.get("tags") or []): - tag_name = tag_name.strip() + tags = body.get("tags") or [] + if not isinstance(tags, list) or any(not isinstance(tag, str) for tag in tags): + return err("tags must be an array of strings") + for raw_tag_name in tags: + tag_name = raw_tag_name.strip() if not tag_name: continue - t_row = await env.DB.prepare( - "SELECT id FROM tags WHERE name=?" - ).bind(tag_name).first() - if t_row: - tag_id = t_row.id - else: - tag_id = new_id() - try: - await env.DB.prepare( - "INSERT INTO tags (id,name) VALUES (?,?)" - ).bind(tag_id, tag_name).run() - except Exception as e: - capture_exception(e, req, env, f"...") - continue + candidate_tag_id = new_id() + await env.DB.prepare( + "INSERT OR IGNORE INTO tags (id,name) VALUES (?,?)" + ).bind(candidate_tag_id, tag_name).run() + t_row = await env.DB.prepare( + "SELECT id FROM tags WHERE name=?" + ).bind(tag_name).first() + if not t_row: + return err("Failed to resolve tag", 500) + tag_id = t_row.idApply the same pattern in both
api_create_activity()andapi_add_activity_tags().As per coding guidelines,
**/*.py: Ensure proper error handling and logging.Also applies to: 392-428
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/api_activities.py` around lines 130 - 154, The tag loop in api_create_activity (and mirror in api_add_activity_tags) currently calls .strip() on unvalidated JSON values and uses a SELECT-then-INSERT that races under concurrency; fix by first validating each tag is a string (if not, capture_exception/log and continue), then trim and skip empty strings, and make the upsert race-safe by using an INSERT OR IGNORE INTO tags (id,name) with a generated id and immediately SELECT id FROM tags WHERE name=? to retrieve the canonical tag id (so the code reuses a concurrently inserted tag instead of failing), keeping capture_exception around DB errors and continuing on failure; update the loops around env.DB.prepare, new_id(), and activity_tags INSERT to follow this pattern in both api_create_activity and api_add_activity_tags.src/db_utils.py (1)
113-132:⚠️ Potential issue | 🟠 MajorDon't swallow non-duplicate seed failures.
seed_db()still treats every D1 error as “already seeded”. Line 115 also recordsuid_map[uname]before the user insert succeeds, so/api/seedcan report success with partial or inconsistent data.🧰 Suggested direction
- uid_map[uname] = uid - try: + try: await env.DB.prepare( - "INSERT INTO users " + "INSERT OR IGNORE INTO users " "(id,username_hash,email_hash,name,username,email,password_hash,role)" " VALUES (?,?,?,?,?,?,?,?)" ).bind( uid, blind_index(uname, enc_key), @@ hash_password(pw, uname), encrypt(role, enc_key), ).run() - except Exception: - pass # already seeded + uid_map[uname] = uid + except Exception: + raiseApply the same rule to the tag/activity/session/enrollment loops: use
INSERT OR IGNOREonly where duplicates are expected, and let unexpected write failures bubble so/api/seedfails instead of returning a false success.As per coding guidelines,
**/*.py: Ensure proper error handling and logging.Also applies to: 149-155, 202-221, 245-257, 269-275
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/db_utils.py` around lines 113 - 132, The seed_db loop currently records uid_map[uname] before attempting the insert and catches all Exceptions (silencing unexpected failures); change this to only mark uid_map after a successful insert and stop swallowing non-duplicate errors — either use a vendor-safe "INSERT OR IGNORE" (or equivalent upsert) if duplicates are expected for the users insert prepared via env.DB.prepare, or catch only the specific duplicate/constraint error and re-raise or log+raise any other Exception so /api/seed fails on real write errors; apply the same pattern to the tag/activity/session/enrollment loops referenced and ensure functions/values like uid_map, env.DB.prepare, blind_index, encrypt, and hash_password are only used after a successful DB write.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/api_activities.py`:
- Around line 240-246: The join handler currently trusts the incoming "role"
(variables act_id and role) and permits "participant", "instructor", or
"organizer", allowing users to self-assign privileged roles; change the logic so
the /api/join path always creates an enrollment with role = "participant"
regardless of body.get("role") input, remove or ignore the whitelist branch that
promotes instructor/organizer, and ensure only host/admin-only mutation paths
(separate functions) can assign elevated roles; keep the existing act_id
validation and use err("activity_id is required") as before.
In `@src/worker.py`:
- Around line 68-75: The admin page at admin_path is protected but a request to
the raw asset "/admin.html" can bypass auth because serve_static handles it;
update the request handling in the request dispatch (use the same variables
path, method and existing is_basic_auth_valid / unauthorized_basic checks) so
that both admin_path and the asset path for admin (e.g., "/admin.html" or
clean_path(getattr(env, "ADMIN_ASSET", "admin.html"))) are checked together
before falling through to serve_static(), or explicitly block/redirect requests
to "/admin.html" by returning unauthorized_basic() when path matches that asset;
ensure you modify the same branch that checks path == admin_path and reuse
is_basic_auth_valid/unauthorized_basic to enforce auth for both identifiers.
---
Duplicate comments:
In `@src/api_activities.py`:
- Around line 90-94: The api_create_activity handler currently only verifies
authentication; enforce that the authenticated user is a host before allowing
creation by checking the user's role (from the object returned by verify_token)
and returning a 403 error if it is not "host" (or the equivalent host role value
used in your app). Update api_create_activity to inspect the role (e.g.,
user["role"] or user.role) immediately after verify_token and reject non-hosts
so downstream functions like api_create_session and api_add_activity_tags cannot
be exploited by members.
- Around line 130-154: The tag loop in api_create_activity (and mirror in
api_add_activity_tags) currently calls .strip() on unvalidated JSON values and
uses a SELECT-then-INSERT that races under concurrency; fix by first validating
each tag is a string (if not, capture_exception/log and continue), then trim and
skip empty strings, and make the upsert race-safe by using an INSERT OR IGNORE
INTO tags (id,name) with a generated id and immediately SELECT id FROM tags
WHERE name=? to retrieve the canonical tag id (so the code reuses a concurrently
inserted tag instead of failing), keeping capture_exception around DB errors and
continuing on failure; update the loops around env.DB.prepare, new_id(), and
activity_tags INSERT to follow this pattern in both api_create_activity and
api_add_activity_tags.
In `@src/db_utils.py`:
- Around line 113-132: The seed_db loop currently records uid_map[uname] before
attempting the insert and catches all Exceptions (silencing unexpected
failures); change this to only mark uid_map after a successful insert and stop
swallowing non-duplicate errors — either use a vendor-safe "INSERT OR IGNORE"
(or equivalent upsert) if duplicates are expected for the users insert prepared
via env.DB.prepare, or catch only the specific duplicate/constraint error and
re-raise or log+raise any other Exception so /api/seed fails on real write
errors; apply the same pattern to the tag/activity/session/enrollment loops
referenced and ensure functions/values like uid_map, env.DB.prepare,
blind_index, encrypt, and hash_password are only used after a successful DB
write.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: alphaonelabs/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a166d0d8-5dc9-49ae-9334-8421868b3806
📒 Files selected for processing (7)
src/api_activities.pysrc/api_admin.pysrc/api_auth.pysrc/db_utils.pysrc/http_utils.pysrc/static_utils.pysrc/worker.py
There was a problem hiding this comment.
Pull request overview
Refactors the Cloudflare Worker implementation by splitting the previously monolithic src/worker.py into focused modules for HTTP helpers, security utilities, DB/schema setup, and API route handlers—while keeping the existing request dispatch structure.
Changes:
- Extracted security, HTTP/response, DB init/seed, static serving, and API handler logic into new modules under
src/. - Simplified
worker.pyto mostly imports + routing/dispatch (_dispatch,on_fetch). - Centralized shared helpers (CORS headers, JSON responses, basic auth, exception logging) in
http_utils.py.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/worker.py | Reduced to dispatcher + imports wiring handlers/utilities from new modules. |
| src/http_utils.py | Introduces shared HTTP helpers (CORS, JSON responses, request parsing, basic auth, exception logging). |
| src/security_utils.py | Houses ID generation, encryption/decryption, blind indexing, password hashing, and token helpers. |
| src/db_utils.py | Moves schema DDL and seed logic out of the worker entrypoint. |
| src/api_auth.py | Contains register/login handlers using shared HTTP + security utilities. |
| src/api_activities.py | Contains activities/enrollments/sessions/tags/dashboard handlers. |
| src/api_admin.py | Adds admin-only table-count endpoint protected by basic auth. |
| src/static_utils.py | Moves static asset serving out of worker.py. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
src/worker.py (1)
68-76:⚠️ Potential issue | 🟠 Major
/admin.htmlcan still bypass the Basic Auth gate.The condition at Line 73 only checks
admin_path, but a direct request to/admin.htmlwill skip this block entirely and fall through toserve_static()at Line 132. This allows unauthenticated access to the admin page by simply requesting the raw asset path.Consider protecting both paths together:
🔒 Proposed fix to close the bypass
- if path == admin_path and method == "GET": + if path in {admin_path, "/admin.html"} and method == "GET": if not is_basic_auth_valid(request, env): return unauthorized_basic() return await serve_static("/admin.html", env)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/worker.py` around lines 68 - 76, The admin page can be accessed via the raw asset path "/admin.html" because the auth check only matches admin_path; update the guard to protect both paths by checking if path == admin_path or path == "/admin.html" (use the existing variables and helpers is_basic_auth_valid, unauthorized_basic, and serve_static) so that any request to either admin_path or the literal "/admin.html" enforces basic auth and then calls await serve_static("/admin.html", env) on success.src/static_utils.py (1)
25-45:⚠️ Potential issue | 🟠 MajorSPA fallback for missing assets serves HTML with wrong Content-Type.
When a request for
/app.jsmisses in KV storage, the code falls back toindex.html(Lines 30-34) but then determines the MIME type from the originalkey("app.js"), resulting in HTML content served asapplication/javascript. This breaks the browser and masks real storage failures.A better approach: only use the SPA fallback for extensionless routes (likely client-side navigation), and return 404 or 500 for missing assets with extensions.
🛠️ Proposed fix to differentiate route fallback from asset errors
try: content = await env.__STATIC_CONTENT.get(key, "text") except Exception: - content = None + return Response( + "Static content unavailable", + status=500, + headers={"Content-Type": "text/plain; charset=utf-8", **CORS}, + ) - if content is None: + # Only fall back to index.html for extensionless routes (SPA navigation) + is_route = "." not in key.rsplit("/", 1)[-1] + if content is None and is_route: try: content = await env.__STATIC_CONTENT.get("index.html", "text") except Exception: - content = None + return Response( + "Static content unavailable", + status=500, + headers={"Content-Type": "text/plain; charset=utf-8", **CORS}, + ) + ext = "html" # Override ext for the fallback case + else: + ext = key.rsplit(".", 1)[-1] if "." in key else "html" if content is None: return Response( "<h1>404 - Not Found</h1>", status=404, headers={"Content-Type": "text/html"}, ) - ext = key.rsplit(".", 1)[-1] if "." in key else "html" mime = _MIME.get(ext, "text/plain") return Response(content, headers={"Content-Type": mime, **CORS})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/static_utils.py` around lines 25 - 45, The SPA fallback currently loads index.html via env.__STATIC_CONTENT.get when the original key (variable key) is missing but then infers MIME from the original key (using _MIME and ext), causing HTML to be served with the wrong content-type for asset requests; change the logic in the block around env.__STATIC_CONTENT.get + Response so that you only perform the index.html fallback when key has no file extension (i.e., "." not in key), and for keys that contain an extension return a 404 (or 500) Response immediately; when serving the fallback explicitly set mime to "text/html" (or treat ext="html") and return Response(content, headers={"Content-Type": "text/html", **CORS}), otherwise on missing asset return Response("<h1>404 - Not Found</h1>", status=404, headers={"Content-Type": _MIME.get(ext, "text/plain"), **CORS}) so MIME for errors matches the asset type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/http_utils.py`:
- Around line 10-28: The capture_exception function currently swallows any
errors raised while building or printing the payload (the bare except/pass),
making logger failures invisible; update capture_exception to catch exceptions
from the logging path and emit a minimal fallback message to stderr (including
the original exception type/name and the traceback from traceback.format_exc())
so logging failures are visible, and ensure the fallback runs inside the
existing except block rather than re-raising; reference the capture_exception
function and the payload/print logic when making the change.
- Around line 57-68: Add a precise return type hint to parse_json_object to
improve IDE/type checking: annotate the function signature (parse_json_object)
to return Tuple[Optional[Dict[str, Any]], Optional[Any]] (or use Python 3.10+
union/list syntax) and add the required typing imports (Tuple, Optional, Dict,
Any) at the top of the module; keep the existing behavior and return values
(body or None, error response or None) unchanged.
In `@src/static_utils.py`:
- Around line 5-14: The _MIME mapping is missing common font and modern image
types; update the _MIME dict to include webfont and new image MIME entries
(e.g., "woff" -> "font/woff", "woff2" -> "font/woff2", "ttf" -> "font/ttf",
"otf" -> "font/otf", "eot" -> "application/vnd.ms-fontobject", plus "webp" ->
"image/webp", "avif" -> "image/avif") so static file responses return correct
Content-Type headers; modify the _MIME variable in src/static_utils.py (the
_MIME dict) to add these keys and their MIME strings.
---
Duplicate comments:
In `@src/static_utils.py`:
- Around line 25-45: The SPA fallback currently loads index.html via
env.__STATIC_CONTENT.get when the original key (variable key) is missing but
then infers MIME from the original key (using _MIME and ext), causing HTML to be
served with the wrong content-type for asset requests; change the logic in the
block around env.__STATIC_CONTENT.get + Response so that you only perform the
index.html fallback when key has no file extension (i.e., "." not in key), and
for keys that contain an extension return a 404 (or 500) Response immediately;
when serving the fallback explicitly set mime to "text/html" (or treat
ext="html") and return Response(content, headers={"Content-Type": "text/html",
**CORS}), otherwise on missing asset return Response("<h1>404 - Not Found</h1>",
status=404, headers={"Content-Type": _MIME.get(ext, "text/plain"), **CORS}) so
MIME for errors matches the asset type.
In `@src/worker.py`:
- Around line 68-76: The admin page can be accessed via the raw asset path
"/admin.html" because the auth check only matches admin_path; update the guard
to protect both paths by checking if path == admin_path or path == "/admin.html"
(use the existing variables and helpers is_basic_auth_valid, unauthorized_basic,
and serve_static) so that any request to either admin_path or the literal
"/admin.html" enforces basic auth and then calls await
serve_static("/admin.html", env) on success.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: alphaonelabs/coderabbit/.coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 568a9ce6-c387-4c66-a370-9a238f8d6919
📒 Files selected for processing (3)
src/http_utils.pysrc/static_utils.pysrc/worker.py
|
Hey @A1L13N , This PR’s aim is to split the code into 8 subfiles. The Copilot, codderbhai comments are mostly follow-up improvement and hardening suggestions. I feel this can be ignored for now... |
Organised.workers.py.mp4
Refactor: Modularize worker.py into focused utility and API handler modules
This PR extracts monolithic logic from worker.py into eight focused modules, improving organization, testability, and maintainability while preserving the request-dispatch behavior.
Purpose
Key modifications
Potential impact
Overall, the PR modularizes responsibilities into clear components and preserves existing endpoints while centralizing security, HTTP, DB, and static-file logic for easier review and testing.