feat: add React frontend, REST API, CI/CD pipeline, and project documentation#17
Merged
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add models for crawl requests, responses, and progress tracking. - Create routes for managing crawls, including creation, deletion, and status retrieval. - Introduce WebSocket support for real-time updates on crawl progress. - Implement services for handling crawl logic and interactions with Neo4j. - Add health check endpoints for application readiness and status. - Enhance error handling in the crawler with detailed error types. - Update shared library to include new error handling and schema management. - Modify deployment configurations to use secrets for sensitive data. - Enable health checks and readiness probes in Kubernetes configurations.
- Implemented Dockerfile for frontend build and nginx serving. - Created nginx configuration for SPA fallback and API proxying. - Added .dockerignore to exclude unnecessary files from Docker context. - Updated GitHub Actions workflow to build and publish frontend Docker image. - Introduced Helm chart for frontend deployment with necessary templates and values. - Modified manager service to remove frontend build steps and adjust service type. - Enhanced API response models to include failed crawl counts. - Updated frontend components to display failed crawl counts and adjust refetch logic. - Removed unused job_status field from UrlJob struct in feeder. - Deleted obsolete dev.sh script for local development.
- Introduced a new `api-reference.md` file detailing all API endpoints for crawls, including request/response formats and error handling. - Added sections for creating, listing, getting details, canceling crawls, and retrieving crawl statistics and graphs. docs(architecture): Document system architecture and service responsibilities - Created `architecture.md` to outline the system architecture, including service interactions and responsibilities of the frontend, manager, feeders, and Neo4j. - Included diagrams to visualize data flow and service interactions. docs(deployment): Provide deployment instructions using Helm - Added `deployment.md` to guide users through deploying the web crawler on Kubernetes using Helm. - Included prerequisites, installation steps, service access details, and configuration options. docs(development): Outline development setup and workflow - Created `development.md` to detail prerequisites, repository structure, local development workflow, and CI/CD pipeline. - Included instructions for backend and frontend development, testing, and pre-commit hooks. docs(neo4j): Describe Neo4j graph model used in the crawler - Introduced `neo4j-graph-model.md` to explain the data model, node labels, relationship types, and status lifecycle in Neo4j. - Provided example Cypher queries for common operations. feat(images): Add images for documentation - Added various images to enhance documentation, including crawl detail progress, list, stats, dashboard, graph visualization, and new crawl UI.
…ookups Remove root_name_depth index (0 reads in production, no query uses it). Add url_name_http_crawl composite index on (name, http_type, crawl_id) to speed up status updates, cancellation checks, and reset-to-pending operations in the feeder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ChefControl
commented
Feb 15, 2026
Collaborator
Author
ChefControl
left a comment
There was a problem hiding this comment.
Need to fix comments before merging to master
- Rename health probes to /livez and /readyz (Kubernetes convention) - Replace two-step cancel with atomic Cypher query to eliminate race condition - Add CANCELLED status tracking to all query endpoints and models - Close WebSocket on cancelled status in addition to completed - Add explanatory comments for per-crate default-features = false - Change tower_http log level from debug to info - Add CORS rationale comment - Fix false robots.txt claim in README - Update API docs with new health paths and cancelled field - Replace non-existent ./dev.sh with actual Helm deploy commands - Align neo4j-graph-model.md indexes with schema.rs - Add /livez and /readyz nginx locations for frontend health probes - Update all Helm chart probe paths - Add vite-env.d.ts explanation comment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No meaningful ready vs alive distinction for a static file server. Removes unused /readyz nginx location block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reject depth outside 1-5 with 400 Bad Request - Scope filter_new_urls query by crawl_id so independent crawls don't skip URLs that exist in other crawls Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces the complete frontend service, restructures the manager into a proper REST API, adds CI/CD automation, and provides comprehensive project documentation.
Frontend (34 new files)
/api/reverse proxyManager API refactoring (16 files, 908 insertions)
routes/,services/,models/,state.rs,config.rsPOST/GET/DELETE /api/v1/crawls,/crawls/:id/graph,/crawls/:id/stats,/crawls/:id/wslimit,offsetquery params)/livez) and readiness (/readyz) probe endpoints (K8s best practices)get_crawl_progress,list_crawls,get_crawl_stats)running→completed/cancelled(when all cancelled, none completed)completedandcancelledstatustower_http=debugtotower_http=infoFeeder improvements
/livez) for Kubernetes liveness checkscrawl_id(independent crawls no longer interfere)Shared library additions
error.rs: UnifiedCrawlerErrortype with HTTP status code mappingschema.rs: Idempotent Neo4j index/constraint creation on startup (includes composite index for exact URL lookups)CI/CD pipeline (new)
mastermergesghcr.io/bluedotiya/web-crawler/{service})Helm charts
/livez, all readiness probes use/readyzvalues.yamlDocumentation (11 new files)
README.mdwith architecture diagram, screenshots, quick start guiderobots.txtclaim, replaced with honest descriptiondocs/architecture.md: System overview, data flow, concurrency model, WebSocket protocoldocs/neo4j-graph-model.md: Node schemas, status lifecycle, indexes aligned withschema.rsdocs/api-reference.md: All endpoints with request/response schemas,cancelledstatus documented, depth constraint (1–5)docs/deployment.md: Helm install (OCI + local), full config reference, K8s architecturedocs/development.md: Local dev workflow, actual Helm commands (removed non-existent./dev.shreference)Cargo workspace
reqwest = { workspace = true, default-features = false }linesCargo.tomlcomment explainingdefault-features = falseis ignored at workspace levelOther changes
.dockerignoreto reduce build context.pre-commit-config.yaml(removed deleteddev.shhook, added frontend lint)dev.shintegration test script (superseded by CI/CD)vite-env.d.tscomment explaining its purposeTest plan
cd frontend && npm run dev— Vite proxy works for local devdocker build -t frontend-test frontend/— Frontend image buildsdocker build -t manager-test -f manager/Dockerfile .— Manager image buildscargo test --workspace— All 19 Rust tests passcd frontend && npm run lint && npm run type-check— Frontend checks pass/livezreturns200 {"status":"ok"}/readyzreturns200 {"status":"ready"}/livezon port 8081 returns200 {"status":"ok"}/livezreturns200 ok/healthand/readypaths return 404"cancelled"cancelledfield present in crawl progress, list, and stats responses?status=cancelledfilter works in list crawls endpointdepth=0→ 400,depth=6→ 400,depth=1→ 201,depth=5→ 201🤖 Generated with Claude Code