Skip to content

feat: add React frontend, REST API, CI/CD pipeline, and project documentation#17

Merged
ChefControl merged 14 commits into
masterfrom
frontend-ui-ux
Feb 15, 2026
Merged

feat: add React frontend, REST API, CI/CD pipeline, and project documentation#17
ChefControl merged 14 commits into
masterfrom
frontend-ui-ux

Conversation

@ChefControl
Copy link
Copy Markdown
Collaborator

@ChefControl ChefControl commented Feb 14, 2026

Summary

This PR introduces the complete frontend service, restructures the manager into a proper REST API, adds CI/CD automation, and provides comprehensive project documentation.

Frontend (34 new files)

  • Full React SPA with TypeScript, Tailwind CSS, and Vite
  • Pages: Dashboard, Crawl List, New Crawl, Crawl Detail (with Progress, Graph, and Statistics tabs)
  • Force-directed graph visualization (D3) with interactive controls and dynamic legend
  • Real-time crawl progress via WebSocket with auto-reconnection
  • Independently deployable nginx container with SPA fallback and /api/ reverse proxy
  • Custom cobweb favicon replacing the default Vite icon
  • Helm subchart with NodePort 30080 for external access

Manager API refactoring (16 files, 908 insertions)

  • Restructured into proper modules: routes/, services/, models/, state.rs, config.rs
  • REST endpoints: POST/GET/DELETE /api/v1/crawls, /crawls/:id/graph, /crawls/:id/stats, /crawls/:id/ws
  • Pagination support for crawl listing (limit, offset query params)
  • Liveness (/livez) and readiness (/readyz) probe endpoints (K8s best practices)
  • Input validation: crawl depth capped to 1–5
  • CANCELLED status tracking across all query endpoints (get_crawl_progress, list_crawls, get_crawl_stats)
  • Status derivation: runningcompleted / cancelled (when all cancelled, none completed)
  • Atomic cancel operation (single Cypher query, no race window with feeders)
  • WebSocket closes on both completed and cancelled status
  • CORS and gzip compression middleware
  • Default logging changed from tower_http=debug to tower_http=info
  • Removed static file serving (now handled by frontend nginx)

Feeder improvements

  • Added HTTP health probe server on port 8081 (/livez) for Kubernetes liveness checks
  • URL deduplication scoped by crawl_id (independent crawls no longer interfere)
  • Configurable environment variables for poll intervals, stale timeout, max depth
  • Graceful shutdown resets claimed jobs to PENDING

Shared library additions

  • error.rs: Unified CrawlerError type with HTTP status code mapping
  • schema.rs: Idempotent Neo4j index/constraint creation on startup (includes composite index for exact URL lookups)

CI/CD pipeline (new)

  • GitHub Actions workflow for master merges
  • Independent versioning per service via conventional commits
  • Docker image builds pushed to GHCR (ghcr.io/bluedotiya/web-crawler/{service})
  • Helm chart packaged and published as OCI artifact
  • Frontend CI job: lint, type-check, build

Helm charts

  • All liveness probes use /livez, all readiness probes use /readyz
  • Frontend readiness probe disabled (no meaningful ready vs alive distinction for static nginx)
  • Probe paths consistent across subchart and parent values.yaml

Documentation (11 new files)

  • Rewritten README.md with architecture diagram, screenshots, quick start guide
  • Corrected security section: removed false robots.txt claim, replaced with honest description
  • docs/architecture.md: System overview, data flow, concurrency model, WebSocket protocol
  • docs/neo4j-graph-model.md: Node schemas, status lifecycle, indexes aligned with schema.rs
  • docs/api-reference.md: All endpoints with request/response schemas, cancelled status documented, depth constraint (1–5)
  • docs/deployment.md: Helm install (OCI + local), full config reference, K8s architecture
  • docs/development.md: Local dev workflow, actual Helm commands (removed non-existent ./dev.sh reference)
  • 6 screenshots captured from the live frontend

Cargo workspace

  • Explanatory comments on all reqwest = { workspace = true, default-features = false } lines
  • Root Cargo.toml comment explaining default-features = false is ignored at workspace level

Other changes

  • .dockerignore to reduce build context
  • Updated .pre-commit-config.yaml (removed deleted dev.sh hook, added frontend lint)
  • Removed dev.sh integration test script (superseded by CI/CD)
  • vite-env.d.ts comment explaining its purpose

Test plan

  • cd frontend && npm run dev — Vite proxy works for local dev
  • docker build -t frontend-test frontend/ — Frontend image builds
  • docker build -t manager-test -f manager/Dockerfile . — Manager image builds
  • Deploy to minikube: dashboard loads at NodePort 30080
  • Create a crawl via the UI, verify WebSocket progress updates
  • Verify graph visualization renders with interactive controls
  • Verify crawl list shows progress bars with failed (red) segments
  • cargo test --workspace — All 19 Rust tests pass
  • cd frontend && npm run lint && npm run type-check — Frontend checks pass
  • All Mermaid diagrams in docs render correctly on GitHub
  • Manager /livez returns 200 {"status":"ok"}
  • Manager /readyz returns 200 {"status":"ready"}
  • Feeder /livez on port 8081 returns 200 {"status":"ok"}
  • Frontend nginx /livez returns 200 ok
  • Old /health and /ready paths return 404
  • Atomic cancel: create deep crawl → immediate cancel → status is "cancelled"
  • cancelled field present in crawl progress, list, and stats responses
  • ?status=cancelled filter works in list crawls endpoint
  • Depth validation: depth=0 → 400, depth=6 → 400, depth=1 → 201, depth=5 → 201
  • Cross-crawl isolation: two crawls for same URL both complete with independent URL sets
  • All pods stable with 0 restarts after deployment
  • All 29 PR review comments resolved

🤖 Generated with Claude Code

ChefControl and others added 5 commits February 13, 2026 20:49
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add models for crawl requests, responses, and progress tracking.
- Create routes for managing crawls, including creation, deletion, and status retrieval.
- Introduce WebSocket support for real-time updates on crawl progress.
- Implement services for handling crawl logic and interactions with Neo4j.
- Add health check endpoints for application readiness and status.
- Enhance error handling in the crawler with detailed error types.
- Update shared library to include new error handling and schema management.
- Modify deployment configurations to use secrets for sensitive data.
- Enable health checks and readiness probes in Kubernetes configurations.
- Implemented Dockerfile for frontend build and nginx serving.
- Created nginx configuration for SPA fallback and API proxying.
- Added .dockerignore to exclude unnecessary files from Docker context.
- Updated GitHub Actions workflow to build and publish frontend Docker image.
- Introduced Helm chart for frontend deployment with necessary templates and values.
- Modified manager service to remove frontend build steps and adjust service type.
- Enhanced API response models to include failed crawl counts.
- Updated frontend components to display failed crawl counts and adjust refetch logic.
- Removed unused job_status field from UrlJob struct in feeder.
- Deleted obsolete dev.sh script for local development.
- Introduced a new `api-reference.md` file detailing all API endpoints for crawls, including request/response formats and error handling.
- Added sections for creating, listing, getting details, canceling crawls, and retrieving crawl statistics and graphs.

docs(architecture): Document system architecture and service responsibilities

- Created `architecture.md` to outline the system architecture, including service interactions and responsibilities of the frontend, manager, feeders, and Neo4j.
- Included diagrams to visualize data flow and service interactions.

docs(deployment): Provide deployment instructions using Helm

- Added `deployment.md` to guide users through deploying the web crawler on Kubernetes using Helm.
- Included prerequisites, installation steps, service access details, and configuration options.

docs(development): Outline development setup and workflow

- Created `development.md` to detail prerequisites, repository structure, local development workflow, and CI/CD pipeline.
- Included instructions for backend and frontend development, testing, and pre-commit hooks.

docs(neo4j): Describe Neo4j graph model used in the crawler

- Introduced `neo4j-graph-model.md` to explain the data model, node labels, relationship types, and status lifecycle in Neo4j.
- Provided example Cypher queries for common operations.

feat(images): Add images for documentation

- Added various images to enhance documentation, including crawl detail progress, list, stats, dashboard, graph visualization, and new crawl UI.
@ChefControl ChefControl changed the title feat: decouple frontend from manager, add graph/progress fixes feat: add React frontend, REST API, CI/CD pipeline, and project documentation Feb 14, 2026
…ookups

Remove root_name_depth index (0 reads in production, no query uses it).
Add url_name_http_crawl composite index on (name, http_type, crawl_id)
to speed up status updates, cancellation checks, and reset-to-pending
operations in the feeder.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@ChefControl ChefControl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix comments before merging to master

Comment thread feeder/src/health.rs Outdated
Comment thread feeder/Cargo.toml
Comment thread frontend/public/vite.svg Outdated
Comment thread frontend/src/vite-env.d.ts
Comment thread frontend/index.html Outdated
Comment thread README.md
Comment thread README.md Outdated
Comment thread docs/api-reference.md
Comment thread docs/development.md Outdated
Comment thread docs/neo4j-graph-model.md
ChefControl and others added 8 commits February 15, 2026 03:03
- Rename health probes to /livez and /readyz (Kubernetes convention)
- Replace two-step cancel with atomic Cypher query to eliminate race condition
- Add CANCELLED status tracking to all query endpoints and models
- Close WebSocket on cancelled status in addition to completed
- Add explanatory comments for per-crate default-features = false
- Change tower_http log level from debug to info
- Add CORS rationale comment
- Fix false robots.txt claim in README
- Update API docs with new health paths and cancelled field
- Replace non-existent ./dev.sh with actual Helm deploy commands
- Align neo4j-graph-model.md indexes with schema.rs
- Add /livez and /readyz nginx locations for frontend health probes
- Update all Helm chart probe paths
- Add vite-env.d.ts explanation comment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No meaningful ready vs alive distinction for a static file server.
Removes unused /readyz nginx location block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reject depth outside 1-5 with 400 Bad Request
- Scope filter_new_urls query by crawl_id so independent crawls
  don't skip URLs that exist in other crawls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ChefControl ChefControl merged commit da46ce3 into master Feb 15, 2026
2 checks passed
@ChefControl ChefControl deleted the frontend-ui-ux branch February 15, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant