From d4d711adf1d1b7cc9d56e1ce94af60e848a4336a Mon Sep 17 00:00:00 2001
From: bluedotiya <ooorannn@gmail.com>
Date: Sun, 22 Feb 2026 19:12:05 +0200
Subject: [PATCH 1/3] feat: improve graph visualization, crawl validation, URL
 normalization, and docs

- GraphView: rewrite with responsive SVG, zoom/pan, centered layout, status-colored nodes
- Crawl depth validation (1-5), URL dedup scoped by crawl_id
- Feeder: stale job reclamation for stuck IN-PROGRESS jobs
- NewCrawl: add targeted crawl toggle (domain-scoped crawling)
- URL normalization module with comprehensive tests
- Add project vision doc, update API reference docs
- Add CLAUDE.md project instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .claude/settings.json                 |   6 +
 CLAUDE.md                             |  85 ++++++++++++++
 Cargo.lock                            |  22 +++-
 docs/api-reference.md                 |   9 +-
 docs/project-vision.md                |  47 ++++++++
 feeder/src/job.rs                     |  34 +++++-
 feeder/src/main.rs                    |   2 +
 frontend/package-lock.json            |  23 ++++
 frontend/package.json                 |   2 +
 frontend/src/components/GraphView.tsx | 154 ++++++++++++++++++++++++--
 frontend/src/lib/api.ts               |   5 +-
 frontend/src/pages/CrawlDetail.tsx    |  14 ++-
 frontend/src/pages/CrawlList.tsx      |   5 +
 frontend/src/pages/NewCrawl.tsx       |  31 +++++-
 frontend/src/types/api.ts             |   2 +
 manager/src/models/crawl.rs           |   4 +
 manager/src/routes/crawl.rs           |  35 +++++-
 manager/src/services/crawl_service.rs |  23 +++-
 shared/Cargo.toml                     |   1 +
 shared/src/url_normalize.rs           |  99 +++++++++++++++++
 20 files changed, 573 insertions(+), 30 deletions(-)
 create mode 100644 .claude/settings.json
 create mode 100644 CLAUDE.md
 create mode 100644 docs/project-vision.md

diff --git a/.claude/settings.json b/.claude/settings.json
new file mode 100644
index 0000000..6119f7c
--- /dev/null
+++ b/.claude/settings.json
@@ -0,0 +1,6 @@
+{
+  "enabledPlugins": {
+    "playwright-skill@playwright-skill": true,
+    "skill-creator@claude-plugins-official": true
+  }
+}
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..c8483a2
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,85 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Build & Test Commands
+
+### Rust
+```bash
+cargo check --workspace          # Fast compilation check
+cargo build --release             # Release build (LTO + stripped)
+cargo test --workspace            # Run all tests
+cargo test -p shared              # Test single crate
+cargo clippy --workspace -- -D warnings  # Lint (CI-strict)
+```
+
+### Frontend (from `frontend/`)
+```bash
+npm install                       # Install deps
+npm run dev                       # Dev server on :3000 (proxies /api to :8080)
+npm run build                     # Type-check + production build
+npm run lint                      # ESLint
+npm run type-check                # TypeScript check only
+```
+
+### Docker (from repo root, use minikube docker-env for local k8s)
+```bash
+docker build -t ghcr.io/bluedotiya/web-crawler/manager:latest -f manager/Dockerfile .
+docker build -t ghcr.io/bluedotiya/web-crawler/feeder:latest -f feeder/Dockerfile .
+docker build -t ghcr.io/bluedotiya/web-crawler/frontend:latest -f frontend/Dockerfile .
+```
+
+## Architecture
+
+Three services communicate through a shared Neo4j database (no direct inter-service HTTP):
+
+- **manager** — Axum HTTP server (port 8080). REST API at `/api/v1/crawls/*` + WebSocket for live progress. Creates ROOT nodes and initial URL children when a crawl is submitted.
+- **feeder** — Background workers (8 replicas). Poll Neo4j for PENDING URLs, fetch HTML, extract links, create child nodes. Atomic job claiming prevents worker conflicts.
+- **frontend** — React SPA (Vite/TypeScript/Tailwind). Served by nginx in production, proxied via Vite in dev. Uses React Query for polling and WebSocket for real-time updates.
+- **shared** — Rust library crate used by both manager and feeder. Contains: crawler (HTTP fetch + URL extraction), dns (resolution with iterative domain shortening), neo4j_client, url_normalize, schema (indexes/constraints), error types.
+
+### Data Flow
+1. User submits URL + depth (1-5) via frontend → POST `/api/v1/crawls`
+2. Manager normalizes URL, resolves DNS, creates ROOT + child URL nodes in Neo4j
+3. Feeder workers atomically claim PENDING URLs, fetch HTML, extract/deduplicate links, create children
+4. Frontend polls progress via REST (5s) or WebSocket (2s), displays force-graph visualization
+
+### Neo4j Data Model
+- **ROOT** node (one per crawl, unique on `crawl_id`) — the seed URL
+- **URL** nodes — discovered links with `job_status` (PENDING/IN-PROGRESS/COMPLETED/FAILED/CANCELLED)
+- **Lead** edges — parent → child link relationships
+- All nodes scoped by `crawl_id` for isolation between crawls
+
+## Key Conventions
+
+- **Conventional commits** required on PR titles: `feat:`, `fix:`, `chore:`, etc. (enforced by CI). Breaking changes use `!` suffix (e.g., `feat!:`). Drives automated semver + per-service tagging.
+- **Pre-commit hooks**: `cargo check`, `cargo clippy -D warnings`, `cargo test`, frontend lint+typecheck. Install: `pip install pre-commit && pre-commit install`
+- **Workspace dependency gotcha**: `default-features = false` in `[workspace.dependencies]` is ignored by Cargo. Each member crate must set it explicitly.
+- **TLS in containers**: Use `rustls-tls-webpki-roots` (bundles CAs in binary). Avoid `native-tls` or `native-roots` in slim Docker images.
+- **HTTP clients** in both feeder and manager must set `.user_agent(...)` to avoid 403 responses.
+- **TypeScript**: Strict mode enabled, no unused locals/parameters. Path alias `@/` → `./src/`.
+- **Docker images** must use full GHCR path (`ghcr.io/bluedotiya/web-crawler/{service}:tag`) to match k8s deployment specs.
+
+## API Routes (manager)
+
+| Method | Endpoint | Purpose |
+|--------|----------|---------|
+| POST | `/api/v1/crawls` | Create new crawl |
+| GET | `/api/v1/crawls` | List crawls (filter/pagination) |
+| GET | `/api/v1/crawls/{id}` | Get crawl progress |
+| DELETE | `/api/v1/crawls/{id}` | Cancel crawl |
+| GET | `/api/v1/crawls/{id}/graph` | Graph data (nodes + edges) |
+| GET | `/api/v1/crawls/{id}/stats` | Crawl statistics |
+| GET | `/api/v1/crawls/{id}/ws` | WebSocket for live updates |
+| GET | `/livez`, `/readyz` | Health probes |
+
+## Project Layout
+
+```
+shared/src/          → lib.rs, crawler.rs, dns.rs, neo4j_client.rs, url_normalize.rs, schema.rs, error.rs
+manager/src/         → main.rs, config.rs, routes/{crawl,status,graph,ws}.rs, services/{crawl,graph}_service.rs
+feeder/src/          → main.rs, config.rs, job.rs
+frontend/src/        → App.tsx, pages/{Dashboard,CrawlList,CrawlDetail,NewCrawl}.tsx, components/GraphView.tsx, lib/api.ts, hooks/useWebSocket.ts
+web-crawler/         → Helm parent chart (neo4j, manager, feeder, frontend subcharts)
+docs/                → architecture.md, api-reference.md, neo4j-graph-model.md, deployment.md, development.md
+```
diff --git a/Cargo.lock b/Cargo.lock
index c45af5e..25d8cd8 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -859,7 +859,7 @@ dependencies = [
  "libc",
  "percent-encoding",
  "pin-project-lite",
- "socket2 0.5.10",
+ "socket2 0.6.2",
  "tokio",
  "tower-service",
  "tracing",
@@ -1466,6 +1466,21 @@ dependencies = [
  "unicode-ident",
 ]
 
+[[package]]
+name = "psl"
+version = "2.1.190"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "66fed3dc7578357ff12137c75eac73413b6aba9a7204916c19f2a0e9e1e920e0"
+dependencies = [
+ "psl-types",
+]
+
+[[package]]
+name = "psl-types"
+version = "2.0.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "33cb294fe86a74cbcf50d4445b37da762029549ebeea341421c7c70370f86cac"
+
 [[package]]
 name = "quote"
 version = "1.0.44"
@@ -1654,7 +1669,7 @@ dependencies = [
  "errno",
  "libc",
  "linux-raw-sys",
- "windows-sys 0.52.0",
+ "windows-sys 0.61.2",
 ]
 
 [[package]]
@@ -1869,6 +1884,7 @@ dependencies = [
  "futures",
  "hickory-resolver",
  "neo4rs",
+ "psl",
  "regex",
  "reqwest",
  "thiserror 2.0.18",
@@ -2006,7 +2022,7 @@ dependencies = [
  "getrandom 0.3.4",
  "once_cell",
  "rustix",
- "windows-sys 0.52.0",
+ "windows-sys 0.61.2",
 ]
 
 [[package]]
diff --git a/docs/api-reference.md b/docs/api-reference.md
index 41dbfc5..db82200 100644
--- a/docs/api-reference.md
+++ b/docs/api-reference.md
@@ -20,13 +20,14 @@ Start a new crawl from a given URL.
 |-------|------|----------|-------------|
 | `url` | string | Yes | The URL to crawl (must be http or https) |
 | `depth` | integer | Yes | Maximum link depth to follow (1–5, where 1 = root only) |
+| `targeted` | boolean | No | When `true`, only follow links within the same registered domain (eTLD+1) as the root URL. Defaults to `false`. |
 
 **Example:**
 
 ```bash
 curl -X POST http://localhost:8080/api/v1/crawls \
   -H 'Content-Type: application/json' \
-  -d '{"url": "https://example.com", "depth": 2}'
+  -d '{"url": "https://example.com", "depth": 2, "targeted": true}'
 ```
 
 **Response:** `201 Created`
@@ -84,7 +85,8 @@ curl "http://localhost:8080/api/v1/crawls?status=running&limit=10"
       "total": 42,
       "completed": 40,
       "failed": 2,
-      "cancelled": 0
+      "cancelled": 0,
+      "targeted": true
     }
   ],
   "total": 1,
@@ -128,7 +130,8 @@ curl http://localhost:8080/api/v1/crawls/d262a3e7-19de-437f-b0a4-cf1d689b1caf
   "failed": 60,
   "cancelled": 0,
   "root_url": "https://example.com",
-  "requested_depth": 3
+  "requested_depth": 3,
+  "targeted": false
 }
 ```
 
diff --git a/docs/project-vision.md b/docs/project-vision.md
new file mode 100644
index 0000000..f00127b
--- /dev/null
+++ b/docs/project-vision.md
@@ -0,0 +1,47 @@
+# Web crawler vision
+Create a free, open-source, deployable platform for Red & Blue teams that want to discover the web attack surface of their applications.
+
+## About
+This file should be used as general guidelines for development. When design decisions are made, this doc should define the "spirit" of those decisions.
+
+## My philosophy
+1. Don't reinvent the wheel - There is code written by smarter people than you. Be humble and use well-established code and tools.
+2. Open Source - This platform should be open and transparent for everyone to contribute, share, and use.
+3. Respect others - Use this platform for the betterment of software and products. Make the world better than you found it.
+4. Have fun - The process of creating things should be fun. There will be chores, but enjoy the process.
+
+
+## Design Principles (Derived from above)
+These principles are a collection of coding and design rules I personally came across and found to work. A lot of this is based on other people's design principles.
+
+---
+
+### Don't reinvent the wheel
+
+#### Adopt mainstream tools
+Use well-established tools from other open-source projects. Only create custom tools when it's absolutely necessary.
+
+#### Keep it simple stupid
+Keep the project as simple as possible. The more moving parts, the less scalable it becomes, and the more things break.
+
+### Open Source
+
+#### All source code is public
+The project vision is to be an open source platform for blue & red teams, anyone can contribute.
+
+#### All source code should be free for individuals
+This platform should always be free for individuals, and for the foreseeable future, for anyone. The code license should reflect that.
+
+### Respect others
+
+#### Respectful crawling
+Rate limiting, robots.txt awareness, and polite user-agent strings by default. The tool should be hard to misuse for DoS or abuse.
+
+### Have fun
+
+#### Visualization graph should be fun to use and explore
+The visuals and tools for exploring the graph should be fun for the user, possibly gamified.
+
+#### Project theme should be fun
+The theme of this project should be cartoony, playful, and fun. The main theme is cobweb (as it's a crawler).
+
diff --git a/feeder/src/job.rs b/feeder/src/job.rs
index c13e16b..b5e1f93 100644
--- a/feeder/src/job.rs
+++ b/feeder/src/job.rs
@@ -16,6 +16,8 @@ pub struct UrlJob {
     pub current_depth: i64,
     pub attempts: Option<i64>,
     pub crawl_id: String,
+    pub targeted: bool,
+    pub target_domain: String,
 }
 
 /// Represents a child node to be created in Neo4j.
@@ -28,6 +30,8 @@ struct ChildNode {
     current_depth: i64,
     request_time: String,
     crawl_id: String,
+    targeted: bool,
+    target_domain: String,
 }
 
 /// Atomically fetches and claims a single URL job from Neo4j.
@@ -64,6 +68,8 @@ pub async fn fetch_job(graph: &Graph, stale_timeout: i64) -> Result<Option<UrlJo
                 current_depth: node.get("current_depth")?,
                 attempts: node.get::<i64>("attempts").ok(),
                 crawl_id: node.get("crawl_id").unwrap_or_default(),
+                targeted: node.get::<bool>("targeted").unwrap_or(false),
+                target_domain: node.get::<String>("target_domain").unwrap_or_default(),
             }))
         }
         None => Ok(None),
@@ -181,7 +187,8 @@ async fn batch_create_children(
                  ON CREATE SET c.ip = $ip, c.domain = $domain, \
                      c.job_status = CASE WHEN $cur_depth = $req_depth THEN 'COMPLETED' ELSE 'PENDING' END, \
                      c.requested_depth = $req_depth, \
-                     c.current_depth = $cur_depth, c.request_time = $req_time \
+                     c.current_depth = $cur_depth, c.request_time = $req_time, \
+                     c.targeted = $targeted, c.target_domain = $target_domain \
                  MERGE (p)-[:Lead]->(c)",
             )
             .param("pname", parent.name.as_str())
@@ -194,7 +201,9 @@ async fn batch_create_children(
             .param("http_type", child.http_type.as_str())
             .param("req_depth", child.requested_depth)
             .param("cur_depth", child.current_depth)
-            .param("req_time", child.request_time.as_str()),
+            .param("req_time", child.request_time.as_str())
+            .param("targeted", child.targeted)
+            .param("target_domain", child.target_domain.as_str()),
         )
         .await?;
     }
@@ -282,8 +291,21 @@ pub async fn feeding(
     // Step 2: Extract URLs from HTML
     let extracted_urls = crawler::extract_urls(&page_data.html);
 
+    // Step 2b: Filter by target domain when targeted
+    let filtered_urls: Vec<&String> = if job.targeted && !job.target_domain.is_empty() {
+        extracted_urls
+            .iter()
+            .filter(|u| {
+                let (norm_name, _) = url_normalize::normalize_url(u);
+                url_normalize::is_same_registered_domain(&norm_name, &job.target_domain)
+            })
+            .collect()
+    } else {
+        extracted_urls.iter().collect()
+    };
+
     // Step 3: Deduplicate against existing DB nodes (server-side)
-    let upper_urls: HashSet<String> = extracted_urls.iter().map(|u| u.to_uppercase()).collect();
+    let upper_urls: HashSet<String> = filtered_urls.iter().map(|u| u.to_uppercase()).collect();
     let new_urls = filter_new_urls(graph, &upper_urls, &job.crawl_id).await?;
 
     if new_urls.is_empty() {
@@ -303,6 +325,9 @@ pub async fn feeding(
     let current_depth = job.current_depth;
     let crawl_id = job.crawl_id.clone();
 
+    let targeted = job.targeted;
+    let target_domain = job.target_domain.clone();
+
     let dns_futures: Vec<_> = normalized
         .iter()
         .map(|(name, http_type)| {
@@ -310,6 +335,7 @@ pub async fn feeding(
             let http_type = http_type.clone();
             let req_time = request_time.clone();
             let cid = crawl_id.clone();
+            let td = target_domain.clone();
             async move {
                 match dns::get_network_stats(resolver, &name, config.max_dns_depth).await {
                     Ok(stats) => Some(ChildNode {
@@ -321,6 +347,8 @@ pub async fn feeding(
                         current_depth: current_depth + 1,
                         request_time: req_time,
                         crawl_id: cid,
+                        targeted,
+                        target_domain: td,
                     }),
                     Err(e) => {
                         tracing::error!("URL: {} -- FAILED: {}", name, e);
diff --git a/feeder/src/main.rs b/feeder/src/main.rs
index 7217e96..6ce4988 100644
--- a/feeder/src/main.rs
+++ b/feeder/src/main.rs
@@ -123,6 +123,8 @@ async fn main() -> anyhow::Result<()> {
             current_depth: url_job.current_depth,
             attempts: url_job.attempts,
             crawl_id: url_job.crawl_id.clone(),
+            targeted: url_job.targeted,
+            target_domain: url_job.target_domain.clone(),
         });
 
         // Check for shutdown after claiming but before processing.
diff --git a/frontend/package-lock.json b/frontend/package-lock.json
index 213b8a8..6b8917d 100644
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
@@ -18,6 +18,7 @@
         "@tanstack/react-query": "^5.62.0",
         "class-variance-authority": "^0.7.1",
         "clsx": "^2.1.1",
+        "d3-force": "^3.0.0",
         "lucide-react": "^0.460.0",
         "react": "^18.3.1",
         "react-dom": "^18.3.1",
@@ -30,6 +31,7 @@
       },
       "devDependencies": {
         "@eslint/js": "^9.15.0",
+        "@types/d3-force": "^3.0.10",
         "@types/react": "^18.3.12",
         "@types/react-dom": "^18.3.1",
         "@vitejs/plugin-react": "^4.3.4",
@@ -2355,6 +2357,13 @@
       "integrity": "sha512-NcV1JjO5oDzoK26oMzbILE6HW7uVXOHLQvHshBUW4UMdZGfiY6v5BeQwh9a9tCzv+CeefZQHJt5SRgK154RtiA==",
       "license": "MIT"
     },
+    "node_modules/@types/d3-force": {
+      "version": "3.0.10",
+      "resolved": "https://registry.npmjs.org/@types/d3-force/-/d3-force-3.0.10.tgz",
+      "integrity": "sha512-ZYeSaCF3p73RdOKcjj+swRlZfnYpK1EbaDiYICEEp5Q6sUiqFaFQ9qgoshp5CzIyyb/yD09kD9o2zEltCexlgw==",
+      "dev": true,
+      "license": "MIT"
+    },
     "node_modules/@types/d3-interpolate": {
       "version": "3.0.4",
       "resolved": "https://registry.npmjs.org/@types/d3-interpolate/-/d3-interpolate-3.0.4.tgz",
@@ -3257,6 +3266,20 @@
         "node": ">=12"
       }
     },
+    "node_modules/d3-force": {
+      "version": "3.0.0",
+      "resolved": "https://registry.npmjs.org/d3-force/-/d3-force-3.0.0.tgz",
+      "integrity": "sha512-zxV/SsA+U4yte8051P4ECydjD/S+qeYtnaIyAs9tgHCqfguma/aAQDjo85A9Z6EKhBirHRJHXIgJUlffT4wdLg==",
+      "license": "ISC",
+      "dependencies": {
+        "d3-dispatch": "1 - 3",
+        "d3-quadtree": "1 - 3",
+        "d3-timer": "1 - 3"
+      },
+      "engines": {
+        "node": ">=12"
+      }
+    },
     "node_modules/d3-force-3d": {
       "version": "3.0.6",
       "resolved": "https://registry.npmjs.org/d3-force-3d/-/d3-force-3d-3.0.6.tgz",
diff --git a/frontend/package.json b/frontend/package.json
index a78fb94..b5d1251 100644
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -21,6 +21,7 @@
     "@tanstack/react-query": "^5.62.0",
     "class-variance-authority": "^0.7.1",
     "clsx": "^2.1.1",
+    "d3-force": "^3.0.0",
     "lucide-react": "^0.460.0",
     "react": "^18.3.1",
     "react-dom": "^18.3.1",
@@ -33,6 +34,7 @@
   },
   "devDependencies": {
     "@eslint/js": "^9.15.0",
+    "@types/d3-force": "^3.0.10",
     "@types/react": "^18.3.12",
     "@types/react-dom": "^18.3.1",
     "@vitejs/plugin-react": "^4.3.4",
diff --git a/frontend/src/components/GraphView.tsx b/frontend/src/components/GraphView.tsx
index eb7ac59..543e513 100644
--- a/frontend/src/components/GraphView.tsx
+++ b/frontend/src/components/GraphView.tsx
@@ -1,8 +1,10 @@
-import { useRef, useCallback, useMemo } from "react";
+import { useState, useRef, useCallback, useMemo, useEffect } from "react";
 import ForceGraph2D, {
   type ForceGraphMethods,
+  type LinkObject,
   type NodeObject,
 } from "react-force-graph-2d";
+import { forceRadial } from "d3-force";
 import type { GraphData } from "../types/api";
 
 interface GraphViewProps {
@@ -11,6 +13,7 @@ interface GraphViewProps {
 
 interface CrawlNode {
   label: string;
+  domain: string;
   depth: number;
   status: string;
   nodeType: string;
@@ -30,15 +33,33 @@ export function GraphView({ data }: GraphViewProps) {
   const fgRef = useRef<ForceGraphMethods<NodeObject<CrawlNode>> | undefined>(
     undefined
   );
+  const [selectedNode, setSelectedNode] = useState<string | null>(null);
+  const containerRef = useRef<HTMLDivElement>(null);
+
+  const needsRecenter = useRef(true);
+  const [containerWidth, setContainerWidth] = useState(0);
+
+  useEffect(() => {
+    const el = containerRef.current;
+    if (!el) return;
+    const observer = new ResizeObserver((entries) => {
+      setContainerWidth(entries[0].contentRect.width);
+    });
+    observer.observe(el);
+    return () => observer.disconnect();
+  }, []);
 
   const graphData = useMemo(() => {
     const nodes = data.nodes.map((n) => ({
       id: n.id,
       label: n.label,
+      domain: n.domain,
       depth: n.depth,
       status: n.status,
       nodeType: n.node_type,
-      val: n.node_type === "ROOT" ? 3 : 1,
+      val: n.node_type === "ROOT" ? 4 : n.depth === 1 ? 2.5 : n.depth === 2 ? 1.5 : 1,
+      // Pin root node at origin for stable centering
+      ...(n.node_type === "ROOT" ? { fx: 0, fy: 0 } : {}),
     }));
 
     const links = data.edges.map((e) => ({
@@ -49,6 +70,22 @@ export function GraphView({ data }: GraphViewProps) {
     return { nodes, links };
   }, [data]);
 
+  const { neighborIds, connectedLinks } = useMemo(() => {
+    if (!selectedNode) return { neighborIds: new Set<string>(), connectedLinks: new Set<string>() };
+    const nIds = new Set<string>();
+    const cLinks = new Set<string>();
+    graphData.links.forEach((link) => {
+      const src = typeof link.source === "object" ? (link.source as NodeObject<CrawlNode>).id : link.source;
+      const tgt = typeof link.target === "object" ? (link.target as NodeObject<CrawlNode>).id : link.target;
+      if (src === selectedNode || tgt === selectedNode) {
+        nIds.add(src as string);
+        nIds.add(tgt as string);
+        cLinks.add(`${src}->${tgt}`);
+      }
+    });
+    return { neighborIds: nIds, connectedLinks: cLinks };
+  }, [selectedNode, graphData]);
+
   const activeStatuses = useMemo(() => {
     const statuses = new Set<string>();
     data.nodes.forEach((n) => {
@@ -58,17 +95,60 @@ export function GraphView({ data }: GraphViewProps) {
     return Object.entries(STATUS_COLORS).filter(([s]) => statuses.has(s));
   }, [data]);
 
+  useEffect(() => {
+    const fg = fgRef.current;
+    if (!fg) return;
+
+    const ringSpacing = 120;
+
+    // Radial force: push nodes into concentric rings by depth
+    fg.d3Force(
+      "radial",
+      forceRadial(
+        (node: NodeObject<CrawlNode>) => ((node as CrawlNode).depth ?? 0) * ringSpacing,
+        0,
+        0
+      ).strength(0.8)
+    );
+
+    // Link distance based on depth
+    fg.d3Force("link")?.distance(
+      (link: LinkObject<CrawlNode>) => {
+        const src = link.source as NodeObject<CrawlNode>;
+        const tgt = link.target as NodeObject<CrawlNode>;
+        return 30 + Math.abs((tgt.depth ?? 0) - (src.depth ?? 0)) * 60;
+      }
+    );
+
+    // Stronger charge to spread nodes within rings
+    fg.d3Force("charge")?.strength(-80);
+
+    needsRecenter.current = true;
+    fg.d3ReheatSimulation();
+  }, [graphData]);
+
   const handleEngineStop = useCallback(() => {
-    if (fgRef.current) {
-      fgRef.current.zoomToFit(400);
-    }
+    const fg = fgRef.current;
+    if (!fg || !needsRecenter.current) return;
+    needsRecenter.current = false;
+
+    // Root is pinned at (0,0). Center on it and zoom to fit all nodes.
+    fg.centerAt(0, 0);
+    fg.zoomToFit(400, 40);
   }, []);
 
   const nodeColor = useCallback(
     (node: NodeObject<CrawlNode>) => {
-      return STATUS_COLORS[node.status || ""] || "#9ca3af";
+      const base = STATUS_COLORS[node.status || ""] || "#9ca3af";
+      if (!selectedNode) return base;
+      if (node.id === selectedNode || neighborIds.has(node.id as string)) return base;
+      // Dim unrelated nodes: parse hex to rgba with low opacity
+      const r = parseInt(base.slice(1, 3), 16);
+      const g = parseInt(base.slice(3, 5), 16);
+      const b = parseInt(base.slice(5, 7), 16);
+      return `rgba(${r},${g},${b},0.2)`;
     },
-    []
+    [selectedNode, neighborIds]
   );
 
   const nodeLabel = useCallback(
@@ -88,6 +168,7 @@ export function GraphView({ data }: GraphViewProps) {
 
   return (
     <div
+      ref={containerRef}
       className="relative border rounded-lg overflow-hidden bg-gray-900"
       style={{ height: 600 }}
     >
@@ -97,13 +178,46 @@ export function GraphView({ data }: GraphViewProps) {
         nodeColor={nodeColor}
         nodeLabel={nodeLabel}
         nodeRelSize={6}
-        linkColor={() => "rgba(255,255,255,0.15)"}
+        onNodeClick={(node: NodeObject<CrawlNode>) => {
+          setSelectedNode(node.id === selectedNode ? null : (node.id as string));
+        }}
+        onBackgroundClick={() => setSelectedNode(null)}
+        nodeCanvasObjectMode={() => selectedNode ? ("after" as const) : undefined}
+        nodeCanvasObject={(node: NodeObject<CrawlNode>, ctx, globalScale) => {
+          if (node.id !== selectedNode) return;
+          const r = Math.sqrt(node.val ?? 1) * 6 + 2;
+          ctx.beginPath();
+          ctx.arc(node.x!, node.y!, r, 0, 2 * Math.PI);
+          ctx.strokeStyle = "#ffffff";
+          ctx.lineWidth = 2 / globalScale;
+          ctx.stroke();
+        }}
+        linkColor={(link: LinkObject<CrawlNode>) => {
+          if (selectedNode) {
+            const src = typeof link.source === "object" ? (link.source as NodeObject<CrawlNode>).id : link.source;
+            const tgt = typeof link.target === "object" ? (link.target as NodeObject<CrawlNode>).id : link.target;
+            const key = `${src}->${tgt}`;
+            return connectedLinks.has(key) ? "rgba(255,255,255,0.6)" : "rgba(255,255,255,0.03)";
+          }
+          const depth = Math.max(
+            (link.source as NodeObject<CrawlNode>)?.depth ?? 0,
+            (link.target as NodeObject<CrawlNode>)?.depth ?? 0
+          );
+          const opacity = Math.max(0.05, 0.25 - depth * 0.05);
+          return `rgba(255,255,255,${opacity})`;
+        }}
+        linkWidth={(link: LinkObject<CrawlNode>) => {
+          if (!selectedNode) return 0.5;
+          const src = typeof link.source === "object" ? (link.source as NodeObject<CrawlNode>).id : link.source;
+          const tgt = typeof link.target === "object" ? (link.target as NodeObject<CrawlNode>).id : link.target;
+          return (src === selectedNode || tgt === selectedNode) ? 2 : 0.5;
+        }}
         linkDirectionalArrowLength={3}
         linkDirectionalArrowRelPos={1}
         backgroundColor="#111827"
         onEngineStop={handleEngineStop}
         cooldownTicks={100}
-        width={undefined}
+        width={containerWidth || undefined}
         height={600}
       />
       <div className="absolute bottom-4 left-4 flex gap-3 bg-gray-800/80 rounded-lg p-2">
@@ -117,6 +231,28 @@ export function GraphView({ data }: GraphViewProps) {
           </div>
         ))}
       </div>
+      {selectedNode && (() => {
+        const node = graphData.nodes.find((n) => n.id === selectedNode);
+        if (!node) return null;
+        return (
+          <div className="absolute top-4 right-4 bg-gray-800/90 backdrop-blur rounded-lg p-3 max-w-xs text-sm text-gray-200 space-y-1">
+            <div className="flex justify-between items-start gap-2">
+              <span className="font-semibold text-white text-xs truncate">{node.label}</span>
+              <button
+                onClick={() => setSelectedNode(null)}
+                className="text-gray-400 hover:text-white shrink-0 leading-none"
+              >
+                &times;
+              </button>
+            </div>
+            <div className="text-xs">Domain: <span className="text-gray-400">{node.domain}</span></div>
+            <div className="text-xs">Depth: <span className="text-gray-400">{node.depth}</span></div>
+            <div className="text-xs">Status: <span style={{ color: STATUS_COLORS[node.status] || "#9ca3af" }}>{node.status}</span></div>
+            <div className="text-xs">Type: <span className="text-gray-400">{node.nodeType}</span></div>
+            <div className="text-xs">Connections: <span className="text-gray-400">{neighborIds.size > 0 ? neighborIds.size - 1 : 0}</span></div>
+          </div>
+        );
+      })()}
     </div>
   );
 }
diff --git a/frontend/src/lib/api.ts b/frontend/src/lib/api.ts
index fc26efa..ef20276 100644
--- a/frontend/src/lib/api.ts
+++ b/frontend/src/lib/api.ts
@@ -19,12 +19,13 @@ async function fetchJSON<T>(url: string, init?: RequestInit): Promise<T> {
 
 export async function createCrawl(
   url: string,
-  depth: number
+  depth: number,
+  targeted?: boolean
 ): Promise<CrawlResponse> {
   return fetchJSON(`${BASE}/crawls`, {
     method: "POST",
     headers: { "Content-Type": "application/json" },
-    body: JSON.stringify({ url, depth }),
+    body: JSON.stringify({ url, depth, ...(targeted ? { targeted } : {}) }),
   });
 }
 
diff --git a/frontend/src/pages/CrawlDetail.tsx b/frontend/src/pages/CrawlDetail.tsx
index 3a150b3..f02f7da 100644
--- a/frontend/src/pages/CrawlDetail.tsx
+++ b/frontend/src/pages/CrawlDetail.tsx
@@ -103,7 +103,13 @@ export default function CrawlDetail() {
             {crawl.root_url.toLowerCase()}
           </p>
           <p className="text-gray-400 text-sm mt-1">
-            Depth: {crawl.requested_depth} | ID: {id}
+            Depth: {crawl.requested_depth}
+            {crawl.targeted && (
+              <span className="ml-2 inline-flex items-center rounded bg-purple-100 px-1.5 py-0.5 text-xs font-medium text-purple-700">
+                Targeted
+              </span>
+            )}
+            {" "}| ID: {id}
           </p>
         </div>
         <div className="flex gap-2">
@@ -227,6 +233,12 @@ export default function CrawlDetail() {
                     <dt className="text-gray-600">Requested Depth</dt>
                     <dd className="font-medium">{crawl.requested_depth}</dd>
                   </div>
+                  <div className="flex justify-between">
+                    <dt className="text-gray-600">Scope</dt>
+                    <dd className="font-medium">
+                      {crawl.targeted ? "Targeted" : "Unrestricted"}
+                    </dd>
+                  </div>
                   <div className="flex justify-between">
                     <dt className="text-gray-600">Status</dt>
                     <dd>
diff --git a/frontend/src/pages/CrawlList.tsx b/frontend/src/pages/CrawlList.tsx
index f5938cd..6a91b71 100644
--- a/frontend/src/pages/CrawlList.tsx
+++ b/frontend/src/pages/CrawlList.tsx
@@ -97,6 +97,11 @@ export default function CrawlList() {
                       <span className="text-gray-400 text-sm shrink-0">
                         depth {crawl.requested_depth}
                       </span>
+                      {crawl.targeted && (
+                        <span className="inline-flex items-center rounded bg-purple-100 px-1.5 py-0.5 text-xs font-medium text-purple-700 shrink-0">
+                          Targeted
+                        </span>
+                      )}
                     </div>
                     <StatusBadge status={crawl.status} />
                   </div>
diff --git a/frontend/src/pages/NewCrawl.tsx b/frontend/src/pages/NewCrawl.tsx
index 5098ae5..e49a57f 100644
--- a/frontend/src/pages/NewCrawl.tsx
+++ b/frontend/src/pages/NewCrawl.tsx
@@ -11,6 +11,7 @@ import { Input } from "../components/ui/input";
 const schema = z.object({
   url: z.string().url("Please enter a valid URL"),
   depth: z.number().min(1).max(5),
+  targeted: z.boolean(),
 });
 
 type FormData = z.infer<typeof schema>;
@@ -28,7 +29,7 @@ export default function NewCrawl() {
     formState: { errors },
   } = useForm<FormData>({
     resolver: zodResolver(schema),
-    defaultValues: { url: "", depth: 2 },
+    defaultValues: { url: "", depth: 2, targeted: false },
   });
 
   const depth = watch("depth");
@@ -37,7 +38,7 @@ export default function NewCrawl() {
     setSubmitting(true);
     setError("");
     try {
-      const result = await createCrawl(data.url, data.depth);
+      const result = await createCrawl(data.url, data.depth, data.targeted || undefined);
       navigate(`/crawls/${result.crawl_id}`);
     } catch (err) {
       setError(err instanceof Error ? err.message : "Failed to start crawl");
@@ -101,6 +102,32 @@ export default function NewCrawl() {
               )}
             </div>
 
+            <div className="flex items-start gap-3">
+              <input
+                type="checkbox"
+                id="targeted"
+                {...register("targeted")}
+                className="mt-1 h-4 w-4 rounded border-gray-300 text-blue-600 focus:ring-blue-500"
+              />
+              <label htmlFor="targeted" className="text-sm">
+                <span className="font-medium text-gray-700">
+                  Targeted crawl
+                </span>
+                <p className="text-gray-500 mt-0.5">
+                  Only follow links within the same registered domain as the
+                  root URL. For example, crawling{" "}
+                  <code className="text-xs bg-gray-100 px-1 rounded">
+                    blog.example.com
+                  </code>{" "}
+                  will also crawl{" "}
+                  <code className="text-xs bg-gray-100 px-1 rounded">
+                    shop.example.com
+                  </code>{" "}
+                  but not external sites.
+                </p>
+              </label>
+            </div>
+
             <div className="bg-blue-50 border border-blue-200 rounded-lg p-4">
               <h4 className="text-sm font-medium text-blue-800 mb-1">
                 What to expect
diff --git a/frontend/src/types/api.ts b/frontend/src/types/api.ts
index 95f04f0..ea1a25b 100644
--- a/frontend/src/types/api.ts
+++ b/frontend/src/types/api.ts
@@ -13,6 +13,7 @@ export interface CrawlProgress {
   failed: number;
   root_url: string;
   requested_depth: number;
+  targeted: boolean;
 }
 
 export interface CrawlListItem {
@@ -23,6 +24,7 @@ export interface CrawlListItem {
   total: number;
   completed: number;
   failed: number;
+  targeted: boolean;
 }
 
 export interface CrawlListResponse {
diff --git a/manager/src/models/crawl.rs b/manager/src/models/crawl.rs
index 8ed1e2f..1dc3f84 100644
--- a/manager/src/models/crawl.rs
+++ b/manager/src/models/crawl.rs
@@ -4,6 +4,8 @@ use serde::{Deserialize, Serialize};
 pub struct CrawlRequest {
     pub url: String,
     pub depth: i64,
+    #[serde(default)]
+    pub targeted: Option<bool>,
 }
 
 #[derive(Serialize)]
@@ -24,6 +26,7 @@ pub struct CrawlProgress {
     pub cancelled: i64,
     pub root_url: String,
     pub requested_depth: i64,
+    pub targeted: bool,
 }
 
 #[derive(Serialize)]
@@ -36,6 +39,7 @@ pub struct CrawlListItem {
     pub completed: i64,
     pub failed: i64,
     pub cancelled: i64,
+    pub targeted: bool,
 }
 
 #[derive(Serialize)]
diff --git a/manager/src/routes/crawl.rs b/manager/src/routes/crawl.rs
index b766967..067f044 100644
--- a/manager/src/routes/crawl.rs
+++ b/manager/src/routes/crawl.rs
@@ -43,6 +43,23 @@ pub async fn create_crawl(
 
     // 1. Normalize root URL
     let (root_name, http_type) = url_normalize::normalize_url(&req.url);
+    let targeted = req.targeted.unwrap_or(false);
+
+    // 1b. Compute target domain for targeted crawls
+    let target_domain = if targeted {
+        match url_normalize::registered_domain(&root_name) {
+            Some(rd) => rd,
+            None => {
+                return (
+                    StatusCode::BAD_REQUEST,
+                    Json(json!({"error": "Cannot determine registered domain for targeted crawl (bare public suffix or invalid host)"})),
+                )
+                    .into_response();
+            }
+        }
+    } else {
+        String::new()
+    };
 
     // 2. Fetch page HTML
     let page_data = match crawler::get_page_data(&state.client, &req.url).await {
@@ -85,10 +102,20 @@ pub async fn create_crawl(
     // 6. Resolve DNS for each extracted URL in parallel
     let request_time = format!("{:?}", page_data.elapsed);
 
-    let dns_futures: Vec<_> = extracted_urls
+    // 6a. Normalize extracted URLs and filter by target domain if targeted
+    let normalized_urls: Vec<(String, String)> = extracted_urls
+        .iter()
+        .map(|url| url_normalize::normalize_url(url))
+        .filter(|(norm_name, _)| {
+            !targeted || url_normalize::is_same_registered_domain(norm_name, &target_domain)
+        })
+        .collect();
+
+    let dns_futures: Vec<_> = normalized_urls
         .iter()
-        .map(|url| {
-            let (norm_name, child_http_type) = url_normalize::normalize_url(url);
+        .map(|(norm_name, child_http_type)| {
+            let norm_name = norm_name.clone();
+            let child_http_type = child_http_type.clone();
             let resolver = &state.resolver;
             let max_depth = state.config.max_dns_depth;
             async move {
@@ -117,6 +144,8 @@ pub async fn create_crawl(
         depth: req.depth,
         request_time: &request_time,
         children: &children,
+        targeted,
+        target_domain: &target_domain,
     };
     if let Err(e) = crawl_service::create_crawl_graph(&state.graph, &params).await
     {
diff --git a/manager/src/services/crawl_service.rs b/manager/src/services/crawl_service.rs
index 62fbff2..193902a 100644
--- a/manager/src/services/crawl_service.rs
+++ b/manager/src/services/crawl_service.rs
@@ -11,6 +11,8 @@ pub struct CreateCrawlParams<'a> {
     pub depth: i64,
     pub request_time: &'a str,
     pub children: &'a [(String, String, String, String)],
+    pub targeted: bool,
+    pub target_domain: &'a str,
 }
 
 /// Create ROOT node and child URL nodes in a single transaction with crawl_id.
@@ -25,7 +27,8 @@ pub async fn create_crawl_graph(
         query(
             "CREATE (:ROOT {name: $name, ip: $ip, domain: $domain, http_type: $http_type, \
              requested_depth: $req_depth, current_depth: 0, request_time: $req_time, \
-             crawl_id: $crawl_id, created_at: datetime()})",
+             crawl_id: $crawl_id, created_at: datetime(), \
+             targeted: $targeted, target_domain: $target_domain})",
         )
         .param("name", params.root_name)
         .param("ip", params.root_ip)
@@ -33,7 +36,9 @@ pub async fn create_crawl_graph(
         .param("http_type", params.http_type)
         .param("req_depth", params.depth)
         .param("req_time", params.request_time)
-        .param("crawl_id", params.crawl_id),
+        .param("crawl_id", params.crawl_id)
+        .param("targeted", params.targeted)
+        .param("target_domain", params.target_domain),
     )
     .await?;
 
@@ -46,7 +51,8 @@ pub async fn create_crawl_graph(
                  ON CREATE SET c.ip = $ip, c.domain = $domain, \
                      c.job_status = CASE WHEN 1 = $req_depth THEN 'COMPLETED' ELSE 'PENDING' END, \
                      c.requested_depth = $req_depth, \
-                     c.current_depth = 1, c.request_time = $req_time \
+                     c.current_depth = 1, c.request_time = $req_time, \
+                     c.targeted = $targeted, c.target_domain = $target_domain \
                  MERGE (root)-[:Lead]->(c)",
             )
             .param("crawl_id", params.crawl_id)
@@ -55,7 +61,9 @@ pub async fn create_crawl_graph(
             .param("ip", child_ip.as_str())
             .param("domain", child_domain.as_str())
             .param("http_type", child_http_type.as_str())
-            .param("req_time", params.request_time),
+            .param("req_time", params.request_time)
+            .param("targeted", params.targeted)
+            .param("target_domain", params.target_domain),
         )
         .await?;
     }
@@ -83,6 +91,7 @@ pub async fn get_crawl_progress(
                    sum(CASE WHEN u.job_status = 'FAILED' THEN 1 ELSE 0 END) AS failed, \
                    sum(CASE WHEN u.job_status = 'CANCELLED' THEN 1 ELSE 0 END) AS cancelled \
                  RETURN r.name AS root_url, r.requested_depth AS depth, r.http_type AS http_type, \
+                   r.targeted AS targeted, \
                    total, completed, pending, in_progress, failed, cancelled",
             )
             .param("crawl_id", crawl_id),
@@ -113,6 +122,8 @@ pub async fn get_crawl_progress(
                         "running".to_string()
                     };
 
+                    let targeted: bool = row.get::<bool>("targeted").unwrap_or(false);
+
                     Ok(Some(CrawlProgress {
                         crawl_id: crawl_id.to_string(),
                         status,
@@ -124,6 +135,7 @@ pub async fn get_crawl_progress(
                         cancelled,
                         root_url: format!("{}{}", http_type, url),
                         requested_depth: depth,
+                        targeted,
                     }))
                 }
                 None => Ok(None),
@@ -159,6 +171,7 @@ pub async fn list_crawls(
          UNWIND items[$offset..($offset + $limit)] AS item \
          RETURN item.r.crawl_id AS crawl_id, item.r.name AS root_url, \
            item.r.http_type AS http_type, item.r.requested_depth AS depth, \
+           item.r.targeted AS targeted, \
            item.total AS total, item.completed AS completed, item.failed AS failed, item.cancelled AS cancelled, item.status AS status, \
            total_count"
     } else {
@@ -178,6 +191,7 @@ pub async fn list_crawls(
          UNWIND items[$offset..($offset + $limit)] AS item \
          RETURN item.r.crawl_id AS crawl_id, item.r.name AS root_url, \
            item.r.http_type AS http_type, item.r.requested_depth AS depth, \
+           item.r.targeted AS targeted, \
            item.total AS total, item.completed AS completed, item.failed AS failed, item.cancelled AS cancelled, item.status AS status, \
            total_count"
     };
@@ -208,6 +222,7 @@ pub async fn list_crawls(
             completed: row.get("completed")?,
             failed: row.get("failed")?,
             cancelled: row.get("cancelled")?,
+            targeted: row.get::<bool>("targeted").unwrap_or(false),
         });
     }
 
diff --git a/shared/Cargo.toml b/shared/Cargo.toml
index 393e47b..9a0bc99 100644
--- a/shared/Cargo.toml
+++ b/shared/Cargo.toml
@@ -13,6 +13,7 @@ regex = { workspace = true }
 thiserror = { workspace = true }
 tracing = { workspace = true }
 futures = { workspace = true }
+psl = "2"
 
 [dev-dependencies]
 tokio = { workspace = true }
diff --git a/shared/src/url_normalize.rs b/shared/src/url_normalize.rs
index fd54467..8e01415 100644
--- a/shared/src/url_normalize.rs
+++ b/shared/src/url_normalize.rs
@@ -21,6 +21,39 @@ pub fn normalize_url(url: &str) -> (String, String) {
     (name, proto.to_string())
 }
 
+use psl::Psl;
+
+/// Extracts the registered domain (eTLD+1) from a normalized name.
+///
+/// The input should be an uppercase normalized name (no protocol, no `www.`).
+/// Ports are stripped before lookup. Returns uppercase eTLD+1.
+///
+/// # Examples
+/// - `"EXAMPLE.COM"` -> `Some("EXAMPLE.COM")`
+/// - `"BLOG.EXAMPLE.CO.UK"` -> `Some("EXAMPLE.CO.UK")`
+/// - `"EXAMPLE.COM:8080"` -> `Some("EXAMPLE.COM")`
+/// - `"COM"` (bare TLD) -> `None`
+pub fn registered_domain(normalized_name: &str) -> Option<String> {
+    // Strip port if present
+    let host = normalized_name.split(':').next().unwrap_or(normalized_name);
+    // psl requires lowercase input
+    let lower = host.to_lowercase();
+    let domain = psl::List.domain(lower.as_bytes())?;
+    let domain_str = std::str::from_utf8(domain.as_bytes()).ok()?;
+    Some(domain_str.to_uppercase())
+}
+
+/// Checks if a normalized name belongs to the same registered domain as the target.
+///
+/// Both inputs should be uppercase. The target should already be a registered domain
+/// (output of `registered_domain()`).
+pub fn is_same_registered_domain(normalized_name: &str, target_domain: &str) -> bool {
+    match registered_domain(normalized_name) {
+        Some(rd) => rd == target_domain,
+        None => false,
+    }
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -66,4 +99,70 @@ mod tests {
         assert_eq!(name, "SUBDOMAIN.WWW.EXAMPLE.COM");
         assert_eq!(proto, "HTTPS://");
     }
+
+    #[test]
+    fn test_registered_domain_simple() {
+        assert_eq!(registered_domain("EXAMPLE.COM"), Some("EXAMPLE.COM".to_string()));
+    }
+
+    #[test]
+    fn test_registered_domain_subdomain() {
+        assert_eq!(registered_domain("BLOG.EXAMPLE.COM"), Some("EXAMPLE.COM".to_string()));
+    }
+
+    #[test]
+    fn test_registered_domain_deep_subdomain() {
+        assert_eq!(registered_domain("A.B.C.EXAMPLE.COM"), Some("EXAMPLE.COM".to_string()));
+    }
+
+    #[test]
+    fn test_registered_domain_co_uk() {
+        assert_eq!(registered_domain("BLOG.EXAMPLE.CO.UK"), Some("EXAMPLE.CO.UK".to_string()));
+    }
+
+    #[test]
+    fn test_registered_domain_with_port() {
+        assert_eq!(registered_domain("EXAMPLE.COM:8080"), Some("EXAMPLE.COM".to_string()));
+    }
+
+    #[test]
+    fn test_registered_domain_bare_tld() {
+        assert_eq!(registered_domain("COM"), None);
+    }
+
+    #[test]
+    fn test_registered_domain_bare_public_suffix() {
+        assert_eq!(registered_domain("GITHUB.IO"), None);
+    }
+
+    #[test]
+    fn test_registered_domain_localhost() {
+        assert_eq!(registered_domain("LOCALHOST"), None);
+    }
+
+
+    #[test]
+    fn test_is_same_registered_domain_match() {
+        assert!(is_same_registered_domain("BLOG.EXAMPLE.COM", "EXAMPLE.COM"));
+    }
+
+    #[test]
+    fn test_is_same_registered_domain_exact() {
+        assert!(is_same_registered_domain("EXAMPLE.COM", "EXAMPLE.COM"));
+    }
+
+    #[test]
+    fn test_is_same_registered_domain_no_match() {
+        assert!(!is_same_registered_domain("GOOGLE.COM", "EXAMPLE.COM"));
+    }
+
+    #[test]
+    fn test_is_same_registered_domain_with_port() {
+        assert!(is_same_registered_domain("API.EXAMPLE.COM:3000", "EXAMPLE.COM"));
+    }
+
+    #[test]
+    fn test_is_same_registered_domain_co_uk() {
+        assert!(is_same_registered_domain("SHOP.EXAMPLE.CO.UK", "EXAMPLE.CO.UK"));
+    }
 }

From be5b1e7fc3501314ee8850dde6b218397ca52528 Mon Sep 17 00:00:00 2001
From: bluedotiya <ooorannn@gmail.com>
Date: Sun, 22 Feb 2026 19:31:26 +0200
Subject: [PATCH 2/3] =?UTF-8?q?fix:=20address=20PR=20review=20=E2=80=94=20?=
 =?UTF-8?q?4xx=20double-update=20bug,=20normalize-once,=20nits?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fix double status update on 4xx: check permanent failure before
  retry-vs-fail branch instead of overwriting PENDING after
- Normalize extracted URLs once into a HashMap, reuse for targeted
  filtering and dedup instead of normalizing twice
- Move `use psl::Psl` to top of url_normalize.rs
- Remove extra blank line in url_normalize tests
- Replace nested ternary with lookup object in GraphView
- Add .claude/ to .gitignore and untrack .claude/settings.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .claude/settings.json                 |  6 ---
 .gitignore                            |  1 +
 feeder/src/job.rs                     | 56 +++++++++++++--------------
 frontend/src/components/GraphView.tsx |  2 +-
 shared/src/url_normalize.rs           |  5 +--
 5 files changed, 32 insertions(+), 38 deletions(-)
 delete mode 100644 .claude/settings.json

diff --git a/.claude/settings.json b/.claude/settings.json
deleted file mode 100644
index 6119f7c..0000000
--- a/.claude/settings.json
+++ /dev/null
@@ -1,6 +0,0 @@
-{
-  "enabledPlugins": {
-    "playwright-skill@playwright-skill": true,
-    "skill-creator@claude-plugins-official": true
-  }
-}
diff --git a/.gitignore b/.gitignore
index 3a786ce..3e56785 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,3 +15,4 @@ tests/
 *.tgz
 frontend/node_modules/
 frontend/dist/
+.claude/
diff --git a/feeder/src/job.rs b/feeder/src/job.rs
index b5e1f93..5d619af 100644
--- a/feeder/src/job.rs
+++ b/feeder/src/job.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::{HashMap, HashSet};
 
 use neo4rs::{query, Graph};
 
@@ -116,23 +116,23 @@ async fn validate_job(
 
             tracing::warn!("Request failed: {} -- Attempts: {} -- Error: {}", full_url, attempts, e);
 
-            if attempts >= config.max_attempts {
-                tracing::error!(
-                    "Failure limit reached! Giving up on {} after {} attempts.",
-                    full_url,
-                    attempts
-                );
+            // 4xx errors are permanent — fail immediately without retry
+            let is_permanent = matches!(e, CrawlerError::HttpStatus { status, .. } if (400..500).contains(&status));
+
+            if is_permanent || attempts >= config.max_attempts {
+                if !is_permanent {
+                    tracing::error!(
+                        "Failure limit reached! Giving up on {} after {} attempts.",
+                        full_url,
+                        attempts
+                    );
+                }
                 update_job_status(graph, job, "FAILED", Some(attempts)).await?;
             } else {
-                // Fix: reset to PENDING so other feeders can retry
+                // Reset to PENDING so other feeders can retry
                 update_job_status(graph, job, "PENDING", Some(attempts)).await?;
             }
 
-            // Return permanent failures (4xx) as immediate failure
-            if matches!(e, CrawlerError::HttpStatus { status, .. } if (400..500).contains(&status)) {
-                update_job_status(graph, job, "FAILED", Some(attempts)).await?;
-            }
-
             Ok(None)
         }
     }
@@ -288,24 +288,24 @@ pub async fn feeding(
         None => return Ok(false),
     };
 
-    // Step 2: Extract URLs from HTML
+    // Step 2: Extract URLs from HTML and normalize once
     let extracted_urls = crawler::extract_urls(&page_data.html);
+    let mut normalized_map: HashMap<String, (String, String)> = HashMap::new();
+    for url in &extracted_urls {
+        let (norm_name, http_type) = url_normalize::normalize_url(url);
+        let upper_key = format!("{}{}", http_type, norm_name).to_uppercase();
+        normalized_map.entry(upper_key).or_insert((norm_name, http_type));
+    }
 
     // Step 2b: Filter by target domain when targeted
-    let filtered_urls: Vec<&String> = if job.targeted && !job.target_domain.is_empty() {
-        extracted_urls
-            .iter()
-            .filter(|u| {
-                let (norm_name, _) = url_normalize::normalize_url(u);
-                url_normalize::is_same_registered_domain(&norm_name, &job.target_domain)
-            })
-            .collect()
-    } else {
-        extracted_urls.iter().collect()
-    };
+    if job.targeted && !job.target_domain.is_empty() {
+        normalized_map.retain(|_, (norm_name, _)| {
+            url_normalize::is_same_registered_domain(norm_name, &job.target_domain)
+        });
+    }
 
     // Step 3: Deduplicate against existing DB nodes (server-side)
-    let upper_urls: HashSet<String> = filtered_urls.iter().map(|u| u.to_uppercase()).collect();
+    let upper_urls: HashSet<String> = normalized_map.keys().cloned().collect();
     let new_urls = filter_new_urls(graph, &upper_urls, &job.crawl_id).await?;
 
     if new_urls.is_empty() {
@@ -314,10 +314,10 @@ pub async fn feeding(
         return Ok(true);
     }
 
-    // Step 4: Normalize, DNS resolve in parallel, build child list
+    // Step 4: DNS resolve in parallel, build child list
     let normalized: HashSet<(String, String)> = new_urls
         .iter()
-        .map(|u| url_normalize::normalize_url(u))
+        .filter_map(|key| normalized_map.get(key).cloned())
         .collect();
 
     let request_time = format!("{:?}", page_data.elapsed);
diff --git a/frontend/src/components/GraphView.tsx b/frontend/src/components/GraphView.tsx
index 543e513..0d6463c 100644
--- a/frontend/src/components/GraphView.tsx
+++ b/frontend/src/components/GraphView.tsx
@@ -57,7 +57,7 @@ export function GraphView({ data }: GraphViewProps) {
       depth: n.depth,
       status: n.status,
       nodeType: n.node_type,
-      val: n.node_type === "ROOT" ? 4 : n.depth === 1 ? 2.5 : n.depth === 2 ? 1.5 : 1,
+      val: { ROOT: 4, 1: 2.5, 2: 1.5 }[n.node_type === "ROOT" ? "ROOT" : n.depth] ?? 1,
       // Pin root node at origin for stable centering
       ...(n.node_type === "ROOT" ? { fx: 0, fy: 0 } : {}),
     }));
diff --git a/shared/src/url_normalize.rs b/shared/src/url_normalize.rs
index 8e01415..dfff237 100644
--- a/shared/src/url_normalize.rs
+++ b/shared/src/url_normalize.rs
@@ -1,3 +1,5 @@
+use psl::Psl;
+
 /// Normalizes a URL by uppercasing, removing protocol and www prefix.
 ///
 /// Returns (normalized_name, protocol).
@@ -21,8 +23,6 @@ pub fn normalize_url(url: &str) -> (String, String) {
     (name, proto.to_string())
 }
 
-use psl::Psl;
-
 /// Extracts the registered domain (eTLD+1) from a normalized name.
 ///
 /// The input should be an uppercase normalized name (no protocol, no `www.`).
@@ -140,7 +140,6 @@ mod tests {
         assert_eq!(registered_domain("LOCALHOST"), None);
     }
 
-
     #[test]
     fn test_is_same_registered_domain_match() {
         assert!(is_same_registered_domain("BLOG.EXAMPLE.COM", "EXAMPLE.COM"));

From c94b460dff03f406ff2bdd43a4d8bdf15df4ea11 Mon Sep 17 00:00:00 2001
From: bluedotiya <ooorannn@gmail.com>
Date: Sun, 22 Feb 2026 19:53:10 +0200
Subject: [PATCH 3/3] chore: remove CLAUDE.md file and its associated
 documentation

---
 CLAUDE.md | 85 -------------------------------------------------------
 1 file changed, 85 deletions(-)
 delete mode 100644 CLAUDE.md

diff --git a/CLAUDE.md b/CLAUDE.md
deleted file mode 100644
index c8483a2..0000000
--- a/CLAUDE.md
+++ /dev/null
@@ -1,85 +0,0 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Build & Test Commands
-
-### Rust
-```bash
-cargo check --workspace          # Fast compilation check
-cargo build --release             # Release build (LTO + stripped)
-cargo test --workspace            # Run all tests
-cargo test -p shared              # Test single crate
-cargo clippy --workspace -- -D warnings  # Lint (CI-strict)
-```
-
-### Frontend (from `frontend/`)
-```bash
-npm install                       # Install deps
-npm run dev                       # Dev server on :3000 (proxies /api to :8080)
-npm run build                     # Type-check + production build
-npm run lint                      # ESLint
-npm run type-check                # TypeScript check only
-```
-
-### Docker (from repo root, use minikube docker-env for local k8s)
-```bash
-docker build -t ghcr.io/bluedotiya/web-crawler/manager:latest -f manager/Dockerfile .
-docker build -t ghcr.io/bluedotiya/web-crawler/feeder:latest -f feeder/Dockerfile .
-docker build -t ghcr.io/bluedotiya/web-crawler/frontend:latest -f frontend/Dockerfile .
-```
-
-## Architecture
-
-Three services communicate through a shared Neo4j database (no direct inter-service HTTP):
-
-- **manager** — Axum HTTP server (port 8080). REST API at `/api/v1/crawls/*` + WebSocket for live progress. Creates ROOT nodes and initial URL children when a crawl is submitted.
-- **feeder** — Background workers (8 replicas). Poll Neo4j for PENDING URLs, fetch HTML, extract links, create child nodes. Atomic job claiming prevents worker conflicts.
-- **frontend** — React SPA (Vite/TypeScript/Tailwind). Served by nginx in production, proxied via Vite in dev. Uses React Query for polling and WebSocket for real-time updates.
-- **shared** — Rust library crate used by both manager and feeder. Contains: crawler (HTTP fetch + URL extraction), dns (resolution with iterative domain shortening), neo4j_client, url_normalize, schema (indexes/constraints), error types.
-
-### Data Flow
-1. User submits URL + depth (1-5) via frontend → POST `/api/v1/crawls`
-2. Manager normalizes URL, resolves DNS, creates ROOT + child URL nodes in Neo4j
-3. Feeder workers atomically claim PENDING URLs, fetch HTML, extract/deduplicate links, create children
-4. Frontend polls progress via REST (5s) or WebSocket (2s), displays force-graph visualization
-
-### Neo4j Data Model
-- **ROOT** node (one per crawl, unique on `crawl_id`) — the seed URL
-- **URL** nodes — discovered links with `job_status` (PENDING/IN-PROGRESS/COMPLETED/FAILED/CANCELLED)
-- **Lead** edges — parent → child link relationships
-- All nodes scoped by `crawl_id` for isolation between crawls
-
-## Key Conventions
-
-- **Conventional commits** required on PR titles: `feat:`, `fix:`, `chore:`, etc. (enforced by CI). Breaking changes use `!` suffix (e.g., `feat!:`). Drives automated semver + per-service tagging.
-- **Pre-commit hooks**: `cargo check`, `cargo clippy -D warnings`, `cargo test`, frontend lint+typecheck. Install: `pip install pre-commit && pre-commit install`
-- **Workspace dependency gotcha**: `default-features = false` in `[workspace.dependencies]` is ignored by Cargo. Each member crate must set it explicitly.
-- **TLS in containers**: Use `rustls-tls-webpki-roots` (bundles CAs in binary). Avoid `native-tls` or `native-roots` in slim Docker images.
-- **HTTP clients** in both feeder and manager must set `.user_agent(...)` to avoid 403 responses.
-- **TypeScript**: Strict mode enabled, no unused locals/parameters. Path alias `@/` → `./src/`.
-- **Docker images** must use full GHCR path (`ghcr.io/bluedotiya/web-crawler/{service}:tag`) to match k8s deployment specs.
-
-## API Routes (manager)
-
-| Method | Endpoint | Purpose |
-|--------|----------|---------|
-| POST | `/api/v1/crawls` | Create new crawl |
-| GET | `/api/v1/crawls` | List crawls (filter/pagination) |
-| GET | `/api/v1/crawls/{id}` | Get crawl progress |
-| DELETE | `/api/v1/crawls/{id}` | Cancel crawl |
-| GET | `/api/v1/crawls/{id}/graph` | Graph data (nodes + edges) |
-| GET | `/api/v1/crawls/{id}/stats` | Crawl statistics |
-| GET | `/api/v1/crawls/{id}/ws` | WebSocket for live updates |
-| GET | `/livez`, `/readyz` | Health probes |
-
-## Project Layout
-
-```
-shared/src/          → lib.rs, crawler.rs, dns.rs, neo4j_client.rs, url_normalize.rs, schema.rs, error.rs
-manager/src/         → main.rs, config.rs, routes/{crawl,status,graph,ws}.rs, services/{crawl,graph}_service.rs
-feeder/src/          → main.rs, config.rs, job.rs
-frontend/src/        → App.tsx, pages/{Dashboard,CrawlList,CrawlDetail,NewCrawl}.tsx, components/GraphView.tsx, lib/api.ts, hooks/useWebSocket.ts
-web-crawler/         → Helm parent chart (neo4j, manager, feeder, frontend subcharts)
-docs/                → architecture.md, api-reference.md, neo4j-graph-model.md, deployment.md, development.md
-```