Skip to content

bijay-develops/Concurrent-Web-Crawler-with-Pluggable-Pipelines

Repository files navigation

Project Documentation

This docs/ directory contains structured documentation for the crawler project. The documentation mirrors the source tree under crawler/, excluding the top-level crawler directory itself.

At a high level, the repository layout is:

Concurrent-Web-Crawler-with-Pluggable-Pipelines/
  crawler/         # Go module (CLI + Web UI + internal packages)
    cmd/
      crawler/     # CLI entrypoint
      webui/       # Web UI entrypoint (also hosts JSON API)
      api/         # Standalone JSON API entrypoint
    internal/
      crawler/     # Core crawler orchestration
      pipeline/    # Pluggable pipeline stages and rate limiting
      shared/      # Shared types such as Item, UseCase, CrawlStats, ModeSummary
      service/     # CrawlService abstraction over the core crawler
      httpapi/     # HTTP handlers exposing the JSON API
      store/       # File-backed persistence for crawl summaries
  docs/            # This documentation tree
  manual/          # How to build and run the project
  WHY_the_PROJECT/ # High-level motivation and use-case explanations
  Problems-and-Solutions/
  QnA/
  code_fixing/
  .github/workflows/  # CI configuration for build + tests
  docker-compose.yml  # Compose file to run Web UI and API in containers

Index

Project-Level Docs

  • ARCHITECTURE – overall system structure and module interactions.
  • DATA_FLOW – conceptual end-to-end data flow and pipeline stages.

Service, API, and Persistence

  • internal/service – describes the CrawlService used by HTTP handlers and other integrations.
  • internal/httpapi – documents the JSON API endpoints (for example POST /api/crawls, GET /api/crawls/history).
  • internal/store – explains how crawl summaries are written to and read from data/crawls.jsonl.

Higher-Level Guides

Source File Docs

How to Navigate

  • Use the links above to jump to documentation for a specific source file.
  • Each file-level document follows a consistent structure:
    1. Overview
    2. File Location
    3. Key Components
    4. Execution Flow
    5. Data Flow
    6. Mermaid Diagrams
    7. Error Handling & Edge Cases
    8. Example Usage
  • Where a source file is currently empty, the documentation explicitly notes that and only describes the intended role implied by its name and placement.
  • Some files may remain empty/legacy after refactors (for example internal/crawler/item.go); in those cases the docs point you at the new canonical type/location.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages