Implement robots.txt Compliance in Crawler

### Description
The current crawler (`crawler/src/main.cpp`) indiscriminately crawls URLs without checking `robots.txt`. This is bad practice and can get our crawler blocked or banned by site administrators.

### Tasks
- [ ] **Crawler (C++)**: Add a `RobotsParser` class or use a library (e.g., specific C++ library or simple parsing logic).
- [ ] **Crawler (C++)**: Before adding a URL to the queue or fetching it, check if it is allowed for our user-agent (`MaxSearchEngineBot`).
- [ ] **Crawler (C++)**: Cache `robots.txt` for domains to avoid re-fetching it for every URL.

### Acceptance Criteria
- Crawler respects `User-agent: *` and `User-agent: MaxSearchEngineBot` rules in `robots.txt`.
- Crawler does *not* fetch disallowed paths.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement robots.txt Compliance in Crawler #25

Description

Tasks

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement robots.txt Compliance in Crawler #25

Description

Description

Tasks

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions