Skip to content

feat(scrapegraph): migrate tool to scrapegraph-py v2 SDK#135

Open
VinciGit00 wants to merge 4 commits into
CelestoAI:mainfrom
VinciGit00:feat/scrapegraph-sdk-v2
Open

feat(scrapegraph): migrate tool to scrapegraph-py v2 SDK#135
VinciGit00 wants to merge 4 commits into
CelestoAI:mainfrom
VinciGit00:feat/scrapegraph-sdk-v2

Conversation

@VinciGit00
Copy link
Copy Markdown
Contributor

@VinciGit00 VinciGit00 commented Apr 22, 2026

Summary

  • Migrates the ScrapeGraphAI tool to the new scrapegraph-py 2.x SDK. The old Client class and its methods (smartscraper, markdownify, searchscraper, smartcrawler, sitemap) no longer exist upstream, so this is a full rewrite against the new endpoint surface.
  • Bumps the scrapegraph extra to scrapegraph-py>=2.1.0.
  • Accepts SGAI_API_KEY (the new SDK default) and falls back to the legacy SCRAPEGRAPH_API_KEY so existing users aren't broken.

New capabilities

Capability Maps to
scrape(url, format) client.scrape(...) with Markdown/Html/Links/SummaryFormatConfig
extract(prompt, url, schema) client.extract(...) — AI structured extraction
search(query, num_results, prompt) client.search(...)
crawl(url, max_pages, max_depth, include/exclude) client.crawl.start(...)
get_crawl_result(crawl_id) client.crawl.get(...)
monitor(url, interval, name, webhook_url) client.monitor.create(...)
credits() client.credits()
health() client.health()

Responses from the SDK are ApiResult objects; the tool turns successful results into a JSON string and surfaces result.error as "Error in <capability>: ..." so the LLM gets a consistent string return.

Breaking change note

The previous capability names (smartscraper, markdownify, etc.) are removed. Any agent prompt that hard-coded those names needs to be updated — see examples/scrapegraphai_example.py for the new surface.

Test plan

  • pytest tests/tools/test_scrapegraphai.py — 16/16 pass (success paths, API error paths, exception paths, missing-dep guard, env-var resolution including legacy fallback)
  • Live-tested against the ScrapeGraphAI API: health, credits, scrape, extract, search, crawl (start + get_crawl_result poll to completion), and error path for invalid URL
  • Reviewer sanity-check on the example

Summary by CodeRabbit

  • New Features

    • Added full ScrapeGraphAI v2 support: scrape, extract, search, crawl, monitor, credits, and health; selectable scrape formats (markdown, html, links, summary).
  • Documentation

    • Updated examples and demo prompts to reflect v2 workflows and added a final "Credits" demo.
    • Primary env var renamed to SGAI_API_KEY (legacy SCRAPEGRAPH_API_KEY still supported).
  • Chores

    • Tightened dependency constraints and bumped ScrapeGraphAI SDK requirement to v2.1.0+ (Python 3.12+).
  • Tests

    • Updated tests to exercise new v2 capabilities and behaviors.

Review Change Stack

Greptile Summary

This PR migrates the ScrapeGraphAI tool from the deprecated v1 Client API to the scrapegraph-py v2 SDK, replacing the old smartscraper/markdownify/etc. capabilities with a new surface (scrape, extract, search, crawl, monitor, credits, health). The dependency pin is bumped to scrapegraph-py>=2.1.0 with a python_version >= '3.12' marker.

  • src/agentor/tools/scrapegraphai.py: Full rewrite with a _FORMAT_BUILDERS dispatch table for selectable scrape formats, a _format_result helper that normalises SDK ApiResult objects into LLM-friendly strings, and a legacy SCRAPEGRAPH_API_KEY env-var fallback so existing users aren't broken.
  • tests/tools/test_scrapegraphai.py: 16 new tests covering every capability, error paths, missing-dep guard, and all three env-var resolution cases (explicit, SGAI_API_KEY, and legacy fallback).
  • pyproject.toml: scrapegraph extra bumped to >=2.1.0 with a Python 3.12+ marker; the all group is updated in parallel.

Confidence Score: 5/5

Safe to merge; the migration is straightforward, all 8 capabilities are well-tested, and the legacy env-var fallback preserves backward compatibility for existing users.

The rewrite is clean and well-tested. The only noteworthy gap is that the ImportError message on Python 3.10/3.11 would redirect users to run an agentor[scrapegraph] install that already completed silently due to the version marker, but this does not affect runtime correctness for any supported path.

pyproject.toml and the ImportError block in scrapegraphai.py — the Python 3.12+ version marker creates a silent no-install on older runtimes without any hint in the error message.

Important Files Changed

Filename Overview
src/agentor/tools/scrapegraphai.py Full rewrite to scrapegraph-py v2 SDK with 8 capabilities; well-structured with consistent error handling. The ImportError message doesn't mention the Python 3.12+ requirement, which will confuse users on 3.10/3.11.
tests/tools/test_scrapegraphai.py 16 tests covering success paths, API error paths, exception handling, missing-dep guard, and all three env-var resolution cases; well isolated with proper mock teardown.
pyproject.toml Bumps scrapegraph extra to scrapegraph-py>=2.1.0 with a python_version >= '3.12' marker; creates a silent no-install on Python 3.10/3.11 that doesn't match the top-level requires-python = '>=3.10'.
examples/scrapegraphai_example.py Updated to v2 capability names and SGAI_API_KEY env var; demonstrates scrape, extract, search, crawl, and credits flows.

Sequence Diagram

sequenceDiagram
    participant Agent as LLM Agent
    participant Tool as ScrapeGraphAI Tool
    participant SDK as scrapegraph-py v2 Client
    participant API as ScrapeGraphAI API

    Agent->>Tool: scrape(url, format)
    Tool->>Tool: _FORMAT_BUILDERS[format]()
    Tool->>SDK: "client.scrape(url, formats=[...])"
    SDK->>API: POST /scrape
    API-->>SDK: ApiResult
    SDK-->>Tool: ApiResult(status, data, error)
    Tool->>Tool: _format_result(result, scrape)
    Tool-->>Agent: JSON string or Error in scrape

    Agent->>Tool: crawl(url, max_pages, max_depth)
    Tool->>SDK: client.crawl.start(url, formats, max_pages, max_depth)
    SDK->>API: POST /crawl/start
    API-->>SDK: ApiResult(crawl_id)
    Tool-->>Agent: crawl_id JSON

    Agent->>Tool: get_crawl_result(crawl_id)
    Tool->>SDK: client.crawl.get(crawl_id)
    SDK->>API: GET /crawl/id
    API-->>SDK: ApiResult(status, pages)
    Tool-->>Agent: JSON string or error
Loading

Reviews (3): Last reviewed commit: "test(scrapegraphai): patch FormatConfig ..." | Re-trigger Greptile

The scrapegraph-py 2.x SDK replaces the old `Client` with `ScrapeGraphAI`
and returns `ApiResult` objects instead of raising exceptions. The old
capability surface (smartscraper, markdownify, searchscraper, smartcrawler,
sitemap) no longer exists upstream, so this is a full rewrite of the tool
against the new endpoints.

Capabilities exposed:
  - scrape(url, format)           — markdown/html/links/summary
  - extract(prompt, url, schema)  — AI structured extraction
  - search(query, num_results)    — web search + optional extraction
  - crawl(url, max_pages, ...)    — start a crawl job
  - get_crawl_result(crawl_id)    — poll crawl status/result
  - monitor(url, interval, ...)   — schedule a page monitor (cron)
  - credits()                     — plan / remaining credits
  - health()                      — API health check

Also:
  - Bump `scrapegraph-py` optional dep to `>=2.1.0`
  - Accept `SGAI_API_KEY` (new SDK default), with fallback to legacy
    `SCRAPEGRAPH_API_KEY` so existing users aren't broken
  - Tests cover success, API-level error (ApiResult.status=="error"),
    exception paths, missing-dep guard, and env-var resolution
  - Example rewritten to exercise the new surface

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

Updated ScrapeGraph integration to v2: examples and tests now target new methods (scrape, extract, search, crawl, monitor, credits, health), and pyproject optional dependency for scrapegraph-py bumped to >=2.1.0 with a Python >=3.12 marker.

Changes

SDK Version Constraint

Layer / File(s) Summary
Upgrade scrapegraph optional deps
pyproject.toml
scrapegraph-py requirement changed to >=2.1.0 and an environment marker python_version >= '3.12' was added in both the scrapegraph optional group and the all group. Also bound a2a-sdk to <1.0.

Tool & Examples

Layer / File(s) Summary
Tool implementation and API surface
src/agentor/tools/scrapegraphai.py
Replaced legacy capabilities with ScrapeGraphAI v2 methods (scrape, extract, search, crawl, get_crawl_result, monitor, credits, health), added ScrapeFormat type, format-config builder, result serialization helpers, and updated API key resolution to prefer SGAI_API_KEY (falls back to SCRAPEGRAPH_API_KEY).
Example & tests updated
examples/scrapegraphai_example.py, tests/tools/test_scrapegraphai.py
Example updated to v2 method names and SGAI_API_KEY; tests refactored to mock _SGAIClient v2 surface and assert argument forwarding and structured responses; registered-tools assertions updated to expect the eight new tool names.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

We swapped the old for something sharp and new,
Keys renamed, methods tidy — neat and true,
A cleaner set of tools to make the run,
Dependencies lifted — the work is done,
Raise a glass, keep walking — business won.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(scrapegraph): migrate tool to scrapegraph-py v2 SDK' directly and clearly summarizes the primary change—migrating the ScrapeGraphAI tool to the new v2 SDK.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines +127 to +133
prompt: Optional extraction prompt applied to the results.
"""
try:
result = self.client.search(query, num_results=num_results, prompt=prompt)
return _format_result(result, "search")
except Exception as e:
logger.exception("ScrapeGraphAI SearchScraper Error")
return f"Error in searchscraper: {str(e)}"
logger.exception("ScrapeGraphAI search error")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 JsonFormatConfig imported but never used

JsonFormatConfig is imported (and nulled out in the fallback) but never referenced in _FORMAT_BUILDERS, in crawl/monitor, or anywhere else. Ruff will flag this as F401 and fail the lint pre-commit hook / CI linter step. Either add "json" as a supported format in _FORMAT_BUILDERS/ScrapeFormat, or drop the import entirely.

Suggested change
prompt: Optional extraction prompt applied to the results.
"""
try:
result = self.client.search(query, num_results=num_results, prompt=prompt)
return _format_result(result, "search")
except Exception as e:
logger.exception("ScrapeGraphAI SearchScraper Error")
return f"Error in searchscraper: {str(e)}"
logger.exception("ScrapeGraphAI search error")
from scrapegraph_py import (
HtmlFormatConfig,
LinksFormatConfig,
MarkdownFormatConfig,
SummaryFormatConfig,
)

Comment on lines +139 to +144
url: str,
max_pages: int = 10,
max_depth: int = 2,
include_patterns: Optional[List[str]] = None,
exclude_patterns: Optional[List[str]] = None,
) -> str:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 JsonFormatConfig also missing from fallback

The except ImportError fallback block does not assign JsonFormatConfig = None. If the import is kept and the library is absent, any reference to JsonFormatConfig would raise a NameError rather than degrade gracefully. If the import is removed from the try-block (see above), also remove it from the fallback.

Suggested change
url: str,
max_pages: int = 10,
max_depth: int = 2,
include_patterns: Optional[List[str]] = None,
exclude_patterns: Optional[List[str]] = None,
) -> str:
_SGAIClient = None
MarkdownFormatConfig = None
HtmlFormatConfig = None
LinksFormatConfig = None
SummaryFormatConfig = None

Comment on lines 188 to +195
Args:
website_url: The URL of the website to crawl
user_prompt: Prompt describing what to extract (used when extraction_mode=True)
max_depth: Maximum depth of crawling (default: 1)
max_pages: Maximum number of pages to crawl (default: 3)
sitemap: Whether to use sitemap for crawling (default: True)
extraction_mode: Whether to use extraction mode (requires data_schema if True, default: False)
data_schema: Data schema for extraction (required if extraction_mode=True)
url: Page to monitor.
interval: Cron expression, e.g. "0 * * * *" for hourly.
name: Optional monitor name.
webhook_url: Optional webhook to receive change notifications.
"""
try:
crawl_params = {
"url": website_url,
"depth": max_depth,
"max_pages": max_pages,
"sitemap": sitemap,
"extraction_mode": extraction_mode,
}

# Include prompt and data_schema only when extraction_mode=True
if extraction_mode:
if data_schema is None:
raise ValueError(
"data_schema is required when extraction_mode=True"
)
crawl_params["prompt"] = user_prompt
crawl_params["data_schema"] = data_schema
response = self.client.crawl(**crawl_params)
return str(response)
result = self.client.monitor.create(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 self.api_key stores unresolved value

super().__init__(api_key) is called before resolved_key is computed, so self.api_key ends up holding None (or the raw, unresolved argument) even when the key was actually read from an environment variable. Any code that later reads tool.api_key to inspect the active credential will see None. Computing resolved_key first and then passing it to super().__init__ would keep the stored attribute consistent with what self.client was initialised with.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
examples/scrapegraphai_example.py (1)

24-24: Small thing — the explicit os.environ.get(...) is redundant.

The constructor already resolves SGAI_API_KEY (and the legacy one) on its own, so passing it in from os.environ.get is belt-and-braces. Not wrong, just noise. You could simply write ScrapeGraphAI() here and let the tool do its job. Keep it if you prefer explicitness — no harm done.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/scrapegraphai_example.py` at line 24, The example instantiates
ScrapeGraphAI by explicitly passing os.environ.get("SGAI_API_KEY"), which is
redundant because the ScrapeGraphAI constructor already resolves SGAI_API_KEY
(and the legacy key) internally; update the instantiation to call
ScrapeGraphAI() with no arguments (i.e., remove the os.environ.get(...)
argument) so the constructor handles env var resolution itself, leaving the rest
of the example unchanged.
src/agentor/tools/scrapegraphai.py (3)

74-92: The parameter name format is shadowing a Python builtin — tidy it up.

Ruff's already whistling at us (A002) on line 74. It won't break anything today, but any code inside scrape that reaches for the builtin format() will be in for a surprise. Rename it, and the Literal type stays just as tight.

♻️ Proposed rename
-    def scrape(self, url: str, format: ScrapeFormat = "markdown") -> str:
+    def scrape(self, url: str, output_format: ScrapeFormat = "markdown") -> str:
         """Fetch a webpage and return its content in the requested format.
 
         Args:
             url: The URL to scrape.
-            format: One of "markdown", "html", "links", "summary". Defaults to markdown.
+            output_format: One of "markdown", "html", "links", "summary". Defaults to markdown.
         """
         try:
-            builder = _FORMAT_BUILDERS.get(format)
+            builder = _FORMAT_BUILDERS.get(output_format)
             if builder is None:
                 return (
-                    f"Error in scrape: unsupported format '{format}'. "
+                    f"Error in scrape: unsupported format '{output_format}'. "
                     "Use one of: markdown, html, links, summary."
                 )
             result = self.client.scrape(url, formats=[builder()])
             return _format_result(result, "scrape")

Mind you — this is a public capability signature, and the tests in tests/tools/test_scrapegraphai.py (lines 32, 43) and the example docstring currently call it as format=.... If you take this route, update those too, or slap a # noqa: A002 on the line and leave the signature alone. Your call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 74 - 92, The method scrape
currently uses the parameter name format which shadows the built-in format()
causing linter A002; rename the parameter (for example to out_format or fmt) in
the scrape signature (def scrape(self, url: str, out_format: ScrapeFormat =
"markdown") -> str), update all internal uses (the lookup
_FORMAT_BUILDERS.get(format) -> _FORMAT_BUILDERS.get(out_format) and any
references to format within the function such as the client.scrape call and
result formatting), and update the public/API callers and tests
(tests/tools/test_scrapegraphai.py, example docstring) to pass the new parameter
name or call positionally; if you prefer to keep the public name, remove the
change and instead add a `# noqa: A002` comment to the original parameter to
silence the linter.

81-225: Same try/except/log/format dance repeated eight times — worth a little decorator.

Every capability does the same thing: call the SDK, format the result, catch Exception, log, return f"Error in <name>: ...". It's clean enough now, but the next capability you add will copy-paste the same seven lines. A thin wrapper keeps the intent obvious and the surface tight.

♻️ Sketch of a wrapper
from functools import wraps

def _safe_capability(name: str):
    def deco(fn):
        `@wraps`(fn)
        def inner(self, *args, **kwargs):
            try:
                result = fn(self, *args, **kwargs)
                return _format_result(result, name)
            except Exception as e:
                logger.exception("ScrapeGraphAI %s error", name)
                return f"Error in {name}: {e}"
        return inner
    return deco

Then each capability just returns the raw SDK result (or the unsupported-format string, which would need a small tweak). Not a blocker — file as "next time you're in here."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 81 - 225, Introduce a small
decorator (e.g. _safe_capability) and apply it to each capability method
(scrape, extract, search, crawl, get_crawl_result, monitor, credits, health) to
centralize the try/except/log/_format_result pattern: the decorator should call
the wrapped method, if the return is a string (existing error message like the
unsupported format case in scrape) return it unchanged, otherwise call
_format_result(result, name); on exception log with
logger.exception("ScrapeGraphAI %s error", name) and return f"Error in {name}:
{e}". Update each capability to return the raw SDK result (or the existing
string error) and remove the repeated try/except blocks so the decorator handles
them.

186-205: The monitor method locks formats to Markdown—same concern as crawl. Worth discussing.

Right, listen. You've spotted something worth noting here. The scrape method lets callers pick their format using that ScrapeFormat knob. Markdown, HTML, links, summary—the lot. But monitor and crawl both hardcode MarkdownFormatConfig() with no way round it. The web confirms monitor.create supports the formats parameter, so the capability's there—it's just not wired up.

It's workable as is, mind you. Markdown's a sensible default for scheduled monitors. But if an agent needs to ask for HTML or a summary on a scheduled run, they've got nothing. The _FORMAT_BUILDERS mapping already exists and handles all four formats cleanly.

The suggested implementation follows the scrape pattern directly: add format: ScrapeFormat = "markdown" to the signature, use the builder, handle unsupported formats properly. Straightforward piece of work, no complications. Low priority for now, but worth considering when you next touch this code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 186 - 205, The monitor
method currently hardcodes Markdown by passing MarkdownFormatConfig() to
self.client.monitor.create; change the signature of monitor (the monitor method)
to accept a format: ScrapeFormat = "markdown" parameter, use the existing
_FORMAT_BUILDERS mapping to build the appropriate format config (like scrape
does), replace the hardcoded MarkdownFormatConfig() with the builder output, and
raise/handle an error if the provided format is unsupported before calling
self.client.monitor.create so monitors can be scheduled in HTML/links/summary as
well as markdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agentor/tools/scrapegraphai.py`:
- Around line 66-71: The code silently passes a None API key into _SGAIClient by
assigning resolved_key from api_key or env vars; add a guard after computing
resolved_key (before calling _SGAIClient) to check if resolved_key is falsy and
raise a clear exception (e.g., ValueError or RuntimeError) with a message
instructing the caller to provide api_key or set
SGAI_API_KEY/SCRAPEGRAPH_API_KEY; update the instantiation site where
self.client = _SGAIClient(api_key=resolved_key) to run only after the check so
the error is explicit and not a downstream SDK stack trace.

---

Nitpick comments:
In `@examples/scrapegraphai_example.py`:
- Line 24: The example instantiates ScrapeGraphAI by explicitly passing
os.environ.get("SGAI_API_KEY"), which is redundant because the ScrapeGraphAI
constructor already resolves SGAI_API_KEY (and the legacy key) internally;
update the instantiation to call ScrapeGraphAI() with no arguments (i.e., remove
the os.environ.get(...) argument) so the constructor handles env var resolution
itself, leaving the rest of the example unchanged.

In `@src/agentor/tools/scrapegraphai.py`:
- Around line 74-92: The method scrape currently uses the parameter name format
which shadows the built-in format() causing linter A002; rename the parameter
(for example to out_format or fmt) in the scrape signature (def scrape(self,
url: str, out_format: ScrapeFormat = "markdown") -> str), update all internal
uses (the lookup _FORMAT_BUILDERS.get(format) ->
_FORMAT_BUILDERS.get(out_format) and any references to format within the
function such as the client.scrape call and result formatting), and update the
public/API callers and tests (tests/tools/test_scrapegraphai.py, example
docstring) to pass the new parameter name or call positionally; if you prefer to
keep the public name, remove the change and instead add a `# noqa: A002` comment
to the original parameter to silence the linter.
- Around line 81-225: Introduce a small decorator (e.g. _safe_capability) and
apply it to each capability method (scrape, extract, search, crawl,
get_crawl_result, monitor, credits, health) to centralize the
try/except/log/_format_result pattern: the decorator should call the wrapped
method, if the return is a string (existing error message like the unsupported
format case in scrape) return it unchanged, otherwise call
_format_result(result, name); on exception log with
logger.exception("ScrapeGraphAI %s error", name) and return f"Error in {name}:
{e}". Update each capability to return the raw SDK result (or the existing
string error) and remove the repeated try/except blocks so the decorator handles
them.
- Around line 186-205: The monitor method currently hardcodes Markdown by
passing MarkdownFormatConfig() to self.client.monitor.create; change the
signature of monitor (the monitor method) to accept a format: ScrapeFormat =
"markdown" parameter, use the existing _FORMAT_BUILDERS mapping to build the
appropriate format config (like scrape does), replace the hardcoded
MarkdownFormatConfig() with the builder output, and raise/handle an error if the
provided format is unsupported before calling self.client.monitor.create so
monitors can be scheduled in HTML/links/summary as well as markdown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: daf1abfc-4df7-4ad7-8d54-5e3b9ce95b6c

📥 Commits

Reviewing files that changed from the base of the PR and between 8eab1f4 and e9732e8.

📒 Files selected for processing (4)
  • examples/scrapegraphai_example.py
  • pyproject.toml
  • src/agentor/tools/scrapegraphai.py
  • tests/tools/test_scrapegraphai.py

Comment on lines +66 to +71
resolved_key = (
api_key
or os.environ.get("SGAI_API_KEY")
or os.environ.get("SCRAPEGRAPH_API_KEY")
)
self.client = _SGAIClient(api_key=resolved_key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent fallback to None when no API key is resolved.

If the caller passes nothing and neither env var is set, resolved_key quietly becomes None and gets shovelled into the SDK. The SDK will eventually bark, but the error won't be as clean as the one we raise for missing deps just above. A quick guard here saves a confusing stack trace down the road.

🛡️ Proposed guard
         resolved_key = (
             api_key
             or os.environ.get("SGAI_API_KEY")
             or os.environ.get("SCRAPEGRAPH_API_KEY")
         )
+        if not resolved_key:
+            raise ValueError(
+                "ScrapeGraphAI API key not provided. Pass `api_key=...` or set "
+                "SGAI_API_KEY (or legacy SCRAPEGRAPH_API_KEY) in the environment."
+            )
         self.client = _SGAIClient(api_key=resolved_key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agentor/tools/scrapegraphai.py` around lines 66 - 71, The code silently
passes a None API key into _SGAIClient by assigning resolved_key from api_key or
env vars; add a guard after computing resolved_key (before calling _SGAIClient)
to check if resolved_key is falsy and raise a clear exception (e.g., ValueError
or RuntimeError) with a message instructing the caller to provide api_key or set
SGAI_API_KEY/SCRAPEGRAPH_API_KEY; update the instantiation site where
self.client = _SGAIClient(api_key=resolved_key) to run only after the check so
the error is explicit and not a downstream SDK stack trace.

VinciGit00 and others added 3 commits May 26, 2026 09:24
scrapegraph-py 2.x only ships wheels for Python >= 3.12, so `uv sync`
failed resolution across every CI Python version (the lockfile must
satisfy every interpreter in `requires-python = ">=3.10"`).

Adding the marker keeps agentor itself installable on 3.10/3.11; the
scrapegraph extra simply becomes a no-op there, which the tool's
existing ImportError guard already handles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a2a-sdk 1.0 moved JSONRPCResponse (and the rest of the 0.3 surface) from
a2a.types into a2a.compat.v0_3.types, so `agentor/core/agent.py:24`
fails to import on the latest SDK and every test module errors at
collection time.

Pinning to the 0.x line restores the import. Main hasn't run pytest
since the 1.0 release so this slipped past; a follow-up PR can migrate
to the 1.x API.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`uv sync` in CI doesn't install the `scrapegraph` extra, so the
top-level `MarkdownFormatConfig` (et al.) imports in
agentor/tools/scrapegraphai.py fall through to the ImportError branch
and become `None`. The tool's `_FORMAT_BUILDERS` lambdas then raise
`TypeError: 'NoneType' object is not callable` the first time scrape /
crawl / monitor are exercised.

Patch the four FormatConfig module attributes in setUp so the tests no
longer require scrapegraph-py to be installed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant