Skip to content

feat: add Tavily Extract as alternative web content reader in visit_web#2

Open
tavily-integrations wants to merge 1 commit into
PolarSeeker:mainfrom
Tavily-FDE:feat/tavily-migration/jina-to-tavily-extract
Open

feat: add Tavily Extract as alternative web content reader in visit_web#2
tavily-integrations wants to merge 1 commit into
PolarSeeker:mainfrom
Tavily-FDE:feat/tavily-migration/jina-to-tavily-extract

Conversation

@tavily-integrations

Copy link
Copy Markdown

Summary

Adds Tavily Extract as a configurable alternative to Jina for URL content extraction in visit_web.py, controlled by the VISIT_PROVIDER environment variable. This is an additive change — existing Jina functionality is fully preserved and remains the default.

When VISIT_PROVIDER=tavily, content is fetched via TavilyClient.extract() and then fed into the same EXTRACTOR_PROMPT + LLM summarization pipeline used by the Jina path, ensuring consistent downstream behavior.

Changes

tools/tool/visit_web.py

  • Imported TavilyClient from tavily
  • Added VISIT_PROVIDER env var lookup (default: jina)
  • Lazily initialize _tavily_client only when provider is tavily
  • Added tavily_readpage(url) — fetches raw content via Tavily Extract API
  • Added readpage_tavily(url, goal) — feeds Tavily content through the existing LLM summarization pipeline
  • Added readpage(url, goal) — dispatcher that routes to Tavily or Jina based on VISIT_PROVIDER
  • Updated visit_web() to use the readpage() dispatcher instead of calling readpage_jina() directly

config/.env

  • Added VISIT_PROVIDER=jina (default placeholder)
  • Added TAVILY_API_KEY= placeholder

requirements.txt

  • Added tavily-python

Notes for reviewers

  • The Tavily client is only initialized when VISIT_PROVIDER=tavily, so no API key is required for the default Jina path
  • The summarization pipeline (EXTRACTOR_PROMPT → LLM → JSON parse) is identical between both paths
  • All existing Jina code, retry loops, and env vars are untouched

Automated Review

  • Passed after 1 attempt(s)
  • Final review: The jina-to-tavily-extract migration is correct and complete. It adds Tavily Extract as a configurable alternative to Jina via the VISIT_PROVIDER env var, preserves all existing Jina logic, wires the dispatcher into visit_web(), and updates requirements.txt and config/.env. The Tavily SDK usage (TavilyClient.extract()) is correct. Several minor issues are noted — primarily code duplication, a potential conflict with the prerequisite unit's additions to requirements.txt and config/.env, and a null-safety gap in tavily_readpage() — but none are blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant