Add Playwright-backed row extraction and local LLM provider support#145
Add Playwright-backed row extraction and local LLM provider support#145AdamEXu wants to merge 18 commits into
Conversation
# Conflicts: # backend/src/mastra/tools/investigate-tool.ts
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
WIP: Still need to add sandboxing and other important features like that to make sure it's good and stuff |
|
have not tested the latest commit at all yet btw |
Summary
This PR adds a Playwright-backed row extraction path that can build, validate, cache, run, and repair reusable extractors through TinyFish Browser. It also carries the supporting fixes needed to make that reliable in practice: richer schema contracts, provider-selectable LLM setup, row extractor settings, better refresh/populate cancellation, and UI/backend wiring for the new configuration.
What changed
Validation
git diff --check upstream/main...HEADcd backend && npm run buildcd frontend && npm run lintpassed with warnings only: existing/new<img>warnings, React Compiler/TanStack warning, generated Convex eslint-disable warnings, and an existing analytics hook dependency warning.cd frontend && npm run buildNotes for reviewers
The biggest behavioral change is that row collection can now move from one-off agent investigation to reusable browser extractors when the dataset has a stable page family. The provider/model and schema-contract changes are included because the extractor builder needs explicit model selection, validation rules, and typed normalization to avoid producing brittle scripts.