fix(douban): read search results from page data#1213
fix(douban): read search results from page data#1213warkcod wants to merge 4 commits intojackwener:mainfrom
Conversation
|
@warkcod thanks for the PR. The Douban page-data fallback is fine, but our reviewers found two browser-core blockers in Blocker 1: named-tab hijackWhen Fix: in named path, only Blocker 2: registered Electron apps get implicitly covered
Fix: Combined patch (apply with
|
|
@warkcod follow-up: a third blocker on the adapter side. Blocker 3: silent empty-result when both DOM and page-data are missingThe current Fix: when both DOM rendering and page-data are missing, throw Sequential patch (apply after the cdp combined patch)diff --git a/clis/douban/utils.js b/clis/douban/utils.js
index c038999..a094e7f 100644
--- a/clis/douban/utils.js
+++ b/clis/douban/utils.js
@@ -564,7 +564,8 @@ export async function searchDouban(page, type, keyword, limit) {
const normalize = (value) => (value || '').replace(/\\s+/g, ' ').trim();
const seen = new Set();
const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
- const rawItems = Array.isArray(window.__DATA__?.items) ? window.__DATA__.items : [];
+ const hasPageDataItems = Array.isArray(window.__DATA__?.items);
+ const rawItems = hasPageDataItems ? window.__DATA__.items : [];
const rawItemsById = new Map(
rawItems
.map((item) => [String(item?.id || '').trim(), item])
@@ -648,10 +649,19 @@ export async function searchDouban(page, type, keyword, limit) {
if (results.length >= \${safeLimit}) break;
}
}
- return results;
+ return {
+ results,
+ hasRenderedItems: items.length > 0,
+ hasPageDataItems,
+ };
})()
\`);
});
+ if (Array.isArray(data)) return data;
+ if (!data?.results?.length && data?.hasRenderedItems === false && data?.hasPageDataItems === false) {
+ throw new EmptyResultError('douban search', 'Search page rendered neither DOM results nor window.__DATA__.items. Douban page structure may have changed or the page did not finish loading.');
+ }
+ if (Array.isArray(data?.results)) return data.results;
return Array.isArray(data) ? data : [];
}
/**
diff --git a/clis/douban/utils.test.js b/clis/douban/utils.test.js
index 2c7aa2c..5f41f10 100644
--- a/clis/douban/utils.test.js
+++ b/clis/douban/utils.test.js
@@ -241,6 +241,21 @@ describe('douban utils', () => {
]);
});
+ it('fails fast when neither rendered DOM nor window data items are available', async () => {
+ const page = {
+ goto: vi.fn().mockResolvedValue(undefined),
+ wait: vi.fn().mockResolvedValue(undefined),
+ evaluate: vi.fn()
+ .mockResolvedValueOnce({ blocked: false, title: 'empty - 豆瓣搜索', href: 'https://search.douban.com/book/subject_search?search_text=empty&cat=1001' })
+ .mockImplementationOnce((script) => runSearchEvaluate(script, undefined, [])),
+ };
+
+ await expect(searchDouban(page, 'book', 'empty', 3)).rejects.toMatchObject({
+ code: 'EMPTY_RESULT',
+ hint: expect.stringContaining('neither DOM results nor window.__DATA__.items'),
+ });
+ });
+
it('normalizes douban book subject raw data into structured fields', () => {
const normalized = normalizeDoubanBookSubject({
id: '2567698',How to apply (after the cdp combined patch)```bash save the diff above as /tmp/douban-empty.patch, then:git apply /tmp/douban-empty.patch Once both patches are pushed, we'll re-review the new head and run the heavyweight CI gate before merge. |
|
Addressed the review feedback in two follow-up commits:\n\n- Named CDP reuse no longer falls back to arbitrary inspectable tabs. When a named tab is requested, OpenCLI now reuses only a matching window.name target or creates a new target.\n- Default named-tab reuse is skipped for registered Electron app workspaces; explicit OPENCLI_CDP_TAB_NAME still opts in.\n- Douban search now returns source metadata from page evaluation and throws EMPTY_RESULT when neither rendered result items nor window.DATA.items are present.\n\nVerification run locally:\n\n- npx vitest run src/browser/cdp.test.ts src/runtime.test.ts clis/douban/utils.test.js\n- npm run typecheck\n- npm run build |
Summary
Verification
douban searchcalls against localhost:9222 reused the same named target instead of opening new tabs