Skip to content

Feat/news feed GitHub hf wechat#199

Merged
davidliuk merged 6 commits into
mainfrom
feat/news-feed-github-hf-wechat
Apr 27, 2026
Merged

Feat/news feed GitHub hf wechat#199
davidliuk merged 6 commits into
mainfrom
feat/news-feed-github-hf-wechat

Conversation

@zli12321
Copy link
Copy Markdown
Collaborator

Added GitHub, hugging face, and WeChat news feed.

Comment thread server/scripts/research-news/search_huggingface.py Fixed
Copy link
Copy Markdown
Collaborator

@davidliuk davidliuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments from local pass.

Comment thread server/routes/news.js Outdated
// UI-supplied API tokens override env vars. Stored in plain JSON config
// (same trust model as the rest of news settings), and only applied to the
// child process — never echoed back over the API.
if (sourceName === 'github' && typeof config.api_token === 'string' && config.api_token.trim()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: this stores GitHub/HF access tokens in the regular news config file, and GET /api/news/config/:source returns that config unchanged. That means saved tokens are persisted in plaintext JSON and echoed back to any caller that can read news settings, bypassing the existing credentialsDb pattern used elsewhere for secrets. Please move these tokens into the credential store or, at minimum, redact them from config responses and avoid writing raw token values into the news config JSON.

account = account.strip()
instance = (instance or DEFAULT_INSTANCE).rstrip("/")

if account.startswith(("http://", "https://")):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: allowing accounts to be a full http:// or https:// URL makes this endpoint an authenticated server-side fetch primitive. A user can point the backend at localhost or private network services, and RSS/Atom-shaped responses may then be reflected into results/logs. Please restrict full URLs to the configured RSSHub host, or disallow arbitrary full URLs and validate scheme/host before fetching.

Comment thread server/routes/news.js Outdated
if (sourceName === 'github' && typeof config.api_token === 'string' && config.api_token.trim()) {
env.GITHUB_TOKEN = config.api_token.trim();
}
if (sourceName === 'huggingface' && typeof config.api_token === 'string' && config.api_token.trim()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: the UI/config now accepts an HF token and passes it as HF_TOKEN, but paper mode still fetches Daily Papers with a hardcoded unauthenticated header in search_huggingface.py rather than hf_auth_headers(). As a result, the token only affects models/datasets/spaces while the default papers mode remains unauthenticated. Please route Daily Papers through the same auth header builder or make the settings copy explicit about the limited scope.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High — secrets in plaintext config + echoed via GET

server/routes/news.js

  • New SECRET_FIELDS_BY_SOURCE registry maps each source's secret field to a credential type: news_github_token, news_huggingface_token, news_wechat_access_key. Helpers upsertSingleNewsCredential / deleteNewsCredentialsByType / readActiveNewsCredential route through credentialsDb (the same per-user store used elsewhere for tokens).
  • GET /config/:source: never returns the secret. Instead returns <field>_set: bool flags. On first read, migrateLegacySecretsInPlace moves any plaintext token already on disk into credentialsDb and rewrites the JSON file without the secret — so existing installs are auto-scrubbed.
  • PUT /config/:source: extracts secret fields from the body and routes them to credentialsDb. Semantics: non-empty string → upsert, null → delete, empty/absent → no-op (so accidentally re-saving other settings can't wipe a stored token).
  • handleSearch: tokens are read from credentialsDb per req.user.id, then injected as GITHUB_TOKEN / HF_TOKEN env into the spawned Python child only. WeChat's --access-key is also pulled from the credential store.
  • The defaultConfig entries for api_token / access_key are removed so the JSON shape can't accidentally re-introduce plaintext storage.

SourceSettingsDialog.tsx + i18n  TokenInput now accepts isSaved and onClear. When a credential is stored: shows a "Saved to your credential store" badge, a "Clear saved token" link, and a placeholder hint that pasting will replace it. Clearing sets the field to null, which the backend treats as an explicit delete; UI shows a "Will be removed when you save" warning until the user saves.

High — SSRF via WeChat accounts

server/scripts/research-news/search_wechat.py  normalize_account_to_url now enforces same-origin between full-URL accounts entries and the configured RSSHub instance (scheme + netloc must match exactly). Bare IDs and relative paths remain inherently scoped to instance. Smoke test confirmed:

Input account | Instance | Result -- | -- | -- wechat/ce/huxiu_com | https://rsshub.app | ✓ resolved huxiu_com (bare) | https://rsshub.app | ✓ resolved https://rsshub.app/wechat/ce/x | https://rsshub.app | ✓ resolved http://localhost:8080/admin | https://rsshub.app | ✗ rejected http://169.254.169.254/... | https://rsshub.app | ✗ rejected http://rsshub.app/... (scheme downgrade) | https://rsshub.app | ✗ rejected http://my-rsshub.local:1200/x | http://my-rsshub.local:1200 | ✓ self-host accepted

Medium — HF Daily Papers ignored HF_TOKEN

server/scripts/research-news/search_huggingface.py  fetch_daily_papers now builds its headers via hf_auth_headers({...}) instead of a hardcoded UA-only dict, so the per-user token forwarded by the Node route is applied to the papers mode the same way it's applied to models / datasets / spaces.

Verification

  • ReadLints clean across all four edited files.
  • npx tsc --noEmit -p . passes.
  • Python AST parse passes for both edited scripts.
  • All three i18n JSONs parse.
  • SSRF guard smoke-tested above.

if last_modified:
try:
last_dt = datetime.fromisoformat(last_modified.replace("Z", "+00:00"))
except (ValueError, TypeError):
@davidliuk davidliuk merged commit 58468c5 into main Apr 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants