Skip to content

Add Instagram support to video-link import#90

Merged
windoze95 merged 1 commit into
mainfrom
feat/video-platforms
Jun 30, 2026
Merged

Add Instagram support to video-link import#90
windoze95 merged 1 commit into
mainfrom
feat/video-platforms

Conversation

@windoze95

Copy link
Copy Markdown
Owner

What

Extends the now-live video-link import (PR #89) to Instagram reels — the platform you asked about. TikTok was the only wired platform; this adds Instagram and the JSON robustness needed for it.

Changes

  • fetchInstagramGET /v1/instagram/postshortcode (cache key), video_url, caption, and video_duration (seconds → rounded to ms). Transcript is a best-effort GET /v2/instagram/media/transcript (frequently null for reels; we fall back to caption + sampled frames, which already carry the recipe).
  • Control-char sanitizer in the shared get() — Instagram captions contain raw newlines inside JSON string values, which encoding/json rejects (verified: invalid character '\n' in string literal). The sanitizer escapes control chars inside string literals only, so it's a no-op on well-formed JSON and leaves the TikTok path unchanged (just hardened).

Verified live (against real ScrapeCreators)

  • @thecookingfoodie reel DX9a1wjKVRw ("Thai mango sticky rice") parsed correctly: shortcode, caption, 22.547s duration, media URL.
  • The cdninstagram.com media URL downloaded fine from a server (HTTP 206, video/mp4) — the feared IP-bound-CDN caveat did not materialize, so no download_media proxy is needed.

Tests (all offline)

  • TestFetchVideo_Instagram — fixture deliberately embeds a raw-newline caption to prove the sanitizer makes real responses parse end-to-end; asserts shortcode→VideoID, seconds→ms rounding, media URL, transcript.
  • TestFetchVideo_Instagram_NotAVideo — image posts are rejected.
  • TestSanitizeJSONControlChars — raw \n/\t escaped + decode round-trip; already-escaped sequences preserved; clean JSON untouched.

Not in this PR

YouTube / Facebook / Pinterest still detect but return "not yet supported" — each needs its own live shape check (YouTube also needs the finished MP4, not the raw googlevideo URL). Follow-ups.

🤖 Generated with Claude Code

https://claude.ai/code/session_01BU4UWZutHd1AnK3XAf7H19

Wire Instagram reels into the ScrapeCreators client (TikTok was the only
implemented platform):

- fetchInstagram: GET /v1/instagram/post → shortcode, video_url, caption,
  and video_duration (seconds → rounded to ms); best-effort transcript via
  /v2/instagram/media/transcript (often null for reels, falls back to
  caption + sampled frames). Verified live: the cdninstagram media URL
  downloads fine from a server (the IP-bound-CDN concern did not pan out).
- Control-char sanitizer in the shared get(): Instagram captions contain
  raw newlines inside JSON strings, which encoding/json rejects. The
  sanitizer escapes control chars inside string literals and is a no-op on
  well-formed JSON, so the TikTok path is unchanged (and hardened).

Tests: Instagram fetch (fixture carries a real raw-newline caption to prove
the sanitizer end-to-end), non-video rejection, and a direct sanitizer unit
test. All offline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BU4UWZutHd1AnK3XAf7H19
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@windoze95 windoze95 merged commit 0bc4891 into main Jun 30, 2026
1 check passed
@windoze95 windoze95 deleted the feat/video-platforms branch June 30, 2026 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant