Add /load-shareurl proxy for hosts blocked on some school networks by jpolitz · Pull Request #623 · brownplt/code.pyret.org

jpolitz · 2026-04-28T04:52:09Z

Joe says:

We really want to make sure that shareurl works from raw.githubusercontent.com. Notably this proxies only that domain for now, specifically to make sure we support this common use case. In general I expect that less constrained users (e.g. ugrad) just fetch code happily with CORS from the browser in the long run.

This also comes with some improvements to proxying, extracted into a helper:

Real size limits
Streaming, not blocking waiting for the whole response
CSP headers so if someone tries to load something executable (e.g. in SVG) it really can't run
Some filtering on redirects to avoid weird cases of redirecting to 127.0.0.1 inside the server

Since we have existing proxying for images it's nice to improve this and have a good helper.

Claude says (lightly edited by Joe):

Some classrooms will likely load curriculum via #shareurl from raw.githubusercontent.com on networks that block GitHub. The new /load-shareurl endpoint fetches an allowlisted URL on the browser's behalf and streams the response back, so those classrooms get a working path.

A small window.fetch shim in beforePyret.js routes any fetch to an allowlisted host through the proxy automatically. That covers every browser fetch path in one place: drive.js's makeUrlFile, the url/url-file import prefetches in cpo-main.js, and the Pyret runtime's F.fetch via cross-fetch.

/downloadImg moves to share the same streaming-fetch helper, picking up nosniff

sandbox response headers, an IP-filtering HTTP agent (request-filtering-agent), a response size cap, and a request timeout.

Joe says: We really want to make sure that shareurl works from raw.githubusercontent.com. Notably this proxies *only* that domain for now, specifically to make sure we support this common use case. In general I expect that less constrained users (e.g. ugrad) just fetch code happily with CORS from the browser in the long run. This also comes with some improvements to proxying, extracted into a helper: - Real size limits - Streaming, not blocking waiting for the whole response - CSP headers so if someone tries to load something executable (e.g. in SVG) it really can't run - Some filtering on redirects to avoid weird cases of redirecting to 127.0.0.1 inside the server Since we have existing proxying for images it's nice to improve this and have a good helper. Claude says (lightly edited by Joe): Some classrooms will likely load curriculum via #shareurl from raw.githubusercontent.com on networks that block GitHub. The new /load-shareurl endpoint fetches an allowlisted URL on the browser's behalf and streams the response back, so those classrooms get a working path. A small window.fetch shim in beforePyret.js routes any fetch to an allowlisted host through the proxy automatically. That covers every browser fetch path in one place: drive.js's makeUrlFile, the url/url-file import prefetches in cpo-main.js, and the Pyret runtime's F.fetch via cross-fetch. /downloadImg moves to share the same streaming-fetch helper, picking up nosniff + sandbox response headers, an IP-filtering HTTP agent (request-filtering-agent), a response size cap, and a request timeout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

blerner · 2026-04-28T13:01:55Z

Seems reasonably straightforward.

Do we want to proxy raw.githubusercontent automatically, or only if it fails first?

Do you intend to have different timeouts in the two proxy usages? Or should those be made constants instead of magic numbers?

jpolitz · 2026-04-28T13:12:02Z

I'll do some magic # fixup, that's good feedback.

Re: eager vs. fallback proxying – the problem with fallback is that in the failure case it can take a long time for the first request to fail on a school content filter. So the experience is lousy.

If we end up proxying a lot, we can do some heuristic thing where we try for 2s or something and give up and proxy if that doesn't work.

blerner · 2026-04-28T13:22:51Z

Maybe we start proxying automatically, and then in the background, we try to request a specific file that we own, without a proxy: if it goes through, we can disable the proxy?

jpolitz · 2026-04-28T13:30:38Z

Good point. Can even just do proxy/non-proxy in parallel on the first request, and then disable the proxy if the regular request succeeds.

If we add more domains that becomes a dictionary of "proxy-for" different ones. But that's not the worst thing ever, and would adapt to firewall rules at schools...

Claude says (Joe felt little need to edit): Replaces the four magic numbers in the /downloadImg and /load-shareurl proxy call sites with named module-level constants. Bumps the image proxy to 20 MB / 30 s so that fetches of phone-sized images and slow Drive ?export= responses don't get cut off; /load-shareurl stays at 1 MB / 10 s since it's plaintext from a fast host. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Joe says: Constraints: - Don't want to proxy everything (be kind to our servers) - Don't want to wait a long time for the direct request to fail (flaky network, weird school blocker) So we kick off both the proxy and the direct request the first time we have a proxy-able candidate. - If the direct request succeeds, cancel the proxy request to CPO and do direct requests for that domain from this page - If the direct request fails, use only proxy requests for that domain for this page A direct request counts as success if it returns 200 OK and text/plain (making sure we don't weirdly accept 200 OK with like... a HTML login page because of a school SSO portal) I think this is a good balance of server load and letting working clients do their thing. Claude says: Responding to Ben's PR feedback to do better than always-proxy. The first fetch to an allowlisted host now runs direct + /load-shareurl in parallel; whichever returns first that verifies (direct = 2xx text/plain) is served to the caller, and the outcome of the direct request locks per-host shouldProxy state for the rest of the page-load. So unrestricted networks pay the proxy hop only once and rapidly switch to direct; blocked networks see direct fail and switch to proxy. Header-only verification (rather than body comparison) lets us abort the in-flight proxy fetch as soon as direct is trusted, keeping server load near zero on the healthy path. proxyStreamFetch now also tears down the upstream connection when the client disconnects, so the abort actually saves bandwidth on the server side too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Joe says: I think I constrained Claude so much in this session that it had to say things I like below :-) This is post-convo with Ben where we were still unhappy with (a) variable names and (b) closed over mutable variables instead of declarative promise APIs. Claude says: Following a second pass of code review: the previous version used a new Promise((resolve, reject) => { let gotSomeResponse = false; ... }) block with a flag tracking which side had won. That conflated two concerns — "first success wins the response" and "abort proxy iff direct beat it" — into one imperative state machine. This pass separates them. The race is expressed as three composed Promise primitives: - directP: direct fetch, gated on content-type verification (rejects if direct returned the wrong shape, so it drops out of the response race entirely). - shouldProxyPromise (Promise.race): false iff direct verified before the timeout; locks the per-host policy for the page-load. - directFinishedSuccessfullyAndFirstP (Promise.race): does direct settle before proxy, AND was that settlement a verified success? Only then do we proxyCtrl.abort(). This keeps the abort safe — we never abort once proxy has already returned headers, which would error the caller's body stream mid-read. - responsePromise (Promise.any): whichever of directP or proxyP fulfills first. Tested both healthy and DevTools-blocked flows. Observed the timing edge case where both proxy and direct return 200 within a few ms of each other: shouldProxy still locks 'direct' as expected, since the verdict is decoupled from whether the abort caught proxy in time. Also adds a comment explaining the deliberate non-handling of caller- provided init.signal: the signal overwrite means a caller aborting won't cancel the proxy fetch, but the alternative (event-listener forwarding or AbortSignal.any) is more complexity than the not-quite-fully-aborted case justifies right now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jpolitz requested review from blerner and schanzer April 28, 2026 04:52

jpolitz and others added 3 commits April 28, 2026 06:51

jpolitz merged commit 5713053 into horizon Apr 30, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /load-shareurl proxy for hosts blocked on some school networks#623

Add /load-shareurl proxy for hosts blocked on some school networks#623
jpolitz merged 4 commits into
horizonfrom
shareurl-proxy

jpolitz commented Apr 28, 2026

Uh oh!

blerner commented Apr 28, 2026

Uh oh!

jpolitz commented Apr 28, 2026

Uh oh!

blerner commented Apr 28, 2026

Uh oh!

jpolitz commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jpolitz commented Apr 28, 2026

Uh oh!

blerner commented Apr 28, 2026

Uh oh!

jpolitz commented Apr 28, 2026

Uh oh!

blerner commented Apr 28, 2026

Uh oh!

jpolitz commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants