Skip to content

Fix image proxy handling pointing at ourselves#3052

Open
marcelveldt wants to merge 2 commits intodevfrom
fix-image-proxy-own-url
Open

Fix image proxy handling pointing at ourselves#3052
marcelveldt wants to merge 2 commits intodevfrom
fix-image-proxy-own-url

Conversation

@marcelveldt
Copy link
Member

Fix (edge) case where an image URL is pointing at our own imageproxy

Copilot AI review requested due to automatic review settings January 29, 2026 11:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes an edge case where an image URL points to the server's own imageproxy endpoint, which can occur when the server's external URL is not reachable from within the server itself. The fix adds detection logic to identify such self-referencing imageproxy URLs and directly fetch the original image instead of attempting to fetch from the external URL.

Changes:

  • Added _extract_imageproxy_params helper function to parse and extract path and provider from imageproxy URLs
  • Modified get_image_data to detect and handle self-referencing imageproxy URLs by recursively fetching the original image
  • Improved error messaging when image fetching fails
  • Enhanced docstring documentation for the get_image_data function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MarvinSchenkel
Copy link
Contributor

Does this fix music-assistant/support#4818 by any chance?

@marcelveldt
Copy link
Member Author

Does this fix music-assistant/support#4818 by any chance?

No

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

:param _depth: Internal recursion depth counter (do not set manually).
"""
max_recursion_depth = 5
if _depth > max_recursion_depth:
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursion depth check uses > (greater than) instead of >= (greater than or equal), which allows one extra level of recursion beyond the intended maximum. With max_recursion_depth = 5 and the check _depth > max_recursion_depth, the function will actually allow recursion up to depth 6 (0, 1, 2, 3, 4, 5, 6 before failing at 7). The check should be if _depth >= max_recursion_depth: to properly limit recursion to the intended maximum of 5 levels.

Suggested change
if _depth > max_recursion_depth:
if _depth >= max_recursion_depth:

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +54
# The path may be double URL-encoded, decode it
decoded_path = urllib.parse.unquote_plus(path)
if re.search(r"%[0-9A-Fa-f]{2}", decoded_path):
decoded_path = urllib.parse.unquote_plus(decoded_path)
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double URL decoding logic at lines 52-54 uses a regex pattern to detect if the path is still URL-encoded after the first decoding, then decodes it again. However, this approach may fail to handle edge cases where the regex pattern appears naturally in the decoded string (e.g., a filename containing "%20"). A more reliable approach would be to track whether double encoding was applied when the URL was created and decode accordingly. Alternatively, you could use a loop to decode until no more encoded characters are found, with a safety limit to prevent infinite loops.

Suggested change
# The path may be double URL-encoded, decode it
decoded_path = urllib.parse.unquote_plus(path)
if re.search(r"%[0-9A-Fa-f]{2}", decoded_path):
decoded_path = urllib.parse.unquote_plus(decoded_path)
# The path may be URL-encoded multiple times; decode until stable
decoded_path = path
for _ in range(3):
new_decoded = urllib.parse.unquote_plus(decoded_path)
if new_decoded == decoded_path:
break
decoded_path = new_decoded

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +58
try:
parsed = urllib.parse.urlparse(url)
query_params = urllib.parse.parse_qs(parsed.query)

path = query_params.get("path", [None])[0]
provider = query_params.get("provider", ["builtin"])[0]

if path:
# The path may be double URL-encoded, decode it
decoded_path = urllib.parse.unquote_plus(path)
if re.search(r"%[0-9A-Fa-f]{2}", decoded_path):
decoded_path = urllib.parse.unquote_plus(decoded_path)
return (decoded_path, provider)
except (KeyError, ValueError, IndexError):
# URL parsing failed, not an imageproxy URL
pass
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handler catches KeyError, ValueError, and IndexError, but these exceptions are unlikely to be raised by the operations in the try block. urllib.parse.urlparse() doesn't raise these exceptions for invalid URLs - it just parses them as best it can. urllib.parse.parse_qs() also doesn't raise these exceptions. The only place where IndexError could potentially occur is line 47-48 when accessing index [0], but this is already guarded by the .get() method which returns a default value. Consider removing this overly broad exception handling, or at minimum document what specific error conditions you're trying to handle.

Suggested change
try:
parsed = urllib.parse.urlparse(url)
query_params = urllib.parse.parse_qs(parsed.query)
path = query_params.get("path", [None])[0]
provider = query_params.get("provider", ["builtin"])[0]
if path:
# The path may be double URL-encoded, decode it
decoded_path = urllib.parse.unquote_plus(path)
if re.search(r"%[0-9A-Fa-f]{2}", decoded_path):
decoded_path = urllib.parse.unquote_plus(decoded_path)
return (decoded_path, provider)
except (KeyError, ValueError, IndexError):
# URL parsing failed, not an imageproxy URL
pass
parsed = urllib.parse.urlparse(url)
query_params = urllib.parse.parse_qs(parsed.query)
path = query_params.get("path", [None])[0]
provider = query_params.get("provider", ["builtin"])[0]
if path:
# The path may be double URL-encoded, decode it
decoded_path = urllib.parse.unquote_plus(path)
if re.search(r"%[0-9A-Fa-f]{2}", decoded_path):
decoded_path = urllib.parse.unquote_plus(decoded_path)
return (decoded_path, provider)

Copilot uses AI. Check for mistakes.
server_base_urls = {mass.webserver.base_url, mass.streams.base_url}
if url_host in server_base_urls:
if imageproxy_params := _extract_imageproxy_params(path_or_url):
extracted_path, extracted_provider = imageproxy_params
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self-reference detection at line 93 checks if the URL host matches the server base URLs, but this check may not handle all edge cases. If the imageproxy URL includes query parameters that reference another imageproxy URL (creating a chain), and those query parameters differ (e.g., different size or format parameters), the recursion will continue until hitting the depth limit. Consider also checking if the extracted path from the imageproxy URL is itself an imageproxy URL pointing to the same server, to prevent unnecessary recursion through chains of imageproxy URLs with varying parameters.

Suggested change
extracted_path, extracted_provider = imageproxy_params
extracted_path, extracted_provider = imageproxy_params
# Iteratively unwrap nested imageproxy URLs that also point to our own server
seen_urls: set[str] = set()
while True:
if extracted_path in seen_urls:
# Detected a loop in proxy URLs; stop unwrapping
break
seen_urls.add(extracted_path)
inner_params = _extract_imageproxy_params(extracted_path)
if not inner_params:
break
inner_parsed = urllib.parse.urlparse(extracted_path)
inner_host = f"{inner_parsed.scheme}://{inner_parsed.netloc}" if inner_parsed.scheme and inner_parsed.netloc else ""
# Only unwrap further if the inner proxy URL also points to this server
if inner_host and inner_host not in server_base_urls:
break
extracted_path, extracted_provider = inner_params

Copilot uses AI. Check for mistakes.
server_base_urls = {mass.webserver.base_url, mass.streams.base_url}
if url_host in server_base_urls:
if imageproxy_params := _extract_imageproxy_params(path_or_url):
extracted_path, extracted_provider = imageproxy_params
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When extracting and decoding the imageproxy path parameter, there's no validation that the decoded path is safe before recursively calling get_image_data. While the recursive call will eventually validate the path through existing checks (e.g., is_safe_path for file paths), an attacker could potentially craft a malicious imageproxy URL that decodes to a path attempting directory traversal or other security issues. Consider adding an early validation check on the extracted_path before the recursive call, especially since this is handling user-controlled URL input.

Suggested change
extracted_path, extracted_provider = imageproxy_params
extracted_path, extracted_provider = imageproxy_params
# Early validation of extracted path before recursive call
if not extracted_path.startswith(("http://", "https://", "data:image")):
if not is_safe_path(extracted_path):
msg = f"Unsafe image path: {extracted_path}"
raise FileNotFoundError(msg)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants