Fix image proxy handling pointing at ourselves#3052
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request fixes an edge case where an image URL points to the server's own imageproxy endpoint, which can occur when the server's external URL is not reachable from within the server itself. The fix adds detection logic to identify such self-referencing imageproxy URLs and directly fetch the original image instead of attempting to fetch from the external URL.
Changes:
- Added
_extract_imageproxy_paramshelper function to parse and extract path and provider from imageproxy URLs - Modified
get_image_datato detect and handle self-referencing imageproxy URLs by recursively fetching the original image - Improved error messaging when image fetching fails
- Enhanced docstring documentation for the
get_image_datafunction
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Does this fix music-assistant/support#4818 by any chance? |
No |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| :param _depth: Internal recursion depth counter (do not set manually). | ||
| """ | ||
| max_recursion_depth = 5 | ||
| if _depth > max_recursion_depth: |
There was a problem hiding this comment.
The recursion depth check uses > (greater than) instead of >= (greater than or equal), which allows one extra level of recursion beyond the intended maximum. With max_recursion_depth = 5 and the check _depth > max_recursion_depth, the function will actually allow recursion up to depth 6 (0, 1, 2, 3, 4, 5, 6 before failing at 7). The check should be if _depth >= max_recursion_depth: to properly limit recursion to the intended maximum of 5 levels.
| if _depth > max_recursion_depth: | |
| if _depth >= max_recursion_depth: |
| # The path may be double URL-encoded, decode it | ||
| decoded_path = urllib.parse.unquote_plus(path) | ||
| if re.search(r"%[0-9A-Fa-f]{2}", decoded_path): | ||
| decoded_path = urllib.parse.unquote_plus(decoded_path) |
There was a problem hiding this comment.
The double URL decoding logic at lines 52-54 uses a regex pattern to detect if the path is still URL-encoded after the first decoding, then decodes it again. However, this approach may fail to handle edge cases where the regex pattern appears naturally in the decoded string (e.g., a filename containing "%20"). A more reliable approach would be to track whether double encoding was applied when the URL was created and decode accordingly. Alternatively, you could use a loop to decode until no more encoded characters are found, with a safety limit to prevent infinite loops.
| # The path may be double URL-encoded, decode it | |
| decoded_path = urllib.parse.unquote_plus(path) | |
| if re.search(r"%[0-9A-Fa-f]{2}", decoded_path): | |
| decoded_path = urllib.parse.unquote_plus(decoded_path) | |
| # The path may be URL-encoded multiple times; decode until stable | |
| decoded_path = path | |
| for _ in range(3): | |
| new_decoded = urllib.parse.unquote_plus(decoded_path) | |
| if new_decoded == decoded_path: | |
| break | |
| decoded_path = new_decoded |
| try: | ||
| parsed = urllib.parse.urlparse(url) | ||
| query_params = urllib.parse.parse_qs(parsed.query) | ||
|
|
||
| path = query_params.get("path", [None])[0] | ||
| provider = query_params.get("provider", ["builtin"])[0] | ||
|
|
||
| if path: | ||
| # The path may be double URL-encoded, decode it | ||
| decoded_path = urllib.parse.unquote_plus(path) | ||
| if re.search(r"%[0-9A-Fa-f]{2}", decoded_path): | ||
| decoded_path = urllib.parse.unquote_plus(decoded_path) | ||
| return (decoded_path, provider) | ||
| except (KeyError, ValueError, IndexError): | ||
| # URL parsing failed, not an imageproxy URL | ||
| pass |
There was a problem hiding this comment.
The exception handler catches KeyError, ValueError, and IndexError, but these exceptions are unlikely to be raised by the operations in the try block. urllib.parse.urlparse() doesn't raise these exceptions for invalid URLs - it just parses them as best it can. urllib.parse.parse_qs() also doesn't raise these exceptions. The only place where IndexError could potentially occur is line 47-48 when accessing index [0], but this is already guarded by the .get() method which returns a default value. Consider removing this overly broad exception handling, or at minimum document what specific error conditions you're trying to handle.
| try: | |
| parsed = urllib.parse.urlparse(url) | |
| query_params = urllib.parse.parse_qs(parsed.query) | |
| path = query_params.get("path", [None])[0] | |
| provider = query_params.get("provider", ["builtin"])[0] | |
| if path: | |
| # The path may be double URL-encoded, decode it | |
| decoded_path = urllib.parse.unquote_plus(path) | |
| if re.search(r"%[0-9A-Fa-f]{2}", decoded_path): | |
| decoded_path = urllib.parse.unquote_plus(decoded_path) | |
| return (decoded_path, provider) | |
| except (KeyError, ValueError, IndexError): | |
| # URL parsing failed, not an imageproxy URL | |
| pass | |
| parsed = urllib.parse.urlparse(url) | |
| query_params = urllib.parse.parse_qs(parsed.query) | |
| path = query_params.get("path", [None])[0] | |
| provider = query_params.get("provider", ["builtin"])[0] | |
| if path: | |
| # The path may be double URL-encoded, decode it | |
| decoded_path = urllib.parse.unquote_plus(path) | |
| if re.search(r"%[0-9A-Fa-f]{2}", decoded_path): | |
| decoded_path = urllib.parse.unquote_plus(decoded_path) | |
| return (decoded_path, provider) |
| server_base_urls = {mass.webserver.base_url, mass.streams.base_url} | ||
| if url_host in server_base_urls: | ||
| if imageproxy_params := _extract_imageproxy_params(path_or_url): | ||
| extracted_path, extracted_provider = imageproxy_params |
There was a problem hiding this comment.
The self-reference detection at line 93 checks if the URL host matches the server base URLs, but this check may not handle all edge cases. If the imageproxy URL includes query parameters that reference another imageproxy URL (creating a chain), and those query parameters differ (e.g., different size or format parameters), the recursion will continue until hitting the depth limit. Consider also checking if the extracted path from the imageproxy URL is itself an imageproxy URL pointing to the same server, to prevent unnecessary recursion through chains of imageproxy URLs with varying parameters.
| extracted_path, extracted_provider = imageproxy_params | |
| extracted_path, extracted_provider = imageproxy_params | |
| # Iteratively unwrap nested imageproxy URLs that also point to our own server | |
| seen_urls: set[str] = set() | |
| while True: | |
| if extracted_path in seen_urls: | |
| # Detected a loop in proxy URLs; stop unwrapping | |
| break | |
| seen_urls.add(extracted_path) | |
| inner_params = _extract_imageproxy_params(extracted_path) | |
| if not inner_params: | |
| break | |
| inner_parsed = urllib.parse.urlparse(extracted_path) | |
| inner_host = f"{inner_parsed.scheme}://{inner_parsed.netloc}" if inner_parsed.scheme and inner_parsed.netloc else "" | |
| # Only unwrap further if the inner proxy URL also points to this server | |
| if inner_host and inner_host not in server_base_urls: | |
| break | |
| extracted_path, extracted_provider = inner_params |
| server_base_urls = {mass.webserver.base_url, mass.streams.base_url} | ||
| if url_host in server_base_urls: | ||
| if imageproxy_params := _extract_imageproxy_params(path_or_url): | ||
| extracted_path, extracted_provider = imageproxy_params |
There was a problem hiding this comment.
When extracting and decoding the imageproxy path parameter, there's no validation that the decoded path is safe before recursively calling get_image_data. While the recursive call will eventually validate the path through existing checks (e.g., is_safe_path for file paths), an attacker could potentially craft a malicious imageproxy URL that decodes to a path attempting directory traversal or other security issues. Consider adding an early validation check on the extracted_path before the recursive call, especially since this is handling user-controlled URL input.
| extracted_path, extracted_provider = imageproxy_params | |
| extracted_path, extracted_provider = imageproxy_params | |
| # Early validation of extracted path before recursive call | |
| if not extracted_path.startswith(("http://", "https://", "data:image")): | |
| if not is_safe_path(extracted_path): | |
| msg = f"Unsafe image path: {extracted_path}" | |
| raise FileNotFoundError(msg) |
Fix (edge) case where an image URL is pointing at our own imageproxy