Skip to content

Nickakhmetov/Add ometiff metadata notebook#3940

Open
NickAkhmetov wants to merge 3 commits intomainfrom
nickakhmetov/misc-ometiff-metadata-notebook
Open

Nickakhmetov/Add ometiff metadata notebook#3940
NickAkhmetov wants to merge 3 commits intomainfrom
nickakhmetov/misc-ometiff-metadata-notebook

Conversation

@NickAkhmetov
Copy link
Copy Markdown
Collaborator

Summary

This PR adds a notebook for inspecting ome tiff metadata without downloading it. Since this is a common task I've had to do for visualization creation and troubleshooting, formalizing it into a notebook should help save time in the future.

This notebook allows users to fetch and display OME-TIFF metadata from a remote URL without downloading the entire file. It includes functionality to read TIFF headers, parse OME-XML, and extract image and pixel metadata, along with structured annotations and regions of interest (ROIs).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Jupyter notebook utility to inspect OME-TIFF metadata from remote files (via HTTP range requests) to support visualization debugging and troubleshooting workflows.

Changes:

  • Added inspect_ometiff_metadata.ipynb notebook that parses TIFF/BigTIFF headers/IFDs and extracts OME-XML metadata from ImageDescription.
  • Added a root CHANGELOG-ometiff-notebook.md entry documenting the addition.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
context/app/notebook/inspect_ometiff_metadata.ipynb New notebook to fetch/parse remote TIFF headers/IFDs and display OME-XML/pyramid/ROI metadata.
CHANGELOG-ometiff-notebook.md Changelog entry for the new notebook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +31 to +46
"## Configuration\n",
"\n",
"Paste the full OME-TIFF URL (including any `?token=` query parameter) below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"IMAGE_URL = \"\"\n",
"\n",
"if not IMAGE_URL:\n",
" raise ValueError(\"IMAGE_URL is required. Paste the full OME-TIFF URL above.\")"
]
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions encourage pasting a full URL including a token query parameter into the notebook. That makes it easy to accidentally persist secrets in the .ipynb file and commit them. Consider reading IMAGE_URL from an environment variable (or prompting via input/getpass) and updating the markdown to discourage saving tokens in the notebook.

Copilot uses AI. Check for mistakes.
Comment thread context/app/notebook/inspect_ometiff_metadata.ipynb Outdated
Comment on lines +110 to +119
" \"\"\"Unpack the value/offset field of a TIFF IFD entry.\"\"\"\n",
" if dtype == 3: # SHORT\n",
" return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",
" if dtype == 4: # LONG\n",
" return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",
" if dtype == 16: # LONG8 (BigTIFF)\n",
" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
"\n",
"\n",
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_unpack_ifd_value() will raise on classic TIFF entries for types other than SHORT/LONG because it unconditionally unpacks a Q from val_bytes, but classic TIFF val_bytes is only 4 bytes. This will break parsing common tags like ImageDescription (ASCII, dtype=2) and prevent the notebook from working on non-BigTIFF OME-TIFFs. Consider unpacking based on the value/offset field size (4 vs 8) or passing bigtiff/ptr_size into this helper and using I for classic TIFF offsets; also treat non-inlined types (e.g., ASCII/RATIONAL) as offsets rather than immediate values.

Suggested change
" \"\"\"Unpack the value/offset field of a TIFF IFD entry.\"\"\"\n",
" if dtype == 3: # SHORT\n",
" return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",
" if dtype == 4: # LONG\n",
" return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",
" if dtype == 16: # LONG8 (BigTIFF)\n",
" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
"\n",
"\n",
" \"\"\"Unpack the value/offset field of a TIFF IFD entry.\n",
"\n",
" For classic TIFF, the value/offset field is 4 bytes; for BigTIFF it is 8 bytes.\n",
" SHORT/LONG/LONG8 values are handled explicitly. For other dtypes, this helper\n",
" returns the field interpreted as an offset whose size matches `val_bytes`.\n",
" \"\"\"\n",
" # Explicit handling for known integer value types\n",
" if dtype == 3: # SHORT\n",
" return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",
" if dtype == 4: # LONG (classic TIFF 32-bit)\n",
" return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",
" if dtype == 16: # LONG8 (BigTIFF 64-bit)\n",
" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
"\n",
" # For other dtypes (e.g. ASCII, RATIONAL), this field is an offset into the file.\n",
" field_size = len(val_bytes)\n",
" if field_size >= 8:\n",
" offset_fmt, size = \"Q\", 8 # BigTIFF-style 64-bit offset\n",
" elif field_size >= 4:\n",
" offset_fmt, size = \"I\", 4 # classic TIFF 32-bit offset\n",
" else:\n",
" # Fallback for unexpectedly short fields; use 16-bit to avoid struct errors.\n",
" offset_fmt, size = \"H\", 2\n",
" return struct.unpack(f\"{byte_order}{offset_fmt}\", val_bytes[:size])[0]\n",
"\n",
"\n",

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +67
" \"\"\"Fetch a byte range from a remote URL.\"\"\"\n",
" r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",
" r.raise_for_status()\n",
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch_range() doesn't verify that the server honored the Range request (e.g., HTTP 206 and/or a valid Content-Range header). If a server ignores Range and responds 200, this code can accidentally download the entire TIFF (potentially multi-GB), defeating the notebook's purpose and risking timeouts/memory pressure. Consider explicitly requiring 206 for ranged reads (and raising a clear error otherwise).

Suggested change
" \"\"\"Fetch a byte range from a remote URL.\"\"\"\n",
" r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",
" r.raise_for_status()\n",
" \"\"\"Fetch a byte range from a remote URL.\n",
"\n",
" This function requires the server to honor the HTTP Range request and\n",
" return 206 Partial Content with a valid Content-Range header.\n",
" \"\"\"\n",
" r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",
" r.raise_for_status()\n",
"\n",
" # Ensure the server actually honored the Range request to avoid\n",
" # accidentally downloading the entire file.\n",
" if r.status_code != 206:\n",
" raise RuntimeError(\n",
" f\"Server did not honor HTTP Range request: expected status 206, \"\n",
" f\"got {r.status_code}. Aborting to avoid downloading the full file.\"\n",
" )\n",
"\n",
" content_range = r.headers.get(\"Content-Range\")\n",
" if not content_range or not content_range.startswith(\"bytes \"):\n",
" raise RuntimeError(\n",
" \"Server response is missing a valid Content-Range header for a \"\n",
" \"range request. Aborting to avoid downloading the full file.\"\n",
" )\n",
"\n",

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +55
"All metadata is fetched via `Range` headers so only a few KB are downloaded,\n",
"even for multi-GB pyramid files."
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook claims "only a few KB are downloaded" via range requests, but Step 2 fetches the full ImageDescription payload (OME-XML) which can be many MB (your example output shows ~23 MB). Consider updating this description to reflect that the OME-XML itself may require a larger download even though pixel data is avoided.

Suggested change
"All metadata is fetched via `Range` headers so only a few KB are downloaded,\n",
"even for multi-GB pyramid files."
"All metadata is fetched via `Range` headers so only the required portions of the file\n",
"are downloaded, avoiding any pixel data even for multi-GB pyramid files. Note that the\n",
"OME-XML ImageDescription block itself can be relatively large (up to many MB), so\n",
"metadata inspection may still transfer more than just a few kilobytes, but remains\n",
"much smaller than downloading the full image."

Copilot uses AI. Check for mistakes.
Comment on lines +249 to +266
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"File size: 34,385,907,269 bytes (32.02 GB)\n",
"Accept-Ranges: bytes\n",
"Format: BigTIFF (big-endian)\n",
"First IFD offset: 34,361,978,704\n",
"\n",
"First IFD: 49152x65536, 16-bit x1ch, compression=None, tiles=512x512\n",
" 18 tags, 8 sub-IFDs\n",
" Software: OME Bio-Formats 7.1.0\n"
]
}
],
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook is committed with non-null execution_count values and captured cell outputs. Other notebooks in context/app/notebook (e.g., files.ipynb:12-16, metadata.ipynb:12-16) keep execution_count=null and outputs=[] to minimize diff noise and avoid committing potentially sensitive output. Consider clearing outputs and resetting execution counts before committing.

Suggested change
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"File size: 34,385,907,269 bytes (32.02 GB)\n",
"Accept-Ranges: bytes\n",
"Format: BigTIFF (big-endian)\n",
"First IFD offset: 34,361,978,704\n",
"\n",
"First IFD: 49152x65536, 16-bit x1ch, compression=None, tiles=512x512\n",
" 18 tags, 8 sub-IFDs\n",
" Software: OME Bio-Formats 7.1.0\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],

Copilot uses AI. Check for mistakes.
"# Read the 8-byte TIFF header\n",
"header = fetch_range(IMAGE_URL, 0, 15) # fetch 16 bytes to cover BigTIFF header\n",
"\n",
"byte_order = \"<\" if header[:2] == b\"II\" else \">\"\n",
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte_order detection defaults to big-endian for any file whose first two bytes are not b"II". For invalid/unsupported headers this can lead to confusing struct unpacking errors later. Consider explicitly validating for b"II"/b"MM" and raising a clear error if neither is present.

Suggested change
"byte_order = \"<\" if header[:2] == b\"II\" else \">\"\n",
"if header[:2] == b\"II\":\n",
" byte_order = \"<\"\n",
"elif header[:2] == b\"MM\":\n",
" byte_order = \">\"\n",
"else:\n",
" raise ValueError(f\"Unrecognized TIFF byte order in header: expected b'II' or b'MM', got {header[:2]!r}\")\n",

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants