Nickakhmetov/Add ometiff metadata notebook by NickAkhmetov · Pull Request #3940 · hubmapconsortium/portal-ui

NickAkhmetov · 2026-02-26T16:19:56Z

Summary

This PR adds a notebook for inspecting ome tiff metadata without downloading it. Since this is a common task I've had to do for visualization creation and troubleshooting, formalizing it into a notebook should help save time in the future.

This notebook allows users to fetch and display OME-TIFF metadata from a remote URL without downloading the entire file. It includes functionality to read TIFF headers, parse OME-XML, and extract image and pixel metadata, along with structured annotations and regions of interest (ROIs).

Copilot

Pull request overview

Adds a Jupyter notebook utility to inspect OME-TIFF metadata from remote files (via HTTP range requests) to support visualization debugging and troubleshooting workflows.

Changes:

Added inspect_ometiff_metadata.ipynb notebook that parses TIFF/BigTIFF headers/IFDs and extracts OME-XML metadata from ImageDescription.
Added a root CHANGELOG-ometiff-notebook.md entry documenting the addition.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
context/app/notebook/inspect_ometiff_metadata.ipynb	New notebook to fetch/parse remote TIFF headers/IFDs and display OME-XML/pyramid/ROI metadata.
CHANGELOG-ometiff-notebook.md	Changelog entry for the new notebook.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T20:02:22Z

+    "## Configuration\n",
+    "\n",
+    "Paste the full OME-TIFF URL (including any `?token=` query parameter) below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "IMAGE_URL = \"\"\n",
+    "\n",
+    "if not IMAGE_URL:\n",
+    "    raise ValueError(\"IMAGE_URL is required. Paste the full OME-TIFF URL above.\")"
+   ]


The instructions encourage pasting a full URL including a token query parameter into the notebook. That makes it easy to accidentally persist secrets in the .ipynb file and commit them. Consider reading IMAGE_URL from an environment variable (or prompting via input/getpass) and updating the markdown to discourage saving tokens in the notebook.

Copilot · 2026-02-27T20:02:23Z

+    "    \"\"\"Unpack the value/offset field of a TIFF IFD entry.\"\"\"\n",
+    "    if dtype == 3:  # SHORT\n",
+    "        return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",
+    "    if dtype == 4:  # LONG\n",
+    "        return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",
+    "    if dtype == 16:  # LONG8 (BigTIFF)\n",
+    "        return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
+    "    return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
+    "\n",
+    "\n",


_unpack_ifd_value() will raise on classic TIFF entries for types other than SHORT/LONG because it unconditionally unpacks a Q from val_bytes, but classic TIFF val_bytes is only 4 bytes. This will break parsing common tags like ImageDescription (ASCII, dtype=2) and prevent the notebook from working on non-BigTIFF OME-TIFFs. Consider unpacking based on the value/offset field size (4 vs 8) or passing bigtiff/ptr_size into this helper and using I for classic TIFF offsets; also treat non-inlined types (e.g., ASCII/RATIONAL) as offsets rather than immediate values.

Suggested change

" \"\"\"Unpack the value/offset field of a TIFF IFD entry.\"\"\"\n",

" if dtype == 3: # SHORT\n",

" return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",

" if dtype == 4: # LONG\n",

" return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",

" if dtype == 16: # LONG8 (BigTIFF)\n",

" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",

" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",

"\n",

"\n",

" \"\"\"Unpack the value/offset field of a TIFF IFD entry.\n",

"\n",

" For classic TIFF, the value/offset field is 4 bytes; for BigTIFF it is 8 bytes.\n",

" SHORT/LONG/LONG8 values are handled explicitly. For other dtypes, this helper\n",

" returns the field interpreted as an offset whose size matches `val_bytes`.\n",

" \"\"\"\n",

" # Explicit handling for known integer value types\n",

" if dtype == 3: # SHORT\n",

" return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",

" if dtype == 4: # LONG (classic TIFF 32-bit)\n",

" return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",

" if dtype == 16: # LONG8 (BigTIFF 64-bit)\n",

" return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",

"\n",

" # For other dtypes (e.g. ASCII, RATIONAL), this field is an offset into the file.\n",

" field_size = len(val_bytes)\n",

" if field_size >= 8:\n",

" offset_fmt, size = \"Q\", 8 # BigTIFF-style 64-bit offset\n",

" elif field_size >= 4:\n",

" offset_fmt, size = \"I\", 4 # classic TIFF 32-bit offset\n",

" else:\n",

" # Fallback for unexpectedly short fields; use 16-bit to avoid struct errors.\n",

" offset_fmt, size = \"H\", 2\n",

" return struct.unpack(f\"{byte_order}{offset_fmt}\", val_bytes[:size])[0]\n",

"\n",

"\n",

Copilot · 2026-02-27T20:02:23Z

+    "    \"\"\"Fetch a byte range from a remote URL.\"\"\"\n",
+    "    r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",
+    "    r.raise_for_status()\n",


fetch_range() doesn't verify that the server honored the Range request (e.g., HTTP 206 and/or a valid Content-Range header). If a server ignores Range and responds 200, this code can accidentally download the entire TIFF (potentially multi-GB), defeating the notebook's purpose and risking timeouts/memory pressure. Consider explicitly requiring 206 for ranged reads (and raising a clear error otherwise).

Suggested change

" \"\"\"Fetch a byte range from a remote URL.\"\"\"\n",

" r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",

" r.raise_for_status()\n",

" \"\"\"Fetch a byte range from a remote URL.\n",

"\n",

" This function requires the server to honor the HTTP Range request and\n",

" return 206 Partial Content with a valid Content-Range header.\n",

" \"\"\"\n",

" r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",

" r.raise_for_status()\n",

"\n",

" # Ensure the server actually honored the Range request to avoid\n",

" # accidentally downloading the entire file.\n",

" if r.status_code != 206:\n",

" raise RuntimeError(\n",

" f\"Server did not honor HTTP Range request: expected status 206, \"\n",

" f\"got {r.status_code}. Aborting to avoid downloading the full file.\"\n",

" )\n",

"\n",

" content_range = r.headers.get(\"Content-Range\")\n",

" if not content_range or not content_range.startswith(\"bytes \"):\n",

" raise RuntimeError(\n",

" \"Server response is missing a valid Content-Range header for a \"\n",

" \"range request. Aborting to avoid downloading the full file.\"\n",

" )\n",

"\n",

Copilot · 2026-02-27T20:02:23Z

+    "All metadata is fetched via `Range` headers so only a few KB are downloaded,\n",
+    "even for multi-GB pyramid files."


The notebook claims "only a few KB are downloaded" via range requests, but Step 2 fetches the full ImageDescription payload (OME-XML) which can be many MB (your example output shows ~23 MB). Consider updating this description to reflect that the OME-XML itself may require a larger download even though pixel data is avoided.

Suggested change

"All metadata is fetched via `Range` headers so only a few KB are downloaded,\n",

"even for multi-GB pyramid files."

"All metadata is fetched via `Range` headers so only the required portions of the file\n",

"are downloaded, avoiding any pixel data even for multi-GB pyramid files. Note that the\n",

"OME-XML ImageDescription block itself can be relatively large (up to many MB), so\n",

"metadata inspection may still transfer more than just a few kilobytes, but remains\n",

"much smaller than downloading the full image."

Copilot · 2026-02-27T20:02:23Z

+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "File size: 34,385,907,269 bytes (32.02 GB)\n",
+      "Accept-Ranges: bytes\n",
+      "Format: BigTIFF (big-endian)\n",
+      "First IFD offset: 34,361,978,704\n",
+      "\n",
+      "First IFD: 49152x65536, 16-bit x1ch, compression=None, tiles=512x512\n",
+      "  18 tags, 8 sub-IFDs\n",
+      "  Software: OME Bio-Formats 7.1.0\n"
+     ]
+    }
+   ],


This notebook is committed with non-null execution_count values and captured cell outputs. Other notebooks in context/app/notebook (e.g., files.ipynb:12-16, metadata.ipynb:12-16) keep execution_count=null and outputs=[] to minimize diff noise and avoid committing potentially sensitive output. Consider clearing outputs and resetting execution counts before committing.

Suggested change

"execution_count": 5,

"metadata": {},

"outputs": [

{

"name": "stdout",

"output_type": "stream",

"text": [

"File size: 34,385,907,269 bytes (32.02 GB)\n",

"Accept-Ranges: bytes\n",

"Format: BigTIFF (big-endian)\n",

"First IFD offset: 34,361,978,704\n",

"\n",

"First IFD: 49152x65536, 16-bit x1ch, compression=None, tiles=512x512\n",

" 18 tags, 8 sub-IFDs\n",

" Software: OME Bio-Formats 7.1.0\n"

]

}

],

"execution_count": null,

"metadata": {},

"outputs": [],

Copilot · 2026-02-27T20:02:24Z

+    "# Read the 8-byte TIFF header\n",
+    "header = fetch_range(IMAGE_URL, 0, 15)  # fetch 16 bytes to cover BigTIFF header\n",
+    "\n",
+    "byte_order = \"<\" if header[:2] == b\"II\" else \">\"\n",


byte_order detection defaults to big-endian for any file whose first two bytes are not b"II". For invalid/unsupported headers this can lead to confusing struct unpacking errors later. Consider explicitly validating for b"II"/b"MM" and raising a clear error if neither is present.

Suggested change

"byte_order = \"<\" if header[:2] == b\"II\" else \">\"\n",

"if header[:2] == b\"II\":\n",

" byte_order = \"<\"\n",

"elif header[:2] == b\"MM\":\n",

" byte_order = \">\"\n",

"else:\n",

" raise ValueError(f\"Unrecognized TIFF byte order in header: expected b'II' or b'MM', got {header[:2]!r}\")\n",

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

NickAkhmetov added 2 commits February 26, 2026 11:15

Changelog

911ee85

NickAkhmetov requested a review from john-conroy February 26, 2026 16:19

NickAkhmetov added the enhancement New feature or request label Feb 26, 2026

NickAkhmetov requested a review from Copilot February 27, 2026 19:56

Copilot started reviewing on behalf of NickAkhmetov February 27, 2026 19:57 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Update context/app/notebook/inspect_ometiff_metadata.ipynb

9efa626

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nickakhmetov/Add ometiff metadata notebook#3940

Nickakhmetov/Add ometiff metadata notebook#3940
NickAkhmetov wants to merge 3 commits intomainfrom
nickakhmetov/misc-ometiff-metadata-notebook

NickAkhmetov commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    "    \"\"\"Unpack the value/offset field of a TIFF IFD entry.\"\"\"\n",
-    "    if dtype == 3:  # SHORT\n",
-    "        return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",
-    "    if dtype == 4:  # LONG\n",
-    "        return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",
-    "    if dtype == 16:  # LONG8 (BigTIFF)\n",
-    "        return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
-    "    return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
-    "\n",
-    "\n",
+    "    \"\"\"Unpack the value/offset field of a TIFF IFD entry.\n",
+    "\n",
+    "    For classic TIFF, the value/offset field is 4 bytes; for BigTIFF it is 8 bytes.\n",
+    "    SHORT/LONG/LONG8 values are handled explicitly. For other dtypes, this helper\n",
+    "    returns the field interpreted as an offset whose size matches `val_bytes`.\n",
+    "    \"\"\"\n",
+    "    # Explicit handling for known integer value types\n",
+    "    if dtype == 3:  # SHORT\n",
+    "        return struct.unpack(f\"{byte_order}H\", val_bytes[:2])[0]\n",
+    "    if dtype == 4:  # LONG (classic TIFF 32-bit)\n",
+    "        return struct.unpack(f\"{byte_order}I\", val_bytes[:4])[0]\n",
+    "    if dtype == 16:  # LONG8 (BigTIFF 64-bit)\n",
+    "        return struct.unpack(f\"{byte_order}Q\", val_bytes[:8])[0]\n",
+    "\n",
+    "    # For other dtypes (e.g. ASCII, RATIONAL), this field is an offset into the file.\n",
+    "    field_size = len(val_bytes)\n",
+    "    if field_size >= 8:\n",
+    "        offset_fmt, size = \"Q\", 8  # BigTIFF-style 64-bit offset\n",
+    "    elif field_size >= 4:\n",
+    "        offset_fmt, size = \"I\", 4  # classic TIFF 32-bit offset\n",
+    "    else:\n",
+    "        # Fallback for unexpectedly short fields; use 16-bit to avoid struct errors.\n",
+    "        offset_fmt, size = \"H\", 2\n",
+    "    return struct.unpack(f\"{byte_order}{offset_fmt}\", val_bytes[:size])[0]\n",
+    "\n",
+    "\n",

-    "    \"\"\"Fetch a byte range from a remote URL.\"\"\"\n",
-    "    r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",
-    "    r.raise_for_status()\n",
+    "    \"\"\"Fetch a byte range from a remote URL.\n",
+    "\n",
+    "    This function requires the server to honor the HTTP Range request and\n",
+    "    return 206 Partial Content with a valid Content-Range header.\n",
+    "    \"\"\"\n",
+    "    r = requests.get(url, headers={\"Range\": f\"bytes={start}-{end}\"}, timeout=30)\n",
+    "    r.raise_for_status()\n",
+    "\n",
+    "    # Ensure the server actually honored the Range request to avoid\n",
+    "    # accidentally downloading the entire file.\n",
+    "    if r.status_code != 206:\n",
+    "        raise RuntimeError(\n",
+    "            f\"Server did not honor HTTP Range request: expected status 206, \"\n",
+    "            f\"got {r.status_code}. Aborting to avoid downloading the full file.\"\n",
+    "        )\n",
+    "\n",
+    "    content_range = r.headers.get(\"Content-Range\")\n",
+    "    if not content_range or not content_range.startswith(\"bytes \"):\n",
+    "        raise RuntimeError(\n",
+    "            \"Server response is missing a valid Content-Range header for a \"\n",
+    "            \"range request. Aborting to avoid downloading the full file.\"\n",
+    "        )\n",
+    "\n",

		"All metadata is fetched via `Range` headers so only a few KB are downloaded,\n",
		"even for multi-GB pyramid files."

-    "All metadata is fetched via `Range` headers so only a few KB are downloaded,\n",
-    "even for multi-GB pyramid files."
+    "All metadata is fetched via `Range` headers so only the required portions of the file\n",
+    "are downloaded, avoiding any pixel data even for multi-GB pyramid files. Note that the\n",
+    "OME-XML ImageDescription block itself can be relatively large (up to many MB), so\n",
+    "metadata inspection may still transfer more than just a few kilobytes, but remains\n",
+    "much smaller than downloading the full image."

-    "byte_order = \"<\" if header[:2] == b\"II\" else \">\"\n",
+    "if header[:2] == b\"II\":\n",
+    "    byte_order = \"<\"\n",
+    "elif header[:2] == b\"MM\":\n",
+    "    byte_order = \">\"\n",
+    "else:\n",
+    "    raise ValueError(f\"Unrecognized TIFF byte order in header: expected b'II' or b'MM', got {header[:2]!r}\")\n",

Conversation

NickAkhmetov commented Feb 26, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants