Update exportmi.py to scan VOB instead of IFO for ScanType if upload is a DVD full disc#1273
Update exportmi.py to scan VOB instead of IFO for ScanType if upload is a DVD full disc#12739Oc wants to merge 1 commit intoAudionut:masterfrom
Conversation
Fixes a bug that caused the ScanType to always be an empty dict in MediaInfo.json when parsing MediaInfo for DVD full discs. Bug was caused by attempting to read the .IFO MediaInfo output which does not contain the ScanType field, so a check was added to exportInfo to read the .VOB file to populate the ScanType value in the MediaInfo.json dict if we are uploading a DVD full disc.
|
Thanks for taking the time to contribute to this project. Upload Assistant is currently in a complete rewrite, and no new development is being conducted on this python source at this time. If you have come this far, please feel free to leave open, any pull requests regarding new sites being added to the source, as these can serve as the baseline for later conversion. If your pull request relates to a critical bug, this will be addressed in this code base, and a new release published as needed. If your pull request only addresses a quite minor bug, it is not likely to be addressed in this code base. Details for the new code base will follow at a later date. |
📝 WalkthroughWalkthroughThis change adds VOB file scanning functionality to detect and populate missing ScanType metadata for DVD content. A new helper function probes the largest .VOB file using either CLI Mediainfo or the Python library, then integrates the detected ScanType into the DVD processing workflow with fallback handling. Changes
Sequence DiagramsequenceDiagram
participant exportInfo as exportInfo Path
participant probe as _probe_vob_scantype()
participant glob as File System (glob)
participant mediainfo as MediaInfo (CLI/Library)
participant result as MediaInfo Data
exportInfo->>exportInfo: Check if DVD (is_dvd == true)
alt DVD Processing
exportInfo->>probe: Call with video path
probe->>glob: Find .VOB files near video
glob-->>probe: Return .VOB file list
probe->>mediainfo: Probe largest .VOB for ScanType
mediainfo-->>probe: Return ScanType or None
probe-->>exportInfo: Return ScanType string
exportInfo->>result: Populate ScanType in first Video track
result-->>exportInfo: Updated MediaInfo dict
else Non-DVD
exportInfo->>result: Use standard MediaInfo parsing
result-->>exportInfo: Return existing data
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/exportmi.py`:
- Around line 531-576: The current logic picks a single 'chosen' VOB and returns
immediately if its tracks lack ScanType; instead, iterate through all candidate
VOBs sorted by descending file size (so largest-first) and probe each with the
existing mediainfo_cmd fallback (subprocess.run) then MediaInfo.parse until you
find a Video track with a non-empty ScanType/scanType (handling nested dicts as
you already do). Update the use of 'chosen' to a loop over 'candidates' (sorted
by os.path.getsize) and only return when a valid scan type is found; if none
yield a value, return "" as before. Ensure you still respect the
mediainfo_cmd/MediaInfo.parse fallback flow for each candidate.
| candidates = [] | ||
| for candidate in (base.replace('_0.IFO', '_1.VOB'), base.replace('.IFO', '.VOB')): | ||
| p = os.path.join(folder, candidate) | ||
| if os.path.exists(p): | ||
| candidates.append(p) | ||
| if not candidates: | ||
| candidates = glob.glob(os.path.join(folder, "*.VOB")) | ||
| if not candidates: | ||
| return "" | ||
|
|
||
| # choose largest VOB (most likely main) | ||
| chosen = max(candidates, key=lambda p: os.path.getsize(p) if os.path.exists(p) else 0) | ||
|
|
||
| # try CLI first (faster/more consistent), else pymediainfo | ||
| json_text = None | ||
| if mediainfo_cmd: | ||
| try: | ||
| proc = await asyncio.to_thread(subprocess.run, [mediainfo_cmd, "--Output=JSON", chosen], | ||
| capture_output=True, text=True, timeout=30) | ||
| if proc.returncode == 0 and proc.stdout: | ||
| json_text = proc.stdout | ||
| except Exception: | ||
| json_text = None | ||
|
|
||
| if not json_text: | ||
| try: | ||
| json_text = MediaInfo.parse(chosen, output="JSON") | ||
| except Exception: | ||
| json_text = None | ||
|
|
||
| if not json_text: | ||
| return "" | ||
|
|
||
| mi = json.loads(json_text) if isinstance(json_text, str) else json_text | ||
| for track in mi.get("media", {}).get("track", []): | ||
| if track.get("@type") == "Video": | ||
| val = track.get("ScanType") or track.get("scanType") or "" | ||
| # handle nested dicts | ||
| if isinstance(val, dict): | ||
| for key in ("#text", "text", "value", "@value", "Name", "String"): | ||
| if key in val and val[key]: | ||
| val = val[key] | ||
| break | ||
| else: | ||
| val = "" | ||
| return str(val) if val else "" |
There was a problem hiding this comment.
Fallback probing can still miss ScanType on some DVD layouts.
At Line 536, glob fallback only runs when heuristic candidates are empty. If a heuristic file exists but has no usable video ScanType, Line 576 returns empty without trying other .VOB files, so the ScanType can still remain unset.
💡 Proposed fix (probe multiple candidates, largest-first, until ScanType is found)
- candidates = []
+ candidates: list[str] = []
for candidate in (base.replace('_0.IFO', '_1.VOB'), base.replace('.IFO', '.VOB')):
p = os.path.join(folder, candidate)
if os.path.exists(p):
candidates.append(p)
- if not candidates:
- candidates = glob.glob(os.path.join(folder, "*.VOB"))
+ # Always include full folder candidates as fallback pool
+ candidates.extend(glob.glob(os.path.join(folder, "*.VOB")))
+ candidates.extend(glob.glob(os.path.join(folder, "*.vob")))
+ # de-dupe while preserving order
+ candidates = list(dict.fromkeys(candidates))
if not candidates:
return ""
- # choose largest VOB (most likely main)
- chosen = max(candidates, key=lambda p: os.path.getsize(p) if os.path.exists(p) else 0)
-
- # try CLI first (faster/more consistent), else pymediainfo
- json_text = None
- if mediainfo_cmd:
- try:
- proc = await asyncio.to_thread(subprocess.run, [mediainfo_cmd, "--Output=JSON", chosen],
- capture_output=True, text=True, timeout=30)
- if proc.returncode == 0 and proc.stdout:
- json_text = proc.stdout
- except Exception:
- json_text = None
-
- if not json_text:
- try:
- json_text = MediaInfo.parse(chosen, output="JSON")
- except Exception:
- json_text = None
-
- if not json_text:
- return ""
-
- mi = json.loads(json_text) if isinstance(json_text, str) else json_text
- for track in mi.get("media", {}).get("track", []):
- if track.get("@type") == "Video":
- val = track.get("ScanType") or track.get("scanType") or ""
- # handle nested dicts
- if isinstance(val, dict):
- for key in ("#text", "text", "value", "@value", "Name", "String"):
- if key in val and val[key]:
- val = val[key]
- break
- else:
- val = ""
- return str(val) if val else ""
+ # Probe largest-first until we find a usable ScanType
+ for chosen in sorted(candidates, key=lambda p: os.path.getsize(p) if os.path.exists(p) else 0, reverse=True):
+ json_text = None
+ if mediainfo_cmd:
+ try:
+ proc = await asyncio.to_thread(
+ subprocess.run,
+ [mediainfo_cmd, "--Output=JSON", chosen],
+ capture_output=True,
+ text=True,
+ timeout=30,
+ )
+ if proc.returncode == 0 and proc.stdout:
+ json_text = proc.stdout
+ except Exception:
+ json_text = None
+
+ if not json_text:
+ try:
+ json_text = MediaInfo.parse(chosen, output="JSON")
+ except Exception:
+ json_text = None
+
+ if not json_text:
+ continue
+
+ mi = json.loads(json_text) if isinstance(json_text, str) else json_text
+ for track in mi.get("media", {}).get("track", []):
+ if track.get("@type") == "Video":
+ val = track.get("ScanType") or track.get("scanType") or ""
+ if isinstance(val, dict):
+ for key in ("#text", "text", "value", "@value", "Name", "String"):
+ extracted = val.get(key)
+ if extracted:
+ val = extracted
+ break
+ else:
+ val = ""
+ if val:
+ return str(val)
except Exception:🧰 Tools
🪛 Ruff (0.15.2)
[warning] 552-552: Do not catch blind exception: Exception
(BLE001)
[warning] 558-558: Do not catch blind exception: Exception
(BLE001)
[warning] 571-571: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/exportmi.py` around lines 531 - 576, The current logic picks a single
'chosen' VOB and returns immediately if its tracks lack ScanType; instead,
iterate through all candidate VOBs sorted by descending file size (so
largest-first) and probe each with the existing mediainfo_cmd fallback
(subprocess.run) then MediaInfo.parse until you find a Video track with a
non-empty ScanType/scanType (handling nested dicts as you already do). Update
the use of 'chosen' to a loop over 'candidates' (sorted by os.path.getsize) and
only return when a valid scan type is found; if none yield a value, return "" as
before. Ensure you still respect the mediainfo_cmd/MediaInfo.parse fallback flow
for each candidate.
There was a problem hiding this comment.
This is a bad suggestion as the proposed change will undoubtedly cause an incorrect scan type to be written to MediaInfo.json at some point. We do not ever want to read a .VOB file that is not the largest one as the largest .VOB file in a DVD full disc will always be the main movie. If you read any of the smaller .VOB files, they main contain a different scan type than the main movie .VOB files. It is better to fallback to returning an empty string and allowing video.py to handle the case where the Scan type field is not present in the largest .VOB MediaInfo dict.
There was a problem hiding this comment.
@9Oc, you're absolutely right—thank you for the correction! I understand now that the largest VOB file always contains the main movie on a DVD full disc, and probing smaller VOBs could indeed lead to an incorrect scan type being written. Your current implementation correctly identifies and uses only the largest VOB, with a fallback to an empty string that video.py can handle appropriately. I appreciate the explanation of the DVD structure!
✏️ Learnings added
Learnt from: 9Oc
Repo: Audionut/Upload-Assistant PR: 1273
File: src/exportmi.py:531-576
Timestamp: 2026-03-01T05:35:38.825Z
Learning: In DVD full disc structures, the largest .VOB file always contains the main movie. Smaller .VOB files may contain different scan types (e.g., special features, menus) and should never be probed for the main video's ScanType. When determining ScanType for DVDs in src/exportmi.py, only the largest VOB should be read, and if it lacks ScanType, return an empty string to let video.py handle the missing field.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: Lusephur
Repo: Audionut/Upload-Assistant PR: 1261
File: src/trackers/TVC.py:106-125
Timestamp: 2026-02-21T21:40:40.803Z
Learning: TVC (src/trackers/TVC.py) does not currently accept BDMV/Blu-ray disc releases or remuxes—only HDTV and WEB-DL. The `_build_disc_info` method exists for potential future compatibility but is not called during normal TVC uploads due to disc blocking in `search_existing()`.
Learnt from: Audionut
Repo: Audionut/Upload-Assistant PR: 1025
File: src/trackers/TVC.py:32-86
Timestamp: 2025-12-14T00:20:21.219Z
Learning: In src/trackers/TVC.py, the genres argument passed to get_cat_id() is a string (comma-separated) rather than a list or other type. Remove or avoid code paths that handle list inputs for this parameter; assume and validate string input, adjust type hints if present, and, if possible, add a unit test to ensure a string is passed.
Learnt from: Audionut
Repo: Audionut/Upload-Assistant PR: 1081
File: src/trackers/CBR.py:94-94
Timestamp: 2025-12-27T13:04:22.071Z
Learning: In Python codebases where a dict (e.g., meta) is guaranteed to always include a specific key (e.g., 'tag'), direct dictionary access meta['tag'] is safe and preferred over meta.get('tag') for performance and readability. Ensure the key is indeed always initialized in all code paths; if there's any doubt, consider defensive checks or documentation to avoid KeyError.
Learnt from: Audionut
Repo: Audionut/Upload-Assistant PR: 1114
File: src/manualpackage.py:16-23
Timestamp: 2026-01-11T05:07:09.469Z
Learning: In Python type hints for read-only mappings (e.g., DEFAULT_CONFIG, TRACKER_CONFIG), prefer Mapping[str, Any] over Dict[str, Any] to signal immutability intent. Use from typing import Mapping; annotate with Mapping[str, Any] while allowing concrete dicts at runtime. Do not rely on isinstance(value, dict) for immutability checks; this is a type-hint convention, not a runtime guarantee.
Learnt from: Audionut
Repo: Audionut/Upload-Assistant PR: 1114
File: src/manualpackage.py:16-23
Timestamp: 2026-01-11T05:07:09.469Z
Learning: In the Upload-Assistant repository, avoid flagging minor or stylistic issues (e.g., ValueError vs TypeError for type validation) unless they represent actual bugs or significant problems. Follow Audionut's preference to focus on substantive issues rather than minor concerns across Python files.
|
Whoops, im stupid and didnt even see mr. nut already made a PR for this. |
Fixes a bug that caused the ScanType to always be an empty dict in MediaInfo.json when parsing MediaInfo for DVD full discs.
Bug was caused by attempting to read the .IFO MediaInfo output which does not contain the ScanType field, so a check was added to exportInfo to read the .VOB file to populate the ScanType value in the MediaInfo.json dict if we are uploading a DVD full disc.
Previous MediaInfo.json dict:
"ScanType": {},After the fix:
"ScanType": "Interlaced",Summary by CodeRabbit