Skip to content

feat(iso): concat Blu-ray main feature across clips and discs#599

Open
javi11 wants to merge 8 commits into
mainfrom
session/suspicious-torvalds-8598c3
Open

feat(iso): concat Blu-ray main feature across clips and discs#599
javi11 wants to merge 8 commits into
mainfrom
session/suspicious-torvalds-8598c3

Conversation

@javi11
Copy link
Copy Markdown
Owner

@javi11 javi11 commented May 20, 2026

Summary

  • Parse BDMV/PLAYLIST/*.mpls to identify the main feature playlist and its ordered M2TS clip list, replacing the heuristic that kept only the largest M2TS per ISO.
  • Detect multi-disc releases by stripping DISC|CD|PART_<n> suffixes from the ISO 9660 volume label (with a filename fallback), grouping discs that arrive in the same NZB.
  • Emit one Content per group whose NestedSources chain spans every M2TS in disc-then-playlist order, so the player sees a single seekable virtual .m2ts end-to-end via WebDAV / FUSE / Stremio.
  • Shared the new logic as archive.ExpandISOContents and removed the duplicated expandISOContents from rar and sevenzip.

Why

Long Blu-ray releases (the trigger here was AVATAR_FIRE_AND_ASH_DISC_1 / _DISC_2) split the main feature across both axes. The old picker dropped clips 2..N from each disc and treated each disc as an unrelated movie. The metadata layer's MetadataVirtualFile.createNestedReader already concatenates NestedSource chains with mixed encrypted/unencrypted members — we only needed to teach the importer to produce that ordered list.

Files

New:

  • internal/importer/archive/iso/mpls.go + _test.go — minimal BDA-spec MPLS parser (clip names, IN/OUT ticks, multi-angle PlayItems skipped via length prefix).
  • internal/importer/archive/iso/volume.go + _test.go — reads the 9660 PVD volume label from sector 16 (hybrid BD ISOs always carry one).
  • internal/importer/archive/iso/bluray.go + _test.go — locates BDMV/PLAYLIST/*.mpls, picks the longest playlist, resolves clip names to ordered BDMV/STREAM/*.M2TS entries.
  • internal/importer/archive/iso_expansion.go + _test.go — shared ExpandISOContents, disc-group regex, main-feature concat assembly.

Modified:

  • internal/importer/archive/iso/types.go — new AnalyzedISO struct (VolumeLabel, Files, MainFeature, DurationTicks).
  • internal/importer/archive/iso/processor.goAnalyzeISO replaces AnalyzeISOContent; encrypted/unencrypted file build paths factored.
  • internal/importer/archive/rar/aggregator.go and sevenzip/aggregator.go — call archive.ExpandISOContents; deleted the duplicated local implementations.

Behaviour

  • BDMV disc with one M2TS → concat = single clip (same bytes as before, but now via NestedSources).
  • BDMV disc with N clips in the main playlist → concat = N clips end-to-end (new).
  • Two ISOs whose labels share a stripped base name → one merged Content spanning every clip across both discs in disc-then-playlist order (new).
  • DVD VIDEO_TS / software disc / unparseable MPLS → falls back to the legacy "largest file" picker per ISO (no regression).
  • Mixed BDMV + non-BDMV in the same group → falls back to per-disc handling (defensive against false groupings).

Tests

  • go test -race ./... passes across the whole repo.
  • go tool golangci-lint run ./internal/importer/archive/... clean.
  • New unit tests cover: MPLS parsing edge cases (single / 5 PlayItems / multi-angle / wrong magic / truncated / bad offset), PVD volume label reading, main-playlist selection by duration, disc-group regex variants (DISC/CD/PART/letter), encrypted vs unencrypted NestedSource conversion, two-disc concat assembly.

Test plan

  • Import the actual AVATAR_FIRE_AND_ASH two-disc NZB; confirm a single virtual .m2ts appears at the library path with size ≈ sum of all main-feature M2TS across both discs.
  • Scrub through it in VLC / Stremio across the disc boundary; verify seeking around the disc-1 total size lands at the start of disc 2's first clip.
  • Re-import a standard single-disc BDMV release; confirm the main feature now plays end-to-end across clips that previously got dropped.
  • Re-import a non-BDMV ISO (e.g. plain MKV inside an ISO); confirm legacy behaviour unchanged.

Out of scope

  • Cross-NZB linking when disc 1 and disc 2 ship as separate posts (current scope: both discs in one NZB).
  • DVD VIDEO_TS / IFO playlist parsing.
  • M2TS container rewriting for perfectly seamless seeking across clip joins (players typically tolerate this).

javi11 added 8 commits May 20, 2026 19:51
Long Blu-ray releases split the main feature two ways: across multiple
M2TS clips within a disc (joined by BDMV/PLAYLIST/*.mpls), and across
multiple discs in one NZB (e.g. AVATAR_FIRE_AND_ASH_DISC_1 / _DISC_2).
The importer previously kept only the single largest M2TS per ISO,
which both dropped the rest of the movie within a disc and treated each
disc as an unrelated file.

Now ExpandISOContents (shared between rar and sevenzip aggregators)
parses the main MPLS playlist on each ISO, reads the 9660 PVD volume
label, groups ISOs by stripped base name with a DISC|CD|PART suffix
regex, and emits a single Content whose NestedSources chain spans every
M2TS in disc-then-playlist order. The metadata layer's existing nested
multi-reader produces one seamless seekable virtual file. Non-BDMV
discs and unparseable playlists fall back to the legacy largest-file
behaviour so nothing regresses.
On a 3D-only Blu-ray release (e.g. AVATAR_FIRE_AND_ASH_3D), the main
feature playlist references clips that exist only as SSIF files in
BDMV/STREAM/SSIF/ — the M2TS directory holds short extras. The previous
resolver indexed only M2TS, so the long 3D playlist failed to resolve
any clips and a short extras playlist won by default, producing a ~177
MB virtual file for a movie whose NZB carries ~88 GB of source data.

Resolve clip names against M2TS first (preserves the smaller, more
compatible 2D version on hybrid 3D releases) and fall back to SSIF when
only it can satisfy the playlist. Two new test cases cover the
3D-only-with-SSIF and hybrid-prefers-M2TS paths.
A repeated 88GB-NZB run is still producing a 177MB virtual file with
clips=2 — byte-identical to the pre-SSIF-fix output. Three hypotheses
remain: stale binary, 'no actual SSIF in this BDMV' (release uses M2TS
only), or SSIF lives at a non-standard path.

Add one summary log per ISO (total files, playlist count, M2TS and SSIF
clip counts, 12 sample paths) and one log per evaluated MPLS (resolved
clip count, unresolved count, duration ticks, summed stream bytes) plus
one 'picked' line. All prefixed with [DEBUG-isobd] for cheap cleanup
and to confirm the new binary is live (the prefix won't appear in
prior builds).
Real-ISO run shows all 38 playlists with items=1, max duration 80s,
max stream bytes 141MB — yet the NZB carries ~88GB across 2 ISOs.
Either ListISOFiles is dropping huge files (UDF alloc-type 2/3 not
handled) or reading wrong sizes for them. Add to the bdmv-scan log:
- sum of every file size (across all entries)
- sum of M2TS-only and SSIF-only sizes
- the 6 largest files with human-readable sizes

One log line will distinguish 'sizes truncated', 'big files missing',
and 'release is genuinely tiny'.
Real run shows all_files_sum_bytes=1.13 GiB across 295 files, biggest
single file 135 MiB. NZB is 88 GiB across 2 ISOs. Need to know whether
src.Size (claimed ISO bytes from the outer RAR archive) matches the
sum of what ListISOFiles enumerated, or whether the walker is missing
multi-GB files. One [DEBUG-isobd] iso analyse line per ISO now prints
filename, iso_size, listed_files, listed_sum, and coverage_pct so the
discrepancy is impossible to miss.
Root cause of the 'main feature M2TS files invisible' bug. udfReadDirEntries
parsed every File Identifier Descriptor in a directory but only ever read
the FIRST 2048-byte sector of each allocation descriptor's extent — even
when the extent's ad.length claimed it spanned many sectors. A Blu-ray
BDMV/STREAM/ directory with ~2500 FIDs (~30 KiB of FID data) lost every
entry past the first sector, including the multi-GB main-feature clips
00016/00017/00022/00023/00028/00029 and the corresponding SSIF files.

Local repro against AVATAR_FIRE_AND_ASH_3D_DISC_1.iso (37 GiB):
- Before: listed_files=298  sum=1.16 GiB  coverage=3.1%   (no clip >135 MiB)
- After:  listed_files=2523 sum=74 GiB                    (00022.m2ts=17 GiB ✓)

Fix factors readMetaExtent / readICBExtent helpers that walk every sector
of an extent until ad.length is exhausted. Both fail-soft on EOF so a
malformed image returns partial data rather than aborting the import.

The pre-existing TestUDFReadDirEntriesShortADClampsExtentLength was
pinning the BUGGY behaviour (it asserted the walker would truncate to one
sector); renamed to TestUDFReadDirEntriesTruncatedExtent and now asserts
the new contract: when an extent claims more sectors than the image
contains, the walker returns whatever data it could read without an error.

Adds fs_local_test.go: an ALTMOUNT_LOCAL_ISO=<path> gated integration test
that catches this class of bug instantly against a real ISO. Skipped in CI.

Also strips the [DEBUG-isobd] / [DEBUG-walk] instrumentation added during
the investigation and tones the resolver / processor logs down to one
production-grade INFO line per ISO and per main-feature pick.
The directory-listing fix exposed a second latent bug downstream: the
walker only stored ONE allocation descriptor's LBA per file even though
huge Blu-ray clips are split across hundreds of extents (Avatar's
00022.m2ts: 945, 00023.m2ts: 945, 00028.m2ts: 294, 00016.m2ts: 238).
For every multi-extent file, downstream reads of bytes past the first
extent's length returned wrong sectors (whatever happened to live next
to extent 1 on disc) instead of the file's real data — silent
corruption ~50× the size of the visible bug.

Changes:
- isoFileEntry now carries []isoExtent instead of a single lba field.
- collectFileExtents() walks every inline AD and chases Allocation
  Extent Descriptor (UDF tag 258) chains so files with more ADs than
  fit in the FE sector are fully enumerated. Caps total extent bytes
  at info_length so a malformed FE can't yield more data than the
  file claims.
- ISOFileContent gains a Sources []ISONestedSource slice (one per
  extent) and drops the single-Segments / single-NestedSource fields.
- buildFileContent emits one ISONestedSource per extent: unencrypted
  ISOs pre-slice outer segments to cover each extent; encrypted ISOs
  keep the full outer segments and seek via InnerOffset (AES-CBC IV
  chain still anchors at byte 0 of the outer ISO).
- archive.isoFileContentToNestedSource → isoFileContentToNestedSources
  fans the slice out into one archive.NestedSource per extent.
- buildMainFeatureContent and buildLargestFileContent thread the
  multi-source path so the final concat Content carries every extent
  of every clip in disc-then-playlist order.

Verified against the real Avatar disc 1 ISO via fs_local_test.go:
00022.m2ts: 945 extents, sum-of-extent-lengths == 17 GiB info_length.
TestLocalISO_DiscoverBigFiles asserts >=2 extents and full coverage
for the sentinel big-clip set.
A BD3D SSIF often emits a dozen separate UDF allocation descriptors for
what's a single contiguous run of sectors on disc. After the multi-
extent fix, each AD became its own NestedSource — bloating the proto
metadata, the validation-sample surface, and the per-file open-handle
count for what is logically one extent.

coalesceExtents merges adjacent extents whose physical sectors follow
the previous extent's last sector. Measured against the real Avatar
disc 1 ISO:
- BDMV/STREAM/SSIF/00022.ssif (22 GiB): 23 extents -> 2
- BDMV/STREAM/SSIF/00028.ssif  (7 GiB):  7 extents -> 1
- BDMV/STREAM/SSIF/00016.ssif  (6 GiB):  6 extents -> 1
M2TS files keep their full extent list because BD authoring genuinely
interleaves the M2TS clips with the SSIF dependent-view data on disc.

Note: the recent import failure ("not a valid ISO 9660 or UDF image"
on disc 1, segment "44c89668..." unreachable during validation) is a
Usenet-side issue — disc 2 analysed cleanly in 30 seconds with the
same code path; disc 1 timed out reading its first sectors for 9
minutes before giving up. The coalescing change reduces the surface
where transient flakes can bite but cannot eliminate it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant