feat(iso): concat Blu-ray main feature across clips and discs#599
Open
javi11 wants to merge 8 commits into
Open
feat(iso): concat Blu-ray main feature across clips and discs#599javi11 wants to merge 8 commits into
javi11 wants to merge 8 commits into
Conversation
Long Blu-ray releases split the main feature two ways: across multiple M2TS clips within a disc (joined by BDMV/PLAYLIST/*.mpls), and across multiple discs in one NZB (e.g. AVATAR_FIRE_AND_ASH_DISC_1 / _DISC_2). The importer previously kept only the single largest M2TS per ISO, which both dropped the rest of the movie within a disc and treated each disc as an unrelated file. Now ExpandISOContents (shared between rar and sevenzip aggregators) parses the main MPLS playlist on each ISO, reads the 9660 PVD volume label, groups ISOs by stripped base name with a DISC|CD|PART suffix regex, and emits a single Content whose NestedSources chain spans every M2TS in disc-then-playlist order. The metadata layer's existing nested multi-reader produces one seamless seekable virtual file. Non-BDMV discs and unparseable playlists fall back to the legacy largest-file behaviour so nothing regresses.
On a 3D-only Blu-ray release (e.g. AVATAR_FIRE_AND_ASH_3D), the main feature playlist references clips that exist only as SSIF files in BDMV/STREAM/SSIF/ — the M2TS directory holds short extras. The previous resolver indexed only M2TS, so the long 3D playlist failed to resolve any clips and a short extras playlist won by default, producing a ~177 MB virtual file for a movie whose NZB carries ~88 GB of source data. Resolve clip names against M2TS first (preserves the smaller, more compatible 2D version on hybrid 3D releases) and fall back to SSIF when only it can satisfy the playlist. Two new test cases cover the 3D-only-with-SSIF and hybrid-prefers-M2TS paths.
A repeated 88GB-NZB run is still producing a 177MB virtual file with clips=2 — byte-identical to the pre-SSIF-fix output. Three hypotheses remain: stale binary, 'no actual SSIF in this BDMV' (release uses M2TS only), or SSIF lives at a non-standard path. Add one summary log per ISO (total files, playlist count, M2TS and SSIF clip counts, 12 sample paths) and one log per evaluated MPLS (resolved clip count, unresolved count, duration ticks, summed stream bytes) plus one 'picked' line. All prefixed with [DEBUG-isobd] for cheap cleanup and to confirm the new binary is live (the prefix won't appear in prior builds).
Real-ISO run shows all 38 playlists with items=1, max duration 80s, max stream bytes 141MB — yet the NZB carries ~88GB across 2 ISOs. Either ListISOFiles is dropping huge files (UDF alloc-type 2/3 not handled) or reading wrong sizes for them. Add to the bdmv-scan log: - sum of every file size (across all entries) - sum of M2TS-only and SSIF-only sizes - the 6 largest files with human-readable sizes One log line will distinguish 'sizes truncated', 'big files missing', and 'release is genuinely tiny'.
Real run shows all_files_sum_bytes=1.13 GiB across 295 files, biggest single file 135 MiB. NZB is 88 GiB across 2 ISOs. Need to know whether src.Size (claimed ISO bytes from the outer RAR archive) matches the sum of what ListISOFiles enumerated, or whether the walker is missing multi-GB files. One [DEBUG-isobd] iso analyse line per ISO now prints filename, iso_size, listed_files, listed_sum, and coverage_pct so the discrepancy is impossible to miss.
Root cause of the 'main feature M2TS files invisible' bug. udfReadDirEntries parsed every File Identifier Descriptor in a directory but only ever read the FIRST 2048-byte sector of each allocation descriptor's extent — even when the extent's ad.length claimed it spanned many sectors. A Blu-ray BDMV/STREAM/ directory with ~2500 FIDs (~30 KiB of FID data) lost every entry past the first sector, including the multi-GB main-feature clips 00016/00017/00022/00023/00028/00029 and the corresponding SSIF files. Local repro against AVATAR_FIRE_AND_ASH_3D_DISC_1.iso (37 GiB): - Before: listed_files=298 sum=1.16 GiB coverage=3.1% (no clip >135 MiB) - After: listed_files=2523 sum=74 GiB (00022.m2ts=17 GiB ✓) Fix factors readMetaExtent / readICBExtent helpers that walk every sector of an extent until ad.length is exhausted. Both fail-soft on EOF so a malformed image returns partial data rather than aborting the import. The pre-existing TestUDFReadDirEntriesShortADClampsExtentLength was pinning the BUGGY behaviour (it asserted the walker would truncate to one sector); renamed to TestUDFReadDirEntriesTruncatedExtent and now asserts the new contract: when an extent claims more sectors than the image contains, the walker returns whatever data it could read without an error. Adds fs_local_test.go: an ALTMOUNT_LOCAL_ISO=<path> gated integration test that catches this class of bug instantly against a real ISO. Skipped in CI. Also strips the [DEBUG-isobd] / [DEBUG-walk] instrumentation added during the investigation and tones the resolver / processor logs down to one production-grade INFO line per ISO and per main-feature pick.
The directory-listing fix exposed a second latent bug downstream: the walker only stored ONE allocation descriptor's LBA per file even though huge Blu-ray clips are split across hundreds of extents (Avatar's 00022.m2ts: 945, 00023.m2ts: 945, 00028.m2ts: 294, 00016.m2ts: 238). For every multi-extent file, downstream reads of bytes past the first extent's length returned wrong sectors (whatever happened to live next to extent 1 on disc) instead of the file's real data — silent corruption ~50× the size of the visible bug. Changes: - isoFileEntry now carries []isoExtent instead of a single lba field. - collectFileExtents() walks every inline AD and chases Allocation Extent Descriptor (UDF tag 258) chains so files with more ADs than fit in the FE sector are fully enumerated. Caps total extent bytes at info_length so a malformed FE can't yield more data than the file claims. - ISOFileContent gains a Sources []ISONestedSource slice (one per extent) and drops the single-Segments / single-NestedSource fields. - buildFileContent emits one ISONestedSource per extent: unencrypted ISOs pre-slice outer segments to cover each extent; encrypted ISOs keep the full outer segments and seek via InnerOffset (AES-CBC IV chain still anchors at byte 0 of the outer ISO). - archive.isoFileContentToNestedSource → isoFileContentToNestedSources fans the slice out into one archive.NestedSource per extent. - buildMainFeatureContent and buildLargestFileContent thread the multi-source path so the final concat Content carries every extent of every clip in disc-then-playlist order. Verified against the real Avatar disc 1 ISO via fs_local_test.go: 00022.m2ts: 945 extents, sum-of-extent-lengths == 17 GiB info_length. TestLocalISO_DiscoverBigFiles asserts >=2 extents and full coverage for the sentinel big-clip set.
A BD3D SSIF often emits a dozen separate UDF allocation descriptors for
what's a single contiguous run of sectors on disc. After the multi-
extent fix, each AD became its own NestedSource — bloating the proto
metadata, the validation-sample surface, and the per-file open-handle
count for what is logically one extent.
coalesceExtents merges adjacent extents whose physical sectors follow
the previous extent's last sector. Measured against the real Avatar
disc 1 ISO:
- BDMV/STREAM/SSIF/00022.ssif (22 GiB): 23 extents -> 2
- BDMV/STREAM/SSIF/00028.ssif (7 GiB): 7 extents -> 1
- BDMV/STREAM/SSIF/00016.ssif (6 GiB): 6 extents -> 1
M2TS files keep their full extent list because BD authoring genuinely
interleaves the M2TS clips with the SSIF dependent-view data on disc.
Note: the recent import failure ("not a valid ISO 9660 or UDF image"
on disc 1, segment "44c89668..." unreachable during validation) is a
Usenet-side issue — disc 2 analysed cleanly in 30 seconds with the
same code path; disc 1 timed out reading its first sectors for 9
minutes before giving up. The coalescing change reduces the surface
where transient flakes can bite but cannot eliminate it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DISC|CD|PART_<n>suffixes from the ISO 9660 volume label (with a filename fallback), grouping discs that arrive in the same NZB.Contentper group whoseNestedSourceschain spans every M2TS in disc-then-playlist order, so the player sees a single seekable virtual.m2tsend-to-end via WebDAV / FUSE / Stremio.archive.ExpandISOContentsand removed the duplicatedexpandISOContentsfrom rar and sevenzip.Why
Long Blu-ray releases (the trigger here was
AVATAR_FIRE_AND_ASH_DISC_1/_DISC_2) split the main feature across both axes. The old picker dropped clips 2..N from each disc and treated each disc as an unrelated movie. The metadata layer'sMetadataVirtualFile.createNestedReaderalready concatenatesNestedSourcechains with mixed encrypted/unencrypted members — we only needed to teach the importer to produce that ordered list.Files
New:
internal/importer/archive/iso/mpls.go+_test.go— minimal BDA-spec MPLS parser (clip names, IN/OUT ticks, multi-angle PlayItems skipped via length prefix).internal/importer/archive/iso/volume.go+_test.go— reads the 9660 PVD volume label from sector 16 (hybrid BD ISOs always carry one).internal/importer/archive/iso/bluray.go+_test.go— locatesBDMV/PLAYLIST/*.mpls, picks the longest playlist, resolves clip names to orderedBDMV/STREAM/*.M2TSentries.internal/importer/archive/iso_expansion.go+_test.go— sharedExpandISOContents, disc-group regex, main-feature concat assembly.Modified:
internal/importer/archive/iso/types.go— newAnalyzedISOstruct (VolumeLabel, Files, MainFeature, DurationTicks).internal/importer/archive/iso/processor.go—AnalyzeISOreplacesAnalyzeISOContent; encrypted/unencrypted file build paths factored.internal/importer/archive/rar/aggregator.goandsevenzip/aggregator.go— callarchive.ExpandISOContents; deleted the duplicated local implementations.Behaviour
Tests
go test -race ./...passes across the whole repo.go tool golangci-lint run ./internal/importer/archive/...clean.Test plan
AVATAR_FIRE_AND_ASHtwo-disc NZB; confirm a single virtual.m2tsappears at the library path with size ≈ sum of all main-feature M2TS across both discs.Out of scope