[BUG FIX] Fix contact point pruning on Apple Metal.#2831
Conversation
087a4cb to
1957db6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 087a4cbadd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…is-Embodied-AI#2831 Rebase of my cooperative-warp dedup work onto Alexis's Genesis-Embodied-AI#2831 (fix_contact_pruning_batched) which fixes the Quadrants Metal codegen bug that was blocking dedup on macOS. PR Genesis-Embodied-AI#2831's contact.py changes are incorporated verbatim into my cooperative kernel: 1. Memory-fence scratch write between the lower-hull and upper-hull passes of Andrew's monotone chain (contact_hull_stack[max_contact_pairs - 1, i_b] = 0). Without this, on Metal _B >= 2 the upper-hull reads of contact_hull_stack don't observe the lower-hull writes and every candidate is kept (hull size == bucket size, no pruning). Standalone repro: perso_hugh/prot/qd_metal_hull_chain_visibility_repro.py 2. New tunable prune_hull_collinear_tol (1e-3 default) separated from the depth coplanarity tol. Lets users tune the hull-popper cross- product slop independently of the depth gate. 3. Deep-penetration restore threshold changed from average to MAX of hull penetrations. A non-hull contact whose penetration substantially exceeds the deepest hull vertex represents a distinct physical support (deep middle of a long body) that the support-polygon argument doesn't authorize dropping. Side cleanups now possible because Genesis-Embodied-AI#2831 fixes the Metal bug: * Removed the gs.backend not in (gs.cuda, gs.amdgpu) guard from collider.py. Dedup is now enabled on Metal too; only CPU is still excluded (the cooperative kernel adds overhead without parallel speedup on a serial backend). * Removed the @pytest.mark.skipif(darwin) from test_contact_pruning; the test now runs on every GPU backend. The cooperative-warp structure (32 threads/env, qd.simt.subgroup reductions for the per-bucket mean-normal / centroid, coop projection and mark-drop, fused phase 3 compact+spatial-sort) is preserved from the previous branch. Note: this branch is currently based on duburcqa:fix_contact_pruning_batched (Genesis-Embodied-AI#2831). Once Genesis-Embodied-AI#2831 merges, the branch base should be moved back to origin/main and the Genesis-Embodied-AI#2831 commit will simply be absorbed. Backup of the pre-rebase tip: hp/dedup-alexis-perf-9d-stage3-r2-pre-2831-rebase-backup.
7cc824d to
c9a77ca
Compare
|
Hi @duburcqa — heads-up that this PR appears to introduce a regression in I verified this by running the failing tests on The The failing scenes are single-link single-geom (a primitive capsule on either a primitive box or a convex mesh box), so the link-pair pruning kernel isn't even invoked for them ( Same failure shows up identically in deskai6 PR #2830 (rebased on top of this PR), so my downstream cooperative-kernel changes aren't the cause. Let me know if you'd like the full failure logs or a more targeted reproduction. |
7401b13 to
dc0c216
Compare
dc0c216 to
4ca455e
Compare
…backends + n_envs Adopt Alexis's fused func_clamp_prune_and_sort_contacts kernel (single per-env serial pass that does clamp + optional pruning + optional spatial sort, gated at compile time via qd.static). This supersedes the deskai6 cooperative warp-per-env split into func_prune_contacts + func_clamp_and_sort_contacts: the fusion removes one kernel launch + one full 11-field permute pass per step, which on net matches or beats the cooperative-warp approach without needing any backend-specific gating. Remove the deskai6 gates that previously disabled dedup on CPU and at n_envs > 8192. Both were tied to the cooperative-kernel approach (warp-cooperative loops serialise on CPU; n_envs > 8192 oversaturates the GPU grid). The fused per-env serial kernel has no such overhead, so dedup now runs unconditionally whenever the scene is eligible (link_pair_pruning_supported and not autodiff). Also remove _effective_pruning_tolerance / _DEDUP_DEFAULT_TOLERANCE / _DEDUP_MAX_N_ENVS: the kernel now reads contact_pruning_tolerance directly from the user option (upstream default 0.1 post Genesis-Embodied-AI#2831), matching the fused kernel's expectation. Remove the @pytest.mark.parametrize backend=gs.gpu restriction on test_contact_pruning so it runs on CPU + GPU per matrix.
No description provided.