feat(bioemu): Add FKC steering denoiser & refactor steering code#203
feat(bioemu): Add FKC steering denoiser & refactor steering code#203
Conversation
Split steering.py into a steering/ package with modular components: - steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential - steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash) - steering/utils.py: Resampling, reward computation, sequence alignment helpers - steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser - steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser Key changes: - Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered) - Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params - Simplified sample.py: removed steering_config param, denoiser handles everything - Added start/end time window for steering resampling - Simplified loop returns to (batch, log_weights) 2-tuple Tests: - 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline - Chignolin e2e tests (require model weights) Closes: https://github.com/msr-ai4science/feynman/issues/20268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split steering.py into a steering/ package with modular components: - steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential - steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash) - steering/utils.py: Resampling, reward computation, sequence alignment helpers - steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser - steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser Key changes: - Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered) - Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params - Simplified sample.py: removed steering_config param, denoiser handles everything - Added start/end time window for steering resampling - Simplified loop returns to (batch, log_weights) 2-tuple Tests: - 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline - Chignolin e2e tests (require model weights) Closes: https://github.com/msr-ai4science/feynman/issues/20268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| return self.compute_batch(all_positions * 10.0, sequence) | ||
|
|
||
|
|
||
| class FractionNativeContacts(CollectiveVariable): |
There was a problem hiding this comment.
This one has duplication with the train/foldedness.py, we might want to decide which one to keep
There was a problem hiding this comment.
Do we do something explicitly with bioemu/training/foldedness.py:foldedness? Otherwise can simply use that one and wrap it here with a CollectiveVariable.
- Change physical_steering.yaml target from dpm_solver_fkc to dpm_solver_smc - Fix SMC loop bug: log_weights was overwritten with None outside steering window - Rewrite README steering section: document both SMC/FKC algorithms, update CLI examples to use denoiser_config (removed old steering_config param), fix parameter descriptions to match current interface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ioemu into yuxie1/fkc-steering # Conflicts: # src/bioemu/config/steering/physical_steering.yaml # src/bioemu/steering/dpm_smc.py
SummarySummary
Coveragesrc.bioemu - 89.1%
src.bioemu.colabfold_setup -
src.bioemu.hpacker_setup - 58.8%
src.bioemu.openfold.np - 44%
src.bioemu.openfold.utils - 50.1%
src.bioemu.steering - 79.3%
src.bioemu.training - 100%
|
There was a problem hiding this comment.
Pull request overview
Refactors BioEmu’s steering functionality into a dedicated bioemu.steering package and adds modular, Hydra-configured steered denoisers (FKC + SMC) built on shared DPM-Solver primitives.
Changes:
- Introduces
bioemu.steeringsubpackage (CVs, potentials, utilities) and new steered denoisers (dpm_fkc,dpm_smc). - Extracts/shared DPM-Solver helper primitives in
denoiser.pyand simplifies sampling API to steer via a singledenoiser_config. - Replaces the legacy steering test with a broader steering test suite (unit + lightweight integration + optional e2e).
Reviewed changes
Copilot reviewed 19 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_steering.py | Removes legacy steering e2e test module. |
| tests/steering/init.py | Adds steering test package marker. |
| tests/steering/test_utils.py | Adds unit tests for resampling + helper utilities. |
| tests/steering/test_potentials.py | Adds unit tests for UmbrellaPotential / LinearPotential behavior. |
| tests/steering/test_integration.py | Adds lightweight integration tests for configs, solvers, resampling, and pipeline wiring. |
| tests/steering/test_denoisers.py | Adds tests for ESS computation and SO(3) gradient mapping helper. |
| tests/steering/test_collective_variables.py | Adds unit tests for CV implementations. |
| tests/steering/test_chignolin_e2e.py | Adds e2e tests that invoke sample() (requires model weights). |
| src/bioemu/steering/utils.py | New steering utilities: config validation, x0/R0 helpers, resampling, reward/grad computation. |
| src/bioemu/steering/potentials.py | New potential framework + UmbrellaPotential / LinearPotential. |
| src/bioemu/steering/dpm_smc.py | Adds SMC steered denoiser built on DPM-Solver++ utilities. |
| src/bioemu/steering/dpm_fkc.py | Adds FKC steered denoiser + analytical weight updates + ESS resampling. |
| src/bioemu/steering/collective_variables.py | Adds CV framework + implementations (FNC, RMSD, CaCaDistance, PairwiseClash). |
| src/bioemu/steering/init.py | Exposes steering public API (CVs, potentials, utilities). |
| src/bioemu/steering.py | Removes legacy monolithic steering module. |
| src/bioemu/sample.py | Simplifies sampling: steering now handled by the instantiated denoiser config. |
| src/bioemu/denoiser.py | Refactors DPM-Solver primitives into reusable helper dataclasses/functions. |
| src/bioemu/config/steering/physical_steering.yaml | Converts physical steering into a self-contained Hydra denoiser config. |
| src/bioemu/config/steering/cv_steer.yaml | Adds example self-contained FKC steering config. |
| README.md | Updates steering documentation and usage to the new single-config model. |
| .gitignore | Ignores tests/cross_repo/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
SummarySummary
Coveragesrc.bioemu - 89.1%
src.bioemu.colabfold_setup -
src.bioemu.hpacker_setup - 58.8%
src.bioemu.openfold.np - 44%
src.bioemu.openfold.utils - 50.1%
src.bioemu.steering - 79.4%
src.bioemu.training - 100%
|
|
Good afternoon! I see that this pull request has been open for quite a while. Is there anything I can help with to move it forward and get it closed faster? Especially with preparing the notebooks, tests, and related tasks. |
|
Hi @vkuzniak, we appreciate the keen eye and your interest in the feature. |
|
Hi @ludwigwinkler, I am glad to hear that and will be waiting. Interesting to see this list. |
ludwigwinkler
left a comment
There was a problem hiding this comment.
Thanks for the refactoring and the work you put in!
| return self.compute_batch(all_positions * 10.0, sequence) | ||
|
|
||
|
|
||
| class FractionNativeContacts(CollectiveVariable): |
There was a problem hiding this comment.
Do we do something explicitly with bioemu/training/foldedness.py:foldedness? Otherwise can simply use that one and wrap it here with a CollectiveVariable.
| RtG = R.transpose(-2, -1) @ dJ_dR # (...,3,3) | ||
| A = 0.5 * (RtG - RtG.transpose(-2, -1)) # skew(...) in so(3) | ||
| return 2.0 * skew_matrix_to_vector(A) # (...,3) vee-map | ||
|
|
There was a problem hiding this comment.
where did you find that equation?
Add a bimodal Gaussian mixture model with an analytical score function. Implements a toy steering example in `notebooks/fkc_steering.py`. Adds two numerical tests for for `dpm_fkc` and `dpm_smc` that simulates a large ensemble of samples to numerically match the analytically tractable (in 1D) biased target distribution. --------- Co-authored-by: ludwigwinkler <luwinkler@microsoft.com>
…conversion layer - CVs and potentials now receive Cα positions in nm directly - Remove 10x multiplication at call site and /10 divisions in CVs - Rename Ca_pos/ca_pos to ca_pos_nm for explicit units - Rescale config constants to nm (target, flatbottom, slope, min_dist) - Remove log_physicality helper (unused) - Update all tests to use nm-scale values Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Harden stratified_resample: normalize CDF and clamp indices to avoid out-of-range from FP error. - Fix resample_based_on_log_weights return annotation to match actual (ChemGraph, Tensor, Tensor, Tensor). - Update compute_reward_and_grad docstring to current call signature (coords in nm, no 10* scaling, no N= kwarg). - Switch lazy relative get_score import to explicit bioemu.denoiser import. - Expand reward_grad_rotmat_to_rotvec derivation with self-contained explanation of the 2*vee factor from the skew-Frobenius identity. - Make validate_steering_config strict: require start/end/num_particles/ ess_threshold; drop .get() defaults in dpm_fkc/dpm_smc; propagate to tests and notebook. - Extract Kabsch alignment from RMSD.compute_batch into kabsch_align() in steering/utils.py and export it. - Deduplicate CONTACT_BETA/DELTA/LAMBDA in FractionNativeContacts by importing from training/foldedness. - Convert UmbrellaPotential/LinearPotential.loss_fn from staticmethod to instance methods using self.*; simplify energy_from_cv and tests. - Flip dpm_solver_sde_smc_step default use_x0_for_reward to True (SMC is defined at t=0; False remains available for debug/toy use) and update internal callers. - Accept os.PathLike for sample()'s denoiser_config. - cv_steer.yaml: reference_pdb: ??? (Hydra mandatory sentinel). - README: correct FKC description to mention ESS-based resampling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- sample.py: narrow denoiser_config type check to str | Path (drops os import) - steering/utils.py: warn when batch is smaller than num_particles in resample_based_on_log_weights (silent fallback kept, now audible) - steering/collective_variables.py: dedup FractionNativeContacts contact scoring by reusing bioemu.training.foldedness._compute_contact_score - steering/collective_variables.py: fix openfold import path after main moved residue_constants (..openfold -> openfold via _vendor) - tests/steering/test_integration.py: adapt generate_batch tests to the new colabfold_inline mocking pattern (mock_run_colabfold and run_colabfold were removed during the ColabFold inlining on main) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The GMMScoreWrapper called gmm.score(x, t) without propagating the autograd
graph. When use_x0_for_reward=True, the FKC code computes
x0 = (x_t + sigma^2 * score(x_t, t)) / alpha_t
and differentiates the reward w.r.t. x_t. Without create_graph=True on the
score, autograd treats score as a constant, so d(x0)/d(x_t) drops the
Hessian-of-log-p_t term and the steered distribution ends up biased toward
the bias center (MAE against the analytical target ~0.032 in the toy).
Fix: pass create_graph=batch.pos.requires_grad to gmm.score, matching the
pattern used in bioemu2/enhanced_sampling_paper/scripts/gmm. Also replaced
an in-place assignment into zeros with torch.cat for cleaner autograd flow,
removed the now-unnecessary dummy nn.Parameter, and flipped the notebook's
default to use_x0_for_reward=True. MAE drops to ~0.008 with either flag.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Put steering functionalities into a steering/ package with modular components:
Other key changes:
steeringdir.dpm.yamlfor example.resampling_frequencytoess_thresholdTests:
test_steering.pyTODOs:
FractionNativeConcactCV class versustrain/foldedness.pyCloses: https://github.com/msr-ai4science/feynman/issues/20268