feat(bioemu): Add FKC steering denoiser & refactor steering code by YuuuXie · Pull Request #203 · microsoft/bioemu

YuuuXie · 2026-02-27T20:16:29Z

Put steering functionalities into a steering/ package with modular components:

steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
steering/utils.py: Resampling, reward computation, sequence alignment helpers
steering/dpm_fkc.py: Added FKC (Feynman-Kac Control) steered denoiser
steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Other key changes:

Extracted DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
Now outside the steering folder, there's only unsteered sampling functionalities, all the steering features are isolated in steering dir.
Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params, and can be used to replace dpm.yaml for example.
Simplified sample.py: removed steering_config param, denoiser handles everything
Kept start/end time window for steering resampling, but changed resampling_frequency to ess_threshold

Tests:

60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline
Chignolin e2e tests (require model weights), kept from the original test_steering.py

TODOs:

@ludwigwinkler will add GMM as an example (as a notebook) and also unit test for FKC and SMC. Test on multiple different umbrellas, and use bigger number of samples and steps to ensure convergence. This will make a separate PR towards this one
@YuuuXie will add a notebook example showing a protein
Some of the tests need to be cleaned up
Decide what to do with FractionNativeConcact CV class versus train/foldedness.py

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Split steering.py into a steering/ package with modular components: - steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential - steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash) - steering/utils.py: Resampling, reward computation, sequence alignment helpers - steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser - steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser Key changes: - Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered) - Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params - Simplified sample.py: removed steering_config param, denoiser handles everything - Added start/end time window for steering resampling - Simplified loop returns to (batch, log_weights) 2-tuple Tests: - 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline - Chignolin e2e tests (require model weights) Closes: https://github.com/msr-ai4science/feynman/issues/20268 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

YuuuXie · 2026-02-27T20:19:11Z

+        return self.compute_batch(all_positions * 10.0, sequence)
+
+
+class FractionNativeContacts(CollectiveVariable):


This one has duplication with the train/foldedness.py, we might want to decide which one to keep

Do we do something explicitly with bioemu/training/foldedness.py:foldedness? Otherwise can simply use that one and wrap it here with a CollectiveVariable.

- Change physical_steering.yaml target from dpm_solver_fkc to dpm_solver_smc - Fix SMC loop bug: log_weights was overwritten with None outside steering window - Rewrite README steering section: document both SMC/FKC algorithms, update CLI examples to use denoiser_config (removed old steering_config param), fix parameter descriptions to match current interface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ioemu into yuxie1/fkc-steering # Conflicts: # src/bioemu/config/steering/physical_steering.yaml # src/bioemu/steering/dpm_smc.py

github-actions · 2026-02-27T20:41:01Z

Summary


Generated on:	02/27/2026 - 20:40:59
Parser:	Cobertura
Assemblies:	7
Classes:	31
Files:	31
Line coverage:	75.1% (2437 of 3241)
Covered lines:	2437
Uncovered lines:	804
Coverable lines:	3241
Total lines:	10699
Covered branches:	0
Total branches:	0
Method coverage:	Feature is only available for sponsors

Coverage

src.bioemu - 89.1%

Name	Line	Branch
src.bioemu	89.1%	****
init.py	100%
chemgraph.py	100%
convert_chemgraph.py	97.6%
denoiser.py	98.3%
get_embeds.py	90.3%
md_utils.py	85.8%
model_utils.py	78%
models.py	94.1%
run_hpacker.py	0%
sample.py	92.3%
sde_lib.py	86.6%
seq_io.py	100%
shortcuts.py	100%
sidechain_relax.py	77.2%
so3_sde.py	90.3%
structure_module.py	84.3%
utils.py	65.6%

src.bioemu.colabfold_setup -

Name	Line	Branch
src.bioemu.colabfold_setup	****	****
init.py

src.bioemu.hpacker_setup - 58.8%

Name	Line	Branch
src.bioemu.hpacker_setup	58.8%	****
init.py
setup_hpacker.py	58.8%

src.bioemu.openfold.np - 44%

Name	Line	Branch
src.bioemu.openfold.np	44%	****
protein.py	31.2%
residue_constants.py	60.7%

src.bioemu.openfold.utils - 50.1%

Name	Line	Branch
src.bioemu.openfold.utils	50.1%	****
rigid_utils.py	50.1%

src.bioemu.steering - 79.3%

Name	Line	Branch
src.bioemu.steering	79.3%	****
init.py	100%
collective_variables.py	32.8%
dpm_fkc.py	100%
dpm_smc.py	100%
potentials.py	95.5%
utils.py	92.1%

src.bioemu.training - 100%

Name	Line	Branch
src.bioemu.training	100%	****
foldedness.py	100%
loss.py	100%

Copilot

Pull request overview

Refactors BioEmu’s steering functionality into a dedicated bioemu.steering package and adds modular, Hydra-configured steered denoisers (FKC + SMC) built on shared DPM-Solver primitives.

Changes:

Introduces bioemu.steering subpackage (CVs, potentials, utilities) and new steered denoisers (dpm_fkc, dpm_smc).
Extracts/shared DPM-Solver helper primitives in denoiser.py and simplifies sampling API to steer via a single denoiser_config.
Replaces the legacy steering test with a broader steering test suite (unit + lightweight integration + optional e2e).

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/test_steering.py	Removes legacy steering e2e test module.
tests/steering/init.py	Adds steering test package marker.
tests/steering/test_utils.py	Adds unit tests for resampling + helper utilities.
tests/steering/test_potentials.py	Adds unit tests for `UmbrellaPotential` / `LinearPotential` behavior.
tests/steering/test_integration.py	Adds lightweight integration tests for configs, solvers, resampling, and pipeline wiring.
tests/steering/test_denoisers.py	Adds tests for ESS computation and SO(3) gradient mapping helper.
tests/steering/test_collective_variables.py	Adds unit tests for CV implementations.
tests/steering/test_chignolin_e2e.py	Adds e2e tests that invoke `sample()` (requires model weights).
src/bioemu/steering/utils.py	New steering utilities: config validation, x0/R0 helpers, resampling, reward/grad computation.
src/bioemu/steering/potentials.py	New potential framework + `UmbrellaPotential` / `LinearPotential`.
src/bioemu/steering/dpm_smc.py	Adds SMC steered denoiser built on DPM-Solver++ utilities.
src/bioemu/steering/dpm_fkc.py	Adds FKC steered denoiser + analytical weight updates + ESS resampling.
src/bioemu/steering/collective_variables.py	Adds CV framework + implementations (FNC, RMSD, CaCaDistance, PairwiseClash).
src/bioemu/steering/init.py	Exposes steering public API (CVs, potentials, utilities).
src/bioemu/steering.py	Removes legacy monolithic steering module.
src/bioemu/sample.py	Simplifies sampling: steering now handled by the instantiated denoiser config.
src/bioemu/denoiser.py	Refactors DPM-Solver primitives into reusable helper dataclasses/functions.
src/bioemu/config/steering/physical_steering.yaml	Converts physical steering into a self-contained Hydra denoiser config.
src/bioemu/config/steering/cv_steer.yaml	Adds example self-contained FKC steering config.
README.md	Updates steering documentation and usage to the new single-config model.
.gitignore	Ignores `tests/cross_repo/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-02-27T21:02:24Z

Summary


Generated on:	02/27/2026 - 21:02:23
Parser:	Cobertura
Assemblies:	7
Classes:	31
Files:	31
Line coverage:	75.2% (2439 of 3243)
Covered lines:	2439
Uncovered lines:	804
Coverable lines:	3243
Total lines:	10701
Covered branches:	0
Total branches:	0
Method coverage:	Feature is only available for sponsors

Coverage

src.bioemu - 89.1%

Name	Line	Branch
src.bioemu	89.1%	****
init.py	100%
chemgraph.py	100%
convert_chemgraph.py	97.6%
denoiser.py	98.3%
get_embeds.py	90.3%
md_utils.py	85.8%
model_utils.py	78%
models.py	94.1%
run_hpacker.py	0%
sample.py	92.3%
sde_lib.py	86.6%
seq_io.py	100%
shortcuts.py	100%
sidechain_relax.py	77.2%
so3_sde.py	90.3%
structure_module.py	84.3%
utils.py	65.6%

src.bioemu.colabfold_setup -

Name	Line	Branch
src.bioemu.colabfold_setup	****	****
init.py

src.bioemu.hpacker_setup - 58.8%

Name	Line	Branch
src.bioemu.hpacker_setup	58.8%	****
init.py
setup_hpacker.py	58.8%

src.bioemu.openfold.np - 44%

Name	Line	Branch
src.bioemu.openfold.np	44%	****
protein.py	31.2%
residue_constants.py	60.7%

src.bioemu.openfold.utils - 50.1%

Name	Line	Branch
src.bioemu.openfold.utils	50.1%	****
rigid_utils.py	50.1%

src.bioemu.steering - 79.4%

Name	Line	Branch
src.bioemu.steering	79.4%	****
init.py	100%
collective_variables.py	32.8%
dpm_fkc.py	100%
dpm_smc.py	100%
potentials.py	95.5%
utils.py	92.1%

src.bioemu.training - 100%

Name	Line	Branch
src.bioemu.training	100%	****
foldedness.py	100%
loss.py	100%

vkuzniak · 2026-03-04T18:18:30Z

Good afternoon!

I see that this pull request has been open for quite a while. Is there anything I can help with to move it forward and get it closed faster? Especially with preparing the notebooks, tests, and related tasks.

ludwigwinkler · 2026-03-04T19:01:05Z

Hi @vkuzniak,

we appreciate the keen eye and your interest in the feature.
We had some more discussions about the setup of the code and how what other features we wanted to extract from our research code base.
We will keep you posted on the progress.

vkuzniak · 2026-03-04T19:20:04Z

Hi @ludwigwinkler,

I am glad to hear that and will be waiting. Interesting to see this list.

ludwigwinkler

Thanks for the refactoring and the work you put in!

ludwigwinkler · 2026-03-05T16:59:40Z

+        return self.compute_batch(all_positions * 10.0, sequence)
+
+
+class FractionNativeContacts(CollectiveVariable):


Do we do something explicitly with bioemu/training/foldedness.py:foldedness? Otherwise can simply use that one and wrap it here with a CollectiveVariable.

ludwigwinkler · 2026-03-05T17:29:54Z

+    RtG = R.transpose(-2, -1) @ dJ_dR  # (...,3,3)
+    A = 0.5 * (RtG - RtG.transpose(-2, -1))  # skew(...) in so(3)
+    return 2.0 * skew_matrix_to_vector(A)  # (...,3) vee-map
+


where did you find that equation?

Add a bimodal Gaussian mixture model with an analytical score function. Implements a toy steering example in `notebooks/fkc_steering.py`. Adds two numerical tests for for `dpm_fkc` and `dpm_smc` that simulates a large ensemble of samples to numerically match the analytically tractable (in 1D) biased target distribution. --------- Co-authored-by: ludwigwinkler <luwinkler@microsoft.com>

…conversion layer - CVs and potentials now receive Cα positions in nm directly - Remove 10x multiplication at call site and /10 divisions in CVs - Rename Ca_pos/ca_pos to ca_pos_nm for explicit units - Rescale config constants to nm (target, flatbottom, slope, min_dist) - Remove log_physicality helper (unused) - Update all tests to use nm-scale values Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Harden stratified_resample: normalize CDF and clamp indices to avoid out-of-range from FP error. - Fix resample_based_on_log_weights return annotation to match actual (ChemGraph, Tensor, Tensor, Tensor). - Update compute_reward_and_grad docstring to current call signature (coords in nm, no 10* scaling, no N= kwarg). - Switch lazy relative get_score import to explicit bioemu.denoiser import. - Expand reward_grad_rotmat_to_rotvec derivation with self-contained explanation of the 2*vee factor from the skew-Frobenius identity. - Make validate_steering_config strict: require start/end/num_particles/ ess_threshold; drop .get() defaults in dpm_fkc/dpm_smc; propagate to tests and notebook. - Extract Kabsch alignment from RMSD.compute_batch into kabsch_align() in steering/utils.py and export it. - Deduplicate CONTACT_BETA/DELTA/LAMBDA in FractionNativeContacts by importing from training/foldedness. - Convert UmbrellaPotential/LinearPotential.loss_fn from staticmethod to instance methods using self.*; simplify energy_from_cv and tests. - Flip dpm_solver_sde_smc_step default use_x0_for_reward to True (SMC is defined at t=0; False remains available for debug/toy use) and update internal callers. - Accept os.PathLike for sample()'s denoiser_config. - cv_steer.yaml: reference_pdb: ??? (Hydra mandatory sentinel). - README: correct FKC description to mention ESS-based resampling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- sample.py: narrow denoiser_config type check to str | Path (drops os import) - steering/utils.py: warn when batch is smaller than num_particles in resample_based_on_log_weights (silent fallback kept, now audible) - steering/collective_variables.py: dedup FractionNativeContacts contact scoring by reusing bioemu.training.foldedness._compute_contact_score - steering/collective_variables.py: fix openfold import path after main moved residue_constants (..openfold -> openfold via _vendor) - tests/steering/test_integration.py: adapt generate_batch tests to the new colabfold_inline mocking pattern (mock_run_colabfold and run_colabfold were removed during the ColabFold inlining on main) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The GMMScoreWrapper called gmm.score(x, t) without propagating the autograd graph. When use_x0_for_reward=True, the FKC code computes x0 = (x_t + sigma^2 * score(x_t, t)) / alpha_t and differentiates the reward w.r.t. x_t. Without create_graph=True on the score, autograd treats score as a constant, so d(x0)/d(x_t) drops the Hessian-of-log-p_t term and the steered distribution ends up biased toward the bias center (MAE against the analytical target ~0.032 in the toy). Fix: pass create_graph=batch.pos.requires_grad to gmm.score, matching the pattern used in bioemu2/enhanced_sampling_paper/scripts/gmm. Also replaced an in-place assignment into zeros with torch.cat for cleaner autograd flow, removed the now-unnecessary dummy nn.Parameter, and flipped the notebook's default to use_x0_for_reward=True. MAE drops to ~0.008 with either flag. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Yu Xie and others added 2 commits February 27, 2026 20:05

YuuuXie commented Feb 27, 2026

View reviewed changes

YuuuXie and others added 2 commits February 27, 2026 20:33

Merge branch 'yuxie1/fkc-steering' of github.com-personal:microsoft/b…

4741092

…ioemu into yuxie1/fkc-steering # Conflicts: # src/bioemu/config/steering/physical_steering.yaml # src/bioemu/steering/dpm_smc.py

YuuuXie requested review from Copilot, ludwigwinkler, nw13slx and sarahnlewis February 27, 2026 20:43

YuuuXie assigned YuuuXie and ludwigwinkler and unassigned YuuuXie Feb 27, 2026

Copilot started reviewing on behalf of YuuuXie February 27, 2026 20:44 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

Comment thread README.md Outdated

Comment thread src/bioemu/steering/utils.py

Comment thread src/bioemu/steering/utils.py

Comment thread src/bioemu/steering/utils.py Outdated

Comment thread src/bioemu/sample.py

Comment thread src/bioemu/config/steering/cv_steer.yaml Outdated

ludwigwinkler reviewed Mar 5, 2026

View reviewed changes

ludwigwinkler and others added 7 commits March 19, 2026 16:37

resolve conflict

987ca1e

add gmm umbrella sampling with mbar

bfccab6

		return self.compute_batch(all_positions * 10.0, sequence)


		class FractionNativeContacts(CollectiveVariable):

Conversation

YuuuXie commented Feb 27, 2026 • edited by ludwigwinkler Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YuuuXie Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

ludwigwinkler Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 27, 2026

Summary

Coverage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Feb 27, 2026

Summary

Coverage

Uh oh!

vkuzniak commented Mar 4, 2026

Uh oh!

ludwigwinkler commented Mar 4, 2026

Uh oh!

vkuzniak commented Mar 4, 2026

Uh oh!

ludwigwinkler left a comment

Choose a reason for hiding this comment

Uh oh!

ludwigwinkler Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ludwigwinkler Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

YuuuXie commented Feb 27, 2026 •

edited by ludwigwinkler

Loading