Skip to content

feat(bioemu): Add FKC steering denoiser & refactor steering code#203

Draft
YuuuXie wants to merge 11 commits intomainfrom
yuxie1/fkc-steering
Draft

feat(bioemu): Add FKC steering denoiser & refactor steering code#203
YuuuXie wants to merge 11 commits intomainfrom
yuxie1/fkc-steering

Conversation

@YuuuXie
Copy link
Copy Markdown
Collaborator

@YuuuXie YuuuXie commented Feb 27, 2026

Put steering functionalities into a steering/ package with modular components:

  • steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
  • steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
  • steering/utils.py: Resampling, reward computation, sequence alignment helpers
  • steering/dpm_fkc.py: Added FKC (Feynman-Kac Control) steered denoiser
  • steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Other key changes:

  • Extracted DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
  • Now outside the steering folder, there's only unsteered sampling functionalities, all the steering features are isolated in steering dir.
  • Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained Hydra denoiser configs with target, potentials, and steering params, and can be used to replace dpm.yaml for example.
  • Simplified sample.py: removed steering_config param, denoiser handles everything
  • Kept start/end time window for steering resampling, but changed resampling_frequency to ess_threshold

Tests:

  • 60+ steering tests: unit tests for CVs, potentials, utils; integration tests for FKC/SMC loops, ODE consistency, generate_batch pipeline
  • Chignolin e2e tests (require model weights), kept from the original test_steering.py

TODOs:

  • @ludwigwinkler will add GMM as an example (as a notebook) and also unit test for FKC and SMC. Test on multiple different umbrellas, and use bigger number of samples and steps to ensure convergence. This will make a separate PR towards this one
  • @YuuuXie will add a notebook example showing a protein
  • Some of the tests need to be cleaned up
  • Decide what to do with FractionNativeConcact CV class versus train/foldedness.py

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Yu Xie and others added 2 commits February 27, 2026 20:05
Split steering.py into a steering/ package with modular components:
- steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
- steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
- steering/utils.py: Resampling, reward computation, sequence alignment helpers
- steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser
- steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Key changes:
- Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
- Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained
  Hydra denoiser configs with target, potentials, and steering params
- Simplified sample.py: removed steering_config param, denoiser handles everything
- Added start/end time window for steering resampling
- Simplified loop returns to (batch, log_weights) 2-tuple

Tests:
- 60+ steering tests: unit tests for CVs, potentials, utils; integration tests
  for FKC/SMC loops, ODE consistency, generate_batch pipeline
- Chignolin e2e tests (require model weights)

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split steering.py into a steering/ package with modular components:
- steering/potentials.py: Potential base class, UmbrellaPotential, LinearPotential
- steering/collective_variables.py: CV framework (RMSD, FNC, CaCaDistance, PairwiseClash)
- steering/utils.py: Resampling, reward computation, sequence alignment helpers
- steering/dpm_fkc.py: FKC (Feynman-Kac Control) steered denoiser
- steering/dpm_smc.py: SMC (Sequential Monte Carlo) steered denoiser

Key changes:
- Unified DPM-Solver primitives in denoiser.py (shared by FKC, SMC, unsteered)
- Steering configs (cv_steer.yaml, physical_steering.yaml) are self-contained
  Hydra denoiser configs with target, potentials, and steering params
- Simplified sample.py: removed steering_config param, denoiser handles everything
- Added start/end time window for steering resampling
- Simplified loop returns to (batch, log_weights) 2-tuple

Tests:
- 60+ steering tests: unit tests for CVs, potentials, utils; integration tests
  for FKC/SMC loops, ODE consistency, generate_batch pipeline
- Chignolin e2e tests (require model weights)

Closes: https://github.com/msr-ai4science/feynman/issues/20268

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
return self.compute_batch(all_positions * 10.0, sequence)


class FractionNativeContacts(CollectiveVariable):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has duplication with the train/foldedness.py, we might want to decide which one to keep

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do something explicitly with bioemu/training/foldedness.py:foldedness? Otherwise can simply use that one and wrap it here with a CollectiveVariable.

YuuuXie and others added 2 commits February 27, 2026 20:33
- Change physical_steering.yaml target from dpm_solver_fkc to dpm_solver_smc
- Fix SMC loop bug: log_weights was overwritten with None outside steering window
- Rewrite README steering section: document both SMC/FKC algorithms, update CLI
  examples to use denoiser_config (removed old steering_config param), fix
  parameter descriptions to match current interface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ioemu into yuxie1/fkc-steering

# Conflicts:
#	src/bioemu/config/steering/physical_steering.yaml
#	src/bioemu/steering/dpm_smc.py
@github-actions
Copy link
Copy Markdown

Summary

Summary
Generated on: 02/27/2026 - 20:40:59
Parser: Cobertura
Assemblies: 7
Classes: 31
Files: 31
Line coverage: 75.1% (2437 of 3241)
Covered lines: 2437
Uncovered lines: 804
Coverable lines: 3241
Total lines: 10699
Covered branches: 0
Total branches: 0
Method coverage: Feature is only available for sponsors

Coverage

src.bioemu - 89.1%
Name Line Branch
src.bioemu 89.1% ****
init.py 100%
chemgraph.py 100%
convert_chemgraph.py 97.6%
denoiser.py 98.3%
get_embeds.py 90.3%
md_utils.py 85.8%
model_utils.py 78%
models.py 94.1%
run_hpacker.py 0%
sample.py 92.3%
sde_lib.py 86.6%
seq_io.py 100%
shortcuts.py 100%
sidechain_relax.py 77.2%
so3_sde.py 90.3%
structure_module.py 84.3%
utils.py 65.6%
src.bioemu.colabfold_setup -
Name Line Branch
src.bioemu.colabfold_setup **** ****
init.py
src.bioemu.hpacker_setup - 58.8%
Name Line Branch
src.bioemu.hpacker_setup 58.8% ****
init.py
setup_hpacker.py 58.8%
src.bioemu.openfold.np - 44%
Name Line Branch
src.bioemu.openfold.np 44% ****
protein.py 31.2%
residue_constants.py 60.7%
src.bioemu.openfold.utils - 50.1%
Name Line Branch
src.bioemu.openfold.utils 50.1% ****
rigid_utils.py 50.1%
src.bioemu.steering - 79.3%
Name Line Branch
src.bioemu.steering 79.3% ****
init.py 100%
collective_variables.py 32.8%
dpm_fkc.py 100%
dpm_smc.py 100%
potentials.py 95.5%
utils.py 92.1%
src.bioemu.training - 100%
Name Line Branch
src.bioemu.training 100% ****
foldedness.py 100%
loss.py 100%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors BioEmu’s steering functionality into a dedicated bioemu.steering package and adds modular, Hydra-configured steered denoisers (FKC + SMC) built on shared DPM-Solver primitives.

Changes:

  • Introduces bioemu.steering subpackage (CVs, potentials, utilities) and new steered denoisers (dpm_fkc, dpm_smc).
  • Extracts/shared DPM-Solver helper primitives in denoiser.py and simplifies sampling API to steer via a single denoiser_config.
  • Replaces the legacy steering test with a broader steering test suite (unit + lightweight integration + optional e2e).

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_steering.py Removes legacy steering e2e test module.
tests/steering/init.py Adds steering test package marker.
tests/steering/test_utils.py Adds unit tests for resampling + helper utilities.
tests/steering/test_potentials.py Adds unit tests for UmbrellaPotential / LinearPotential behavior.
tests/steering/test_integration.py Adds lightweight integration tests for configs, solvers, resampling, and pipeline wiring.
tests/steering/test_denoisers.py Adds tests for ESS computation and SO(3) gradient mapping helper.
tests/steering/test_collective_variables.py Adds unit tests for CV implementations.
tests/steering/test_chignolin_e2e.py Adds e2e tests that invoke sample() (requires model weights).
src/bioemu/steering/utils.py New steering utilities: config validation, x0/R0 helpers, resampling, reward/grad computation.
src/bioemu/steering/potentials.py New potential framework + UmbrellaPotential / LinearPotential.
src/bioemu/steering/dpm_smc.py Adds SMC steered denoiser built on DPM-Solver++ utilities.
src/bioemu/steering/dpm_fkc.py Adds FKC steered denoiser + analytical weight updates + ESS resampling.
src/bioemu/steering/collective_variables.py Adds CV framework + implementations (FNC, RMSD, CaCaDistance, PairwiseClash).
src/bioemu/steering/init.py Exposes steering public API (CVs, potentials, utilities).
src/bioemu/steering.py Removes legacy monolithic steering module.
src/bioemu/sample.py Simplifies sampling: steering now handled by the instantiated denoiser config.
src/bioemu/denoiser.py Refactors DPM-Solver primitives into reusable helper dataclasses/functions.
src/bioemu/config/steering/physical_steering.yaml Converts physical steering into a self-contained Hydra denoiser config.
src/bioemu/config/steering/cv_steer.yaml Adds example self-contained FKC steering config.
README.md Updates steering documentation and usage to the new single-config model.
.gitignore Ignores tests/cross_repo/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread src/bioemu/steering/utils.py
Comment thread src/bioemu/steering/utils.py
Comment thread src/bioemu/steering/utils.py Outdated
Comment thread src/bioemu/sample.py
Comment thread src/bioemu/config/steering/cv_steer.yaml Outdated
@github-actions
Copy link
Copy Markdown

Summary

Summary
Generated on: 02/27/2026 - 21:02:23
Parser: Cobertura
Assemblies: 7
Classes: 31
Files: 31
Line coverage: 75.2% (2439 of 3243)
Covered lines: 2439
Uncovered lines: 804
Coverable lines: 3243
Total lines: 10701
Covered branches: 0
Total branches: 0
Method coverage: Feature is only available for sponsors

Coverage

src.bioemu - 89.1%
Name Line Branch
src.bioemu 89.1% ****
init.py 100%
chemgraph.py 100%
convert_chemgraph.py 97.6%
denoiser.py 98.3%
get_embeds.py 90.3%
md_utils.py 85.8%
model_utils.py 78%
models.py 94.1%
run_hpacker.py 0%
sample.py 92.3%
sde_lib.py 86.6%
seq_io.py 100%
shortcuts.py 100%
sidechain_relax.py 77.2%
so3_sde.py 90.3%
structure_module.py 84.3%
utils.py 65.6%
src.bioemu.colabfold_setup -
Name Line Branch
src.bioemu.colabfold_setup **** ****
init.py
src.bioemu.hpacker_setup - 58.8%
Name Line Branch
src.bioemu.hpacker_setup 58.8% ****
init.py
setup_hpacker.py 58.8%
src.bioemu.openfold.np - 44%
Name Line Branch
src.bioemu.openfold.np 44% ****
protein.py 31.2%
residue_constants.py 60.7%
src.bioemu.openfold.utils - 50.1%
Name Line Branch
src.bioemu.openfold.utils 50.1% ****
rigid_utils.py 50.1%
src.bioemu.steering - 79.4%
Name Line Branch
src.bioemu.steering 79.4% ****
init.py 100%
collective_variables.py 32.8%
dpm_fkc.py 100%
dpm_smc.py 100%
potentials.py 95.5%
utils.py 92.1%
src.bioemu.training - 100%
Name Line Branch
src.bioemu.training 100% ****
foldedness.py 100%
loss.py 100%

@vkuzniak
Copy link
Copy Markdown

vkuzniak commented Mar 4, 2026

Good afternoon!

I see that this pull request has been open for quite a while. Is there anything I can help with to move it forward and get it closed faster? Especially with preparing the notebooks, tests, and related tasks.

@ludwigwinkler
Copy link
Copy Markdown
Member

Hi @vkuzniak,

we appreciate the keen eye and your interest in the feature.
We had some more discussions about the setup of the code and how what other features we wanted to extract from our research code base.
We will keep you posted on the progress.

@vkuzniak
Copy link
Copy Markdown

vkuzniak commented Mar 4, 2026

Hi @ludwigwinkler,

I am glad to hear that and will be waiting. Interesting to see this list.

Copy link
Copy Markdown
Member

@ludwigwinkler ludwigwinkler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring and the work you put in!

return self.compute_batch(all_positions * 10.0, sequence)


class FractionNativeContacts(CollectiveVariable):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do something explicitly with bioemu/training/foldedness.py:foldedness? Otherwise can simply use that one and wrap it here with a CollectiveVariable.

Comment thread src/bioemu/steering/collective_variables.py Outdated
Comment thread src/bioemu/steering/dpm_fkc.py Outdated
Comment thread src/bioemu/steering/dpm_smc.py Outdated
Comment thread src/bioemu/steering/potentials.py Outdated
Comment thread src/bioemu/steering/utils.py
Comment thread src/bioemu/steering/utils.py Outdated
Comment on lines +295 to +298
RtG = R.transpose(-2, -1) @ dJ_dR # (...,3,3)
A = 0.5 * (RtG - RtG.transpose(-2, -1)) # skew(...) in so(3)
return 2.0 * skew_matrix_to_vector(A) # (...,3) vee-map

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you find that equation?

Comment thread src/bioemu/denoiser.py
Comment thread src/bioemu/sample.py
ludwigwinkler and others added 7 commits March 19, 2026 16:37
Add a bimodal Gaussian mixture model with an analytical score function.

Implements a toy steering example in `notebooks/fkc_steering.py`. 

Adds two numerical tests for for `dpm_fkc` and `dpm_smc` that simulates
a large ensemble of samples to numerically match the analytically
tractable (in 1D) biased target distribution.

---------

Co-authored-by: ludwigwinkler <luwinkler@microsoft.com>
…conversion layer

- CVs and potentials now receive Cα positions in nm directly
- Remove 10x multiplication at call site and /10 divisions in CVs
- Rename Ca_pos/ca_pos to ca_pos_nm for explicit units
- Rescale config constants to nm (target, flatbottom, slope, min_dist)
- Remove log_physicality helper (unused)
- Update all tests to use nm-scale values

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Harden stratified_resample: normalize CDF and clamp indices to avoid
  out-of-range from FP error.
- Fix resample_based_on_log_weights return annotation to match actual
  (ChemGraph, Tensor, Tensor, Tensor).
- Update compute_reward_and_grad docstring to current call signature
  (coords in nm, no 10* scaling, no N= kwarg).
- Switch lazy relative get_score import to explicit bioemu.denoiser import.
- Expand reward_grad_rotmat_to_rotvec derivation with self-contained
  explanation of the 2*vee factor from the skew-Frobenius identity.
- Make validate_steering_config strict: require start/end/num_particles/
  ess_threshold; drop .get() defaults in dpm_fkc/dpm_smc; propagate to
  tests and notebook.
- Extract Kabsch alignment from RMSD.compute_batch into kabsch_align()
  in steering/utils.py and export it.
- Deduplicate CONTACT_BETA/DELTA/LAMBDA in FractionNativeContacts by
  importing from training/foldedness.
- Convert UmbrellaPotential/LinearPotential.loss_fn from staticmethod
  to instance methods using self.*; simplify energy_from_cv and tests.
- Flip dpm_solver_sde_smc_step default use_x0_for_reward to True (SMC
  is defined at t=0; False remains available for debug/toy use) and
  update internal callers.
- Accept os.PathLike for sample()'s denoiser_config.
- cv_steer.yaml: reference_pdb: ??? (Hydra mandatory sentinel).
- README: correct FKC description to mention ESS-based resampling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- sample.py: narrow denoiser_config type check to str | Path (drops os import)
- steering/utils.py: warn when batch is smaller than num_particles in
  resample_based_on_log_weights (silent fallback kept, now audible)
- steering/collective_variables.py: dedup FractionNativeContacts contact
  scoring by reusing bioemu.training.foldedness._compute_contact_score
- steering/collective_variables.py: fix openfold import path after main
  moved residue_constants (..openfold -> openfold via _vendor)
- tests/steering/test_integration.py: adapt generate_batch tests to the
  new colabfold_inline mocking pattern (mock_run_colabfold and
  run_colabfold were removed during the ColabFold inlining on main)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The GMMScoreWrapper called gmm.score(x, t) without propagating the autograd
graph. When use_x0_for_reward=True, the FKC code computes
    x0 = (x_t + sigma^2 * score(x_t, t)) / alpha_t
and differentiates the reward w.r.t. x_t. Without create_graph=True on the
score, autograd treats score as a constant, so d(x0)/d(x_t) drops the
Hessian-of-log-p_t term and the steered distribution ends up biased toward
the bias center (MAE against the analytical target ~0.032 in the toy).

Fix: pass create_graph=batch.pos.requires_grad to gmm.score, matching the
pattern used in bioemu2/enhanced_sampling_paper/scripts/gmm. Also replaced
an in-place assignment into zeros with torch.cat for cleaner autograd flow,
removed the now-unnecessary dummy nn.Parameter, and flipped the notebook's
default to use_x0_for_reward=True. MAE drops to ~0.008 with either flag.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants