Skip to content

new predict method API, and reduced sampling in input space#404

Merged
odunbar merged 36 commits intomainfrom
orad/new-predict-method
Apr 2, 2026
Merged

new predict method API, and reduced sampling in input space#404
odunbar merged 36 commits intomainfrom
orad/new-predict-method

Conversation

@odunbar
Copy link
Copy Markdown
Member

@odunbar odunbar commented Mar 6, 2026

Co-authored by @ArneBouillon

Purpose

Content

  • For the predict() for both Emulator and ForwardMapWrapper
    • Added deprecation notice to transform_to_real=false kwarg. Removed this from all tests (except to test the deprecation message)
    • Added add_obs_noise_cov=false kwarg to predict. when set true, this adds the regularization to the machine learning tool
    • Added encode=nothing to the predict method, when set to "in", "out", "in_and_out" it will take in / push out encoded/decoded outputs.
    • When encoding inputs, with ForwardMapWrapper, samples are decoded into the output space before application

API is now improved for user experience

# (User) typical to predict at new points:
y_pred, y_pred_cov = predict(em_or_fmw, new_inputs) 

# (Internal) call within encoded MCMC, with full uncertainty
g_pred, g_pred_cov = predict(em_or_fmw, new_inputs, encode="in_and_out", add_obs_noise_cov=true) 
  • When retrieving MCMC posterior, if a sufficiently lossy encoder is used noise is injected into the null space from the prior (and correlation-aware). This is separated so one can precompute quantities, stored in NoiseInjector and then use decode_and_add_noise(noise_injector, samples) to apply the decoding. Users can also set
    • noise_injector_threshold (to determine how lossy before injection)
    • noise_injector_scaling (in case the injection results in instability due to Gaussian assumptions)
  • Added utility get_encoder_from_schedule to return E,b from an encoder_schedule defining an affine encoding Ex + b
  • Added new docstrings and improved existing docstrings
  • Full unit testing.

Misc

Detail on current encoding setup

Current settings: full (F) vs reduced (R) space

  • Emulator space (R)
  • MCMC: initial conditions (R)
  • MCMC: computing likelihood (R)
  • MCMC: stored encoded prior distribution (R)
  • MCMC: computing logpdf(encoded_prior,x) (R)
  • MCMC: output posterior samples (F) (decoding each posterior sample + add correlated noise in null space from the prior)
  • ForwardMapWrapper: predict, (F) (decoding each posterior sample + add correlated noise in null space from the prior)

Darcy example - [updated since review]

We compare, lossless encoding. Lossy encoding without inflation, Lossy encoding (retain 99.5%var) with inflation, and the truth. We look for the lossy encoding to by increased by using the prior variance. The D utility indicates the amount of contraction from prior to posterior (by comparison of determinants)

Lossless - (20D->50D), D utility O(10^11)
GP_posterior_pointwise_uq_noloss
Lossy - inflate - (19D->15D), D utility O(10^11)
GP_posterior_pointwise_uq_inflated
Lossy - no inflate - (19D->15D), D utility O(10^29)
GP_posterior_pointwise_uq_non_inflate
true
output_true

Sinusoid example

Encoding truncates to ignore one of the (independent) parameters. So null space of the encoder means one samples the prior. Here we compare inflation with scaling 1, 0.2, and no inflation.
With inflation (GP, RF, ForwardMap)
sinusoid_MCMC_hist_GP_inflate sinusoid_MCMC_hist_RF_inflate sinusoid_MCMC_hist_FM_inflate
With inflation scaled by 0.2 (GP, RF, ForwardMap)

sinusoid_MCMC_hist_GP sinusoid_MCMC_hist_RF sinusoid_MCMC_hist_FM
Without inflation (GP, RF, ForwardMap)
sinusoid_MCMC_hist_GP sinusoid_MCMC_hist_RF sinusoid_MCMC_hist_FM


  • I have read and checked the items on the review checklist.

@odunbar odunbar force-pushed the orad/new-predict-method branch from 42f37c7 to 105d172 Compare March 6, 2026 20:11
odunbar and others added 4 commits March 6, 2026 12:16
Co-authored-by: ArneBouillon <45404227+ArneBouillon@users.noreply.github.com>
deprecate transform_to_real

replace transform_to_real kwarg, and add add_obs_noise_cov
@odunbar odunbar force-pushed the orad/new-predict-method branch from 105d172 to b02758f Compare March 6, 2026 20:18
@odunbar odunbar changed the title new predict method new predict method API, and reduced sampling in input space Mar 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 94.91525% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.23%. Comparing base (c011a05) to head (5c88c18).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/MachineLearningTools/GaussianProcess.jl 76.92% 3 Missing ⚠️
src/Utilities.jl 95.52% 3 Missing ⚠️
src/Emulator.jl 95.65% 2 Missing ⚠️
src/MachineLearningTools/VectorRandomFeature.jl 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #404      +/-   ##
==========================================
+ Coverage   94.20%   94.23%   +0.02%     
==========================================
  Files          10       10              
  Lines        1813     1977     +164     
==========================================
+ Hits         1708     1863     +155     
- Misses        105      114       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@odunbar odunbar requested a review from ArneBouillon March 27, 2026 20:34
Comment on lines +518 to +521
if add_obs_noise_cov
for i in 1:size(σ2, 2)
σ2[:, i] .= σ2[:, i] + gp.regularization
end
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be difficult to change, but it seems like a bit of a contrast to me to use the very explicit add_obs_noise_cov and, on the other hand, use the vague gp.regularization to contain this noise cov.

Comment thread src/Emulator.jl
Comment thread src/Emulator.jl Outdated
Comment thread src/MarkovChainMonteCarlo.jl
Comment thread src/MarkovChainMonteCarlo.jl Outdated
Comment thread src/MarkovChainMonteCarlo.jl Outdated
@odunbar
Copy link
Copy Markdown
Member Author

odunbar commented Mar 31, 2026

Update:

  • I have encoded the gaussian prior for the logpdf computations.
  • I have created an offline store NoiseInjector in utilities that holds the gain etc. for doing decode_and_add_noise. This now makes it efficient to apply both in the "one-off" scenario to decode posterior samples, but also in the "every-step" scenario of an MCMC that requires the forward_map_wrapper to decode.
  • I have added a scaling argument to the NoiseInjector so that one can scale amount of injected noise. This is to allow increase robustness when the forward map is highly nonlinear

Comment thread src/Utilities.jl Outdated
Comment thread src/Utilities.jl Outdated
Comment thread src/Emulator.jl Outdated
Comment thread src/MarkovChainMonteCarlo.jl Outdated
@odunbar odunbar merged commit f6a2727 into main Apr 2, 2026
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants