Learning transition matrix parameters in STS - predictions seem off #565

albertpod · 2025-11-11T16:58:18Z

albertpod
Nov 11, 2025
Collaborator

Hi! I've been exploring the excellent sts-jax repo by @xinglong-li and @murphyk, and thought it would be interesting to implement something similar in RxInferExamples.jl with a twist: learning the transition matrix F parameters rather than fixing them.

I started with electricity demand example but added priors on the seasonal frequencies and AR coefficient. However, my forecasts are quite off compared to what I'd expect (see attached plot).

Model structure:

6D state space: [level, daily_cos, daily_sin, weekly_cos, weekly_sin, ar]
Removed the slope for now.
Added priors on F parameters: F ~ MvNormal([1.0, 0.1, 1.0, 0.1, -1.0], Σ_F)
Using Wishart prior on QR: QR ~ Wishart(8, diag([1e2, 1.0, 1.0, 1.0, 1.0, 1.0]))

Suspected issue: In the original sts-jax implementation, you use a selection matrix R to create structured covariance R*Q*R'. I'm using an unstructured QR ~ Wishart(...) directly, which might be causing the level component to dominate while seasonals collapse. I tried to work around this with diagonal scaling in the Wishart prior, but it might not be enough.

Dataset: electricity data gist

Minimal code:

using RxInfer, LinearAlgebra, Plots, Statistics

include("data_electicity.jl")
n = length(demand)
hours = 0:(n-1)

# State: [μ, s_d, s_d*, s_w, s_w*, ar] (6D)
D = 6
H = [1.0, 1.0, 0.0, 1.0, 0.0, 1.0]
X = [[temp] for temp in temperature]
function transition(F)
    FT = eltype(F)
    M = zeros(FT, 6, 6)
    M[1,1] = one(FT)
    M[2,2] = F[1]; M[2,3] = F[2]; M[3,2] = -F[2]; M[3,3] = F[1]
    M[4,4] = F[3]; M[4,5] = F[4]; M[5,4] = -F[4]; M[5,5] = F[3]
    M[6,6] = F[5]
    return M
end

@model function rxsts(H, X, y, priors)
    τy ~ priors[:τy]
    β ~ priors[:β]
    # In original STS, they use structured covariance matrix QR
    QR ~ Wishart(priors[:QR].df, priors[:QR].S)
    zprev ~ priors[:z0]
    F ~ priors[:F]
    
    for t in eachindex(y)
        
        z[t] ~ ContinuousTransition(zprev, F, QR) # # z[t] = F*z[t-1] + R[t], R[t] ~ N(0, QR)
        μ[t] ~ dot(H, z[t]) + dot(X[t], β)
        y[t] ~ Normal(mean = μ[t], precision = τy)
        zprev = z[t]
    end
end

@constraints function rxsts_constraints()
    q(z, zprev, F, QR, μ, y, τy, β) = q(z, zprev)q(F)q(QR)q(μ, y)q(τy)q(β)
end

@meta function rxsts_meta()
    ContinuousTransition() -> CTMeta(transition)
end

priors = Dict(
    :τy => GammaShapeRate(10.0, 1.0),
    :β => MvNormalMeanPrecision(ones(1), diageye(1)),
    :z0 => MvNormalMeanPrecision(ones(D), diageye(D)),
    :F => MvNormalMeanPrecision([1.0, 0.1, 1.0, 0.1, -1.0], diagm([1e2, 1e2, 1e2, 1e2, 1.0])),
    :QR => Wishart(D+2, diagm([1e2, 1.0, 1.0, 1.0, 1.0, 1.0]))
)

@initialization function rxsts_init(priors)
    q(τy)    = priors[:τy]
    q(F)     = priors[:F]
    q(QR)    = priors[:QR]
    μ(β)     = priors[:β]
    μ(zprev) = priors[:z0]
end

n_predict = 100
results = infer(
    model = rxsts(H=H, X=X, priors=priors),
    data = (y = [demand[1:end-n_predict]; repeat([missing], n_predict)],),
    constraints = rxsts_constraints(),
    meta = rxsts_meta(),
    initialization = rxsts_init(priors),
    returnvars = KeepLast(),
    iterations = 200,
    showprogress = true,
    options = (limit_stack_depth = 500,)
)

# Extract & decompose
F_param = mean(results.posteriors[:F])
β_mean, β_var = mean(results.posteriors[:β])[1], var(results.posteriors[:β])[1]
τy_mean = mean(results.posteriors[:τy])
z_means = mean.(results.posteriors[:z])
z_vars = var.(results.posteriors[:z])

level_means, level_vars = getindex.(z_means, 1), getindex.(z_vars, 1)
daily_means, daily_vars = getindex.(z_means, 2), getindex.(z_vars, 2)
weekly_means, weekly_vars = getindex.(z_means, 4), getindex.(z_vars, 4)
ar_means, ar_vars = getindex.(z_means, 6), getindex.(z_vars, 6)
temp_means = β_mean .* temperature

# Predictions
y_pred_means = mean.(results.predictions[:y][end])
y_pred_vars = var.(results.predictions[:y][end])
n_train = n - n_predict
n_total = length(y_pred_means)

p_pred = plot(1:n_train, demand[1:n_train], label="Training Data", lw=1.5, alpha=0.7, color=:blue,
    title="Forecast: $(n_predict) Steps Ahead", xlabel="Hour", ylabel="Demand", 
    size=(1000, 500), legend=:topleft)
plot!(n_train+1:n_total, y_pred_means[n_train+1:end], ribbon=2*sqrt.(y_pred_vars[n_train+1:end]),
    label="Forecast (±2σ)", lw=2, color=:red, fillalpha=0.2)
n_real = min(n_total, length(demand))
n_real > n_train && plot!(n_train+1:n_real, demand[n_train+1:n_real], 
    label="Real Demand", lw=1.5, color=:green, alpha=0.7)
vline!([n_train], label="", linestyle=:dash, color=:black, lw=2)
display(p_pred)

println("\n" * "="^60)
println("Learned Parameters:")
println("  Daily:  $(round(F_param[1], digits=3)), $(round(F_param[2], digits=3))")
println("  Weekly: $(round(F_param[3], digits=3)), $(round(F_param[4], digits=3))")
println("  AR(1):  φ = $(round(F_param[5], digits=3))")
println("  β:      $(round(β_mean, digits=3)) ± $(round(sqrt(β_var), digits=3))")
println("  σ_obs:  $(round(1/sqrt(τy_mean), digits=3))")

println("\nComponent Magnitudes:")
println("  Level:  $(round(std(level_means), digits=1))")
println("  Daily:  $(round(std(daily_means), digits=1))")
println("  Weekly: $(round(std(weekly_means), digits=1))")
println("  AR:     $(round(std(ar_means), digits=1))")
println("  Temp:   $(round(std(temp_means), digits=1))")
println("="^60)

# Component decomposition
p1 = plot(hours[1:n_train], demand[1:n_train], label="Data", lw=1, alpha=0.7, title="Observed")
p2 = plot(hours[1:n_train], level_means[1:n_train], ribbon=sqrt.(level_vars[1:n_train]), label="Level", lw=1.5, title="Level", fillalpha=0.3)
p3 = plot(hours[1:n_train], daily_means[1:n_train], ribbon=sqrt.(daily_vars[1:n_train]), label="Daily", lw=1.5, title="Daily", fillalpha=0.3)
p4 = plot(hours[1:n_train], weekly_means[1:n_train], ribbon=sqrt.(weekly_vars[1:n_train]), label="Weekly", lw=1.5, title="Weekly", fillalpha=0.3)
p5 = plot(hours[1:n_train], ar_means[1:n_train], ribbon=sqrt.(ar_vars[1:n_train]), label="AR", lw=1, title="AR", fillalpha=0.3)
p6 = plot(hours[1:n_train], temp_means[1:n_train], label="Temp", lw=1, color=:red, title="Temperature Effect")

Any thoughts on what might be going wrong?
I think supporting this type of models would be great as we could run different what-if scenarios!

P.S. I'm also concerned with the number of iterations needed here...

albertpod · 2025-11-11T16:59:32Z

albertpod
Nov 11, 2025
Collaborator Author

Tagging @mamagarobonomon as it may come relevant for his project.

0 replies

murphyk · 2025-11-11T23:16:44Z

murphyk
Nov 11, 2025

I'm not sure what the problem is. Have you tried debugging by hard-coding F and Q to constants?
(Also maybe ask Chris Sutter, cgs@google.com, who I think wrote some of the original TFP code for this example)

0 replies

albertpod · 2025-11-18T15:44:23Z

albertpod
Nov 18, 2025
Collaborator Author

@murphyk I've fixed the issue! As soon as we impose the structured noise via R matrix, the predictions become dramatically better. I'm really happy with the results — it converges nicely in just 10 iterations (less than 2 sec on my mac).
We're also learning the transition matrix F here with uninformative prior, yet structured. I'll clean this up into a proper notebook, run a few more experiments, and then add it to RxInfer's Model Zoo, eventually will add this to RxInfer MCP. Beautiful stuff — thank you so much for the suggestion!

D = 6
H = [1.0, 1.0, 0.0, 1.0, 0.0, 1.0]
X = [[temp] for temp in temperature]
function transition(F)
    FT = eltype(F)
    M = zeros(FT, 6, 6)
    M[1,1] = one(FT)
    M[2,2] = F[1]; M[2,3] = F[2]; M[3,2] = -F[2]; M[3,3] = F[1]
    M[4,4] = F[3]; M[4,5] = F[4]; M[5,4] = -F[4]; M[5,5] = F[3]
    M[6,6] = F[5]
    return M
end

@model function rxsts(H, X, y, R, priors)
    τy    ~ priors[:τy]
    β     ~ priors[:β]
    Q     ~ Wishart(priors[:Q].df, priors[:Q].S)
    η     ~ MvNormalMeanPrecision(mean(priors[:η]), Q)
    zprev ~ priors[:z0]
    F     ~ priors[:F]
    
    for t in eachindex(y)
        
        z₁[t] ~ ContinuousTransition(zprev, F, diageye(D))
        z₂[t] ~ R*η
        z[t] ~ z₁[t] + z₂[t]

        μ[t] ~ dot(H, z[t]) + dot(X[t], β)
        y[t] ~ Normal(mean = μ[t], precision = τy)
        zprev = z[t]
    end
end

@constraints function rxsts_constraints()
    q(z, z₁, z₂, zprev, F, Q, η, μ, y, τy, β) = q(z, z₁, z₂, zprev)q(F)q(Q)q(η)q(μ, y)q(τy)q(β)
end

@meta function rxsts_meta()
    ContinuousTransition() -> CTMeta(transition)
end

R = [
    1     0    0    0
    0     1    0    0
    0     0    0    0
    0     0    1    0
    0     0    0    0
    0     0    0    1
]


priors = Dict(
    :τy => GammaShapeRate(10.0, 1.0),
    :β => MvNormalMeanPrecision(ones(1), diageye(1)),
    :z0 => MvNormalMeanPrecision(ones(D), diageye(D)),
    :F => MvNormalMeanPrecision([1.0, 1.0, 1.0, 1.0, 1.0], diageye(5)),
    :Q => Wishart(4, diagm([1.0, 1.0, 1.0, 1.0])),
    :η => MvNormalMeanPrecision(zeros(4), diageye(4))
)

@initialization function rxsts_init(priors)
    q(τy)    = priors[:τy]
    q(F)     = priors[:F]
    q(Q)     = priors[:Q]
    q(η)     = priors[:η]
    μ(β)     = priors[:β]
    μ(zprev) = priors[:z0]
    μ(z)     = priors[:z0]
end

n_predict = Int(round(length(demand)*0.1)) # 10%
results = infer(
    model = rxsts(H=H, X=X, R=R, priors=priors),
    data = (y = [demand[1:end-n_predict]; repeat([missing], n_predict)],),
    constraints = rxsts_constraints(),
    meta = rxsts_meta(),
    initialization = rxsts_init(priors),
    returnvars = KeepLast(),
    iterations = 10,
    showprogress = true,
    options = (limit_stack_depth = 500,)
)

@Nimrais this example can be used to check this issue in future #570.

1 reply

Nimrais Nov 21, 2025
Maintainer

I am trying to run this example, however I do not obtain so nice prediction confidence interval:

using RxInfer, LinearAlgebra, Plots, Statistics

include("data_electricity.jl")

D = 6
H = [1.0, 1.0, 0.0, 1.0, 0.0, 1.0]
X = [[temp] for temp in temperature]
function transition(F)
    FT = eltype(F)
    M = zeros(FT, 6, 6)
    M[1,1] = one(FT)
    M[2,2] = F[1]; M[2,3] = F[2]; M[3,2] = -F[2]; M[3,3] = F[1]
    M[4,4] = F[3]; M[4,5] = F[4]; M[5,4] = -F[4]; M[5,5] = F[3]
    M[6,6] = F[5]
    return M
end

@model function rxsts(H, X, y, R, priors)
    τy    ~ priors[:τy]
    β     ~ priors[:β]
    Q     ~ Wishart(priors[:Q].df, priors[:Q].S)
    η     ~ MvNormalMeanPrecision(mean(priors[:η]), Q)
    zprev ~ priors[:z0]
    F     ~ priors[:F]
    
    for t in eachindex(y)
        
        z₁[t] ~ ContinuousTransition(zprev, F, diageye(D))
        z₂[t] ~ R*η
        z[t] ~ z₁[t] + z₂[t]

        μ[t] ~ dot(H, z[t]) + dot(X[t], β)
        y[t] ~ Normal(mean = μ[t], precision = τy)
        zprev = z[t]
    end
end

@constraints function rxsts_constraints()
    q(z, z₁, z₂, zprev, F, Q, η, μ, y, τy, β) = q(z, z₁, z₂, zprev)q(F)q(Q)q(η)q(μ, y)q(τy)q(β)
end

@meta function rxsts_meta()
    ContinuousTransition() -> CTMeta(transition)
end

R = [
    1     0    0    0
    0     1    0    0
    0     0    0    0
    0     0    1    0
    0     0    0    0
    0     0    0    1
]


priors = Dict(
    :τy => GammaShapeRate(10.0, 1.0),
    :β => MvNormalMeanPrecision(ones(1), diageye(1)),
    :z0 => MvNormalMeanPrecision(ones(D), diageye(D)),
    :F => MvNormalMeanPrecision([1.0, 1.0, 1.0, 1.0, 1.0], diageye(5)),
    :Q => Wishart(4, diagm([1.0, 1.0, 1.0, 1.0])),
    :η => MvNormalMeanPrecision(zeros(4), diageye(4))
)

@initialization function rxsts_init(priors)
    q(τy)    = priors[:τy]
    q(F)     = priors[:F]
    q(Q)     = priors[:Q]
    q(η)     = priors[:η]
    μ(β)     = priors[:β]
    μ(zprev) = priors[:z0]
    μ(z)     = priors[:z0]
end

n_predict = Int(round(length(demand)*0.1)) # 10%
results = infer(
    model = rxsts(H=H, X=X, R=R, priors=priors),
    data = (y = [demand[1:end-n_predict]; repeat([missing], n_predict)],),
    constraints = rxsts_constraints(),
    meta = rxsts_meta(),
    initialization = rxsts_init(priors),
    returnvars = KeepLast(),
    iterations = 10,
    showprogress = true,
    options = (limit_stack_depth = 500,)
)

μ_sub = results.posteriors[:μ]
μ_mean = mean.(μ_sub)
μ_std = std.(μ_sub)

p = plot(demand, label = "True Demand", legend = :topleft, title = "Electricity Demand Prediction", xlabel = "Time", ylabel = "Demand")
plot!(p, μ_mean, ribbon = 2 .* μ_std, label = "Predicted (Mean ± 2σ)", alpha = 0.5)
vline!(p, [length(demand) - n_predict], label = "Prediction Start", linestyle = :dash, color = :red)
savefig(p, "demand_prediction.png")

How can I reproduce your picture?

murphyk · 2025-11-19T04:06:32Z

murphyk
Nov 19, 2025

I'm glad you fixed the issue. Since you said the state has the form
[level, daily_cos, daily_sin, weekly_cos, weekly_sin, ar], with your diagonal R matrix,
it seems you allow all terms to drift except daily_sin and weekly_sin, which have 0 noise:
Is this because of sum-to-constant constraint between the sin and cos terms?

( I seem to suggest using a similar trick in my book for modeling seasonal effects, see eqn 29.149)

I'm not sure why the original code uses RQR'...

1 reply

albertpod Nov 20, 2025
Collaborator Author

Indeed!

murphyk · 2025-11-19T04:16:20Z

murphyk
Nov 19, 2025

BTW, now that the Gaussian case works, it would be cool try some non-conjugate likelihood, such as Poisson
(very common for modeling integer-valued time series, such as num items sold or num sick patients - see 29.12.5.3 of my book)
https://github.com/probml/sts-jax/blob/main/sts_jax/structural_time_series/demos/sts_poisson_demo.ipynb

Just for fun, you could also try a nonlinear link function (eg a little neural network) to represent p(y|z)= Cat(y|softmax(MLP(z))),
where y is e.g., a word such as "boiling", "hot", "medium", "cold", "freezing". (Obviously posterior infernece of z wil be harder
with such an uninformative likelihood, which is why a strong prior will be useful :)
(Not implemented in sts-jax)

And a version of causal impact would also be cool:
https://github.com/probml/sts-jax/blob/main/sts_jax/causal_impact/causal_impact_demo.ipynb

2 replies

albertpod Nov 20, 2025
Collaborator Author

Thanks @murphyk! I think having a very small neural network is possible, we have some cool nodes available to model categorical outputs, so that will come easily. For Poisson, we can use ProjectedTo functionality available in RxInfer, but it will impact the speed (I want to be very fast), so I guess we will do a few tricks there and file a PR to RxInferExamples.jl

Nimrais Nov 23, 2025
Maintainer

I walked through this and got nerd sniped. PR with running Poisson Likelihood and projections that runs at the same speed as your Gaussian likelihood one :).

https://github.com/ReactiveBayes/RxInferExamples.jl/pull/65/files

ReactiveBayes

Learning transition matrix parameters in STS - predictions seem off #565

Uh oh!

albertpod Nov 11, 2025 Collaborator

Replies: 5 comments · 4 replies

Uh oh!

albertpod Nov 11, 2025 Collaborator Author

Uh oh!

murphyk Nov 11, 2025

Uh oh!

Uh oh!

albertpod Nov 18, 2025 Collaborator Author

Uh oh!

Nimrais Nov 21, 2025 Maintainer

Uh oh!

murphyk Nov 19, 2025

Uh oh!

albertpod Nov 20, 2025 Collaborator Author

Uh oh!

murphyk Nov 19, 2025

Uh oh!

albertpod Nov 20, 2025 Collaborator Author

Uh oh!

Nimrais Nov 23, 2025 Maintainer

albertpod
Nov 11, 2025
Collaborator

Replies: 5 comments 4 replies

albertpod
Nov 11, 2025
Collaborator Author

murphyk
Nov 11, 2025

albertpod
Nov 18, 2025
Collaborator Author

Nimrais Nov 21, 2025
Maintainer

murphyk
Nov 19, 2025

albertpod Nov 20, 2025
Collaborator Author

murphyk
Nov 19, 2025

albertpod Nov 20, 2025
Collaborator Author

Nimrais Nov 23, 2025
Maintainer