Replies: 4 comments 5 replies
-
|
Hey @meditans , thanks for reading the paper and trying to recreate it! I'll talk you through your questions one by one. Mostly you're right, and most of your confusion comes from my own laziness in not cleaning up the code in the process of writing the paper :) Let's go into them
I hope this helps! If you have any more questions, feel free to ask! |
Beta Was this translation helpful? Give feedback.
-
|
Hi @wouterwln, thank you so much for the answers. No worries, it all makes sense! I have just two follow up questions that stem from debugging my own adaptation of the code.
I traced back this problem to the published code in biaslab/EFEasVFE, and I wonder if you could confirm. In your function generate_maze_tensors(grid_size_x::Int, grid_size_y::Int, n_actions::Int;
sink_states::Vector{Tuple{Int,Int}}=[(4, 2), (4, 4)],
stochastic_states::Vector{Tuple{Int,Int}}=[(2, 3), (3, 3), (4, 3)],
- noisy_observations::Vector{Tuple{Int,Int,Float64}}=[(1, 5, 0.1), (2, 5, 0.1), (3, 5, 0.4), (4, 5, 0.1), (2, 3, 0.4), (2, 4, 0.4), (3, 2, 0.4), (3, 3, 0.4), (4, 3, 0.2), (2, 2, 0.3), (3, 2, 0.4), (3, 4, 0.4),])
+ noisy_observations::Vector{Tuple{Int,Int,Float64}}=[(1, 5, 0.0)])Look at how the performance of the EFE agent drops: It seems to me that the agent was avoiding the stochastic transitions because they happened to be along paths of noisy observability, but it happily tries (and often fails) if it can observe clearly. But maybe I'm missing something important. Thank you again! |
Beta Was this translation helpful? Give feedback.
-
|
Hey @meditans and @skoghoern , thanks for your responses. I'll try and give some insight: Exactly, the point of the paper is that we do no longer need to enumerate all possible sequences of actions. Essentially what we are doing here is the same "trick" as variational inference: Bayesian inference is an intractable problem, meaning it requires an enumeration of all possible configurations. When introducing an auxiliary variational distribution As for your second question, and here's where I have to correct @skoghoern a bit: You're right that the agent prefers stochastic transitions; this is what the epistemic prior on action essentially says. The awkward thing with this optimization objective is that the two epistemic priors have a bit of a counteracting effect; one wants to maximize the entropy of the state belief of future states (to explore), and another wants the entropy of future state beliefs to be minimal (identified). In my further work on this topic I have found that this objective is very hard to optimize because of these counteracting forces. It could very well be the case that putting in different values of transition and observation noise is detrimental for performance. However, the behavior is perfectly rational, think about it like this: The agent does not know if the uncertainty in the transition distribution of these noisy states is epistemic or purely aleatoric (we know it is aleatoric), so for the agent, it makes perfect sense to try these transitions, as it is actively trying to explore these transitions to reduce uncertainty about them. |
Beta Was this translation helpful? Give feedback.
-
|
Since this thread is about EFEasVFE: @model function efe_tmaze_agent(reward_observation_tensor, location_transition_tensor, prior_location, prior_reward_location, reward_to_location_mapping, u_prev, T, reward_observation, location_observation)
old_location ~ prior_location # =beliefs.location
reward_location ~ prior_reward_location # =beliefs.reward_location
current_location ~ DiscreteTransition(old_location, location_transition_tensor, u_prev)
location_observation ~ DiscreteTransition(current_location, diageye(5))
reward_observation ~ DiscreteTransition(current_location, reward_observation_tensor, reward_location)
previous_location = current_location
for t in 1:T
# Epistemic Action Prior (Exploration Node) - ideally u[t] ~ Categorical([0.25, 0.25, 0.25, 0.25])
u[t] ~ Categorical(calculate_epistemic_action_prior(reward_observation))
location[t] ~ DiscreteTransition(previous_location, location_transition_tensor, u[t])
# Epistemic State Prior (Ambiguity Node) - ideally location[t] ~ Categorical([1/3, 1/6, 1/6, 1/6, 1/6])
location[t] ~ Categorical(calculate_epistemic_prior_vec(reward_observation_tensor))
previous_location = location[t]
end
location[end] ~ DiscreteTransition(reward_location, reward_to_location_mapping)
endwhere i defined: function calculate_epistemic_action_prior(tensor)
return [0.25, 0.25, 0.25, 0.25]
end
function calculate_epistemic_prior_vec(tensor)
return [1/3, 1/6, 1/6, 1/6, 1/6]
end |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! I recently read @wouterwln, @Nimrais's and @ThijsvdLaar's very insightful paper, "A message passing realization of EFE minimization". While working through the code I distilled the stochastic maze example into a minimal single-file presentation, that could be eventually added to the RxInfer examples. Trying to understand the ideas behind it, I've encountered six questions I'd like to hear your thoughts on. I apologize for the combined length of the post, but it seemed better than dividing the questions in separate threads.
1 - Closing
y_future[t]half-edgesIn the model definition,
y_futureis described as:I believe the categorical distribution constraint is added because otherwise
y_futureedges would remain half-open.I tried substituting the constraint with
y_future[t] ~ Uninformative(), but I couldn't find a combination that worked.If I understand correctly, the difference is that, in the
Uninformativecase no message is sent from the node, while theCategoricalnode nudges the variable toward the uniform distribution. That's somehow extraneous to the model. Whydoesn't this cause problems in practice?
2 - The lack of prior on observations:
The paper defines three types of priors:
Am I correct in thinking that the third type of node is absent because it concerns parameter estimation, and this model doesn't include parameters? For instance, if we were to learn the slippery coefficient of each state (which remains constant across runs), would we then need to include this prior? How?
3 - On missing marginal rules
While writing a minimal version of the code, I realized that if the marginal rules that introduce the handling of
JointMarginalStoragewere missing, the result would be incorrect, (since the version without an explicit meta would be used, andJointMarginalMetawould never be updated), yet crucially no error would be raised.Would it be worth to notify the user when the meta parameter goes unused to prevent this kind of issue?
4 - On which rules are fired
This is related to the previous question. While debugging that problem, I wanted to know which @rules are fired during the inference process. Is there a recommended way to inspect this? By the way, this might be something that could
be shown to the user in the new observation library, @bvdmitri.
5 - On the meaning of the sumdims parameter
In the definition of
JointMarginalMetaComponent, there's asumdimsparameter. Since it's never used as a runtime value, I assume it helps selecting the correct dispatch as a type (similar toout_dimandin_dim). Is that correct?Could you explain why it's needed and what it represents?
6 - On the true dependencies of Exploration and Ambiguity nodes
Consider the
Explorationnode. Nominally, it depends they_currentvariable (based on how the model is connected), but the rule doesn't actually use the information coming from that node. In fact, the only thing that matters for exploration is the joint distribution calculated during the marginal rule in the state transition node, which is passed through viaJointMarginalMeta.Here's a crude drawing: the red connections represent the hidden links through which the marginal information is shared. Again, both
ExplorationandAmbiguitynodes nominally depend ony_current.My question is, is
y_currentsimply a place-holder input that we need because an incoming message is required to trigger the rule? Could we have used something else instead?Moreover, regarding the scheduling of messages, what prompts the
Explorationnode to send a message? It can't bey_currentagain (?) and I'm not sure whether a change in the meta can trigger a message from the node. I'm just confused on this.Beta Was this translation helpful? Give feedback.
All reactions