-
Notifications
You must be signed in to change notification settings - Fork 278
Description
The order of vulnerability execution defines the order that __discovered_nodes are added internally. There is functionality in place that uses this integer value for downstream tasks. (1, 2, 3, 4, etc)
This also means that the action masking is dependent on the order of vulnerability execution (repro)
Doesn't that mean that during training, the state-action value approximations are based on an integer encoding that changes according to the order of actions at every reset? Checkpointing or transfer learning would also suffer.
For ports, local/remote vulnerabilities, etc you use model.Environment.identifiers when retrieving via integer encoding, so they're fixed.
Let me know if you need any clarification, or if I'm missing something here.
Thanks!