Negative rewards and rewarding new successful attack to already owned nodes

1. In [here](https://github.com/microsoft/CyberBattleSim/blob/wiblum/46/cyberbattle/_env/cyberbattle_env.py#L1070), the reward at each step is calculated as reward = max(0., reward). I see the code about penalty in actions.py and I understand why you cancel that. When I tried to remove this max() operation, the reward became highly negative and the agent learns nothing. However, it makes the agent overfit because the reward is so sparse. I think adding some time cost is necessary such as -1 or -0.5.
2. In [here](https://github.com/microsoft/CyberBattleSim/blob/wiblum/46/cyberbattle/simulation/actions.py#L274), when giving the reward for NEW_SUCCESSFULL_ATTACK_REWARD, it does not take into consideration whether the attacked the node is already owned. It's meaningless to attack a node already owned by the attacker. It will make the agent repeatedly launch attacks between owned nodes and ignore discovering new nodes.

In my experiment, I trained an agent with the original reward design in the chain env. The agent can perfectly take ownership of the network in training. When I saved the model and evaluate it with epsilon-greedy, the success rate is only about 90%. When I patched the two points I proposed above and trained an agent with the same parameters, the successful rate for evaluation is about 100%. I think the original reward design makes the agent overfit.

Could you please take a look at the two points and give some feedback? Anyway, thanks again for your codes, it helps with my research, and I even would like to use them in my next research project about online learning :)

_Originally posted by @sherdencooper in https://github.com/microsoft/CyberBattleSim/issues/46#issuecomment-1136458981_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Negative rewards and rewarding new successful attack to already owned nodes #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Negative rewards and rewarding new successful attack to already owned nodes #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions