Question about the monotonic improvement guarantee of MAT.

Very great work! 

I am very interest why MAT can hold the monotonic improvement guarantee while avoids sequential updates. 

To guarantee the monotonic improvement, HAPPO updates each policy one-by-one during training, by leveraging previous update results. That means if we want to update ${\pi}^2_{old}$, we have to wait ${\pi}^1_{new}$. 

There is only a rough discussion about this issue in the paper: 
![image](https://github.com/PKU-MARL/Multi-Agent-Transformer/assets/12772774/f087d8de-c8d0-4103-864d-3b421adbc90e)

After careful checking the HAPPO paper, I found MAT's Eq 5 is not the same as Eq 11 in HAPPO paper. Specifically, MAT's Eq 5 ignores the first term of $M^{i_{1:m}}$ which depends on previous update results, e.g., ${\pi}^1_{new}$. 

 **Can you explain why Eq.5 can guarantee monotonic improvement ？**

This question has been bothering me for a long time and I look forward to getting your reply.

![image](https://github.com/PKU-MARL/Multi-Agent-Transformer/assets/12772774/6bf1e3b5-785a-487b-aef1-b7f9848aa3f2)

![image](https://github.com/PKU-MARL/Multi-Agent-Transformer/assets/12772774/cb8286b8-5bee-4b91-9f1e-03dcd77d3cd2)

![image](https://github.com/PKU-MARL/Multi-Agent-Transformer/assets/12772774/fb508741-d125-46eb-9be0-427c6ecabc8c)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the monotonic improvement guarantee of MAT. #36

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Question about the monotonic improvement guarantee of MAT. #36

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions