Help needed to reproduce paper's experiments

Hello, @MikukuOvO I'm PhD in autonomic computing and runtime adaptation and I got super interested in your work. I read the paper "[The Vision of Autonomic Computing: Can LLMs Make It a Reality?](https://arxiv.org/pdf/2407.14402)" and watched the [video](https://youtu.be/IFFLb5mgzY0). I tried to experiment with the LLM-based multi-agents. However, I couldn't reproduce the experiments, the outcomes are quite different from the paper and video.

I tried to run [auto-kube](https://github.com/microsoft/ACV/tree/main/self_managing_systems/microservice/AutoKube) against my local LLM (Ollama -- deepseekr1-1.4 and llama3.2) and it didn't worked, it is missing `cloudgpt_aoai.py`. Then, I tried the code from [paper_artifact_arXiv_2407_14402](https://github.com/microsoft/ACV/tree/main/self_managing_systems/microservice/paper_artifact_arXiv_2407_14402) and this one seems to work.

I tried a few examples as in the README:
* on `working_mechanism_1`: `report CPU from your component --component catalogue`, `scale your component to three replicas --component catalogue`,
* on `working_mechanism_2`: `Reduce the total P99 latency of catalogue and front-end to under 400 ms  --components catalogue,front-end`

In all cases, I observed the agents reasoning and trying many different things, but they never converge to what I've asked. For example, when I request for `working_mechanism_1 report CPU from your component --component catalogue`, the agent diverge after a dozen of iterations and start trying to fetch the response time of the service instead of CPU. Stopping after 50 iterations with no results.

I've tried different models (deepseek-r1:32b, deepseek-r1:14b, llama3.2, etc), and in all cases the outcome diverges from the original task. I'm not sure if it is an issue with the models that are very generic or the setup [prompts](https://github.com/microsoft/ACV/tree/main/self_managing_systems/microservice/paper_artifact_arXiv_2407_14402/prompts) that need to be fine tuned.

Could you provide some guidance, am I missing anything?

Besides GPT, do you have any suggestion about which other model could perform better? -- I don't have access to GPT model.

In regarding to the [prompts](https://github.com/microsoft/ACV/tree/main/self_managing_systems/microservice/paper_artifact_arXiv_2407_14402/prompts), have you tried variations on them? Do you have any hint on how could I customize them to achieve better results?


Thx!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help needed to reproduce paper's experiments #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Help needed to reproduce paper's experiments #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions