Skip to content

Help needed to reproduce paper's experiments #7

@adalrsjr1

Description

@adalrsjr1

Hello, @MikukuOvO I'm PhD in autonomic computing and runtime adaptation and I got super interested in your work. I read the paper "The Vision of Autonomic Computing: Can LLMs Make It a Reality?" and watched the video. I tried to experiment with the LLM-based multi-agents. However, I couldn't reproduce the experiments, the outcomes are quite different from the paper and video.

I tried to run auto-kube against my local LLM (Ollama -- deepseekr1-1.4 and llama3.2) and it didn't worked, it is missing cloudgpt_aoai.py. Then, I tried the code from paper_artifact_arXiv_2407_14402 and this one seems to work.

I tried a few examples as in the README:

  • on working_mechanism_1: report CPU from your component --component catalogue, scale your component to three replicas --component catalogue,
  • on working_mechanism_2: Reduce the total P99 latency of catalogue and front-end to under 400 ms --components catalogue,front-end

In all cases, I observed the agents reasoning and trying many different things, but they never converge to what I've asked. For example, when I request for working_mechanism_1 report CPU from your component --component catalogue, the agent diverge after a dozen of iterations and start trying to fetch the response time of the service instead of CPU. Stopping after 50 iterations with no results.

I've tried different models (deepseek-r1:32b, deepseek-r1:14b, llama3.2, etc), and in all cases the outcome diverges from the original task. I'm not sure if it is an issue with the models that are very generic or the setup prompts that need to be fine tuned.

Could you provide some guidance, am I missing anything?

Besides GPT, do you have any suggestion about which other model could perform better? -- I don't have access to GPT model.

In regarding to the prompts, have you tried variations on them? Do you have any hint on how could I customize them to achieve better results?

Thx!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions