Translation Gym

demo.mov

This is an environment for quickly implementing and benchmarking LLM-based program translation agents. As of now, we support only C to Rust translation.

Why use this tool?

Implementing an LLM-based solution for full-project translation can be tricky. You can't translate the entire project at one go, so you need to break it up into individual functions. These functions have to be extracted with their dependencies and translated. Then you need to incorporate this function into the project, and test to see that it still works.

Translation Gym takes care of all this effort! It gives you:

a function body
the functions that it calls
values of each argument and the returned value, from a sample execution
any global variables used by this function
definitions of any structure or enum used in this function

In exchange, you need to provide it with:

the function translation
"glue code" in the form of a wrapper function to interface with the source language

Translation Gym will merge this into the project, compile it, run tests, and give you:

compiler feedback
stdout/stderr of the run
a runtime trace with values of each argument and the returned value

You can use this feedback to repair your translation and provide Translation Gym with:

the repaired function translation
repaired "glue code"

In this manner, you can translate all the functions in the project! The entire process is highly customizable, and you can implement your own Translator, Validator and Orchestrator agents (or use our default implementations). Additionally, we also provide a suite of C projects from Coreutils out-of-the-box for benchmarking, and you can add your own datasets too.

Quickstart

The easiest way to run this tool is with Docker and docker-compose. If you do not already have Docker installed, follow the instructions here for Docker, and here for docker-compose. To build the Docker container for our tool, run the following script:

bash build.sh

Each test dataset is built into a separate Docker container. Run the following commands to build all datasets (this should take about 10 minutes in total):

cd data
export USER_ID=$(id -u)
export GROUP_ID=$(id -g)
docker-compose build coreutils && docker-compose build

For LLM-based translation, we support OpenAI, Anthropic, and Google models. First put your API key in models/.env. For example:

# Use `ANTHROPIC_API_KEY`, `GOOGLE_API_KEY` or `OPENAI_API_KEY`, depending on your use case
echo 'OPENAI_API_KEY="<your_key_here>"' > translation_gym/models/.env

Now you are ready to run translation.

bash run.sh toy gpt4o

Here, toy is the name of a sample C program, corresponding to a dataset config in data/datasets.json. gpt4o is a model name, defined in translation_gym/models/__init__.py. You should see output like this:

Translating code in directory: /app/data/toy
Copied over the code to /app/output/toy
Found executable target: toy
Compilation succeeded
Generated executable: /app/output/toy/target/debug/toy
Running tests against the following executable: /app/output/toy/target/debug/toy
Test passed: data/toy/tests/test.sh
Translating function: subtract
Calling LLM for translation
LLM response received
Running tests against the following executable: /app/output/toy/target/debug/toy
Attempt 1/5
Translation succeeded

At the end of this process, the translated Rust project will be in output/toy. The results will be printed like this:

+-------------+---------------+----------+
|   Function  |     Result    | Attempts |
+-------------+---------------+----------+
|   subtract  |    Success    |    1     |
| concatenate |    Success    |    1     |
|     add     |    Success    |    1     |
|    main_0   |    Success    |    5     |
+-------------+---------------+----------+
|   Overall   |      4/4      |          |
+-------------+---------------+----------+

These results are also logged to output/toy/log.json.

Customization

translation_gym is designed to be extensively customizable! To that end, we are continually adding functionality to provide allow developers to implement a wider range of translation algorithms. If you have any specific requests, please open an issue.

Currently, translation_gym allows you to implement your own Orchestrator, Translator, and Validator. The templates for each of those classes are in translation_gym/modules. For example, to implement your own custom translation logic, you would inherit from Translator as follows:

class MyTranslator(Translator):

    def __init__(self, arg1, ...):
        # Some code here

You need to implement two complusory methods, translate and repair.

    def translate(self, func, source_manager, verbose=False):
        # Your translation logic here
        # Return a function translation in the specific format
    
    def repair(self, result, source_manager, verbose=False):
        # Your repair logic here
        # Return a repaired version of the function translation

Now you can use MyTranslator as part of the translation routine. Modify main.py as follows:

...
orchestrator = DefaultOrchestrator()
my_translator = MyTranslator(arg1, ...) # Custom translation logic
validator = DefaultValidator(compile_attempts=5)

engine = TranslationEngine(dataset=dataset,
                            output_dir=args.output_dir,
                            model=args.model,
                            num_attempts=args.num_attempts,
                            verbose=args.verbose)

engine.run(translator=my_translator,
            orchestrator=orchestrator,
            validator=validator)
...

The translation can be run as usual:

bash run.sh toy gpt4o

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
data		data
resources/rust_wrapper		resources/rust_wrapper
tools		tools
translation_gym		translation_gym
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
C_metrics_after.sh		C_metrics_after.sh
Dockerfile		Dockerfile
README.md		README.md
build.sh		build.sh
check_functions.py		check_functions.py
compute_C_metrics.py		compute_C_metrics.py
compute_C_metrics_after.py		compute_C_metrics_after.py
compute_metrics.py		compute_metrics.py
main.py		main.py
metrics.sh		metrics.sh
open_container.sh		open_container.sh
prune_only.sh		prune_only.sh
requirements.txt		requirements.txt
run.sh		run.sh
run_C_metrics.sh		run_C_metrics.sh
temp.txt		temp.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Translation Gym

Why use this tool?

Quickstart

Customization

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Translation Gym

Why use this tool?

Quickstart

Customization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages