MLIR based Brainfuck Compiler in Python

This project is an experiment to implement a Compiler using MLIR. Brainfuck was choosen as a programming language for its simplicity.

To keep more true to the stack I would use for a real programming language a Lark based grammar is used to lex and parse the Brainfuck code. To create the MLIR i tried both, the native LLVM MLIR Python bindings, and the Python xDSL package.

Syntax Tree

So a simple Brainfuck code like >+>+[<] is parsed as this Syntax Tree:

Tree(Token('RULE', 'start'), [
    Token('MOVE_RIGHT', '>'), 
    Token('INCREMENT', '+'),
    Token('MOVE_RIGHT', '>'),
    Token('INCREMENT', '+'),
    Tree(Token('RULE', 'loop'), [
        Token('MOVE_LEFT', '<')
    ])
])

Initial "free" Dialect

This Tree is then converted (native, xDSL) to the first MLIR dialect (native, xDSL). This first dialect does not contain any SSA values, yet, but mimics the syntax tree very closely:

builtin.module {
  func.func @main() {
    "bf.free.right"() : () -> ()
    "bf.free.inc"() : () -> ()
    "bf.free.right"() : () -> ()
    "bf.free.inc"() : () -> ()
    "bf.free.loop"() ({
      "bf.free.left"() : () -> ()
    }) : () -> ()
    func.return
  }
}

Lowering to linked Dialect

In a first lowering step (native, xDSL) this "free" dialect is lowered to the "linked" dialect (native, xDSL). In this second dialect every Operation that uses the memory index receives the index as an operand. Every operation, that modifies the index will produce a new index as a result. (This lowering was accomplished with a custom Tree-Walk that seems to be unconvential for xDSL?)

builtin.module {
  func.func @main() {
    %0 = arith.constant 0 : index
    %1 = "bf.linked.right"(%0) : (index) -> index
    "bf.linked.inc"(%1) : (index) -> ()
    %2 = "bf.linked.right"(%1) : (index) -> index
    "bf.linked.inc"(%2) : (index) -> ()
    %3 = "bf.linked.loop"(%2) ({
    ^bb0(%4 : index):
      %5 = "bf.linked.left"(%4) : (index) -> index
      "bf.linked.loop.end"(%5) : (index) -> ()
    }) : (index) -> index
    func.return
  }
}

Lowering to builtins

Using a second lowering step (native, xDSL) these linked ops are converted to the builtin dialects. Namely, arith, scf, and memref. The output and input operands (not displayed here, though arguably the most interesting) are converted to llvm.inline_asm to get a read/write syscall.

builtin.module {
  func.func @main() {
    // init
    %0 = arith.constant 0 : index
    %const_one = arith.constant 1 : index
    %1 = arith.constant 0 : i8
    %const_one_ui8 = arith.constant 1 : i8
    %index_mask = arith.constant 32767 : index
    %const_size = arith.constant 32768 : index
    %memory = memref.alloc() : memref<32768xi8>
    scf.for %2 = %0 to %const_size step %const_one {
      memref.store %1, %memory[%2] : memref<32768xi8>
    }
    %3 = arith.constant 0 : index
    // >
    %4 = arith.addi %3, %const_one : index
    %5 = arith.andi %4, %index_mask : index
    // +
    %6 = memref.load %memory[%5] : memref<32768xi8>
    %7 = arith.addi %6, %const_one_ui8 : i8
    memref.store %7, %memory[%5] : memref<32768xi8>
    // >
    %8 = arith.addi %5, %const_one : index
    %9 = arith.andi %8, %index_mask : index
    // +
    %10 = memref.load %memory[%9] : memref<32768xi8>
    %11 = arith.addi %10, %const_one_ui8 : i8
    memref.store %11, %memory[%9] : memref<32768xi8>
    // [ ... ]
    %12 = scf.while (%13 = %9) : (index) -> index {
      %14 = memref.load %memory[%13] : memref<32768xi8>
      %15 = arith.constant 0 : i8
      %16 = arith.cmpi ugt, %14, %15 : i8
      scf.condition(%16) %13 : index
    } do {
    ^bb0(%17 : index):
      // <
      %18 = arith.subi %17, %const_one : index
      %19 = arith.andi %18, %index_mask : index
      scf.yield %19 : index
    }
    func.return
  }
}

This MLIR can then be further lowered/optimized using the mlir-opt tool. With the native bindings the necessary passes can be triggered from Python itself. Afterwards the optimized MLIR can be translated to LLVM-IR with mlir-translate, converted to assembly with llc and then compiled using clang. See the Makefiles (native, xDSL), that can be used to compile .bf code to .out exceutables, for the exact commands.

Devcontainer

This project contains a Devcontainer Configuration, however it compiles the llvm-project to get the mlir executables, so first startup can take a while and it is not very optimized in general.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.devcontainer		.devcontainer
py_mlir_bf_compiler_native		py_mlir_bf_compiler_native
py_mlir_bf_compiler_xdsl		py_mlir_bf_compiler_xdsl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile_native		Makefile_native
Makefile_xdsl		Makefile_xdsl
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLIR based Brainfuck Compiler in Python

Syntax Tree

Initial "free" Dialect

Lowering to linked Dialect

Lowering to builtins

Devcontainer

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLIR based Brainfuck Compiler in Python

Syntax Tree

Initial "free" Dialect

Lowering to linked Dialect

Lowering to builtins

Devcontainer

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages