Lumi is a small educational programming language implementation written in C#/.NET. It is a hobby project intended to explore language design, parsing, bytecode generation and a simple virtual machine for executing the generated bytecode.
This repository contains several projects that together form the compiler/runtime toolchain:
| Project | Description |
|---|---|
Lumi.Lexer |
Lexer/tokenizer that converts source text into a stream of tokens. |
Lumi.AST |
Abstract Syntax Tree node types (Program, BinaryExpression, IfStatement, ForStatement, …) and helper structures (NodeSpan). |
Lumi.Parser |
Recursive-descent parser that consumes tokens and builds the AST. |
Lumi.SemanticAnalyzer |
Semantic analysis layer that validates variable definitions, type compatibility, and symbol references before bytecode generation. |
Lumi.Bytecode |
Bytecode definitions (instruction set, constant pool, locals) and a tree-walking bytecode generator that visits the AST and emits instructions. |
Lumi.VM |
Stack-based virtual machine that executes Lumi bytecode. |
Lumi.Engine |
Entry point — provides a REPL and script runner that wires lexer → parser → semantic analyzer → bytecode generator → VM. |
*.Tests |
MSTest unit-test projects for every layer (Lumi.Lexer.Tests, Lumi.AST.Tests, Lumi.Parser.Tests, Lumi.SemanticAnalyzer.Tests, Lumi.Bytecode.Tests, Lumi.VM.Tests, Lumi.Engine.Tests). |
- Open the solution in Visual Studio 2022/2026 or run
dotnet buildfrom the repository root (projects target .NET 10). - Run tests from Test Explorer or
dotnet test. - Run
Lumi.Engineto start the REPL or execute a.lumiscript file.
The language currently supports:
- Literals — numbers (
42), strings ("hello"), booleans (true/false). - Arithmetic & comparison —
+,-,*,/,%,==,!=,<,>,<=,>=. - Logical operators —
&&,||,!. - Variable declarations —
let,var,constwith optional type annotations (let x: int -> 42). - Control flow —
if/elsestatements. - Loops —
forloops with an iterator, start/end range, and optional step (for i in 0..10 step 2 { ... }). - Block scoping — braces create new lexical scopes; inner declarations shadow outer ones.
- Functions —
fndeclarations with parameters, return statements, and calls. - Arrays — array literals and basic support (indexing and methods being added).
- Print — built-in
printstatement for output.
For a detailed analysis of implemented vs. missing language features and recommendations for what to add next, see LANGUAGE_FEATURES.md.
Source text
│
▼
Lexer tokenizes into a stream of tokens
│
▼
Parser builds an AST from the token stream
│
▼
SemanticAnalyzer validates variable definitions, type compatibility, symbol references
│
▼
BytecodeGen walks the AST, emits instructions + constant pool
│
▼
VM executes instructions on a value stack
| Category | Instructions |
|---|---|
| Stack | PushConst, Pop |
| Arithmetic | Add, Sub, Mul, Div, Mod, Inc, Dec |
| Comparison | Eq, Neq, Lt, Gt, Leq, Geq |
| Control flow | Jump, JumpIfTrue, JumpIfFalse |
| Variables | LoadVar, StoreVar |
| Functions | CallFn, Return |
| Misc | Print, Nop, Halt |
Each instruction is an Instruction value with an InstructionKind discriminator and an optional int or string operand.
The ConstantPool stores literal values (numbers, strings, booleans, null, undefined). Number and string constants are deduplicated so repeated literals occupy a single pool slot. Instructions reference constants by index via PushConst <index>.
The LocalManager maintains a stack of scopes. Each scope is a dictionary mapping variable names to Local entries (name, LocalKind, slot Label, declared VarType). EnterScope / ExitScope push and pop scopes; lookup walks from innermost to outermost, enabling shadowing.
Forward jumps (e.g. skipping an else-branch) are emitted with a placeholder operand and a Label. When the target position is reached, PatchLabel back-patches every recorded jump site to point at _instructions.Count. Backward jumps (e.g. the loop-back in a for statement) record the target position directly at emit time and do not use the patching mechanism.
The SemanticAnalyzer performs validation on the AST before bytecode generation. It uses a visitor pattern similar to the bytecode generator:
- Symbol management: A
ScopeManagermaintains a stack of scopes (dictionaries mapping names toSymbolentries). Each scope tracks variable declarations and function definitions. - Type tracking:
Symbolrecords include the inferred or declared type of each variable (TypeKind:Number,String,Boolean,Unknown, etc.) and whether it is read-only (const). - Validation rules:
- Undefined variables: A reference to an undefined variable raises an error before bytecode generation.
- Redefinition: Declaring the same variable twice in the same scope raises an error.
- Const reassignment: Attempting to assign to a
constvariable raises an error. - Invalid assignment targets: Assigning to a literal or non-identifier raises an error.
- Scope rules: Variables are looked up from the innermost to outermost scope, enabling shadowing; variables declared in an inner scope are not visible outside that scope.
The semantic analyzer returns a SemanticAnalysisResult containing any errors found. The compiler stops and reports errors before attempting bytecode generation if semantic analysis fails.
The codebase uses both forms deliberately:
-
Instructionis a plainreadonly struct, not areadonly record struct.Instructionsits on the hottest path in the system — it is stored inline in theList<Instruction>backing array and read every VM cycle. Areadonly record structwould synthesizeEquals,GetHashCode,operator ==/!=,IEquatable<T>,PrintMembersand aToStringoverride. None of these are needed: instructions are never compared for equality, never used as dictionary keys, and the type already provides a hand-writtenToString. The synthesized methods would be dead code that contributes to assembly size and can confuse the JIT's inlining budget. A plainreadonly structwith explicit constructors keeps the type minimal and its intent clear. -
Constantis areadonly record struct. Constants are stored in theConstantPoolwhich deduplicates by value. Synthesized equality makesConstantusable as a dictionary key or in equality checks without manual boilerplate. Constants are written once at compile time and read at execution time — the few extra bytes of metadata are not on a hot path. -
LabelandLocalarereadonly record structtypes that benefit from positional syntax and synthesized equality (labels are used as dictionary keys in_unpatchedJumps). -
Value(VM stack value) is a plainreadonly struct, for the same reasons asInstruction: it is pushed and popped thousands of times per program and never needs structural equality.
Use a plain readonly struct when the type is on a hot path and you only need construction + field access. Use a readonly record struct when you need value equality, positional deconstruction, or the type is primarily data that benefits from the synthesized members.
See ROADMAP.md for a detailed roadmap of current improvements, planned projects, and long-term vision for the Lumi language.
Contributions are welcome. Open an issue or a pull request with a small, focused change. Keep changes well-documented and include tests where applicable.
This project is licensed under the MIT License — a permissive open-source license that allows you to use, modify, and distribute this code freely, including in commercial projects. The only requirement is to include a copy of the license and copyright notice.
See LICENSE for the full text.
This is a personal/hobby project. For questions or suggestions, open an issue in the repository.