Skip to content

pjotrcasteel/Lumi_NET

 
 

Repository files navigation

Lumi (.NET)

Lumi is a small educational programming language implementation written in C#/.NET. It is a hobby project intended to explore language design, parsing, bytecode generation and a simple virtual machine for executing the generated bytecode.

Projects

This repository contains several projects that together form the compiler/runtime toolchain:

Project Description
Lumi.Lexer Lexer/tokenizer that converts source text into a stream of tokens.
Lumi.AST Abstract Syntax Tree node types (Program, BinaryExpression, IfStatement, ForStatement, …) and helper structures (NodeSpan).
Lumi.Parser Recursive-descent parser that consumes tokens and builds the AST.
Lumi.SemanticAnalyzer Semantic analysis layer that validates variable definitions, type compatibility, and symbol references before bytecode generation.
Lumi.Bytecode Bytecode definitions (instruction set, constant pool, locals) and a tree-walking bytecode generator that visits the AST and emits instructions.
Lumi.VM Stack-based virtual machine that executes Lumi bytecode.
Lumi.Engine Entry point — provides a REPL and script runner that wires lexer → parser → semantic analyzer → bytecode generator → VM.
*.Tests MSTest unit-test projects for every layer (Lumi.Lexer.Tests, Lumi.AST.Tests, Lumi.Parser.Tests, Lumi.SemanticAnalyzer.Tests, Lumi.Bytecode.Tests, Lumi.VM.Tests, Lumi.Engine.Tests).

Getting started

  1. Open the solution in Visual Studio 2022/2026 or run dotnet build from the repository root (projects target .NET 10).
  2. Run tests from Test Explorer or dotnet test.
  3. Run Lumi.Engine to start the REPL or execute a .lumi script file.

Language features

The language currently supports:

  • Literals — numbers (42), strings ("hello"), booleans (true / false).
  • Arithmetic & comparison+, -, *, /, %, ==, !=, <, >, <=, >=.
  • Logical operators&&, ||, !.
  • Variable declarationslet, var, const with optional type annotations (let x: int -> 42).
  • Control flowif / else statements.
  • Loopsfor loops with an iterator, start/end range, and optional step (for i in 0..10 step 2 { ... }).
  • Block scoping — braces create new lexical scopes; inner declarations shadow outer ones.
  • Functionsfn declarations with parameters, return statements, and calls.
  • Arrays — array literals and basic support (indexing and methods being added).
  • Print — built-in print statement for output.

For a detailed analysis of implemented vs. missing language features and recommendations for what to add next, see LANGUAGE_FEATURES.md.

Compiler pipeline

Source text
  │
  ▼
Lexer              tokenizes into a stream of tokens
  │
  ▼
Parser             builds an AST from the token stream
  │
  ▼
SemanticAnalyzer   validates variable definitions, type compatibility, symbol references
  │
  ▼
BytecodeGen        walks the AST, emits instructions + constant pool
  │
  ▼
VM                 executes instructions on a value stack

Bytecode overview

Instruction set

Category Instructions
Stack PushConst, Pop
Arithmetic Add, Sub, Mul, Div, Mod, Inc, Dec
Comparison Eq, Neq, Lt, Gt, Leq, Geq
Control flow Jump, JumpIfTrue, JumpIfFalse
Variables LoadVar, StoreVar
Functions CallFn, Return
Misc Print, Nop, Halt

Each instruction is an Instruction value with an InstructionKind discriminator and an optional int or string operand.

Constant pool

The ConstantPool stores literal values (numbers, strings, booleans, null, undefined). Number and string constants are deduplicated so repeated literals occupy a single pool slot. Instructions reference constants by index via PushConst <index>.

Locals and scoping

The LocalManager maintains a stack of scopes. Each scope is a dictionary mapping variable names to Local entries (name, LocalKind, slot Label, declared VarType). EnterScope / ExitScope push and pop scopes; lookup walks from innermost to outermost, enabling shadowing.

Jump patching

Forward jumps (e.g. skipping an else-branch) are emitted with a placeholder operand and a Label. When the target position is reached, PatchLabel back-patches every recorded jump site to point at _instructions.Count. Backward jumps (e.g. the loop-back in a for statement) record the target position directly at emit time and do not use the patching mechanism.

Semantic analysis

The SemanticAnalyzer performs validation on the AST before bytecode generation. It uses a visitor pattern similar to the bytecode generator:

  • Symbol management: A ScopeManager maintains a stack of scopes (dictionaries mapping names to Symbol entries). Each scope tracks variable declarations and function definitions.
  • Type tracking: Symbol records include the inferred or declared type of each variable (TypeKind: Number, String, Boolean, Unknown, etc.) and whether it is read-only (const).
  • Validation rules:
    • Undefined variables: A reference to an undefined variable raises an error before bytecode generation.
    • Redefinition: Declaring the same variable twice in the same scope raises an error.
    • Const reassignment: Attempting to assign to a const variable raises an error.
    • Invalid assignment targets: Assigning to a literal or non-identifier raises an error.
    • Scope rules: Variables are looked up from the innermost to outermost scope, enabling shadowing; variables declared in an inner scope are not visible outside that scope.

The semantic analyzer returns a SemanticAnalysisResult containing any errors found. The compiler stops and reports errors before attempting bytecode generation if semantic analysis fails.

Design notes

readonly struct vs readonly record struct

The codebase uses both forms deliberately:

  • Instruction is a plain readonly struct, not a readonly record struct. Instruction sits on the hottest path in the system — it is stored inline in the List<Instruction> backing array and read every VM cycle. A readonly record struct would synthesize Equals, GetHashCode, operator == / !=, IEquatable<T>, PrintMembers and a ToString override. None of these are needed: instructions are never compared for equality, never used as dictionary keys, and the type already provides a hand-written ToString. The synthesized methods would be dead code that contributes to assembly size and can confuse the JIT's inlining budget. A plain readonly struct with explicit constructors keeps the type minimal and its intent clear.

  • Constant is a readonly record struct. Constants are stored in the ConstantPool which deduplicates by value. Synthesized equality makes Constant usable as a dictionary key or in equality checks without manual boilerplate. Constants are written once at compile time and read at execution time — the few extra bytes of metadata are not on a hot path.

  • Label and Local are readonly record struct types that benefit from positional syntax and synthesized equality (labels are used as dictionary keys in _unpatchedJumps).

  • Value (VM stack value) is a plain readonly struct, for the same reasons as Instruction: it is pushed and popped thousands of times per program and never needs structural equality.

General struct guidance in this codebase

Use a plain readonly struct when the type is on a hot path and you only need construction + field access. Use a readonly record struct when you need value equality, positional deconstruction, or the type is primarily data that benefits from the synthesized members.

Roadmap

See ROADMAP.md for a detailed roadmap of current improvements, planned projects, and long-term vision for the Lumi language.

Contributing

Contributions are welcome. Open an issue or a pull request with a small, focused change. Keep changes well-documented and include tests where applicable.

License

This project is licensed under the MIT License — a permissive open-source license that allows you to use, modify, and distribute this code freely, including in commercial projects. The only requirement is to include a copy of the license and copyright notice.

See LICENSE for the full text.

Contact

This is a personal/hobby project. For questions or suggestions, open an issue in the repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C# 100.0%