This is a greenfield educational project. The goal is to build a working SQLite implementation in Haskell — not for production use, but to deeply learn:
- Functional programming: ADTs, pattern matching, monads (IO, State, Reader, Except), monad transformers, property-based testing, binary serialization
- Database internals: B-trees, page-based storage, record formats, SQL parsing, bytecode compilation, virtual machines, transactions
The project directory /Users/jay/lab/experiments/hasql is currently empty.
SQL Text → Tokenizer → Parser → Code Generator → Virtual Machine → B-Tree → Pager → OS/File
───────────────────────────────────── ───────────────────────────────────────────
SQL Compiler (Frontend) Storage Engine (Backend)
We build bottom-up (storage first, SQL last) so each layer has concrete, testable foundations beneath it.
Learning focus: Haskell tooling, project structure, binary I/O with ByteString
- Initialize with
cabal init(library + executable + test-suite) - Core dependencies:
bytestring,binary,megaparsec,text,vector,mtl,QuickCheck,hspec - Module layout:
src/ HaSQLite/ Storage/ Pager.hs -- page I/O Varint.hs -- SQLite varint encoding Record.hs -- record format BTree.hs -- B-tree operations Compiler/ Lexer.hs -- SQL tokenizer Parser.hs -- SQL parser AST.hs -- SQL abstract syntax tree CodeGen.hs -- AST → bytecode VM/ Types.hs -- opcodes, registers, VM state Execute.hs -- bytecode interpreter Core.hs -- top-level API (prepare, step, finalize) app/ Main.hs -- REPL test/ ...
- Implement SQLite's varint format: 1-9 bytes, big-endian, 7 bits per byte with high-bit continuation
- Use
Data.Binary.GetandData.Binary.Putmonads - FP concept:
Get/Putmonads as domain-specific interpreters for binary serialization - Test: QuickCheck round-trip property —
decode . encode ≡ idfor allInt64values
- Fixed-size pages (4096 bytes default)
- Read/write individual pages by page number using
System.IO+hSeek - Database header (first 100 bytes of page 1): magic string, page size, page count
- FP concept:
IOmonad for file operations,ReaderTfor config (page size) - DB concept: Why pages? Disk I/O alignment, caching granularity
Deliverable: Can create a new database file, read/write raw pages, encode/decode varints.
Learning focus: ADTs for data modeling, pattern matching, recursive data structures
- Define
SQLiteValueADT:data SqlValue = SqlNull | SqlInt Int64 | SqlFloat Double | SqlText Text | SqlBlob ByteString
- Implement record format: header (serial types as varints) + body (values)
- Serial type mapping: 0=NULL, 1-6=ints, 7=float, 8=int 0, 9=int 1, >=12 even=blob, >=13 odd=text
- FP concept: ADTs with exhaustive pattern matching — the compiler enforces you handle every case
- Test: Round-trip property for arbitrary
[SqlValue]lists
- Model B-tree page types as ADT:
data BTreePage = TableLeaf { cells :: [TableLeafCell] } | TableInterior { cells :: [TableInteriorCell], rightChild :: PageNum } | IndexLeaf { cells :: [IndexLeafCell] } | IndexInterior { cells :: [IndexInteriorCell], rightChild :: PageNum }
- Parse cell pointer array, read cells from page bytes
- Table leaf cell: payload size (varint) + rowid (varint) + record payload
- Table interior cell: child page (4 bytes) + rowid (varint)
- FP concept: Sum types to model disjoint page types; each variant carries exactly the data it needs
- DB concept: B-tree variants — table B-trees (integer key → data) vs index B-trees (arbitrary key → no data)
- Recursive tree traversal to scan all rows in a table
- Key lookup: binary search within page, recurse into child for interior nodes
- Range scans: in-order traversal
- FP concept: Recursion over tree structures,
IOmonad threading for page reads - Milestone test: Create a small SQLite database with the real
sqlite3CLI, then read it with HaSQLite — verify you get the same rows back
Deliverable: Can read tables from real SQLite database files.
Learning focus: StateT monad transformer, mutable state management in pure FP
- Insert cell into leaf page (find correct position, shift cells)
- Page splitting when a leaf overflows: allocate new page, split cells, push median key up to parent
- Recursive splitting up through interior nodes
- Growing the tree: when root splits, create new root
- FP concept:
StateToverIO— threading mutable database state (free page list, page cache) through pure-looking code - DB concept: B-tree balancing, why splits maintain O(log n) height
- Simple case: remove cell from leaf
- Underflow handling: borrow from sibling or merge pages
- Test: QuickCheck — insert N random keys, delete a subset, verify remaining keys are intact and tree invariants hold (sorted order, balanced height, page fill constraints)
- Page 1 is always the root of the
sqlite_schematable - Rows:
(type, name, tbl_name, rootpage, sql) - Parse CREATE TABLE statements to extract column names and types
- DB concept: Self-describing databases — the schema is stored in the database itself
Deliverable: Can create tables, insert rows, and persist them. Verified by reading back with real sqlite3.
Learning focus: Parser combinators, building ASTs, Applicative/Monad interplay
- Define types for the SQL subset you support:
data Statement = Select { columns :: [Expr], from :: TableName, whereClause :: Maybe Expr } | Insert { table :: TableName, columns :: [ColumnName], values :: [[Expr]] } | CreateTable { name :: TableName, columns :: [ColumnDef] } | Delete { table :: TableName, whereClause :: Maybe Expr } data Expr = LitInt Int64 | LitText Text | LitNull | ColRef ColumnName | BinOp Op Expr Expr | UnaryOp Op Expr
- FP concept: ADTs as the "universal interface" between compiler phases — each phase consumes one ADT and produces another
- Tokenize: keywords (
SELECT,FROM,WHERE,INSERT,CREATE), identifiers, literals, operators, punctuation - Parse with combinators:
select,insert,createTableparsers composed from smaller pieces - FP concept: Parser combinators —
<|>(alternative),<*>(sequencing),try(backtracking). Parsers are values you compose, not grammar rules you declare - Test: Parse → pretty-print → parse again, verify AST equality
UPDATE ... SET ... WHERE ...DELETE FROM ... WHERE ...- Comparison operators:
=,<>,<,>,<=,>= - Logical operators:
AND,OR,NOT ORDER BY,LIMIT
Deliverable: Can parse a useful subset of SQL into a typed AST.
Learning focus: Interpreter design, register machines, the ST monad for local mutability
- Define opcodes as ADT:
data OpCode = Init | Goto | Halt | OpenRead | OpenWrite | Close | Rewind | Next | Column | ResultRow | SeekGE | SeekLE | Integer | String | Eq | Ne | Lt | Le | Gt | Ge | MakeRecord | Insert | Delete | CreateTable | ...
- Instruction:
data Instruction = Instruction { opcode :: OpCode, p1 :: Int, p2 :: Int, p3 :: Int, p4 :: Maybe P4Value } - VM state: program counter, registers (as
Vector), open cursors (B-tree iterators), result accumulator - FP concept: ADTs for opcodes give exhaustive checking — add a new opcode and the compiler tells you every place that needs updating
- Tight recursive loop with pattern matching on opcodes
- Each opcode handler: read operands → perform action → advance PC (or jump)
- Cursor abstraction over B-tree:
cursorRewind,cursorNext,cursorColumn,cursorEof - FP concept:
StateT VMState IOmonad — the VM loop is pure state threading with IO only at B-tree boundaries - DB concept: Why bytecode? Decouples "what to execute" from "how to execute it" — enables
EXPLAINfor free
- Pretty-print bytecode programs (opcode table like real SQLite's
EXPLAIN) - Invaluable for debugging the code generator in Phase 6
Deliverable: Can manually construct bytecode programs and execute them against the B-tree storage layer.
Learning focus: Compiling high-level representations to low-level ones, full system integration
- Walk the SQL AST, emit bytecode instructions:
SELECT: OpenRead → Rewind → [loop: Column, ResultRow, Next] → Close → HaltINSERT: OpenWrite → MakeRecord → Insert → Close → HaltCREATE TABLE: CreateTable → insert into sqlite_schema → HaltWHEREclauses: emit comparison + conditional jump opcodes
- FP concept: Tree-walking compiler as a pure function
Statement -> [Instruction]— no side effects in compilation - DB concept: Query compilation — the same SQL can produce different bytecode depending on available indexes
- Read-eval-print loop: read SQL → parse → compile → execute → display results
- Handle
.tables,.schema,.quitdot-commands - Display results as formatted ASCII tables
- FP concept: The REPL is a small interpreter loop itself —
forever (readLine >>= parse >>= compile >>= execute >>= display)
- Wire everything together:
Core.prepare(parse + compile),Core.step(VM step),Core.finalize(cleanup) - Monad stack:
ReaderT Config (StateT DBState (ExceptT HaSQLiteError IO)) - Test: Create database with HaSQLite, read with real
sqlite3, and vice versa
Deliverable: Working SQLite-compatible REPL that can CREATE TABLE, INSERT, SELECT, DELETE.
Pick based on interest — each is independently valuable:
| Topic | FP Concepts | DB Concepts |
|---|---|---|
| Transactions & WAL | Bracket pattern for resource safety, MonadMask |
ACID, write-ahead logging, crash recovery |
| Indexes | More ADT variants, generalized B-tree | Secondary indexes, covering indexes, query planning |
| Joins | Lazy evaluation for nested-loop joins | Join algorithms (nested loop, sort-merge, hash) |
| Aggregates | Folds and accumulators | GROUP BY, HAVING, aggregate functions |
| Page cache | IORef/MVar for mutable cache, LRU eviction |
Buffer pool management, cache hit rates |
| Concurrent reads | MVar/STM for reader-writer locks |
MVCC, isolation levels |
| Property testing | quickcheck-state-machine |
Model-based testing: verify HaSQLite matches a Map-based reference |
| Library | Purpose | Phase |
|---|---|---|
bytestring |
Binary data, page buffers | 1+ |
binary |
Get/Put monads for serialization |
1+ |
megaparsec |
SQL parser combinators | 4 |
text |
Text handling for SQL strings | 1+ |
vector |
VM registers, efficient arrays | 5 |
mtl |
Monad transformers (ReaderT, StateT, ExceptT) | 3+ |
QuickCheck |
Property-based testing | 1+ |
hspec |
Test framework | 1+ |
At each phase boundary:
- Unit tests: QuickCheck properties + hspec examples for each module
- Integration tests: Round-trip with real
sqlite3CLI starting from Phase 2 - EXPLAIN: From Phase 5 onward, use
EXPLAINoutput to verify compiled bytecode - Compatibility check: Create DB with HaSQLite → read with
sqlite3, and vice versa