- Bear16 is an exploratory, self-directed project made over hundreds of hours during the summer before my freshman year of college.
- Link to Bear16 OS, a retro operating system made with the bear16 tool-chain here.
- The VM and Assembler are written in C++ 23.
- All programming for the Bear16 system must be done in raw assembly.
- The Bear16 architecture is inspired by RISC-V's simplicity, although it deviates in its syntax, directives, and instruction layout. The assembly language is very flexible with automatic immediate inlining.
- Bear16 is a pure Harvard architecture in which ROM and RAM occupy separate address spaces. This was done as a design choice largely to expand system memory given 16-bit address space constraints without the need for banking or wider-than-16-bit register addressing.
- The VM has been optimized heavily and can run well over 100 MHz real-time (tested on M2, r7800x, & Core Ultra 9 285H), although this is, of course, overkill and it has been throttled to ~40 MHz. The VM operates at the RTL and is cycle-accurate.
- Bear16 was primarily an educational endeavor for myself, though the entire toolchain is usable by anyone.
- CLI Usage
- TUI Preview
- Set-up
- Technical Overview
- The Bear16 Design Process and a Toolchain Overview
- Writing Bear16 Assembly Language
usage: b16 [--help | -h]
b16 [--assemble | assemble | -a] <assembly_file> <target_binary_file>
b16 [--run | run | -r] <binary_file> <--dump/no-flag>
b16 [-ar | -ra] <assembly_file> <binary_file> <--dump/no-flag>
b16 [--ui | --tui]
b16 -sd
b16 -cd
b16 -rd
b16 [--version | -v]
b16 [doctor | --doctor]
b16 <no-args>
options:
-a, --assemble, assemble assemble source file(s)
-r, --run, run run the specified rom image
--dump dump process state at termination
-h, --help, help show help message
-v, --version show version information
-u, --ui, --tui start terminal user interface
-ar, -ra assemble and run
-sd, --set-disk set the active disk image
-cd, --check-disk check the path of the current disk
-rd, --reset-disk-disk reset/zero the contents of the current disk
doctor, --doctor restore ".b16.json" config file and rebuild
disks/projects directories if missing
<no-args> start terminal user interface
*note: all args and flags are order-agnostic
example flows: ~ project set-up ~
cd ~/bear16_projects/
mkdir my_project
cd my_project
touch main.asm
touch file_to_be_included.asm # make sure to include this in your entry file!
nvim main.asm # edit in neovim (or your editor of choice)!
mkdir build
~ assemble and run ~
b16 assemble main.asm build/my_executable.bin
b16 run build/my_executable.bin
~~~~~~~~ or ~~~~~~~~
b16 --assemble main.asm --run build/my_executable.bin
~~~~~~~~ or ~~~~~~~~
b16 -ar my_project_entry.asm build/my_executable.bin
================================================================================
| BEAR16 Emulator & Assembler - v1.0 (2025-08-25) |
| Made by Zach Mahan |
================================================================================
| Open Project: |
| (/Users/zachmahan/dev/asm/bear16-os/) |
================================================================================
| [1] Assemble Project (Release) |
| [2] Run Executable (Release) |
| [3] Assemble & Run (Release) |
|------------------------------------------------------------------------------|
| [4] Assemble Project (Debug) |
| [5] Run Executable (Debug) |
| [6] Assemble & Run (Debug) |
|------------------------------------------------------------------------------|
| [P] Open a Different Project |
| [C] Configure |
| [H] Help |
| [Q] Quit |
--------------------------------------------------------------------------------
- Fedora:
sudo dnf install SDL2-devel - Debian/Ubuntu:
sudo apt install libsdl2-dev - Arch:
sudo pacman -S sdl2 git clone --depth=1 https://github.com/zachMahan64/bear16.git ~/where/you/want/bear16/to/live
git clone --depth=1 https://github.com/zachMahan64/bear16.git ~/where/you/want/bear16/to/live
git clone --depth=1 https://github.com/zachMahan64/bear16.git %USERPROFILE%/where/you/want/bear16/to/live
- fill in the path in the git clone command with the actual path you want
- You should export the b16 executable's parent directory to your path for proper CLI support (where your b16 executable lives, like a build directory; directions below).
- Buildable with CMake
- Platform support: Linux, MacOS, and Windows.
- Dependencies:
- CMake >= v3.10
- Make or Ninja
- C++23 compiler:
- Linux & MacOS: gcc or clang
- Windows: MinGW
- SDL2 (for framebuffer and input)
- Nlohmann Json (header-only inside the repo, so no set-up required for this)
cd path/to/bear16
mkdir build && cd build && cmake .. && make
echo 'export PATH="$HOME/path/to/bear16/build:$PATH"' >> ~/.bashrc # or .zshrc, etc.
source ~/.bashrc # ^
- Visit:
https://www.msys2.org/, then download and run the Msys2 setup exe - Open MSYS2 UCRT64 once installed
pacman -Syu
pacman -Su
pacman -S mingw-w64-ucrt-x86_64-gcc mingw-w64-ucrt-x86_64-make mingw-w64-ucrt-x86_64-cmake
cd path/to/bear16
mkdir build && cd build && cmake -G "MinGW Makefiles" && mingw32-make
export PATH="$HOME/path/to/bear16/build:$PATH" >> .bashrc
source ~/.bashrc
- To export to Windows Path (for use in Powershell or cmd):
- (Windows Key) + R > OK
- Path > Edit > Browse or New & select or type the path to b16's build dir
- Now you can just use b16 in your native windows environment
- Download the Bear16 OS; directions are in the repo's README.
- Run
b16 --helpto read over a guide to the CLI. - Run the b16 project-based terminal interface with
b16orb16 --tui-> select "Help" in the menu - Now you have all the tools to run the OS or make your own Bear16 programs.
-
Full ISA spreadsheet available here
-
Bear16 is a 16-bit, little-endian architecture with 16-bit memory addressing and 64-bit fixed-width instructions. It compromises compact instructions for immediacy and simplicity.
-
Instruction format:
- Each instruction is broken up into four distinct 16-bit fields
{16} {16} {16} {16}{opcode} {destination} {source_one} {source_two}- Each opcode contains two sub-fields:
{opcode}{2} {14}- {immediate_flags}{instruction_code}
- The immediate flags can be 00, 01, 10, and 11
- Bit 15 (MSB) indicates that source_one is an immediate; bit 14 indicates the same for source_two.
- This can be specified manually in text by
{op}{imm_suffix}where the suffix can either be absent (00), i1 (10), i2 (01), or i (11). - The immediate system allows for full 16-bit immediate inlining which makes code much more concise, at the cost of each instruction being wide (8 bytes!).
- Example Instruction:
add t0, t1, 2- Explanation: This instruction adds the value stored in register t1 (temporary/volatile 1) and stores the result into t0. Notice how the source_two field contains the immediate value 2. So this instruction really should be:
addi2 t0, t1, 2. Thankfully, the assembler recognizes this necessity and resolves all inlined immediates. Manual suffixing (i1,i2,i) is entirely optional and generally discouraged.
- Each instruction is broken up into four distinct 16-bit fields
-
Directives:
@include "relative_path_if_in_sub_directory/file_name.asm"- For C-style linking
- .data
- Denotes constant data
- .text
- Denotes executable code
.const CONSTANT_NAME = {SOME VALUE}- Value must evaluate to a 16-bit quantity. Supported types:
- Signed/unsigned integers
- 8.8 fixed-point numbers
- ASCII characters
- Pointers or label references
- Other constants (if forward-defined and file-local)
- Expressions are allowed, but:
- Must be wrapped in parentheses
- Must resolve to one of the above value types
- Value must evaluate to a 16-bit quantity. Supported types:
- Expressions:
- Must always be wrapped in parentheses
- Can be used when defining constants or directly inlined in assembly instructions
- Can contain labels or constants
- ROM Layout: 64 KB
- After assembly, the executables layout will be in this order:
[0x0000: entry_point][<varies>: combined_text_section]→ 8-byte aligned[<varies>: combined_data_section]→ Byte aligned
- RAM Layout: 64 KB
[0x0000 - 0x17FF: framebuffer]- 6 KB
[0x1800 - 0x19FF: privileged_space]- 2 KB
- Reserved for memory mapped IO
- See
isa.hor the bear16-os repo for details
[0x1800 - 0x9FFF: heap]- 32 KB
[0xA000-0xFFFF: stack]- 24 KB
The design philosophy behind Bear16 is one of maximizing speed. The emulator, despite being cycle-accurate, can comfortably 100 MHz on modern CPUs. However, it has been calibrated to target ~36-40 MHz. The program's design is Object-Based, following principles of encapsulation, but uses composition with zero inheritance for maximum runtime speed. The general structure of b16 is as follows:
- bear16 interface
- controlled via CLI arguments or the TUI (a click-and-run experience that runs in the terminal).
- owns: assmebler, board
- assembler
- assembles .asm files (links included files recursively -> create an executable binary compliant with the Bear16 ISA)
- owns preprocessor for linking
- assembles in a multi-pass fashion: tokenize -> break into text & data sections -> resolve constant expressions -> resolve references to labels and constants -> convert into binary that Bear16 CPU understands!
- board
- owns a CPU16: the CPU core
- owns memory and disk (ram and rom are std::arrays and disk is an std::vector of uint8_t's)
- owns a Screen (wrapper around an SDL2 window with framebuffer rendering functionality)
- owns a DiskController & a InputController (memory-mapped IO controllers)
- all subcomponents of Board have non-owning views into memory/disk
- all objects follow RAII and everything lives on stack (for maximized cache-locality) besides the SDL2 window and the disk (an std::vector).
- heres the pseudocode code for the Board's main method ("run"):
Board::run():
while cpu.isNotHalted:
poll for inputs
for (numInnerCycles):
cpu.step()
clock.tick()
// this inner loop lets the cpu wait less often for (slow) SDL2 polling
diskController.handleDiskOperation()
if (timeSinceLastFrame >= TARGET_FRAME_TIME):
renderFramebuffer()
- cpu16
- main interface through the "step" method:
void CPU16::step() {
// fetch & decode
parts::Instruction instr(fetchInstruction());
// execute & writeback
execute(instr);
if (!pcIsFrozenThisCycle) {
pc += 8;
}
pcIsFrozenThisCycle = false;
}- Use the python scripts included in the repo to automatically convert images into bitmaps or to create blank bitmaps that you can manually draw 1's onto.
- All syntax is provided in the "Assembly Language Details" section here in the README (for directives/other assembly-time related syntax) and the instruction syntax is defined here.
- The best example is Bear16 OS itself! It is entirely written in raw Bear16 asm lang. Check out the repo to read over the source code if you're curious.
#PROPER RECURSIVE FIBONACCI CALCULATOR, MON 20250609
.text
.const FIB_N = 5
.const STO_LOC = 4096
start:
mov a0, FIB_N
call fibonacci
sw STO_LOC, rv
hlt
fibonacci:
# a0 = n
le base_case, a0, 2 # if a0 <= 1 goto base_case
push a0 # save n
sub a0, a0, 1 # a0 = n - 1
call fibonacci # fib(n - 1)
pop t0 # n -> t0
push rv # save result onto stack
sub a0, t0, 2 # a0 = n - 3
call fibonacci # fib(n - 2)
push rv # save result onto stack
pop t1 # get fib(n - 2) from stack
pop t2 # get fib(n - 2) from stack
add rv, t1, t2 # rv = fib(n-1) + fib(n-2)
ret
base_case:
mov rv, 1
ret#SIMPLE QUADRATIC SOLVER USING A LOOK-UP TABLE, SAT 20250614 (UPDATED TUE 8/19/2025)
.data
#struct: val {2-byte}, sqrt(val) {2-byte}
sqrt_table:
.word 0 1
.word 1 1
.word 4 2
.word 9 3
# ... all the way through 16-bit unsigned integers ->
.word 65025 255
sqrt_table_size:
.word 256
quadratic_params:
_a:
.word 1
_b:
.word -82
_c:
.word 1645
str_a:
.string "a = "
str_b:
.string "b = "
str_c:
.string "c = "
str_ans:
.string "ans: "
str_and:
.string " and "
.text
.const STO_LOC = 4096
.const PRNT_STRT_LOC = 0
start:
call pr_params
call solve_quad
call pr_ans
hlt
pr_ans:
sb s6, ' '
inc s6
add s6, s6, 5
lea t4, 26
romcpy t4, str_ans, 5 #prnt "ans: "
lw a0, STO_LOC
call pr_s_num_four_dig
romcpy s6, str_and, 5 #prnt " and "
add s6, s6, 5
add t0, STO_LOC, 2
lw a0, t0
call pr_s_num_four_dig
sb s6, ' '
ret
pr_params: #print into memory
clr s6 # this will be our print address ptr
add s6, s6, 4
lea t4, 0
romcpy t4, str_a, 4 #prnt "a="
lwrom a0, _a
call pr_s_num_four_dig
sb s6, ' '
add s6, s6, 4
lea t4, 9
romcpy t4, str_b, 4 #prnt "b = "
lwrom a0, _b
call pr_s_num_four_dig
sb s6, ' '
add s6, s6, 4
lea t4, 18
romcpy t4, str_c, 4 #prnt "c = "
lwrom a0, _c
call pr_s_num_four_dig
sb s6, ' '
ret
pr_s_num_four_dig: #fn(a0, &s6) = fn(val_of_num, print_loc_ptr)
comp a0, 0
jcond_neg _neg
fd_main:
divs t0, a0, 1000
add t0, t0, 48
sb s6, t0 # #000
inc s6
divs t0, a0, 100
mods t0, t0, 10
add t0, t0, 48
sb s6, t0 # 0#00
inc s6
divs t0, a0, 10
mods t0, t0, 10
add t0, t0, 48
sb s6, t0 # 00#0
inc s6
mods t0, a0, 10
add t0, t0, 48
sb s6, t0 # 000#
inc s6
ret
_neg:
sb s6, '-'
neg a0, a0
inc s6
jmp fd_main
solve_quad:
lwrom s0, _a
lwrom s1, _b
lwrom s2, _c
call pos_sol
lea t3, STO_LOC
sw t3, rv
call neg_sol
sw t3, 2, rv
ret
pos_sol:
neg t0, s1 # -b
push t0 # save -b
mults t1, s1, s1 # b^2
mults t2, s0, s2 # a*c
mults t2, t2, 4 # 4ac
sub a0, t1, t2 # b^2 - 4ac
lt b_ret a0, 0 # terminate if sqrt is of a neg number
call sqrt # sqrt(a0) = sqrt(b^2 - 4ac)
mov t1, rv #save sqrt(b^2 - 4ac) into t1
mults t2, s0, 2 # 2a
pop t0 #get our beloved -b back
add t0, t0, t1 # -b + sqrt(b^2 - 4ac)
divs t0, t0, t2 # (-b + sqrt(b^2 - 4ac)) / 2a
mov rv, t0 #return answer
ret #"
b_ret:
mov rv, -1
ret
neg_sol:
neg t0, s1 # -b
push t0 # save -b
mults t1, s1, s1 # b^2
mults t2, s0, s2 # a*c
mults t2, t2, 4 # 4ac
sub a0, t1, t2 # b^2 - 4ac
lt bn_ret a0, 0 # terminate if sqrt is of a neg number
call sqrt # sqrt(a0) = sqrt(b^2 - 4ac)
mov t1, rv #save sqrt(b^2 - 4ac) into t1
mults t2, s0, 2 # 2a
pop t0 #get our beloved -b back
sub t0, t0, t1 # -b - sqrt(b^2 - 4ac)
divs t0, t0, t2 # (-b - sqrt(b^2 - 4ac)) / 2a
mov rv, t0 #return answer
ret #"
bn_ret:
mov rv, -1
ret
sqrt:
clr t1 # our lil counter/offsetter
s_loop:
lwrom t2, sqrt_table, t1 # <load from rom> t2, *sqrt_table, offset
eq s_hit, a0, t2
ult overshot, a0, t2
add t1, t1, 4 # inc by 2 word
jmp s_loop
s_hit:
mov s7, 7 # debug, lucky # 7, this is just a little marker to indicate a hit!
add t1, t1, 2
mov s3, t1 # debug
lwrom rv, sqrt_table, t1 # <load from rom> t2, *sqrt_table, new_offset (*sqrt(val))
ret
overshot:
sub t3, t1, 2
lwrom rv, sqrt_table, t3 # <load from rom> t2, *sqrt_table, new_offset (*est_down_sqrt(val))
ret# BABY'S FIRST HELLO WORLD PRINTED TO THE FRAME BUFFER; THIS WAS THE FOUNDATION FOR STRING BLITTING CORE UTILITY IN THE BEAR16 OS
.data
fb_data_start:
.byte 0b00010000 0b10111100 0b01111100 0b10111100 0b11111110 0b11111110 0b01111110 0b10000010 0b11111110 0b01111110 0b10000010 0b01000000 0b10000010 0b10000010 0b01111100 0b10111100 0b01111100 0b10111100 0b01111100 0b11111110 0b10000010 0b10000010 0b10000010 0b10000010 0b10000010 0b11111110 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
.byte 0b00101000 0b11000010 0b10000010 0b11000010 0b10000000 0b10000000 0b10000000 0b10000010 0b00010000 0b00001000 0b10000010 0b01000000 0b11000110 0b11000010 0b10000010 0b11000010 0b10000010 0b11000010 0b10000010 0b00010000 0b10000010 0b10000010 0b10000010 0b01000100 0b10000010 0b00000100 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
.byte 0b01000100 0b10000010 0b10000000 0b10000010 0b10000000 0b10000000 0b10000000 0b10000010 0b00010000 0b00001000 0b10000100 0b01000000 0b10101010 0b10100010 0b10000010 0b10000010 0b10000010 0b10000010 0b10000000 0b00010000 0b10000010 0b10000010 0b10000010 0b00101000 0b01000100 0b00001000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
.byte 0b10000010 0b11111100 0b10000000 0b10000010 0b11111110 0b11111110 0b10001110 0b11111110 0b00010000 0b00001000 0b11111100 0b01000000 0b10010010 0b10010010 0b10000010 0b11000010 0b10000010 0b11000010 0b01111100 0b00010000 0b10000010 0b10000010 0b10010010 0b00010000 0b00101000 0b00010000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
.byte 0b11111110 0b10000010 0b10000000 0b10000010 0b10000000 0b10000000 0b10000010 0b10000010 0b00010000 0b00001000 0b10000010 0b01000000 0b10000010 0b10001010 0b10000010 0b10111100 0b10001010 0b10111100 0b00000010 0b00010000 0b10000010 0b01000100 0b10010010 0b00101000 0b00010000 0b00100000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
.byte 0b10000010 0b11000010 0b10000010 0b11000010 0b10000000 0b10000000 0b11000010 0b10000010 0b00010000 0b10001000 0b10000010 0b01000000 0b10000010 0b10000110 0b10000010 0b10000000 0b10000100 0b10000100 0b10000010 0b00010000 0b10000010 0b00101000 0b10101010 0b01000100 0b00010000 0b01000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
.byte 0b10000010 0b10111100 0b01111100 0b10111100 0b11111110 0b10000000 0b10111110 0b10000010 0b11111110 0b01110000 0b10000010 0b01111100 0b10000010 0b10000010 0b01111100 0b10000000 0b01111010 0b10000010 0b01111100 0b00010000 0b01111100 0b00010000 0b01000100 0b10000010 0b00010000 0b11111110 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
.byte 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000 0b00000000
my_str:
.string " HELLO WORLD"
me_str:
.string " MADE BY ZACH MAHAN"
date_str_0:
.string " ON JUNE SIXTEENTH"
date_str_1:
.string " TWENTY TWENTY FIVE"
.text
.const FB_LOC = 0
.const LINE_SIZE = 256
.const TILE_MAP_SIZE = 256
.const LINE_WIDTH_B = 32
start:
lea s0, FB_LOC
romcpy s0, fb_data_start, TILE_MAP_SIZE #print out char tile set
add s0, s0, LINE_SIZE
mov a0, 2 # line
mov a1, 0 # index
mov a2, my_str
call blit_strl_rom #blitting a str
mov a0, 4 # line
mov a1, 0 # index
mov a2, me_str
call blit_strl_rom #blitting a str
mov a0, 5 # line
mov a1, 0 # index
mov a2, date_str_0
call blit_strl_rom #blitting a str
mov a0, 6 # line
mov a1, 0 # index
mov a2, date_str_1
call blit_strl_rom #blitting a str
call inf_loop # stall
hlt
inf_loop:
jmp inf_loop
ret
blit_cl:
#a0 = line, a1 = index, a2 = desired char
mult t0, a0, LINE_SIZE # set line
add t0, t0, a1 # set index
sub t1, a2, 65 # get char index in rom, also ' ' works essentially by coincidence since that region of ROM is empty
add t1, t1, fb_data_start
clr t2 # cnt
bcl_loop:
lbrom t3, t1 #load byte from rom in t3
sb t0, t3 #store byte in t3 into addr @ t0
add t0, t0, LINE_WIDTH_B # t0 += 32
add t1, t1, LINE_WIDTH_B # t1 += 32
inc t2 # t2++
ult bcl_loop, t2, 8 # check cnt
ret
blit_strl_rom:
#a0 = line, a1 = index, a2 = char*
mov t5, a2 # current char ptr
bstrl_rom_loop:
lbrom a2, t5 # a2 <- *char
eq bstrl_ret, a2, 0
call blit_cl #reuse a0 & a1
inc a1
inc t5
jmp bstrl_rom_loop
bstrl_ret:
ret