OpenFloat

Code Generators for Floating-Point Unit Design in Integrated Circuits

OpenFloat is a parameterized floating-point unit (FPU) design generator developed using Chisel, a hardware construction language embedded in Scala. It generates synthesizable RTL for floating-point arithmetic operations targeting FPGA and ASIC implementations.

Overview

OpenFloat provides a library of highly configurable floating-point arithmetic modules that support multiple IEEE 754 precision formats. All modules are parameterized by a FloatingPointFormat trait, which defines the exponent and mantissa widths. This allows the same module to be instantiated for any standard format (FP16, BF16, FP32, FP64, FP128) or a custom format.

Project Structure

OpenFloat/
├── src/main/scala/
│   ├── FloatingPoint/          # Core FPU modules
│   │   ├── FloatingPointFormat.scala # Format definitions and FPModule base class
│   │   └── fpu.scala           # All floating-point operation implementations
│   ├── Primitives/             # Low-level building blocks
│   │   ├── primitives.scala    # Arithmetic primitives (adders, dividers, CORDIC, etc.)
│   │   └── convert.scala       # IEEE 754 format conversion utilities
│   ├── Generate/               # RTL generation utilities
│   │   └── generate.scala      # SystemVerilog output generator
│   └── TB/                     # Test benches
│       └── testbench.scala     # Verification test suites
├── build.sbt                   # SBT build configuration
├── LICENSE                     # BSD-style license
└── README.md                   # This file

Supported Floating-Point Formats

OpenFloat supports standard IEEE 754 formats, BFloat16, and custom formats via the FloatingPointFormat trait.

Format	Object	Total Bits	Sign	Exponent	Mantissa	Bias
Half	`FP16`	16	1	5	10	15
BFloat16	`BF16`	16	1	8	7	127
Single	`FP32`	32	1	8	23	127
Double	`FP64`	64	1	11	52	1023
Quad	`FP128`	128	1	15	112	16383

You can also define arbitrary custom formats using CustomFormat(exponent, mantissa).

Floating-Point Modules

Basic Arithmetic Operations

Module	Description	Parameters
`FP_add`	Floating-point addition	`FORMAT`: FloatingPointFormat, `latency`: pipeline depth (any value >= 1)
`FP_mult`	Floating-point multiplication	`FORMAT`: FloatingPointFormat, `latency`: pipeline depth (any value >= 1)
`FP_div`	Digit-recurrence division	`FORMAT`: FloatingPointFormat, `L`: iterations, `latency`: pipeline stages
`FP_sqrt`	Digit-recurrence square root	`FORMAT`: FloatingPointFormat, `L`: iterations, `latency`: pipeline stages

Transcendental Functions

Module	Description	Parameters
`FP_cos`	Cosine and Sine (CORDIC-based)	`FORMAT`: FloatingPointFormat, `iters`: CORDIC iterations
`FP_atan`	Arctangent (CORDIC-based)	`FORMAT`: FloatingPointFormat, `iters`: CORDIC iterations
`FP_exp`	Exponential function (e^x)	`FORMAT`: FloatingPointFormat

Utility Modules

Module	Description	Parameters
`FP_acc`	Floating-point accumulator	`FORMAT`: FloatingPointFormat, `iters`: accumulation count, `ExpExp`, `ExpMSB`, `LSB`
`FP_floor`	Floor function	`FORMAT`: FloatingPointFormat
`FloatTOFixed`	Float to fixed-point conversion	`FORMAT`: FloatingPointFormat, `ibits`: integer bits, `fbits`: fractional bits
`FixedTOFloat`	Fixed-point to float conversion	`FORMAT`: FloatingPointFormat, `ibits`: integer bits, `fbits`: fractional bits

Pipeline Depth

FP_add and FP_mult accept any latency >= 1. Pipeline registers are automatically distributed across 10 internal stage boundaries using the same pipe_skip/pipe_map algorithm used by the digit-recurrence primitives (divider, frac_sqrt, cordic). Higher latency values improve timing at the cost of throughput latency; values above 10 stack additional registers at evenly-spaced boundaries.

Primitive Modules

The Primitives package provides low-level building blocks:

Module	Description
`LZC`	Leading Zero Counter with tree-based reduction
`full_adder`	Parameterized width adder with carry
`full_subtractor`	Parameterized width subtractor with borrow
`multiplier`	Basic integer multiplication
`divider`	Digit-recurrence integer divider (pipelined)
`frac_sqrt`	Fractional square root for normalized numbers
`cordic`	Fixed-point CORDIC processor (rotation/vectoring modes)
`ucordic`	Universal CORDIC (circular, linear, hyperbolic modes)
`cos`, `atan`, `exp`	Fixed-point trigonometric/exponential wrappers

Software Conversion Utilities

The convert object provides Scala-side IEEE 754 conversion functions for testbench use:

Function	Description
`convert_string_to_IEEE_754(str, fmt)`	Converts a decimal string to an IEEE 754 bit pattern for any `FloatingPointFormat`
`convert_IEEE754_to_Decimal(num, fmt)`	Converts an IEEE 754 bit pattern back to a `BigDecimal` value

These accept any FloatingPointFormat (including BF16, CustomFormat, etc.), making it easy to generate test vectors and examine outputs for any format.

Interface Specification

All modules implement a standard ready-valid handshaking protocol for flow control:

Input Interface

val in_ready = Output(Bool())  // Module can accept new input
val in_valid = Input(Bool())   // Input data is valid
val in_a     = Input(UInt())   // Input operand A
val in_b     = Input(UInt())   // Input operand B (for binary ops)

Output Interface

val out_ready = Input(Bool())  // Consumer ready to accept output
val out_valid = Output(Bool()) // Output data is valid
val out_s     = Output(UInt()) // Result

Handshake Protocol

A transaction occurs when both valid and ready are high on the same clock edge
in_ready indicates the module can accept new data
Backpressure propagates through module chains via the ready signals
The pipeline stalls when out_valid is high but out_ready is low

Example Usage

// Connect two modules in a chain
module_a.out_ready := module_b.in_ready
module_b.in_valid  := module_a.out_valid
module_b.in_a      := module_a.out_s

Usage

Prerequisites

Scala: 2.13.12
SBT: 1.7.2 or later
Chisel: 6.0.0 (managed by SBT)
Verilator: For simulation (optional)

Generating Verilog

To generate SystemVerilog RTL for a specific module, modify src/main/scala/Generate/generate.scala:

object generate extends App {
  private def genVerilog(mod: => RawModule): Unit = {
    val gen: () => RawModule = () => mod
    (new ChiselStage).execute(
      Array("--target", "systemverilog"),
      Seq(ChiselGeneratorAnnotation(gen),
        FirtoolOption("--disable-all-randomization"),
        FirtoolOption("-strip-debug-info"),
        FirtoolOption("--disable-annotation-unknown")
      ),
    )
  }

  // Generate desired module
  // Import the formats first: import FloatingPoint.{FP32, FP64, BF16}
  genVerilog(new FP_mult(FP32, 7))  // 32-bit multiplier with 7-stage pipeline
}

Generated SystemVerilog files will be placed in the project root directory.

Module Instantiation Examples

Floating-Point Adder

import FloatingPoint._
import FloatingPoint.fpu._

// 32-bit adder with 7-stage pipeline
val adder = Module(new FP_add(FORMAT = FP32, latency = 7))
adder.io.out_ready := true.B
adder.io.in_valid := input_valid
adder.io.in_a := operand_a
adder.io.in_b := operand_b
val result = adder.io.out_s
val result_valid = adder.io.out_valid

Floating-Point Divider

// 32-bit divider with 23 iterations and 23-cycle latency
val divider = Module(new FP_div(FORMAT = FP32, L = 23, latency = 23))
divider.io.out_ready := downstream_ready
divider.io.in_valid := input_valid
divider.io.in_a := dividend
divider.io.in_b := divisor
val quotient = divider.io.out_s

CORDIC Cosine/Sine

// 32-bit cos/sin with 23 CORDIC iterations
val trig = Module(new FP_cos(FORMAT = FP32, iters = 23))
trig.io.out_ready := true.B
trig.io.in_valid := angle_valid
trig.io.in_angle := angle_ieee754
val cos_result = trig.io.out_cos
val sin_result = trig.io.out_sin

Algorithm Details

Division and Square Root

Both FP_div and FP_sqrt use digit-recurrence algorithms, computing one bit of the result per iteration. The L parameter controls the number of iterations (typically equal to mantissa width), while latency controls how iterations are distributed across pipeline stages.

CORDIC (COordinate Rotation DIgital Computer)

Trigonometric and hyperbolic functions use the CORDIC algorithm, which computes results through iterative shift-and-add operations. The ucordic module supports three modes:

Circular mode (mu = 1): cos, sin, atan
Linear mode (mu = 0): multiplication, division
Hyperbolic mode (mu = -1): sinh, cosh, atanh, exp, ln

Exponential (e^x)

FP_exp uses range reduction to decompose x / ln(2) = w + f into an integer part w and fractional part f. The fractional part is computed via a hyperbolic CORDIC engine (e^(f * ln(2)) = 2^f), while the integer part becomes an exponent bias adjustment. Constant multiplications by ln(2) and 1/ln(2) are implemented using Canonical Signed Digit (CSD) encoding for multiplierless shift-and-add operations.

Overflow/Underflow Handling

All modules implement saturation arithmetic:

Overflow: Result saturates to maximum representable value
Underflow: Result saturates to minimum normalized value

License

BSD 3-Clause License

See LICENSE for full terms.

Copyright Notice

Code Generators for Floating-Point Unit Design in Integrated Circuits (OpenFloat) Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and University of Houston-Clear Lake, All rights reserved.

If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.

This software was developed under funding from the U.S. Department of Energy.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.bsp		.bsp
.idea		.idea
project		project
src/main/scala		src/main/scala
.gitignore		.gitignore
FP_DOC_v1.pdf		FP_DOC_v1.pdf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenFloat

Overview

Project Structure

Supported Floating-Point Formats

Floating-Point Modules

Basic Arithmetic Operations

Transcendental Functions

Utility Modules

Pipeline Depth

Primitive Modules

Software Conversion Utilities

Interface Specification

Input Interface

Output Interface

Handshake Protocol

Example Usage

Usage

Prerequisites

Generating Verilog

Module Instantiation Examples

Floating-Point Adder

Floating-Point Divider

CORDIC Cosine/Sine

Algorithm Details

Division and Square Root

CORDIC (COordinate Rotation DIgital Computer)

Exponential (e^x)

Overflow/Underflow Handling

License

Copyright Notice

About

Uh oh!

Releases

Packages

Languages

License

VegaMario/OpenFloat

Folders and files

Latest commit

History

Repository files navigation

OpenFloat

Overview

Project Structure

Supported Floating-Point Formats

Floating-Point Modules

Basic Arithmetic Operations

Transcendental Functions

Utility Modules

Pipeline Depth

Primitive Modules

Software Conversion Utilities

Interface Specification

Input Interface

Output Interface

Handshake Protocol

Example Usage

Usage

Prerequisites

Generating Verilog

Module Instantiation Examples

Floating-Point Adder

Floating-Point Divider

CORDIC Cosine/Sine

Algorithm Details

Division and Square Root

CORDIC (COordinate Rotation DIgital Computer)

Exponential (e^x)

Overflow/Underflow Handling

License

Copyright Notice

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages