Skip to content

SaifoSaeed/5-Stage-RISC-V-Processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Five-Stage Pipelined RISC-V Processor

This repository contains the design and implementation of a high-performance, five-stage pipelined RISC-V processor. This project was developed for the Computer Architecture course.

The processor architecture is engineered to maximize instruction throughput by incorporating advanced features such as a 2-bit dynamic branch predictor, separate instruction and data caches, and a comprehensive hazard detection and forwarding unit.

Project Overview

This project presents the design, simulation, and verification of a 5-stage pipelined RISC-V processor. The core of the architecture divides instruction execution into five distinct stages: Instruction Fetch (IF), Instruction Decode (ID), Execution (EXE), Memory (MEM), and Write-back (WB). This pipelined approach allows for the parallel execution of instructions, leading to a significant performance increase over a single-cycle design.

The processor's functional correctness was validated against diverse workloads, confirming its effectiveness in handling arithmetic operations, memory accesses, and complex control flow.

Key Features

  • Five-Stage Pipeline: Decomposes instruction execution into IF, ID, EXE, MEM, and WB stages to improve throughput.
  • Dual Cache System: Implements separate caches for Instruction Memory (IM) and Data Memory (DM) to reduce memory access latency.
    • 4KB Direct-Mapped Instruction Cache.
    • 1KB Direct-Mapped Data Cache.
  • 2-Bit Branch Predictor: A dynamic branch predictor with a saturating counter is used to minimize control hazards and reduce the penalty of branch mispredictions to a single cycle.
  • Hazard Detection & Forwarding Unit: A sophisticated hazard detection unit works with a forwarding unit to resolve data hazards (including ALU-ALU and Load-Use dependencies), minimizing pipeline stalls and maintaining high throughput.

Processor Architecture

The top-level module encapsulates all components, allowing them to interact within a structured environment. The diagram below illustrates the complete datapath of the five-stage pipelined processor.


Design and Implementation

The processor is meticulously engineered with distinct modules for each pipeline stage and supporting units.

1. Instruction Fetch (IF)

The IF stage fetches instructions from a direct-mapped instruction cache. In the case of a cache miss, the instruction is retrieved from the main instruction memory. The stage also works with a 2-bit branch predictor to predict the next PC address for branching instructions, saving a cycle on correct predictions with no loss on mispredictions.

2. Instruction Decode (ID)

The ID stage decodes instructions to generate control signals and operands. It houses the PC Controller, which manages branch predictions and jump instructions, triggering pipeline flushes in the event of a misprediction. This stage also reads from the Register File and generates immediate values for subsequent stages.

3. Execution (EXE)

This stage performs arithmetic and logic operations using the ALU. To handle data dependencies, it features a comprehensive hazard detection unit and a forwarding unit, which work together to provide the most recent register data to the ALU. Detection is performed by examining the IF and ID stages, while forwarding decisions are made by checking the EXE/MEM and MEM/WB registers.

4. Memory Access (MEM)

The MEM stage handles all data memory operations, including loads and stores. It contains a direct-mapped data memory cache that utilizes a write-through policy to maintain consistency with the main memory layer. A single cache miss can cost hundreds of cycles, making this stage's efficiency critical to overall performance.

5. Write-Back (WB)

The final stage writes the results of operations back into the register file. While simple, this stage leaves room for future expansion with features like dynamic instruction scheduling or exception handling.


Performance Analysis

Compared to a non-pipelined single-cycle processor, this design offers a significant improvement in execution speed. Ideally, a five-stage pipeline provides a speed-up of 5x. However, performance is constrained by data hazards and branch mispredictions that hinder ideal performance.

A more realistic performance analysis based on simulation yields an improvement of 3.04x over a non-pipelined architecture.

Improvement = (8193 * 50 ps) / (13491 * 10 ps) = 3.04 This calculation demonstrates that while not achieving the ideal five-fold improvement, the pipelined design is substantially faster.

Testing and Validation

The processor's functionality was rigorously tested using four different benchmarks that cover the required ISA and handle numerous edge cases. The output was verified against the RARS simulator.

The testbenches included:

  • Arithmetic and Logical Operations: To verify ALU functionality and the processing of non-dependent instructions.
  • Logical Shifting: To ensure the validity of shift instructions and the EXE stage as a whole.
  • Loads and Stores: Testing all combinations of logical and arithmetic instructions to validate memory access and cache functionality.
  • Loops and Branches: A loop test was used to validate the branch handling and prediction mechanism, which passed successfully under different initial states of the saturating counter.

About

Simple 5-stage RISC-V CPU designed with 2-bit branch prediction and a dual-cache system submitted as part of the Computer Architecture course.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors