Skip to content

Khajan38/SemantiC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SemantiC

A Static Code Analyzer & Mini Compiler Pipeline for a simplified C-like language, built entirely in C++17.

SemantiC takes a source file and runs it through 8 compiler stages — Lexing, Parsing, Semantic Analysis, IR Generation, Static Analysis, Optimization, Dependency Analysis, and Parallelism/Vectorization — printing the output of every stage so you can see exactly what happens inside a compiler.


Features

  • Lexical Analysis — Tokenizes source code into keywords, identifiers, literals, and operators
  • Recursive Descent Parser — Builds an Abstract Syntax Tree (AST) from an LL(1) grammar
  • Semantic Analysis — Type checking, scope validation, and error/warning diagnostics
  • Symbol Table — Scoped variable/parameter/function tracking with shadowing support
  • 3-Address IR Generation — Lowers AST to intermediate representation with temporaries and labels
  • Static Analysis — Detects unused variables, unreachable code, dead code, and redundant assignments
  • IR Optimization — Constant folding, constant propagation, common subexpression elimination (CSE), and dead code elimination (DCE)
  • Data Dependency Analysis — Identifies RAW, WAR, and WAW dependencies between instructions
  • Parallelism Suggestions — Finds instruction pairs that can safely execute in parallel
  • Vectorization Hints — Analyzes for loops for SIMD-friendliness

Supported Language

SemantiC analyzes a C-like language supporting:

Category Features
Types int, float, void
Declarations Global/local variables, functions with parameters
Control Flow if/else, while, for, break, continue, return
Operators + - * / % == != < > <= >= && || ! =
Other Function calls, nested expressions, // and /* */ comments

Build

Requirements: C++17 compiler, CMake 3.16+

mkdir build
cd build
cmake ..
cmake --build .

On Windows with MSVC:

mkdir build
cd build
cmake ..
cmake --build . --config Release

Usage

# Analyze a source file
./semantic tests/sample.sc

# Read from stdin
cat tests/test_1.c | ./semantic

Sample Output Stages

========== TOKENS ==========
line 1, col 1 int 'int'
line 1, col 5 Identifier 'g'
line 1, col 7 = '='
line 1, col 9 IntLiteral '1'
...

========== PARSE / AST ==========
Program
  GlobalVar g : int
    IntLiteral 1
  Fun main -> int
    Block
      VarDecl i : int
      ...

========== SEMANTICS ==========
--- Symbol table (functions) ---
add(int, int) -> int @ line 4, col 1
main() -> int @ line 10, col 1
...

========== IR (3-address) ==========
main:
; function main
  i = 0
  t0 = 2 * 3
  j = t0
  ...

========== STATIC ANALYSIS ==========
[unused] variable 'unused_global' is never read
[redundant] redundant assignment to 'k' overwritten ...

========== OPTIMIZATION ==========
IR instructions: 35 -> 28
  i = 0
  j = 6
  ...

========== DATA DEPENDENCIES ==========
RAW 'i': instr 3 -> 5
WAW 'k': instr 6 -> 7
...

========== PARALLELISM ==========
May run in parallel: IR 3 and IR 5 — no recorded data dependence

========== VECTORIZATION ==========
line 24, col 5 (loop var 'z'): loop appears SIMD-friendly

Pipeline Architecture

Source Code (.c / .sc)
       │
       ▼
  1. Lexer ──────────── Breaks code into tokens
       │
       ▼
  2. Parser ─────────── Builds Abstract Syntax Tree (AST)
       │
       ▼
  3. Semantic Analyzer ─ Type checking, scope validation
       │
       ▼
  4. IR Generator ───── Emits 3-address intermediate code
       │
       ├──▶ 5. Static Analysis ─── Finds bugs (unused vars, dead code)
       │
       ▼
  6. Optimizer ──────── Constant folding, CSE, DCE
       │
       ▼
  7. Dependency Analysis ── RAW / WAR / WAW detection
       │
       ▼
  8. Parallelism & Vectorization ── Execution hints

Project Structure

SemantiC/
├── include/                    # Header files (.hpp)
│   ├── token.hpp                  Token types & Token struct
│   ├── lexer.hpp                  Lexer API
│   ├── location.hpp               Source location tracking
│   ├── ast.hpp                    AST node definitions
│   ├── parser.hpp                 Parser class & grammar
│   ├── symbol_table.hpp           Symbol table
│   ├── semantic.hpp               Semantic analyzer
│   ├── ir.hpp                     IR instructions & generator
│   ├── optimizer.hpp              IR optimizer
│   ├── static_analysis.hpp        Static analysis
│   ├── dependency.hpp             Dependency analysis
│   ├── parallel.hpp               Parallelism suggestions
│   └── vectorize.hpp              Vectorization hints
│
├── src/                        # Source files (.cpp)
│   ├── main.cpp                   Entry point (runs all stages)
│   ├── lexer.cpp                  Lexer implementation
│   ├── parser.cpp                 Parser implementation
│   ├── ast_print.cpp              AST pretty printer
│   ├── symbol_table.cpp           Symbol table operations
│   ├── semantic.cpp               Semantic checks
│   ├── ir.cpp                     IR code generation
│   ├── optimizer.cpp              Optimization passes
│   ├── static_analysis.cpp        Bug detection
│   ├── dependency.cpp             RAW/WAR/WAW analysis
│   ├── parallel_analysis.cpp      Parallel execution hints
│   └── vector_analysis.cpp        SIMD vectorization analysis
│
├── tests/                      # Sample test programs
│   ├── sample.sc                  Full-featured sample
│   ├── test_1.c                   Error detection test
│   └── test_2.c                   Simple test
│
├── docs/                       # Detailed documentation
│   ├── 01_Project_Overview.md
│   ├── 02_Lexical_Analysis.md
│   ├── 03_Parsing_and_AST.md
│   ├── 04_Symbol_Table.md
│   ├── 05_Semantic_Analysis.md
│   ├── 06_IR_Generation.md
│   ├── 07_Static_Analysis.md
│   ├── 08_Optimization.md
│   ├── 09_Dependency_Analysis.md
│   ├── 10_Parallelism_Analysis.md
│   ├── 11_Vectorization_Analysis.md
│   ├── 12_Main_Pipeline.md
│   └── 13_Test_Files.md
│
├── CMakeLists.txt              # Build configuration
├── LICENSE                     # MIT License
└── README.md                   # This file

Documentation

Detailed documentation for each compiler stage is available in the docs/ folder. Each file explains the theory, data structures, algorithms, and code for one topic.


License

This project is licensed under the MIT License — see LICENSE for details.

Author: khajan_bhatt

About

Static Code Analyzer for C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors