A Static Code Analyzer & Mini Compiler Pipeline for a simplified C-like language, built entirely in C++17.
SemantiC takes a source file and runs it through 8 compiler stages — Lexing, Parsing, Semantic Analysis, IR Generation, Static Analysis, Optimization, Dependency Analysis, and Parallelism/Vectorization — printing the output of every stage so you can see exactly what happens inside a compiler.
- Lexical Analysis — Tokenizes source code into keywords, identifiers, literals, and operators
- Recursive Descent Parser — Builds an Abstract Syntax Tree (AST) from an LL(1) grammar
- Semantic Analysis — Type checking, scope validation, and error/warning diagnostics
- Symbol Table — Scoped variable/parameter/function tracking with shadowing support
- 3-Address IR Generation — Lowers AST to intermediate representation with temporaries and labels
- Static Analysis — Detects unused variables, unreachable code, dead code, and redundant assignments
- IR Optimization — Constant folding, constant propagation, common subexpression elimination (CSE), and dead code elimination (DCE)
- Data Dependency Analysis — Identifies RAW, WAR, and WAW dependencies between instructions
- Parallelism Suggestions — Finds instruction pairs that can safely execute in parallel
- Vectorization Hints — Analyzes
forloops for SIMD-friendliness
SemantiC analyzes a C-like language supporting:
| Category | Features |
|---|---|
| Types | int, float, void |
| Declarations | Global/local variables, functions with parameters |
| Control Flow | if/else, while, for, break, continue, return |
| Operators | + - * / % == != < > <= >= && || ! = |
| Other | Function calls, nested expressions, // and /* */ comments |
Requirements: C++17 compiler, CMake 3.16+
mkdir build
cd build
cmake ..
cmake --build .On Windows with MSVC:
mkdir build
cd build
cmake ..
cmake --build . --config Release# Analyze a source file
./semantic tests/sample.sc
# Read from stdin
cat tests/test_1.c | ./semantic========== TOKENS ==========
line 1, col 1 int 'int'
line 1, col 5 Identifier 'g'
line 1, col 7 = '='
line 1, col 9 IntLiteral '1'
...
========== PARSE / AST ==========
Program
GlobalVar g : int
IntLiteral 1
Fun main -> int
Block
VarDecl i : int
...
========== SEMANTICS ==========
--- Symbol table (functions) ---
add(int, int) -> int @ line 4, col 1
main() -> int @ line 10, col 1
...
========== IR (3-address) ==========
main:
; function main
i = 0
t0 = 2 * 3
j = t0
...
========== STATIC ANALYSIS ==========
[unused] variable 'unused_global' is never read
[redundant] redundant assignment to 'k' overwritten ...
========== OPTIMIZATION ==========
IR instructions: 35 -> 28
i = 0
j = 6
...
========== DATA DEPENDENCIES ==========
RAW 'i': instr 3 -> 5
WAW 'k': instr 6 -> 7
...
========== PARALLELISM ==========
May run in parallel: IR 3 and IR 5 — no recorded data dependence
========== VECTORIZATION ==========
line 24, col 5 (loop var 'z'): loop appears SIMD-friendly
Source Code (.c / .sc)
│
▼
1. Lexer ──────────── Breaks code into tokens
│
▼
2. Parser ─────────── Builds Abstract Syntax Tree (AST)
│
▼
3. Semantic Analyzer ─ Type checking, scope validation
│
▼
4. IR Generator ───── Emits 3-address intermediate code
│
├──▶ 5. Static Analysis ─── Finds bugs (unused vars, dead code)
│
▼
6. Optimizer ──────── Constant folding, CSE, DCE
│
▼
7. Dependency Analysis ── RAW / WAR / WAW detection
│
▼
8. Parallelism & Vectorization ── Execution hints
SemantiC/
├── include/ # Header files (.hpp)
│ ├── token.hpp Token types & Token struct
│ ├── lexer.hpp Lexer API
│ ├── location.hpp Source location tracking
│ ├── ast.hpp AST node definitions
│ ├── parser.hpp Parser class & grammar
│ ├── symbol_table.hpp Symbol table
│ ├── semantic.hpp Semantic analyzer
│ ├── ir.hpp IR instructions & generator
│ ├── optimizer.hpp IR optimizer
│ ├── static_analysis.hpp Static analysis
│ ├── dependency.hpp Dependency analysis
│ ├── parallel.hpp Parallelism suggestions
│ └── vectorize.hpp Vectorization hints
│
├── src/ # Source files (.cpp)
│ ├── main.cpp Entry point (runs all stages)
│ ├── lexer.cpp Lexer implementation
│ ├── parser.cpp Parser implementation
│ ├── ast_print.cpp AST pretty printer
│ ├── symbol_table.cpp Symbol table operations
│ ├── semantic.cpp Semantic checks
│ ├── ir.cpp IR code generation
│ ├── optimizer.cpp Optimization passes
│ ├── static_analysis.cpp Bug detection
│ ├── dependency.cpp RAW/WAR/WAW analysis
│ ├── parallel_analysis.cpp Parallel execution hints
│ └── vector_analysis.cpp SIMD vectorization analysis
│
├── tests/ # Sample test programs
│ ├── sample.sc Full-featured sample
│ ├── test_1.c Error detection test
│ └── test_2.c Simple test
│
├── docs/ # Detailed documentation
│ ├── 01_Project_Overview.md
│ ├── 02_Lexical_Analysis.md
│ ├── 03_Parsing_and_AST.md
│ ├── 04_Symbol_Table.md
│ ├── 05_Semantic_Analysis.md
│ ├── 06_IR_Generation.md
│ ├── 07_Static_Analysis.md
│ ├── 08_Optimization.md
│ ├── 09_Dependency_Analysis.md
│ ├── 10_Parallelism_Analysis.md
│ ├── 11_Vectorization_Analysis.md
│ ├── 12_Main_Pipeline.md
│ └── 13_Test_Files.md
│
├── CMakeLists.txt # Build configuration
├── LICENSE # MIT License
└── README.md # This file
Detailed documentation for each compiler stage is available in the docs/ folder. Each file explains the theory, data structures, algorithms, and code for one topic.
This project is licensed under the MIT License — see LICENSE for details.
Author: khajan_bhatt