File Compression Engine

A lossless data compression engine written in C++ using LZ77 and Huffman Coding, built to test custom algorithmic performance against standard zlib.

Core Mechanics

Parallel Processing: Uses C++ std::async to compress multiple file chunks at the same time, writing the final output in the exact original order.
LZ77 Sliding Window: Uses a 32KB buffer to identify and replace duplicate byte sequences.
Fast String Matching: Uses a custom hash function and flat arrays to find matching text instantly in O(1) time.
Huffman Coding: Builds frequency trees for every 2MB block to compress common characters into fewer bits.
Data Integrity: Uses CRC32 checksums to guarantee the extracted file exactly matches the original.

Build & Run

Requires a C++17 compiler. Compile with -O3 to ensure hardware vectorization and cache locality.

g++ -O3 -march=native main.cpp compress.cpp decompress.cpp -o main.exe

To Compress:

./main.exe compress <file_path>

To Decompress:

./main.exe decompress <file_path>.bin

Benchmarks

Evaluated against 131 files (247.6 MB total), including the Silesia Corpus, text, PDFs, and high-entropy media (MP4). ZLIB tested at standard Level 6.

File Category	C++ Space Saved	ZLIB Space Saved	C++ Mean Speed	ZLIB Mean Speed
UNKNOWN (Silesia/Binaries)	63.99%	64.71%	0.37s	1.37s
.TXT (Source Code/Text)	61.71%	62.67%	0.07s	0.03s
.PDF (Documents)	17.32%	18.27%	0.05s	0.02s
.MP4 (High-Entropy)	13.22%	13.59%	0.15s	0.63s

Integrity Verification: 100% byte-for-byte accuracy verified across all 131 files after full compress/decompress cycles.

Testing Suite

Python test scripts are included to fetch datasets and verify bitstream integrity.

python fetch_datasets.py
python benchmark.py
python decompress_checker.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
benchmark.py		benchmark.py
compress.cpp		compress.cpp
crc32.h		crc32.h
decompress.cpp		decompress.cpp
decompress_checker.py		decompress_checker.py
fetch_datasets.py		fetch_datasets.py
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File Compression Engine

Core Mechanics

Build & Run

Benchmarks

Testing Suite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

File Compression Engine

Core Mechanics

Build & Run

Benchmarks

Testing Suite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages