Skip to content

AShaw110/File-Compressor-and-decompressor

Repository files navigation

File Compression Engine

A lossless data compression engine written in C++ using LZ77 and Huffman Coding, built to test custom algorithmic performance against standard zlib.

Core Mechanics

  • Parallel Processing: Uses C++ std::async to compress multiple file chunks at the same time, writing the final output in the exact original order.

  • LZ77 Sliding Window: Uses a 32KB buffer to identify and replace duplicate byte sequences.

  • Fast String Matching: Uses a custom hash function and flat arrays to find matching text instantly in O(1) time.

  • Huffman Coding: Builds frequency trees for every 2MB block to compress common characters into fewer bits.

  • Data Integrity: Uses CRC32 checksums to guarantee the extracted file exactly matches the original.

Build & Run

Requires a C++17 compiler. Compile with -O3 to ensure hardware vectorization and cache locality.

g++ -O3 -march=native main.cpp compress.cpp decompress.cpp -o main.exe

To Compress:

./main.exe compress <file_path>

To Decompress:

./main.exe decompress <file_path>.bin

Benchmarks

Evaluated against 131 files (247.6 MB total), including the Silesia Corpus, text, PDFs, and high-entropy media (MP4). ZLIB tested at standard Level 6.

File Category C++ Space Saved ZLIB Space Saved C++ Mean Speed ZLIB Mean Speed
UNKNOWN (Silesia/Binaries) 63.99% 64.71% 0.37s 1.37s
.TXT (Source Code/Text) 61.71% 62.67% 0.07s 0.03s
.PDF (Documents) 17.32% 18.27% 0.05s 0.02s
.MP4 (High-Entropy) 13.22% 13.59% 0.15s 0.63s

Integrity Verification: 100% byte-for-byte accuracy verified across all 131 files after full compress/decompress cycles.

Testing Suite

Python test scripts are included to fetch datasets and verify bitstream integrity.

python fetch_datasets.py
python benchmark.py
python decompress_checker.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors