A high-performance, concurrent implementation of the Unix wc utility written in Go. wcGo processes files and streams with goroutines, efficient chunked I/O, and proper UTF-8 handling to deliver fast, accurate word, line, byte, and character counts.
- Streaming Processing: Processes files in 32KB chunks, enabling memory-efficient handling of arbitrarily large files without loading them entirely into memory
- Concurrent Processing: Leverages goroutines to process multiple files simultaneously, distributing I/O and computation across available system resources
- Correct UTF-8 Handling: Properly decodes and counts multi-byte UTF-8 characters (runes) with full Unicode support, including emoji and international text
- Safe Rune Decoding Across Chunk Boundaries: Intelligently handles incomplete UTF-8 sequences at chunk edges, carrying over partial runes to the next chunk to prevent character corruption
- stdin Support: Read from standard input when no files are specified, making wcGo compatible with Unix pipes and shell redirection
- Full POSIX Compatibility: Supports all standard flags (
-l,-w,-c,-m) and produces identical output to GNU wc
Ensure you have Go 1.21 or later installed.
git clone https://github.com/brickster241/wc-Go.git
cd wc-Go
go build -o wcGo ./cmdThe binary will be created in the current directory.
Make wcGo globally accessible from anywhere:
# Option 1: Copy to a system directory (requires sudo)
sudo cp wcGo /usr/local/bin/
# Option 2: Add the build directory to your PATH
export PATH="$PATH:/path/to/wc-Go"
echo 'export PATH="$PATH:/path/to/wc-Go"' >> ~/.zshrc # for zshwcGo --helpwcGo replicates the behavior of the standard Unix wc utility with identical command-line syntax and output formatting.
Count lines, words, and bytes in a file (default behavior):
wcGo file.txtOutput example:
10 50 312 file.txt
This shows 10 lines, 50 words, and 312 bytes.
Count only lines:
wcGo -l file.txtCount only words:
wcGo -w file.txtCount only bytes:
wcGo -c file.txtCount only characters (runes):
wcGo -m file.txtCombine multiple flags to show specific metrics:
wcGo -l -w -c file.txtProcess multiple files concurrently; wcGo automatically parallelizes across goroutines:
wcGo file1.txt file2.txt file3.txtOutput will include counts for each file and a totals line:
10 50 312 file1.txt
15 75 425 file2.txt
8 40 210 file3.txt
33 165 947 total
Use wcGo with pipes and input redirection:
cat file.txt | wcGowcGo < file.txtecho "hello world" | wcGo -w| Flag | Description |
|---|---|
-l |
Count lines (number of newline characters) |
-w |
Count words (contiguous sequences of non-whitespace characters) |
-c |
Count bytes (total size in bytes, not characters) |
-m |
Count characters (Unicode runes, accounting for multi-byte characters) |
Default Behavior: When no flags are specified, wcGo outputs lines, words, and bytes (equivalent to -l -w -c).
wcGo does not load entire files into memory. Instead, it:
- Reads files in 32KB chunks using a buffered reader
- Processes each chunk independently to compute word, line, byte, and character counts
- Aggregates counts across chunks
- This allows processing of arbitrarily large files with constant memory usage
UTF-8 is a variable-length encoding where characters can span 1–4 bytes. When a chunk boundary occurs mid-rune, wcGo:
- Attempts to decode each byte sequence using Go's
utf8.DecodeRune() - When a rune is incomplete (fewer bytes than needed for a full character), it returns
utf8.RuneError - wcGo carries over incomplete bytes to the next chunk, prepending them before processing
- This ensures that multi-byte Unicode characters spanning chunk boundaries are correctly counted without loss or corruption
When multiple files are provided:
- Each file is opened and processed in a separate goroutine
- A buffered error channel coordinates completion across goroutines
- Results are aggregated and printed in order with a total line
- Goroutines are not started sequentially; they all launch immediately and execute in parallel
This design maximizes throughput on multi-core systems by avoiding synchronous file I/O bottlenecks.
wcGo includes comprehensive tests that verify correctness against GNU wc.
Sample test files are located in the testdata/ directory:
empty.txt– Empty file (validates edge case handling)simple.txt– Simple ASCII textmultiline.txt– Multiple lines of varying lengthsnospace.txt– Text without spaces (word boundary testing)unicode.txt– Mix of Unicode characters, emoji, and international textbase_test.txt– Baseline test file
Build wcGo first:
go build -o wcGo ./cmdRun the test suite:
cd tests
go test -vThis will:
- Process each test file with various flag combinations
- Compare wcGo output byte-for-byte with GNU wc
- Report any mismatches
To manually verify wcGo against wc:
wc testdata/unicode.txt
wcGo testdata/unicode.txtThe output should be identical.
The test suite includes a TestLargeRandomFile that:
- Generates a large random file (500 MB in size) with mixed content (ASCII, digits, punctuation, emoji)
- Runs wcGo and wc on this file
- Compares output to ensure correctness on real-world large data
- Automatically cleans up the temporary file after testing
This verifies that wcGo maintains accuracy and efficiency with large datasets.
- Parallel Chunk Processing: Process chunks of a single large file in parallel (currently sequential per file)
- Performance Metrics: Built-in benchmarking and timing output
- Additional Output Formats: JSON, CSV, or TSV output options
- Recursive Directory Processing:
-Rflag to recursively count all files in a directory tree - Filtering and Exclusion: Pattern-based file inclusion/exclusion for batch processing
- Streaming Statistics: Real-time progress indicators for very large files