This guide explains how to install, run, and extend the static document security scanner.
- OS: Linux, macOS, or Windows
- Go: version 1.18 or newer
Check your Go installation:
go versionFrom the repository root:
.
├── docscanner/
│ ├── cmd/
│ │ └── scanner/
│ │ └── main.go # CLI entrypoint
│ ├── internal/
│ │ ├── analyzer/ # Pluggable analyzers
│ │ │ ├── analyzer.go # Analyzer interface
│ │ │ ├── pdf.go # PDFAnalyzer
│ │ │ └── word.go # WordAnalyzer
│ │ ├── scanner/
│ │ │ ├── walker.go # Directory walker
│ │ │ └── workerpool.go # Worker pool
│ │ └── model/
│ │ └── result.go # ScanResult model
│ └── go.mod # Go module definition
├── samples/ # Example documents to scan
└── Guide.md # This guide
From the docscanner directory:
cd docscanner
# If go.mod does not exist yet, create it (run once):
go mod init docscanner
# Download and tidy dependencies:
go mod tidyIf go.mod already exists, you only need go mod tidy.
You can scan any directory on your system. Two common options:
-
Use the provided
samples/directory in the repo root:- Put
.pdf,.docx, and.docmfiles insamples/(and its subfolders). - Example path:
/home/<user>/Projects/static-document-security-scanner/samples
- Put
-
Use your own folder anywhere, for example:
~/Documents/docscanner-input
The scanner just needs the directory path.
Always run the CLI from inside docscanner/.
From the repository root:
cd docscanner
go run ./cmd/scanner ../samplesReplace <directory-to-scan> with an absolute or relative path:
cd docscanner
go run ./cmd/scanner <directory-to-scan>Examples:
# Scan the current directory
cd docscanner
go run ./cmd/scanner .
# Scan your Documents folder
cd docscanner
go run ./cmd/scanner ~/DocumentsBy default, results are printed as JSON to the terminal (stdout).
To save them to a file:
cd docscanner
go run ./cmd/scanner ../samples > results.jsonOpen results.json in your editor to inspect the output.
The scanner prints an array of ScanResult objects. Each object has:
{
"file_path": "../samples/Hotel_Management_Report.docx",
"file_type": "word",
"sha256": "<sha256 hash of the file>",
"indicators": ["..."],
"risk_score": 0
}file_path– Path of the scanned file.file_type– Logical type of the document (e.g.,word,pdf).sha256– SHA‑256 hash of the file contents.indicators– List of strings describing what was found (e.g., suspicious features).risk_score– Simple numeric score based on indicators (higher means more suspicious).
Implemented in internal/analyzer/word.go:
- Detects presence of
vbaProject.binwithin the OOXML ZIP structure. - If found, adds an indicator like
"Embedded VBA Macro (vbaProject.bin)"and assigns a higherrisk_score.
Implemented in internal/analyzer/pdf.go:
- Looks for suspicious markers such as:
/JavaScript,/JS,/Launch,/OpenAction,/AA,/EmbeddedFile.
- Each matched indicator increases
risk_score.
You can add support for new document types by implementing the Analyzer interface.
Defined in internal/analyzer/analyzer.go:
type Analyzer interface {
Supports(filename string) bool
Analyze(filepath string, data []byte) (*model.ScanResult, error)
}- Create a new analyzer file in
internal/analyzer/, for exampleexcel.go. - Implement
Supportsto match your file extensions (e.g.,.xlsx,.xlsm). - Implement
Analyzeto inspectdataand return aScanResult. - Register the analyzer in
cmd/scanner/main.goby adding it to theanalyzersslice:
analyzers := []analyzer.Analyzer{
&analyzer.WordAnalyzer{},
&analyzer.PDFAnalyzer{},
&analyzer.ExcelAnalyzer{}, // new
}No changes are needed in the walker or worker pool – they automatically use the new analyzer.
- Go command not found – Install Go and ensure
gois on yourPATH. - Import path errors – Make sure you ran
go mod init docscanneronce insidedocscanner/. - Permission denied reading files – Run the scanner on directories your user can read.
- No results appear – Verify the directory actually contains
.pdf,.docx, or.docmfiles.
- Add more analyzers for other document types (e.g., Excel, PowerPoint).
- Enhance
risk_scoreto consider file size, number of indicators, or custom rules. - Integrate this CLI into a larger pipeline or CI step for automated document checks.