A production-ready PowerShell-based runner for serving large language models from HuggingFace using llama.cpp and PM2 process management. This is the PowerShell equivalent of the original Bash script with enhanced Windows compatibility and PowerShell-native features.
- π Cross-Platform: Works on Windows, Linux, and macOS with PowerShell 5.1+
- π¦ HuggingFace Integration: Automatic model download with resume capability and smart file selection
- π PM2 Process Management: Production-ready process management with auto-restart and memory limits
- ποΈ Instance Lifecycle Management: Built-in start/stop/restart/delete commands with health checks
- πΎ GGUF Model Support: Optimized for GGUF quantized models with intelligent selection
- π Health Monitoring: Built-in health checks, startup validation, and monitoring
- π Enhanced Progress: PowerShell-native progress indicators and colored output
- βοΈ Configurable: Flexible configuration with parameter validation
- ποΈ Port Management: Automatic port allocation with conflict detection
- π‘οΈ Error Recovery: Robust error handling with cleanup and detailed troubleshooting
- π Debug Mode: Detailed debug output with
$env:DEBUG=1 - πΎ Disk Management: Space validation and automatic cleanup
# Install PM2 (Node.js process manager)
npm install -g pm2
# Install PowerShell (if not already installed)
# Windows: Pre-installed on Windows 10/11
# Linux/macOS: https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell
# Verify installations
pm2 --version
$PSVersionTable.PSVersionWindows:
# Using Chocolatey
choco install curl jq
# Using winget
winget install curl
winget install stedolan.jq
# Or download manually:
# curl: https://curl.se/windows/
# jq: https://stedolan.github.io/jq/download/Linux/macOS:
# Ubuntu/Debian
sudo apt update
sudo apt install curl jq
# macOS
brew install curl jqYou need to have llama.cpp built and available in your system PATH:
# Clone and build llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
# Build for CPU
make
# Build with CUDA support (optional)
make LLAMA_CUDA=1
# Build with OpenCL support (optional)
make LLAMA_OPENCL=1
# Make sure the binary is in your PATH
# Windows: Add to PATH environment variable
# Linux/macOS: sudo cp llama-server /usr/local/bin/-
Clone or download this runner:
git clone https://github.com/rizalwfh/llama-cpp-runner-bash.git llama-cpp-runner cd llama-cpp-runner
-
Run the interactive setup:
.\Runner.ps1
-
Follow the prompts:
- Enter HuggingFace model ID (e.g.,
microsoft/Phi-3-mini-4k-instruct) - Specify PM2 instance name (alphanumeric, hyphens, underscores only)
- Configure optional settings (port: 8080+, context size: 512-32768, threads: 1-CPU cores)
- Enter HuggingFace model ID (e.g.,
-
Access your model:
- API:
http://localhost:8080(or your configured port) - Health check:
http://localhost:8080/health - Web UI:
http://localhost:8080(built-in web interface)
- API:
-
Manage your instances:
.\Runner.ps1 -List # List all instances .\Runner.ps1 -Action stop -InstanceName my-instance # Stop an instance .\Runner.ps1 -Action start -InstanceName my-instance # Start an instance .\Runner.ps1 -Action delete -InstanceName my-instance # Delete an instance
.\Runner.ps1# Show help
.\Runner.ps1 -Help
# List running PM2 processes
.\Runner.ps1 -List
# Show detailed status
.\Runner.ps1 -Status
# Clean up old models and logs
.\Runner.ps1 -Cleanup
# Enable debug mode for troubleshooting
$env:DEBUG=1; .\Runner.ps1# Start a stopped instance
.\Runner.ps1 -Action start -InstanceName <instance-name>
# Stop a running instance
.\Runner.ps1 -Action stop -InstanceName <instance-name>
# Restart an instance
.\Runner.ps1 -Action restart -InstanceName <instance-name>
# Delete an instance (with confirmation)
.\Runner.ps1 -Action delete -InstanceName <instance-name>Examples:
# Start the 'phi3-mini' instance
.\Runner.ps1 -Action start -InstanceName phi3-mini
# Stop the 'gemma-7b' instance
.\Runner.ps1 -Action stop -InstanceName gemma-7b
# Restart the 'mistral' instance with health check
.\Runner.ps1 -Action restart -InstanceName mistral
# Delete the 'old-model' instance and cleanup files
.\Runner.ps1 -Action delete -InstanceName old-model# PowerShell provides built-in parameter validation
# Invalid parameters will show helpful error messages
.\Runner.ps1 -Action invalid # Shows valid options# PowerShell's try/catch provides detailed error information
# Automatic cleanup on script termination
# Graceful handling of Ctrl+C interruption# Built-in progress bars for downloads
# Colored output for better readability
# Structured logging with timestamps# Works on Windows PowerShell 5.1 and PowerShell Core 6+
# Automatic path handling for different operating systems
# Native PowerShell modules with proper manifest files- Port: 8080 (auto-incremented if busy)
- Context Size: 2048 tokens (range: 512-32768)
- Threads: 4 (or optimal based on CPU cores)
- Temperature: 0.7
- Batch Size: 512
- Memory Limit: 2GB (PM2 restart threshold)
- Model Selection: Prefers Q4_0 or Q4_K_M quantization
lib/
βββ Utils.psm1 # Core utilities module
βββ Utils.psd1 # Module manifest
βββ Download.psm1 # HuggingFace download module
βββ Download.psd1 # Module manifest
βββ PM2Config.psm1 # PM2 configuration module
βββ PM2Config.psd1 # Module manifest
# Import individual modules for development
Import-Module .\lib\Utils.psm1 -Force
Import-Module .\lib\Download.psm1 -Force
Import-Module .\lib\PM2Config.psm1 -Force
# Get available functions
Get-Command -Module Utils# Utils Module
Initialize-Environment -ScriptDirectory $PWD
Test-Dependencies
Find-AvailablePort -StartPort 8080
Write-LogMessage -Level "INFO" -Message "Test message"
# Download Module
Invoke-ModelDownload -ModelId "microsoft/Phi-3-mini-4k-instruct" -ModelType "completion"
Get-LocalModels
# PM2Config Module
New-PM2Config -InstanceName "test" -ModelPath "path/to/model.gguf" -Port 8080
Get-PM2Configs-
Execution Policy:
# Check current execution policy Get-ExecutionPolicy # Set execution policy (run as Administrator) Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
-
Module Import Issues:
# Force reload modules Remove-Module Utils, Download, PM2Config -ErrorAction SilentlyContinue Import-Module .\lib\Utils.psm1 -Force
-
Path Issues on Windows:
# Use PowerShell-native path handling $modelPath = Join-Path $PWD "models\model.gguf" # Script automatically handles path separators
-
PowerShell Version Compatibility:
# Check PowerShell version $PSVersionTable.PSVersion # Script requires PowerShell 5.1 or higher # Compatible with both Windows PowerShell and PowerShell Core
- Model not found: Verify HuggingFace model ID format
- Download failures: Check internet connectivity and disk space
- Port conflicts: Script automatically finds available ports
- Dependency issues: Ensure PM2, curl, jq, and llama-server are installed
- Memory issues: PM2 restarts processes exceeding 2GB memory
# Enable verbose logging and detailed tracing
$env:DEBUG=1; .\Runner.ps1
# Debug output includes:
# - API validation responses
# - Download progress details
# - File operation traces
# - Function call validation
# - Environment variable checksThe PowerShell version maintains 100% compatibility with the Bash version:
- Same Configuration Files: Uses identical PM2 ecosystem configurations
- Same Directory Structure: Maintains the same models/, logs/, and config/ directories
- Same API Endpoints: Produces identical server configurations
- Same Command Interface: Equivalent command-line options and functionality
You can switch between Bash and PowerShell versions seamlessly:
# Using Bash version
./runner.sh --list
# Using PowerShell version (equivalent)
.\Runner.ps1 -ListContributions are welcome! The PowerShell version follows the same patterns as the Bash version while leveraging PowerShell-specific features.
# Make sure script can execute
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
# Test the script with debug mode
$env:DEBUG=1; .\Runner.ps1 -HelpThis project is licensed under the MIT License. See LICENSE file for details.
- llama.cpp - Fast LLM inference in C/C++
- PM2 - Production process manager for Node.js
- HuggingFace - Model repository and hosting
- PowerShell - Cross-platform automation framework
For PowerShell-specific support:
- Check the PowerShell troubleshooting section above
- Verify PowerShell version compatibility
- Ensure execution policy allows script execution
- Create an issue with PowerShell version information
Happy model serving with PowerShell! π¦β¨