Skip to content

Latest commit

 

History

History
189 lines (154 loc) · 7.43 KB

File metadata and controls

189 lines (154 loc) · 7.43 KB

Dell EMC Unity Array Zabbix Template & Script

Overview

This documentation covers the Zabbix template and external script for monitoring Dell EMC Unity storage arrays using the REST API. The solution provides comprehensive auto-discovery and monitoring of Unity resources including LUNs, Pools, Disks, Ports, Batteries, Alerts, Hosts, Replication Sessions, and Storage Tiers with health, status, capacity, and performance metrics.

  • Template: dell_unity_health.yaml
  • Script: get_dell_unity_health.py
  • Zabbix Version: 6.0+
  • Author: Simon Jackson / @sjackson0109

Features

  • Auto-discovery of Unity resources (LUNs, Pools, Disks, Ports, Batteries, Alerts, Hosts, Replication Sessions, Storage Tiers)
  • Comprehensive monitoring of health, capacity, firmware, status, and performance metrics
  • Alert monitoring for proactive issue detection and management
  • Host connectivity tracking with last contact times and health status
  • Replication monitoring including source/destination status and health
  • Flexible authentication with secure or insecure TLS/SSL connections (configurable)
  • Zabbix integration using zabbix_sender for efficient data transmission
  • Comprehensive logging for troubleshooting and audit trails

Installation

1. Deploy the Script

Copy get_dell_unity_health.py to your Zabbix externalscripts directory (on proxy or server).

2. Install Python Dependencies

pip install requests urllib3

3. Import the Template

  • Import dell_unity_health.yaml into Zabbix (Configuration > Templates > Import).

4. Configure Host Macros

Add the following macros to your Unity host in Zabbix:

{$UNITY_USER}      # Unity API username
{$UNITY_PASSWORD}  # Unity API password
{$UNITY_PORT}      # Unity API port (default: 443)
{$UNITY_TLS_VERIFY} # Set to 1 to enforce TLS/SSL certificate validation, 0 to ignore (default: 0)

5. Assign the Template

  • Link the template to your Unity array host.
  • Set the macros with correct values for your environment.

Usage

  • The template will auto-discover LUNs, pools, disks, ports, batteries, alerts, hosts, and replication sessions.
  • Comprehensive metrics including health status, capacity information, model/serial details, firmware versions, alert states, host connectivity, and replication status will be collected.
  • Pre-configured triggers and graphs provide immediate monitoring capabilities.
  • Individual item queries enable detailed metric retrieval for specific resources.

Security Note

  • By default, the script ignores SSL certificate errors. To enforce certificate validation, set {$UNITY_TLS_VERIFY} to 1 and ensure your Unity array uses a valid certificate.

Troubleshooting

  • Check /tmp/unity_state.log for script logs and errors.
  • Ensure all macros are set and correct on the Zabbix host.
  • Verify Python dependencies are installed.
  • Use Zabbix's "Latest Data" to confirm metrics are being collected.

References

Template Components

Discovery Rules

  • Storage Pool Discovery: Discovers all configured storage pools
  • LUN Discovery: Finds all logical units and their mappings
  • Disk Discovery: Identifies physical disks and their status
  • FC Port Discovery: Discovers Fibre Channel ports and connections
  • iSCSI Port Discovery: Finds iSCSI network interfaces
  • Controller Discovery: Identifies storage processors and controllers

Monitored Metrics

System Health

  • Overall system health status
  • Alert count and severity levels
  • Component fault indicators
  • Environmental status (temperature, fans, power)

Storage Capacity

  • Pool capacity and utilisation percentage
  • Available free space per pool
  • LUN allocation and usage statistics
  • Thin provisioning efficiency ratios

Performance Metrics

  • Read/write IOPS per pool and LUN
  • Throughput (MB/s) for read and write operations
  • Average response times and latency
  • Queue depth and outstanding I/O operations

Hardware Components

  • Disk health status and failure predictions
  • Controller operational state and failover status
  • Network port link status and utilisation
  • Power supply and cooling system status

Triggers

  • Critical Alerts: System failures, disk failures, pool full conditions
  • Warning Alerts: High utilisation, performance degradation, component warnings
  • Information: Status changes, maintenance events, configuration updates

Graphs

  • Storage capacity trends and growth projections
  • Performance metrics over time (IOPS, throughput, latency)
  • System health and availability statistics
  • Component status and failure tracking

Error Handling

Common Issues

  1. Authentication Failures:

    • Verify Unity user credentials and permissions
    • Check account lockout status and password expiry
    • Ensure REST API service is running
  2. Connection Timeouts:

    • Verify network connectivity to Unity management interface
    • Check firewall rules for HTTPS (port 443) access
    • Increase timeout macro value if needed
  3. Discovery Failures:

    • Confirm Unity system is fully initialised
    • Check for pending configuration changes
    • Verify sufficient user privileges for discovery operations

Troubleshooting Steps

  1. Test REST API connectivity using curl or similar tools:

    curl -k -u username:password https://unity-mgmt-ip/api/instances/basicSystemInfo
  2. Review Zabbix server logs for API errors and responses

  3. Check Unity event logs for authentication and access issues

  4. Validate template macros and host configuration

Best Practices

  • Use dedicated monitoring account with minimal required privileges
  • Implement proper certificate validation for secure environments
  • Monitor API call frequency to avoid overwhelming the Unity system
  • Set appropriate data collection intervals based on system size
  • Use Zabbix maintenance periods during Unity maintenance windows

Value Mappings

Health Status

  • 0: Unknown
  • 5: OK
  • 7: OK but minor warning
  • 10: Degraded/Warning
  • 15: Minor failure
  • 20: Major failure
  • 25: Critical failure
  • 30: Non-recoverable error

Disk Status

  • 1: Enabled
  • 2: Disabled
  • 3: Removed
  • 4: Missing
  • 5: Faulted
  • 6: Unknown

Advanced Configuration

Custom Thresholds

Adjust capacity and performance thresholds using template macros:

  • {$POOL_UTIL_WARN}: Pool utilisation warning threshold (%)
  • {$POOL_UTIL_CRIT}: Pool utilisation critical threshold (%)
  • {$RESPONSE_TIME_WARN}: Response time warning threshold (ms)
  • {$RESPONSE_TIME_CRIT}: Response time critical threshold (ms)

Extended Discovery

Enable additional discovery rules for detailed monitoring:

  • Snapshot and replication status
  • File system and NAS server monitoring
  • Host access and initiator tracking

References