Skip to content

Conversation

@bigbrett
Copy link
Owner

@bigbrett bigbrett commented Jan 21, 2026

TL;DR: Makes wolfHSM server thread safe!

This pull request implements a generic framework for thread-safe access to shared server resources in wolfHSM, specifically targeting the NVM (non-volatile memory) and global key cache subsystems.

Note that an instance of a server context itself cannot be shared across threads without serialization by the caller. This PR just ensures that if multiple server contexts share an NVM or global keystore, that accesses to shared resources are properly serialized such that requests from multiple clients can be processed concurrently in their own threads.

Changes:

  • Introduces lock abstraction layer (wh_lock.{c,h}) with callback-based design for platform independence
  • Example POSIX lock implementation using pthread_mutex
  • Refactor NVM and keystore layer internals to use lock abstraction such that global keystore and NVM can be shared by multiple server contexts in a thread-safe manner
  • Thread safe functionality enabled with the WOLFHSM_CFG_THREADSAFE build option. When this option is NOT defined, all lock abstraction operations compile to no-ops, with zero overhead
  • Adds "thread safe stress test" to test suite that attempts the flesh out data races via a large number of contention cases, meant to be run under ThreadSanitizer

Gaps/future work:

  • Serializing access to global crypto state, specifically hardware crypto for ports. A bit of a tricky problem since offload is provided at the port level, and there isn't a good way for wolfHSM to know which algos will be accelerated and which wont

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements thread-safe access to shared server resources in wolfHSM, specifically targeting the NVM (non-volatile memory) and global key cache subsystems. The implementation introduces a platform-agnostic lock abstraction layer with a POSIX pthread_mutex implementation, along with comprehensive stress testing infrastructure.

Changes:

  • Introduces lock abstraction layer (wh_lock.h/c) with callback-based design for platform independence
  • Adds POSIX lock implementation using pthread_mutex
  • Implements thread-safe wrappers for all NVM and keystore operations with proper lock acquisition/release
  • Adds "unlocked" variants of functions for atomic multi-step operations
  • Includes comprehensive stress tests with 18+ contention phases and TSAN (Thread Sanitizer) integration

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file
File Description
wolfhsm/wh_settings.h Documents WOLFHSM_CFG_THREADSAFE configuration option
wolfhsm/wh_nvm.h Adds lock member to NVM context and declares unlocked function variants
wolfhsm/wh_lock.h Defines platform-agnostic lock abstraction API
src/wh_lock.c Implements lock abstraction with callback-based design
src/wh_nvm.c Adds locking to all NVM operations and implements unlocked variants
src/wh_server_keystore.c Adds locking to all keystore operations with atomic multi-step support
port/posix/posix_lock.h Defines POSIX-specific lock structures and callbacks
port/posix/posix_lock.c Implements pthread_mutex-based locking
test/wh_test_lock.h Header for lock unit tests
test/wh_test_lock.c Basic lock functionality tests
test/wh_test_threadsafe_stress.h Header for stress tests
test/wh_test_threadsafe_stress.c Comprehensive multi-threaded stress tests
test/wh_test_crypto.c Initializes NVM config to avoid uninitialized memory
test/wh_test.c Integrates lock and stress tests into test suite
test/tsan.supp Thread Sanitizer suppressions for known false positives
test/Makefile Adds TSAN support and stress test mode

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ety,

serializing access to shared global resources like NVM and global keycache
API/Error handling:
- Add initialized flag to whLock structure to distinguish init states
- Enhance error handling: acquire/release check initialized flag
- Make wh_Lock_Cleanup zero structure for clear post-cleanup state
- Document init/cleanup must be single-threaded (no atomics)
- Document cleanup preconditions (no active contention required)
- Update all API docs with precise return codes and error conditions
- Change blocking acquire failure from ERROR_LOCKED to ERROR_ABORTED
- Add comment explaining why non-blocking acquire is not provided

POSIX port improvements:
- Enhanced errno mapping in posix_lock.c (EINVAL→BADARGS, etc)
- Trap PTHREAD_MUTEX_ERRORCHECK errors (EDEADLK, EPERM)

Test coverage:
- Add testUninitializedLock to validate error handling
- Enhance testLockLifecycle with post-cleanup validation tests

Misc:
- Apply consistent critical section style pattern in wh_nvm.c
- Update copyright years to 2026
- Rename stress test files to wh_test_posix_threadsafe_stress.*
…nter, img_mgr, and nvm modules

Adds proper thread-safety locking discipline to additional server modules that
perform compound NVM operations. This prevents TOCTOU (Time-Of-Check-Time-Of-Use)
issues where metadata could become stale between check and use/writeback.

Changes:
- wh_server_cert.c: Add NVM locking for atomic GetMetadata + Read operations in
  certificate read and export paths
- wh_server_counter.c: Add NVM locking for atomic read-modify-write counter
  increment operations
- wh_server_img_mgr.c: Add NVM locking for atomic signature load operations
- wh_server_keystore.c: Refactor to use unlocked internal variants for compound
  operations (GetUniqueId + CacheKey, policy check + erase, freshen + export).
  Add locking discipline documentation.
- wh_server_nvm.c: Add NVM locking for DMA read operations to ensure metadata
  remains valid throughout transfer. Add locking discipline documentation.
- wh_test_posix_threadsafe_stress.c: Add new stress test phases for counter
  concurrent increment, counter increment vs read, NVM read vs resize, NVM
  concurrent resize, and NVM read DMA vs resize. Add counter atomicity validation.

All compound operations now follow the pattern:
1. Acquire server->nvm->lock
2. Use only *Unlocked() variants internally
3. Keep lock held for entire operation including DMA
4. Release lock after all metadata-dependent operations complete
…vel server module APIs (keystore, NVM, counter, etc.) and aquire lock in request handling functions (e.g. wh_Server_HandleXXXRequest())
@bigbrett bigbrett force-pushed the server-thread-safe branch 3 times, most recently from 2a07204 to 4de1c8e Compare January 28, 2026 23:23
  protection
- TSAN options to fail-fast in CI on error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants