Skip to content

Conversation

@bigbrett
Copy link
Contributor

@bigbrett bigbrett commented Jan 22, 2026

Server thread safety

TL;DR: Makes wolfHSM server safe to use in multithreaded scenarios.

Overview

This pull request implements thread-safe access to shared server resources in wolfHSM, specifically targeting the NVM (non-volatile memory) subsystem which also protects the global key cache. Crypto is left to a subsequent PR but is the likely next candidate.

Note that a server context itself still cannot be shared across threads without proper serialization by the caller. This PR adds the mechanisms such that, when multiple server contexts share an NVM instance (which includes the global keystore), access to those shared resources is properly serialized, allowing requests from multiple clients to be processed concurrently in separate threads.

Changes

  • Introduces lock abstraction layer (wh_lock.{c,h}) with callback-based design for platform independence
  • Example POSIX lock implementation using pthread_mutex
  • Adds server-level NVM locking API (wh_Server_NvmLock()/wh_Server_NvmUnlock()) with convenience macros WH_SERVER_NVM_LOCK()/WH_SERVER_NVM_UNLOCK()
  • All request handlers that access NVM or global keystore resources acquire the lock at the handler level before performing operations
  • Lower-level modules (NVM, keystore, counter, cert, etc.) remain lock-free; synchronization is the responsibility of the request handler layer
  • Thread safe functionality enabled with the WOLFHSM_CFG_THREADSAFE build option. When this option is NOT defined, all lock macros compile to no-ops with zero overhead
  • Adds "thread safe stress test" to test suite that attempts to flush out data races via a large number of contention cases, meant to be run under ThreadSanitizer

Design Rationale

The locking strategy is intentionally simple: acquire the NVM lock at the start of a request handler, perform all operations (including any compound operations involving multiple NVM/cache accesses), then release the lock. This approach:

  1. Avoids TOCTOU issues - No risk of metadata becoming stale or objects being destroyed/replaced between checks
  2. Makes lock scope visible - Locking is explicit at the handler level rather than hidden in lower layers

Gaps/Future Work

  • Serializing access to global crypto state, specifically hardware crypto for ports. A bit of a tricky problem since offload is provided at the port level, and there isn't a good way for wolfHSM to know which algos will be accelerated and which won't. A naive implementation might consider simply locking the server crypto context, but this contains a mixture of local (CMAC) and quasi-global (RNG) elements and no abstraction for hardware. Locks also need to be synchronized with the wolfCrypt port mutex. We should refactor the server crypto context and perhaps split it into local and global structures, with the global supporting hardware state. Future work...

…ety,

serializing access to shared global resources like NVM and global keycache
Copy link
Contributor

@billphipps billphipps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truly excellent! You solved this just the way I had hoped for!
My requested changes are very limited and not really functional. More just fleshing out the exact requirements for a real implementation and a few minor typos and renaming opportunities.

The stress testing framework is outstanding!

#include "wolfhsm/wh_lock.h"
#include "wolfhsm/wh_error.h"

#ifdef WOLFHSM_CFG_THREADSAFE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the best name? Consider the more mundane WOLFHSM_CFG_LOCKS. Threadsafe may imply more than just locks, like cancelability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah was kind of wishy washy on this. good point. Let me think on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding posix into the name of this file since it heavily used posix to provide any real functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it might be nice to organize our posix tests in one spot. maybe test/posix or port/posix/test/ so we can leave our wh_test_*.c stuff generic for all platforms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that solution. +1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good idea. Unfortunately a lot of our generic tests modules (e.g. wh_test_clientserver.c) contain both generic drivers as well as a POSIX harness (e.g. spins up the client + server threads). I think it might be best to push this out of scope of this PR and refactor the tests to better split generic test drivers (e.g. whTest_XXXClientCfg(whClientConfig*) and whTest_XXXCLientCtx(whClientCtx*)) from the actual underlying test harness. I'd wager we could reduce a lot of code that way with one or two unified harnesses that drivers just run on top of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed! Definitely outside the scope of this PR

Copy link
Contributor

@rizlik rizlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look into tests yet.
Great work.
Is this lock enough to properly synchronize client request?
Example, _HandleNvmRead:

    rc = wh_Nvm_GetMetadata(server->nvm, id, &meta);
    if (rc != WH_ERROR_OK) {
        return rc;
    }

    if (offset >= meta.len)
        return WH_ERROR_BADARGS;

    /* Clamp length to object size */
    if ((offset + len) > meta.len) {
        len = meta.len - offset;
    }

    rc = wh_Nvm_ReadChecked(server->nvm, id, offset, len, out_data);
    if (rc != WH_ERROR_OK)

metadata can be changed between GetMetadata and ReadChecked.
Also, when handling key request:

            /* get a new id if one wasn't provided */
            if (WH_KEYID_ISERASED(meta->id)) {
                ret     = wh_Server_KeystoreGetUniqueId(server, &meta->id);
                resp.rc = ret;
            }
            /* write the key */
            if (ret == WH_ERROR_OK) {
                ret     = wh_Server_KeystoreCacheKeyChecked(server, meta, in);
                resp.rc = ret;
            }

the id might not be unique anymore when _KeysotreCacheKeyCached.

Would more coarse granular locking at request level simplify the design?

API/Error handling:
- Add initialized flag to whLock structure to distinguish init states
- Enhance error handling: acquire/release check initialized flag
- Make wh_Lock_Cleanup zero structure for clear post-cleanup state
- Document init/cleanup must be single-threaded (no atomics)
- Document cleanup preconditions (no active contention required)
- Update all API docs with precise return codes and error conditions
- Change blocking acquire failure from ERROR_LOCKED to ERROR_ABORTED
- Add comment explaining why non-blocking acquire is not provided

POSIX port improvements:
- Enhanced errno mapping in posix_lock.c (EINVAL→BADARGS, etc)
- Trap PTHREAD_MUTEX_ERRORCHECK errors (EDEADLK, EPERM)

Test coverage:
- Add testUninitializedLock to validate error handling
- Enhance testLockLifecycle with post-cleanup validation tests

Misc:
- Apply consistent critical section style pattern in wh_nvm.c
- Update copyright years to 2026
- Rename stress test files to wh_test_posix_threadsafe_stress.*
@bigbrett
Copy link
Contributor Author

@rizlik great catch, thanks. I thought I fixed all of those but clearly there are some non-atomic compound operations still lurking. I will make another pass to ensure I make them all atomic.

@rizlik
Copy link
Contributor

rizlik commented Jan 27, 2026

@rizlik great catch, thanks. I thought I fixed all of those but clearly there are some non-atomic compound operations still lurking. I will make another pass to ensure I make them all atomic.

I wonder, if we are going to use a single lock, can't we just acquire the lock at wh_Server_HandleKeyRequest start and release the lock at the end (same for wh_Server_HandleNvmRequest)?

It's probably a tradeoff, we'll gain simplicity as we don't need locked vs unlocked APIs but there is the risk that other part of the code misuse Nvm API and introduce races in the future.

@bigbrett
Copy link
Contributor Author

It's probably a tradeoff, we'll gain simplicity as we don't need locked vs unlocked APIs but there is the risk that other part of the code misuse Nvm API and introduce races in the future.

@rizlik yep that is what I was worried about and why I didn't initially try it that way ¯\_(ツ)_/¯

I'm not 100% sold on which is better

…nter, img_mgr, and nvm modules

Adds proper thread-safety locking discipline to additional server modules that
perform compound NVM operations. This prevents TOCTOU (Time-Of-Check-Time-Of-Use)
issues where metadata could become stale between check and use/writeback.

Changes:
- wh_server_cert.c: Add NVM locking for atomic GetMetadata + Read operations in
  certificate read and export paths
- wh_server_counter.c: Add NVM locking for atomic read-modify-write counter
  increment operations
- wh_server_img_mgr.c: Add NVM locking for atomic signature load operations
- wh_server_keystore.c: Refactor to use unlocked internal variants for compound
  operations (GetUniqueId + CacheKey, policy check + erase, freshen + export).
  Add locking discipline documentation.
- wh_server_nvm.c: Add NVM locking for DMA read operations to ensure metadata
  remains valid throughout transfer. Add locking discipline documentation.
- wh_test_posix_threadsafe_stress.c: Add new stress test phases for counter
  concurrent increment, counter increment vs read, NVM read vs resize, NVM
  concurrent resize, and NVM read DMA vs resize. Add counter atomicity validation.

All compound operations now follow the pattern:
1. Acquire server->nvm->lock
2. Use only *Unlocked() variants internally
3. Keep lock held for entire operation including DMA
4. Release lock after all metadata-dependent operations complete
Copy link
Member

@AlexLanzano AlexLanzano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good so far!

My main concern is the addition of *Unlocked functions. I feel like there has to be a way to remove those and still use the top level API functions by either checking if the current thread has already acquired the nvm lock. Or by creating a lock for both the keystore and the nvm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it might be nice to organize our posix tests in one spot. maybe test/posix or port/posix/test/ so we can leave our wh_test_*.c stuff generic for all platforms

…vel server module APIs (keystore, NVM, counter, etc.) and aquire lock in request handling functions (e.g. wh_Server_HandleXXXRequest())
@bigbrett bigbrett assigned bigbrett and unassigned AlexLanzano and billphipps Jan 28, 2026
@bigbrett bigbrett force-pushed the server-thread-safe branch 3 times, most recently from 2a07204 to 4de1c8e Compare January 28, 2026 23:23
  protection
- TSAN options to fail-fast in CI on error
@bigbrett
Copy link
Contributor Author

OK @billphipps @rizlik @AlexLanzano I have updated this to dramatically simplify based on our meeting discussion. I recommend reviewing the "fresh" diff against main, and not looking at the diff since your last review, as it will be VERY noisy given how much I ripped out. I will probably want to squash commits before we merge given that I redid it.

Notable changes:

  • Centralized lock acquisition: BIG refactor moving the locking from lower-level server module APIs (NVM, keystore, counter, etc.) up to the request handling layer (wh_Server_HandleXXXRequest() functions)
  • Removed wh_nvm_internal.h: Eliminated the separate internal header containing "unlocked" NVM variants; these are no longer needed with the new locking architecture
  • Added SHE supprot: Realized I missed the SHE module before this, so went ahead and added it
  • Updated wh_nvm.h documentation: Added misisng Doxygen documentation for NVM APIs
  • Test cleanup: Fixed macro protection issues and test housekeeping in lock and stress tests

One thing to note: while the lock aquisition/release has been fully removed from the lower layer APIs and relocated to the handlers, I did keep the lock Init/Cleanup inside wh_Nvm_Init()/wh_Nvm_Cleanup() just since this should happen before any threads are spawned and before any server contexts are initialized. I can remove this and put the burden on the caller to init NVM then immediately initialize the lock, but figured this was simpler. It is commented accordingly in the NVM API. Let me know if we think this should instead be left to the caller.

Copy link
Contributor

@rizlik rizlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good, I've just minor comments.
I can't properly understand TSAN test still, I'll try to give it a look soon

Comment on lines +111 to +112
context->cb = NULL;
context->context = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: aren't these = NULL redundant?

Comment on lines +1680 to +1684
if (ret == WH_ERROR_OK) {
/* Translate server keyId back to client format with flags
*/
resp.id = wh_KeyId_TranslateToClient(meta->id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: translateToClient operation can be done outside of the critical section

if (ret == WH_ERROR_OK) {
/* Translate server keyId back to client format with flags
*/
resp.id = wh_KeyId_TranslateToClient(meta->id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +156 to +162
rc = WH_SERVER_NVM_LOCK(server);
if (rc == WH_ERROR_OK) {
/* Process the list action */
rc = wh_Nvm_List(server->nvm, req.access, req.flags,
req.startId, &resp.count, &resp.id);

(void)WH_SERVER_NVM_UNLOCK(server);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the List API is problematic from the point of view of the Client as it is supposed to work in multiple rounds. Consider adding a comment in the List documentation. For the future we might want to provide alternative API as well.

Comment on lines +216 to +223

if (rc == 0) {
resp.id = meta.id;
resp.access = meta.access;
resp.flags = meta.flags;
resp.len = meta.len;
memcpy(resp.label, meta.label, sizeof(resp.label));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This can be out from the critical section

Comment on lines +13 to +19
race:wolfCrypt_Init
race:wolfCrypt_Cleanup

# Races on gCryptoDev array in crypto callback registration
race:wc_CryptoCb_RegisterDevice
race:wc_CryptoCb_UnRegisterDevice
race:wc_CryptoCb_GetDevice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never used TSAN. Can proper locking in tests (or initialization in a single thread) avoid adding these exceptions?

Comment on lines +193 to +197
* NOTE: Use client-facing keyId format (simple ID + flags), NOT server-internal
* format (WH_MAKE_KEYID). The server's wh_KeyId_TranslateFromClient() extracts
* only the lower 8 bits as ID and checks WH_KEYID_CLIENT_GLOBAL_FLAG for
* global. Using WH_MAKE_KEYID with user=1 sets bit 8, which is
* WH_KEYID_CLIENT_GLOBAL_FLAG!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this comment is misplaced.

Comment on lines +215 to +218
#define HOT_NVM_ID ((whNvmId)100)
#define HOT_NVM_ID_2 ((whNvmId)101)
#define HOT_NVM_ID_3 ((whNvmId)102)
#define HOT_COUNTER_ID ((whNvmId)200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we use HOT_KEY_ID_GLOBAL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants