fix(ai): serialize sentence-transformer encoding to prevent GPU races by ksaurabhAparavi · Pull Request #1182 · rocketride-org/rocketride-server

ksaurabhAparavi · 2026-06-08T10:21:22Z

Summary

Serialize both the wrapper encode() and raw shared-model access so concurrent inference on the shared NomicBert model does not race or trigger intermittent tensor size mismatches.
Adds a CUDA reproducer and focused regression coverage.

⚠️ Reviewer note

Conflict-resolved by keeping upstream's packages/ai/tests/conftest.py (the downstream commit's conftest additions were dropped). The core fix in sentence_transformers.py applied cleanly. Please confirm the added tests run under upstream's conftest in CI.

Testing

CI (./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.

Linked Issue

Fixes #1169

coderabbitai · 2026-06-08T10:21:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 61db1aaf-2a01-4c86-a0f7-5069281f9746

📥 Commits

Reviewing files that changed from the base of the PR and between efecb7e and bd4c627.

📒 Files selected for processing (4)

packages/ai/src/ai/common/models/transformers/sentence_transformers.py
packages/ai/tests/ai/common/models/transformers/__init__.py
packages/ai/tests/ai/common/models/transformers/reproduce_sentence_transformer_origin.py
packages/ai/tests/ai/common/models/transformers/test_sentence_transformers.py

📝 Walkthrough

Walkthrough

The PR adds thread serialization to SentenceTransformer.encode() local inference path via a mutex lock, preventing concurrent calls from interleaving preprocess/inference/postprocess operations. Includes a unit test verifying serialization and a standalone GPU reproducer script for manual testing.

Changes

Concurrent encode serialization

Layer / File(s)	Summary
Serialization lock implementation `packages/ai/src/ai/common/models/transformers/sentence_transformers.py`	`threading` module imported; `self._encode_lock` created in `__init__`; `_encode_local()` acquires lock around batched preprocess → inference → postprocess, serializing concurrent local encodes.
Unit test verification `packages/ai/tests/ai/common/models/transformers/test_sentence_transformers.py`	Monkeypatches `SentenceTransformer` loader and pipeline; spawns concurrent `encode()` calls via `ThreadPoolExecutor`; asserts inference executes serially (max 1 simultaneous active call); validates NumPy array output shape `(4, 1)`.
Manual reproducer and test infrastructure `packages/ai/tests/ai/common/models/transformers/__init__.py`, `reproduce_sentence_transformer_origin.py`	Standalone GPU reproducer generates variable-length synthetic batches, logs encode events with thread/worker identity, and supports sequential/concurrent execution modes via CLI flags; test package init.py header added.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Concurrent shared sentence-transformer inference races (tensor size mismatches) #1169: Addresses the same concurrent SentenceTransformer.encode() race condition causing tensor size mismatches by serializing access to the shared local model inference path.

Suggested reviewers

jmaionchi
stepmikhaylov
Rod-Christensen

Poem

🐰 A lock on the encoder so fair,
No more shall the threads interfere,
Sequential encode, now pristine,
GPU tensors stay serene—
One thread at a time, we declare!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding serialization to sentence-transformer encoding to prevent GPU race conditions.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-08T10:50:10Z

🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

Serialize both the wrapper encode() and raw shared-model access so concurrent inference on the shared NomicBert model does not race or trigger intermittent tensor size mismatches during instance processing. Adds a CUDA reproducer and focused regression coverage. Fixes rocketride-org#1169

stepmikhaylov

Requesting changes. Good diagnosis and repro, but the lock is too coarse.

Issue: Wrapping all of SentenceTransformer._encode_local in a per-instance self._encode_lock serializes the entire encode call — tokenization and postprocess included — when only the GPU forward pass needs protection. There's already an established pattern for this in the same package.

Fix: Adopt the WhisperLoader approach (audio/whisper.py): a class-level _model_locks registry keyed by id(model), acquired inside inference() around the forward pass only (see _get_model_lock). Apply the same to SentenceTransformerLoader.inference(), keyed on id(actual_model) (after the model_obj unwrap). This confines the critical section to the unsafe operation and lets tokenization/postprocess overlap. Once it's in the loader, drop the now-redundant self._encode_lock.

Additional point: the current fix also doesn't account for remote mode — self._encode_lock only covers the local wrapper path, while the static SentenceTransformerLoader.inference() path is left unsynchronized. Moving the lock into the loader resolves this too. (The commit message mentions "raw shared-model access," but only the wrapper is locked — please update it to match the final scope.)

Test: Keep the coverage but retarget it — the current test monkeypatches inference wholesale, which would replace the lock along with it. Stub only the GPU forward and assert serialization through the real inference(). The reproducer is fine to keep as-is.

ksaurabhAparavi requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners June 8, 2026 10:21

github-actions Bot added the module:ai AI/ML modules label Jun 8, 2026

ksaurabhAparavi force-pushed the fix/RR-1169-sentence-transformer-concurrency branch from bdedea6 to bd4c627 Compare June 8, 2026 11:51

stepmikhaylov requested changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ai): serialize sentence-transformer encoding to prevent GPU races#1182

fix(ai): serialize sentence-transformer encoding to prevent GPU races#1182
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1169-sentence-transformer-concurrency

ksaurabhAparavi commented Jun 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

stepmikhaylov left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ksaurabhAparavi commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

⚠️ Reviewer note

Testing

Linked Issue

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

stepmikhaylov left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ksaurabhAparavi commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading