Medical Domain Embeddings Adapter by iberi22 · Pull Request #45 · iberi22/isar_agent_memory

iberi22 · 2026-04-16T22:32:42Z

This PR introduces the MedicalEmbeddingsAdapter and MedicalTokenizer to improve the quality of embeddings for medical text in both Spanish and English.

The MedicalTokenizer handles the expansion of common medical abbreviations (e.g., "TA" to "tensión arterial", "BP" to "blood pressure") using a regex-based approach with word boundaries to avoid false positives.

The MedicalEmbeddingsAdapter is implemented as a decorator that pre-processes text before passing it to an underlying EmbeddingsAdapter.

The EmbeddingsAdapter interface has been updated with a medicalNormalized method, and all existing implementations (GeminiEmbeddingsAdapter, OnDeviceEmbeddingsAdapter, FallbackEmbeddingsAdapter, and CodeEmbeddingsAdapter) have been updated accordingly.

Unit tests have been added to verify the abbreviation expansion and the integration with embedding adapters.

Fixes #38

PR created automatically by Jules for task 14484991894278155486 started by @iberi22

…ation - Added `medicalNormalized` to `EmbeddingsAdapter` interface. - Created `MedicalTokenizer` for Spanish and English abbreviation expansion. - Implemented `MedicalEmbeddingsAdapter` as a decorator for other adapters. - Updated all existing adapters to implement the new interface method. - Added comprehensive unit tests for the new components.

google-labs-jules · 2026-04-16T22:32:43Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

coderabbitai · 2026-04-16T22:32:52Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 76d598c0-809c-4df7-96d3-538af52c4593

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/medical-embeddings-adapter-14484991894278155486

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces medical text processing enhancements by adding a MedicalTokenizer for expanding Spanish and English medical abbreviations and a MedicalEmbeddingsAdapter decorator to apply these expansions before embedding. It also extends the EmbeddingsAdapter interface with a medicalNormalized method, implemented across existing adapters. Feedback was provided regarding the performance of the MedicalTokenizer, specifically suggesting the caching of regular expressions and sorted keys to avoid redundant computations during text processing.

gemini-code-assist · 2026-04-16T22:34:30Z

+  String expandAbbreviations(String text) {
+    if (text.isEmpty) return text;
+
+    String expandedText = text;
+
+    // Sort keys by length descending to avoid partial matches (e.g., 'ta' in 'tac')
+    final sortedKeys = _abbreviations.keys.toList()
+      ..sort((a, b) => b.length.compareTo(a.length));
+
+    for (final key in sortedKeys) {
+      // Use regex with word boundaries to avoid matching inside words
+      // e.g. "TA" should match but "taza" should not.
+      // We handle Tª specifically as it has a special character.
+      final escapedKey = RegExp.escape(key);
+      final regex = RegExp('\\b$escapedKey\\b', caseSensitive: false);
+
+      // Special case for Tª since \b might not work as expected with ª
+      if (key == 'tª') {
+        expandedText = expandedText.replaceAll(
+            RegExp(r'Tª', caseSensitive: false), _abbreviations[key]!);
+      } else {
+        expandedText = expandedText.replaceAllMapped(regex, (match) {
+          return _abbreviations[key]!;
+        });
+      }
+    }
+
+    return expandedText;
+  }


The expandAbbreviations method is inefficient because it re-calculates the sorted keys and re-compiles multiple RegExp objects on every call. This can lead to performance degradation when processing large texts or when called frequently.

Consider pre-calculating the sorted keys and caching the compiled regular expressions as static members of the class. This avoids redundant work and improves the overall performance of the tokenizer.

static final List<String> _sortedKeys = _abbreviations.keys.toList() ..sort((a, b) => b.length.compareTo(a.length)); static final Map<String, RegExp> _regexCache = { for (final key in _sortedKeys) key: key == 'tª' ? RegExp(r'tª', caseSensitive: false) : RegExp(r'\b' + RegExp.escape(key) + r'\b', caseSensitive: false) }; String expandAbbreviations(String text) { if (text.isEmpty) return text; String expandedText = text; for (final key in _sortedKeys) { expandedText = expandedText.replaceAll(_regexCache[key]!, _abbreviations[key]!); } return expandedText; }

iberi22 · 2026-04-16T23:41:11Z

SCOPE ISSUE: MedicalTokenizer tiene abreviaturas médicas hardcoded en español (TA, FC, DM, etc.). Esto viola el principio de package genérico.

SOLUCIÓN REQUERIDA:

Eliminar lib/src/utils/medical_tokenizer.dart
Eliminar lib/src/embeddings/medical_embeddings_adapter.dart
Crear un sistema PLUGGABLE en lib/src/utils/normalization_config.dart:

`dart
// Generic normalization config - NO hardcoded values
class TextNormalizationConfig {
final Map<String, String> abbreviationMap; // inyectado desde fuera

static TextNormalizationConfig empty() => TextNormalizationConfig({});

String normalize(String text) { ... }
}
`

En lib/src/embeddings/ crear un adapter genérico que use el config:

`dart
class NormalizingEmbeddingsAdapter implements EmbeddingsAdapter {
final EmbeddingsAdapter inner;
final TextNormalizationConfig config;

// Usa config.normalizationConfig.abbreviationMap, sin hardcodear nada
}
`

Las abreviaturas médicas específicas para España/Latam se configurarán en OrionHealth, no en este package.

Por favor: elimina lo hardcoded y hazlo genérico.

iberi22 · 2026-04-17T00:13:47Z

SCOPE VIOLATION: MedicalTokenizer tiene abreviaciones medicas hardcoded (TA, FC, DM). Por favor crea nuevo PR con TextNormalizationConfig generico y pluggable.

google-labs-jules Bot mentioned this pull request Apr 16, 2026

Feat: Medical Domain Embeddings Adapter #38

Open

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

iberi22 closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Medical Domain Embeddings Adapter#45

Medical Domain Embeddings Adapter#45
iberi22 wants to merge 1 commit into
mainfrom
feat/medical-embeddings-adapter-14484991894278155486

iberi22 commented Apr 16, 2026

Uh oh!

google-labs-jules Bot commented Apr 16, 2026

Uh oh!

coderabbitai Bot commented Apr 16, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Uh oh!

iberi22 commented Apr 16, 2026

Uh oh!

iberi22 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iberi22 commented Apr 16, 2026

Uh oh!

google-labs-jules Bot commented Apr 16, 2026

Uh oh!

coderabbitai Bot commented Apr 16, 2026

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

iberi22 commented Apr 16, 2026

Uh oh!

iberi22 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant