Skip to content
This repository was archived by the owner on Jul 28, 2025. It is now read-only.

Conversation

@tomolopolis
Copy link
Member

Small improvements for fine-tuning and using the de-id in a pipeline:

  1. train / set splits can be performed outside of the train method. Changes to support this in DeIdModel.train
  2. locale specific regex can be useful in a pipeline rather than directly collecting annotations and fine-tuning the underlying model. Changes to include arbitrary patterns matched and mapped to CDB cuis, then merged with model predictions. Eval code also doesn't use tokenizer split, so is fully representative of what was annotated.

Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments that I think should be addressed (i.e duplicate/typod key, remaining old doc string), the rest of it is more optional I'd say.
Though I do feel like the 2 comments regarding cui2preferred_name as well as the one regarding using filter make sense to implement as well.

Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests as well - great!
All good on my side.

@tomolopolis tomolopolis merged commit a7661ef into master Jun 4, 2025
8 checks passed
alhendrickson pushed a commit to CogStack/cogstack-nlp that referenced this pull request Jul 1, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants