fix: stop heuristic role expansion after LLM role suggestion#203
fix: stop heuristic role expansion after LLM role suggestion#203michaelmwu merged 4 commits intomainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adjusts resume extraction behavior in LLM mode so heuristic role inference does not run once the model has provided a role suggestion, preventing post-processing from expanding/overriding LLM-provided roles.
Changes:
- Gate heuristic role inference behind a new “LLM provided role suggestion” check.
- Preserve normalized LLM role output even when it normalizes to an empty list (instead of backfilling from heuristics).
- Add a regression unit test intended to ensure heuristic role inference doesn’t expand LLM-suggested roles.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/unit/test_resume_extractor.py | Adds a regression test for “no heuristic role backfill after LLM role suggestion”. |
| packages/shared/src/five08/resume_extractor.py | Updates LLM extraction normalization to skip heuristic role inference when the LLM supplied a role suggestion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/unit/test_resume_extractor.py
Outdated
|
|
||
| result = extractor.extract("Jane Doe\nSoftware Engineer") | ||
|
|
||
| assert result.primary_roles == [] |
There was a problem hiding this comment.
This assertion will likely fail: the LLM payload provides primary_roles: ["platform specialist"], and _normalize_role_collection() keeps non-empty strings even when they don't map to a canonical role (it falls back to an alphanumeric normalized string). With the new extractor logic, result.primary_roles should remain the normalized LLM value (e.g. ["platform specialist"]), not []. If the intent is to regression-test that heuristics don't add roles like developer, assert that "developer" is not present (or adjust the fake LLM output to something like "/" that normalizes to an empty list).
| assert result.primary_roles == [] | |
| # LLM-suggested roles should be preserved, and heuristic roles like "developer" | |
| # should not be added on top. | |
| assert result.primary_roles == ["platform specialist"] | |
| assert "developer" not in result.primary_roles |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| llm_provided_role_suggestion = False | ||
| if isinstance(parsed_primary_roles_raw, str): | ||
| llm_provided_role_suggestion = bool(parsed_primary_roles_raw.strip()) | ||
| elif isinstance(parsed_primary_roles_raw, (list, tuple)): | ||
| llm_provided_role_suggestion = any( | ||
| isinstance(item, str) and bool(item.strip()) | ||
| for item in parsed_primary_roles_raw | ||
| ) |
There was a problem hiding this comment.
llm_provided_role_suggestion is derived from the raw primary_roles/primary_role value. For string inputs that are non-whitespace but normalize to an empty role list (e.g. a comma-only value like "," which normalize_roles splits into empty tokens), this will incorrectly skip heuristic inference and return primary_roles=[] (previously heuristics would have run because the normalized list was empty). Consider basing the decision on the normalized parsed_primary_roles (or otherwise treating punctuation-only / empty-token strings as “no suggestion”) so heuristics still run when no actual roles can be parsed.
| llm_provided_role_suggestion = False | |
| if isinstance(parsed_primary_roles_raw, str): | |
| llm_provided_role_suggestion = bool(parsed_primary_roles_raw.strip()) | |
| elif isinstance(parsed_primary_roles_raw, (list, tuple)): | |
| llm_provided_role_suggestion = any( | |
| isinstance(item, str) and bool(item.strip()) | |
| for item in parsed_primary_roles_raw | |
| ) | |
| llm_provided_role_suggestion = bool(parsed_primary_roles) |
| parsed_primary_roles_raw = parsed.get("primary_roles") | ||
| if not parsed_primary_roles_raw: | ||
| parsed_primary_roles_raw = parsed.get("primary_role") | ||
| parsed_primary_roles = _normalize_role_collection(parsed_primary_roles_raw) |
There was a problem hiding this comment.
This change adds support for honoring the legacy primary_role field (via the new fallback logic), but there isn’t a unit test exercising the primary_role path (only primary_roles). Adding a regression test where the LLM returns primary_role (string) and verifying heuristic role inference does not expand it would help prevent future regressions.
Description
When resume extraction is running in LLM mode, role heuristics now only run if the model did not provide any role suggestion.
If
primary_rolesorprimary_roleis present from the LLM, we keep only normalized LLM roles and do not backfill extra heuristic roles.Added a regression test that verifies LLM-suggested roles are not expanded by heuristic inference during extraction.
Related Issue
None provided.
How Has This Been Tested?
Pre-commit checks ran during commit (
ruff,ruff format, andmypy) and passed.