Skip to content

fix: stop heuristic role expansion after LLM role suggestion#203

Merged
michaelmwu merged 4 commits intomainfrom
michaelmwu/workflow-tasks-setup
Mar 9, 2026
Merged

fix: stop heuristic role expansion after LLM role suggestion#203
michaelmwu merged 4 commits intomainfrom
michaelmwu/workflow-tasks-setup

Conversation

@michaelmwu
Copy link
Member

@michaelmwu michaelmwu commented Mar 9, 2026

Description

When resume extraction is running in LLM mode, role heuristics now only run if the model did not provide any role suggestion.
If primary_roles or primary_role is present from the LLM, we keep only normalized LLM roles and do not backfill extra heuristic roles.
Added a regression test that verifies LLM-suggested roles are not expanded by heuristic inference during extraction.

Related Issue

None provided.

How Has This Been Tested?

Pre-commit checks ran during commit (ruff, ruff format, and mypy) and passed.

Copilot AI review requested due to automatic review settings March 9, 2026 20:32
@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Warning

Rate limit exceeded

@michaelmwu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e38f148-e3f6-415d-82eb-171680c2a719

📥 Commits

Reviewing files that changed from the base of the PR and between a1351ad and dd02aab.

📒 Files selected for processing (2)
  • packages/shared/src/five08/resume_extractor.py
  • tests/unit/test_resume_extractor.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch michaelmwu/workflow-tasks-setup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts resume extraction behavior in LLM mode so heuristic role inference does not run once the model has provided a role suggestion, preventing post-processing from expanding/overriding LLM-provided roles.

Changes:

  • Gate heuristic role inference behind a new “LLM provided role suggestion” check.
  • Preserve normalized LLM role output even when it normalizes to an empty list (instead of backfilling from heuristics).
  • Add a regression unit test intended to ensure heuristic role inference doesn’t expand LLM-suggested roles.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tests/unit/test_resume_extractor.py Adds a regression test for “no heuristic role backfill after LLM role suggestion”.
packages/shared/src/five08/resume_extractor.py Updates LLM extraction normalization to skip heuristic role inference when the LLM supplied a role suggestion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


result = extractor.extract("Jane Doe\nSoftware Engineer")

assert result.primary_roles == []
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion will likely fail: the LLM payload provides primary_roles: ["platform specialist"], and _normalize_role_collection() keeps non-empty strings even when they don't map to a canonical role (it falls back to an alphanumeric normalized string). With the new extractor logic, result.primary_roles should remain the normalized LLM value (e.g. ["platform specialist"]), not []. If the intent is to regression-test that heuristics don't add roles like developer, assert that "developer" is not present (or adjust the fake LLM output to something like "/" that normalizes to an empty list).

Suggested change
assert result.primary_roles == []
# LLM-suggested roles should be preserved, and heuristic roles like "developer"
# should not be added on top.
assert result.primary_roles == ["platform specialist"]
assert "developer" not in result.primary_roles

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 9, 2026 21:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2337 to +2344
llm_provided_role_suggestion = False
if isinstance(parsed_primary_roles_raw, str):
llm_provided_role_suggestion = bool(parsed_primary_roles_raw.strip())
elif isinstance(parsed_primary_roles_raw, (list, tuple)):
llm_provided_role_suggestion = any(
isinstance(item, str) and bool(item.strip())
for item in parsed_primary_roles_raw
)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_provided_role_suggestion is derived from the raw primary_roles/primary_role value. For string inputs that are non-whitespace but normalize to an empty role list (e.g. a comma-only value like "," which normalize_roles splits into empty tokens), this will incorrectly skip heuristic inference and return primary_roles=[] (previously heuristics would have run because the normalized list was empty). Consider basing the decision on the normalized parsed_primary_roles (or otherwise treating punctuation-only / empty-token strings as “no suggestion”) so heuristics still run when no actual roles can be parsed.

Suggested change
llm_provided_role_suggestion = False
if isinstance(parsed_primary_roles_raw, str):
llm_provided_role_suggestion = bool(parsed_primary_roles_raw.strip())
elif isinstance(parsed_primary_roles_raw, (list, tuple)):
llm_provided_role_suggestion = any(
isinstance(item, str) and bool(item.strip())
for item in parsed_primary_roles_raw
)
llm_provided_role_suggestion = bool(parsed_primary_roles)

Copilot uses AI. Check for mistakes.
Comment on lines +2333 to +2336
parsed_primary_roles_raw = parsed.get("primary_roles")
if not parsed_primary_roles_raw:
parsed_primary_roles_raw = parsed.get("primary_role")
parsed_primary_roles = _normalize_role_collection(parsed_primary_roles_raw)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds support for honoring the legacy primary_role field (via the new fallback logic), but there isn’t a unit test exercising the primary_role path (only primary_roles). Adding a regression test where the LLM returns primary_role (string) and verifying heuristic role inference does not expand it would help prevent future regressions.

Copilot uses AI. Check for mistakes.
@michaelmwu michaelmwu merged commit d98882c into main Mar 9, 2026
5 checks passed
@michaelmwu michaelmwu deleted the michaelmwu/workflow-tasks-setup branch March 9, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants