fix: stop heuristic role expansion after LLM role suggestion by michaelmwu · Pull Request #203 · 508-dev/508-workflows

michaelmwu · 2026-03-09T20:32:00Z

Description

When resume extraction is running in LLM mode, role heuristics now only run if the model did not provide any role suggestion.
If primary_roles or primary_role is present from the LLM, we keep only normalized LLM roles and do not backfill extra heuristic roles.
Added a regression test that verifies LLM-suggested roles are not expanded by heuristic inference during extraction.

Related Issue

None provided.

How Has This Been Tested?

Pre-commit checks ran during commit (ruff, ruff format, and mypy) and passed.

coderabbitai · 2026-03-09T20:32:18Z

Warning

Rate limit exceeded

@michaelmwu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e38f148-e3f6-415d-82eb-171680c2a719

📥 Commits

Reviewing files that changed from the base of the PR and between a1351ad and dd02aab.

📒 Files selected for processing (2)

packages/shared/src/five08/resume_extractor.py
tests/unit/test_resume_extractor.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch michaelmwu/workflow-tasks-setup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adjusts resume extraction behavior in LLM mode so heuristic role inference does not run once the model has provided a role suggestion, preventing post-processing from expanding/overriding LLM-provided roles.

Changes:

Gate heuristic role inference behind a new “LLM provided role suggestion” check.
Preserve normalized LLM role output even when it normalizes to an empty list (instead of backfilling from heuristics).
Add a regression unit test intended to ensure heuristic role inference doesn’t expand LLM-suggested roles.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/unit/test_resume_extractor.py	Adds a regression test for “no heuristic role backfill after LLM role suggestion”.
packages/shared/src/five08/resume_extractor.py	Updates LLM extraction normalization to skip heuristic role inference when the LLM supplied a role suggestion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T20:36:13Z

tests/unit/test_resume_extractor.py

+
+    result = extractor.extract("Jane Doe\nSoftware Engineer")
+
+    assert result.primary_roles == []


This assertion will likely fail: the LLM payload provides primary_roles: ["platform specialist"], and _normalize_role_collection() keeps non-empty strings even when they don't map to a canonical role (it falls back to an alphanumeric normalized string). With the new extractor logic, result.primary_roles should remain the normalized LLM value (e.g. ["platform specialist"]), not []. If the intent is to regression-test that heuristics don't add roles like developer, assert that "developer" is not present (or adjust the fake LLM output to something like "/" that normalizes to an empty list).

Suggested change

assert result.primary_roles == []

# LLM-suggested roles should be preserved, and heuristic roles like "developer"

# should not be added on top.

assert result.primary_roles == ["platform specialist"]

assert "developer" not in result.primary_roles

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T21:07:10Z

packages/shared/src/five08/resume_extractor.py

+            llm_provided_role_suggestion = False
+            if isinstance(parsed_primary_roles_raw, str):
+                llm_provided_role_suggestion = bool(parsed_primary_roles_raw.strip())
+            elif isinstance(parsed_primary_roles_raw, (list, tuple)):
+                llm_provided_role_suggestion = any(
+                    isinstance(item, str) and bool(item.strip())
+                    for item in parsed_primary_roles_raw
+                )


llm_provided_role_suggestion is derived from the raw primary_roles/primary_role value. For string inputs that are non-whitespace but normalize to an empty role list (e.g. a comma-only value like "," which normalize_roles splits into empty tokens), this will incorrectly skip heuristic inference and return primary_roles=[] (previously heuristics would have run because the normalized list was empty). Consider basing the decision on the normalized parsed_primary_roles (or otherwise treating punctuation-only / empty-token strings as “no suggestion”) so heuristics still run when no actual roles can be parsed.

Suggested change

llm_provided_role_suggestion = False

if isinstance(parsed_primary_roles_raw, str):

llm_provided_role_suggestion = bool(parsed_primary_roles_raw.strip())

elif isinstance(parsed_primary_roles_raw, (list, tuple)):

llm_provided_role_suggestion = any(

isinstance(item, str) and bool(item.strip())

for item in parsed_primary_roles_raw

)

llm_provided_role_suggestion = bool(parsed_primary_roles)

Copilot · 2026-03-09T21:07:11Z

packages/shared/src/five08/resume_extractor.py

+            parsed_primary_roles_raw = parsed.get("primary_roles")
+            if not parsed_primary_roles_raw:
+                parsed_primary_roles_raw = parsed.get("primary_role")
+            parsed_primary_roles = _normalize_role_collection(parsed_primary_roles_raw)


This change adds support for honoring the legacy primary_role field (via the new fallback logic), but there isn’t a unit test exercising the primary_role path (only primary_roles). Adding a regression test where the LLM returns primary_role (string) and verifying heuristic role inference does not expand it would help prevent future regressions.

fix: skip heuristic role backfill for llm roles

ad09db9

Copilot AI review requested due to automatic review settings March 9, 2026 20:32

michaelmwu had a problem deploying to test March 9, 2026 20:32 — with GitHub Actions Failure

Copilot started reviewing on behalf of michaelmwu March 9, 2026 20:32 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

test: fix llm role suggestion expectation

b1ac432

michaelmwu temporarily deployed to test March 9, 2026 21:02 — with GitHub Actions Inactive

test: assert no heuristic developer role add

5ab00d8

Copilot AI review requested due to automatic review settings March 9, 2026 21:04

michaelmwu temporarily deployed to test March 9, 2026 21:04 — with GitHub Actions Inactive

Copilot started reviewing on behalf of michaelmwu March 9, 2026 21:04 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

fix: gate role heuristics on normalized llm roles

dd02aab

michaelmwu temporarily deployed to test March 9, 2026 21:11 — with GitHub Actions Inactive

michaelmwu merged commit d98882c into main Mar 9, 2026
5 checks passed

michaelmwu deleted the michaelmwu/workflow-tasks-setup branch March 9, 2026 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop heuristic role expansion after LLM role suggestion#203

fix: stop heuristic role expansion after LLM role suggestion#203
michaelmwu merged 4 commits intomainfrom
michaelmwu/workflow-tasks-setup

michaelmwu commented Mar 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		result = extractor.extract("Jane Doe\nSoftware Engineer")

		assert result.primary_roles == []

-    assert result.primary_roles == []
+    # LLM-suggested roles should be preserved, and heuristic roles like "developer"
+    # should not be added on top.
+    assert result.primary_roles == ["platform specialist"]
+    assert "developer" not in result.primary_roles

Conversation

michaelmwu commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

How Has This Been Tested?

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michaelmwu commented Mar 9, 2026 •

edited

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading