Skip to content

[daily-ai-review] Potential classifier false negatives - relevant articles rejected #153

@github-actions

Description

@github-actions

The classifier rejected several articles that appear highly relevant to the dataset scope:

Potentially relevant articles classified as 'not_relevant':

  1. Walla article about prostitution enforcement statistics: "עלייה במאות אחוזים באכיפת החוק לאיסור הזנות בישראל בשנת 2025" - This directly discusses prostitution law enforcement, which should be core to this dataset.

  2. Mako articles with relevant content:

    • "שדדו 35 אלף שקלים מקורבן זנות בתל"א" - Robbery of prostitution victim
    • "גברים במעגל הזנות" - Men in prostitution circles
    • "בלב שכונת מגורים: מה שמצאו השוטרים בחדרה" - Police operation that resulted in arrests (classified with low confidence)

Evidence: Out of 74 unseen articles, 0 were classified as relevant despite several having titles/snippets directly matching the dataset scope (prostitution, trafficking, enforcement).

Next steps:

  • Review classifier prompt/training for false negative bias
  • Manually verify a sample of the rejected articles
  • Consider if classification criteria are too restrictive

Review context

  • Run timestamp: 2026-05-18T08:08:14.877167+00:00

  • Run snapshot: state_repo/news_items/ingest/runs/2026-05-18T08-08-14-877167Z.json

  • Debug summary: state_repo/news_items/ingest/logs/2026-05-18T08-08-14-877167Z.summary.json

  • Debug log: state_repo/news_items/ingest/logs/2026-05-18T08-08-14-877167Z.json

  • Workflow run: https://github.com/DataHackIL/tfht_enforce_idx/actions/runs/26021317412

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions