Summary
Implement a Hugging Face token‐classification pipeline to perform NER on normalized news articles, with both a Celery task and a CLI wrapper calling a shared core library.
Motivation
- Transformer‐based NER models (e.g.
dslim/bert-base-NER) deliver higher accuracy on diverse news text.
- organizing code into reusable functions, Celery tasks, and CLI commands.
- Enables both scheduled/asynchronous processing and on-demand debugging.
Scope
None
Acceptance Criteria
Additional Context
- Dependencies:
- Articles normalized and stored in the database
- Database connection module (
nlp/db.py) available
- Redis (or other) broker configured for Celery
Architecture
- Core logic (
nlp/core.py)
run_ner_hf(text: str) → List[Entity]
process_article(article_id: str, db) → List[Entity]
- Celery task (
nlp/tasks.py)
@app.task def ner_task(article_id: str) → process_article(article_id, db)
- Hook into Celery Beat schedule for periodic batch runs or call via
ner_task.delay(id)
- CLI wrapper (
nlp/cli.py)
- Thin
click or argparse command that calls process_article() and prints summary
- Executable via
python -m nlp.cli --article-id=<id>
Tasks
- Add dependencies
- Install
transformers, torch, celery, click (update /nlp/requirements.txt)
- Core module
- Create
/nlp/core.py with run_ner_hf() and process_article() as described
- Celery integration
- Create
/nlp/tasks.py with a ner_task Celery task
- Configure broker URL and include a sample beat schedule entry in project docs
- CLI command
- Create
/nlp/cli.py with a ner_cli command (using click or argparse)
- Add entry to
/nlp/README.md showing both Celery and CLI usage
- Unit tests
- In
/nlp/tests/test_core.py, test run_ner_hf() on sample text
- In
/nlp/tests/test_tasks.py, mock db to verify ner_task calls process_article()
- In
/nlp/tests/test_cli.py, invoke CLI with a dummy --article-id and assert exit code 0
- Documentation
- Update
/nlp/README.md with:
- Installation steps
- How to run
ner_task via Celery
- How to invoke
python -m nlp.cli
- Sample Celery Beat schedule snippet
Summary
Implement a Hugging Face token‐classification pipeline to perform NER on normalized news articles, with both a Celery task and a CLI wrapper calling a shared core library.
Motivation
dslim/bert-base-NER) deliver higher accuracy on diverse news text.Scope
None
Acceptance Criteria
Additional Context
nlp/db.py) availableArchitecture
nlp/core.py)run_ner_hf(text: str) → List[Entity]process_article(article_id: str, db) → List[Entity]nlp/tasks.py)@app.task def ner_task(article_id: str) → process_article(article_id, db)ner_task.delay(id)nlp/cli.py)clickorargparsecommand that callsprocess_article()and prints summarypython -m nlp.cli --article-id=<id>Tasks
transformers,torch,celery,click(update/nlp/requirements.txt)/nlp/core.pywithrun_ner_hf()andprocess_article()as described/nlp/tasks.pywith aner_taskCelery task/nlp/cli.pywith aner_clicommand (usingclickorargparse)/nlp/README.mdshowing both Celery and CLI usage/nlp/tests/test_core.py, testrun_ner_hf()on sample text/nlp/tests/test_tasks.py, mockdbto verifyner_taskcallsprocess_article()/nlp/tests/test_cli.py, invoke CLI with a dummy--article-idand assert exit code 0/nlp/README.mdwith:ner_taskvia Celerypython -m nlp.cli