Summary
Build a RAG-style QA system over the news corpus so users can ask questions (e.g. “What happened to Company X?”) and receive synthesized, context-grounded answers.
Motivation
- Provides a powerful, natural-language interface to the article database.
- end-to-end retrieval, prompt construction, and LLM integration.
- Lays groundwork for advanced analytics and conversational features.
Scope
None
Acceptance Criteria
Additional Context
Details
- Category: nlp
- Priority: P1
- Estimate: 3d
- Dependencies:
- Embedding & vector-store pipeline in place
- Database connection module (
nlp/db.py)
- HF model weights available locally or via Hugging Face Hub
Tasks
- Add dependencies
- Add
sentence-transformers, faiss-cpu, and transformers to /nlp/requirements.txt.
- Core function signatures (
/nlp/core.py)
def retrieve_context(question: str, top_k: int = 5) -> List[str]
def generate_answer(question: str, contexts: List[str]) -> str
- Celery task hook (
/nlp/tasks.py)
@app.task def qa_task(question: str, top_k: int = 5) -> str
- Should call
retrieve_context() then generate_answer().
- CLI entrypoint (
/nlp/cli.py)
python -m nlp.cli qa --question="What happened to Company X?" --top-k=5
- Tests & documentation
- Retrieval test (
/nlp/tests/test_retrieval.py): assert retrieve_context() returns at least top_k snippets.
- Generation test (
/nlp/tests/test_generation.py): with sample contexts, assert generate_answer() returns a non-empty string.
- Task test (
/nlp/tests/test_qa_task.py): mock core functions, assert qa_task() returns the expected answer.
- Update
/nlp/README.md with installation steps, vector-store setup, Celery usage, and CLI example.
Summary
Build a RAG-style QA system over the news corpus so users can ask questions (e.g. “What happened to Company X?”) and receive synthesized, context-grounded answers.
Motivation
Scope
None
Acceptance Criteria
retrieve_context(question) returns at least top_k context snippets
generate_answer(question, contexts) returns a coherent answer string
qa_task(question) executes end-to-end and returns the generated answer
CLI qa command runs without errors and prints the answer
All tests pass in CI and README clearly documents the HF-based RAG workflow
Additional Context
Details
nlp/db.py)Tasks
sentence-transformers,faiss-cpu, andtransformersto/nlp/requirements.txt./nlp/core.py)def retrieve_context(question: str, top_k: int = 5) -> List[str]def generate_answer(question: str, contexts: List[str]) -> str/nlp/tasks.py)@app.task def qa_task(question: str, top_k: int = 5) -> strretrieve_context()thengenerate_answer()./nlp/cli.py)python -m nlp.cli qa --question="What happened to Company X?" --top-k=5
/nlp/tests/test_retrieval.py): assertretrieve_context()returns at leasttop_ksnippets./nlp/tests/test_generation.py): with sample contexts, assertgenerate_answer()returns a non-empty string./nlp/tests/test_qa_task.py): mock core functions, assertqa_task()returns the expected answer./nlp/README.mdwith installation steps, vector-store setup, Celery usage, and CLI example.