Skip to content

government_rag example cannot be executed: missing requirements, LangChain import failures, hardcoded credentials/device/paths #535

Description

@veyron-kairo

What happened:

I tried to run examples/government_rag following its README and it is currently impossible for a new user to execute.

Problem 1 — no dependency manifest

There is no requirements.txt and the README has no install instructions. The dependency set (langchain, chromadb, sentence-transformers, unstructured, …) has to be reverse-engineered from import errors one at a time.

Problem 2 — import-time failure on current LangChain

With current LangChain installed (pip install langchain langchain-community, tested on langchain 1.3.8 / langchain-community 0.4.2, Python 3.12):

FAIL : from langchain.text_splitter import RecursiveCharacterTextSplitter -> ModuleNotFoundError: No module named 'langchain.text_splitter'
FAIL : from langchain.chains import RetrievalQA -> ModuleNotFoundError: No module named 'langchain.chains'

langchain.text_splitter moved to langchain_text_splitters, and RetrievalQA is a legacy chain that no longer ships in langchain 1.x. Notably, RetrievalQA and HuggingFacePipeline are imported but never used — the example crashes on dead imports. langchain-community itself is sunset (langchain-ai/langchain-community#674), and the code also uses vector_store.persist() (removed in langchain-chroma) and the deprecated get_relevant_documents().

Problem 3 — hardcoded device, credentials and paths in basemodel.py / gov_rag.py

  • module-level device = "cuda", no fallback — fails on CPU-only and Apple Silicon machines
  • default backend is qianfan with literal placeholders in the OAuth URL: client_id=[应用API Key]&client_secret=[应用Secret Key]; the DeepSeek backend has api_key="<DeepSeek API Key>", siliconflow has Bearer <token> — none configurable without editing source
  • embedding model hardcoded to a personal path /home/icyfeather/models/bge-m3
  • base_path defaults to the placeholder /path/ianvs/dataset/gov_rag

Impact:

The example is 100% non-runnable from a fresh checkout: import crash (P2), then guaranteed runtime failures (P3), with no dependency list to even get started (P1). RAG/LLM examples are exactly what new SIG-AI contributors reach for first, so this feeds the confused-user issue stream described in #230. The drift gets worse monthly since langchain-community is unmaintained.

Fix (example-only, no ianvs core changes):

  1. Add requirements.txt with maintained packages (langchain-chroma, langchain-huggingface, langchain-text-splitters).
  2. Migrate imports, drop the dead ones, replace persist() and get_relevant_documents().
  3. Auto-pick device (cuda/mps/cpu); read backend + API keys from env vars with clear errors; fix the embedding-model and base_path defaults.
  4. README install/configuration section.

I have this working locally — knowledge-base build, persist reload, province filtering and retrieval all verified against a synthetic corpus (the Kaggle GovAff dataset itself is unchanged). PR incoming.

Environment: macOS ARM64, Python 3.12.13, ianvs @ bf0f596.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions