What happened:
I tried to run examples/government_rag following its README and it is currently impossible for a new user to execute.
Problem 1 — no dependency manifest
There is no requirements.txt and the README has no install instructions. The dependency set (langchain, chromadb, sentence-transformers, unstructured, …) has to be reverse-engineered from import errors one at a time.
Problem 2 — import-time failure on current LangChain
With current LangChain installed (pip install langchain langchain-community, tested on langchain 1.3.8 / langchain-community 0.4.2, Python 3.12):
FAIL : from langchain.text_splitter import RecursiveCharacterTextSplitter -> ModuleNotFoundError: No module named 'langchain.text_splitter'
FAIL : from langchain.chains import RetrievalQA -> ModuleNotFoundError: No module named 'langchain.chains'
langchain.text_splitter moved to langchain_text_splitters, and RetrievalQA is a legacy chain that no longer ships in langchain 1.x. Notably, RetrievalQA and HuggingFacePipeline are imported but never used — the example crashes on dead imports. langchain-community itself is sunset (langchain-ai/langchain-community#674), and the code also uses vector_store.persist() (removed in langchain-chroma) and the deprecated get_relevant_documents().
Problem 3 — hardcoded device, credentials and paths in basemodel.py / gov_rag.py
- module-level
device = "cuda", no fallback — fails on CPU-only and Apple Silicon machines
- default backend is qianfan with literal placeholders in the OAuth URL:
client_id=[应用API Key]&client_secret=[应用Secret Key]; the DeepSeek backend has api_key="<DeepSeek API Key>", siliconflow has Bearer <token> — none configurable without editing source
- embedding model hardcoded to a personal path
/home/icyfeather/models/bge-m3
base_path defaults to the placeholder /path/ianvs/dataset/gov_rag
Impact:
The example is 100% non-runnable from a fresh checkout: import crash (P2), then guaranteed runtime failures (P3), with no dependency list to even get started (P1). RAG/LLM examples are exactly what new SIG-AI contributors reach for first, so this feeds the confused-user issue stream described in #230. The drift gets worse monthly since langchain-community is unmaintained.
Fix (example-only, no ianvs core changes):
- Add
requirements.txt with maintained packages (langchain-chroma, langchain-huggingface, langchain-text-splitters).
- Migrate imports, drop the dead ones, replace
persist() and get_relevant_documents().
- Auto-pick device (cuda/mps/cpu); read backend + API keys from env vars with clear errors; fix the embedding-model and base_path defaults.
- README install/configuration section.
I have this working locally — knowledge-base build, persist reload, province filtering and retrieval all verified against a synthetic corpus (the Kaggle GovAff dataset itself is unchanged). PR incoming.
Environment: macOS ARM64, Python 3.12.13, ianvs @ bf0f596.
What happened:
I tried to run
examples/government_ragfollowing its README and it is currently impossible for a new user to execute.Problem 1 — no dependency manifest
There is no
requirements.txtand the README has no install instructions. The dependency set (langchain, chromadb, sentence-transformers, unstructured, …) has to be reverse-engineered from import errors one at a time.Problem 2 — import-time failure on current LangChain
With current LangChain installed (
pip install langchain langchain-community, tested on langchain 1.3.8 / langchain-community 0.4.2, Python 3.12):langchain.text_splittermoved tolangchain_text_splitters, andRetrievalQAis a legacy chain that no longer ships in langchain 1.x. Notably,RetrievalQAandHuggingFacePipelineare imported but never used — the example crashes on dead imports.langchain-communityitself is sunset (langchain-ai/langchain-community#674), and the code also usesvector_store.persist()(removed in langchain-chroma) and the deprecatedget_relevant_documents().Problem 3 — hardcoded device, credentials and paths in
basemodel.py/gov_rag.pydevice = "cuda", no fallback — fails on CPU-only and Apple Silicon machinesclient_id=[应用API Key]&client_secret=[应用Secret Key]; the DeepSeek backend hasapi_key="<DeepSeek API Key>", siliconflow hasBearer <token>— none configurable without editing source/home/icyfeather/models/bge-m3base_pathdefaults to the placeholder/path/ianvs/dataset/gov_ragImpact:
The example is 100% non-runnable from a fresh checkout: import crash (P2), then guaranteed runtime failures (P3), with no dependency list to even get started (P1). RAG/LLM examples are exactly what new SIG-AI contributors reach for first, so this feeds the confused-user issue stream described in #230. The drift gets worse monthly since langchain-community is unmaintained.
Fix (example-only, no ianvs core changes):
requirements.txtwith maintained packages (langchain-chroma, langchain-huggingface, langchain-text-splitters).persist()andget_relevant_documents().I have this working locally — knowledge-base build, persist reload, province filtering and retrieval all verified against a synthetic corpus (the Kaggle GovAff dataset itself is unchanged). PR incoming.
Environment: macOS ARM64, Python 3.12.13, ianvs @ bf0f596.