Skip to content

MukulRay1603/irminsul-corpus

Repository files navigation

irminsul-corpus

Autonomous knowledge base pipeline for Irminsul — a production RAG assistant for Genshin Impact.

Deterministic corpus builder that pulls peer-reviewed theorycrafting (KQM TCL), exact game stats (genshin-db API), and canonical lore (Fandom Wiki). Runs weekly via GitHub Actions. Each vector carries 7 metadata fields for smart filtering by tier, character, content type, and trust score.


Corpus

Tier Source Files Trust
TCL KQM Theorycrafting Library 305 Ground truth
Structured genshin-db API 405 Ground truth
Wiki Genshin Impact Fandom Wiki 130 Canonical

Total: 840 files · 6,876 Pinecone vectors · 7 metadata fields per chunk


Pipeline

discover → tcl → structured → wiki → ingest

Runs every Sunday 2am UTC via GitHub Actions. Auto-discovers new characters when patches drop. Incremental updates only.


Architecture

  • Smart name resolver with persistent cache — handles full wiki names automatically
  • Incremental timestamp-based updates — only re-fetches pages changed since last run
  • Self-healing structured data — validate_file() detects broken API fields and auto-re-fetches

Setup

git clone https://github.com/MukulRay1603/irminsul-corpus
cd irminsul-corpus
pip install -r requirements.txt
cp .env.example .env

Required secrets (GitHub Actions):

  • PINECONE_API_KEY
  • PINECONE_INDEX

Optional (local only):

  • GEMINI_API_KEY (legacy, not used in pipeline)

Run:

python run_pipeline.py

Attribution & Legal

About

scrape the irminsul's corpus. Teyvat's Lore - Genshin Impact

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages