River Algorithm — AI Chat History Edition

River Algorithm is a personal digital profile weighting algorithm for local AI systems.

Existing AI memory systems (ChatGPT Memory, Claude Memory, etc.) are essentially flat lists: a handful of facts with no temporal dimension, no confidence levels, no contradiction detection. Memories live in the cloud, owned by the platform — switch providers and you start from zero. The River Algorithm is fundamentally different — conversations flow like water, key information settles like sediment into profiles, progressively upgrading from "suspected" to "confirmed" to "established" through multi-turn verification. Offline consolidation (Sleep) acts as the river's self-purification: washing away outdated information, resolving contradictions, making cognition clearer over time. All data is stored locally, owned by you, aggregated across platforms — never lost when you switch AI providers. The River Algorithm is designed to grow: the more you talk, the more local data accumulates, and the deeper the AI understands you.

This project is a special edition of the River Algorithm, focused on batch-extracting personal profiles from your ChatGPT / Claude / Gemini conversation history — personality, preferences, experiences, relationships, life trajectory. Every conversation you've had with AI is a piece of the real you. This data is invaluable: past conversations record who you were, and the past is fact. The future builds on the present.

Shares the same database with the Riverse main project. Use this project to populate your historical profile first, then start real-time conversations with Riverse — your AI knows you from day one.

Note: No LLM today is specifically trained or fine-tuned for personal profile extraction, so results will vary across models — some hallucinations are inevitable. Also, since historical conversations were not conducted through the River Algorithm's conversation module, they lack real-time context awareness and multi-turn verification — historical profiles are for reference only and are less accurate than profiles built through live Riverse conversations. If you spot anything inaccurate, you can close or reject it directly in the web viewer without affecting other data. Feel free to open an Issue — I'm continuously improving extraction quality.

Cost warning: When using a remote LLM API (OpenAI, Anthropic, etc.), conversations with lots of code or very long messages can consume significant tokens. Review and clean your export data before running. Local models (Ollama) are free.

What's New (May 2026)

Multi-owner support — every business table now carries an owner_id. When the database is shared with JKRiver's family-mode setup, RiverHistory can process imports for one family member at a time. Pick the target account with --owner-name <name> on run.py. Default single-account installs are unchanged.
hypotheses table dropped — its lifecycle was folded into user_profile.layer ('suspected' → 'confirmed'). Storage code now uses save_profile_fact() exclusively.
Prompts moved — multilingual prompts are now in agent/config/prompts/{zh,en,ja}.yaml (loaded via agent.config.prompts.get_prompt). The old agent/core/sleep_prompts.py is removed.
Sleep pipeline split — agent/sleep/orchestration.py now delegates to four step modules (steps_extract.py, steps_analyze.py, steps_maintain.py, steps_output.py). Public entry point run(owner_id=...) is unchanged for CLI callers.
Admin account seeded on init — setup_db.py creates an accounts row at id=1 with name taken from settings.yaml.admin_name or your OS user (whoami). Idempotent — re-running is safe.

Features

Import your locally exported ChatGPT / Claude / Gemini conversation history into the database
LLM-powered profile extraction (remote LLM API or local Ollama)
Contradiction detection & timeline tracking
Monthly snapshot viewer
Relationship mapping
Local web viewer (Chinese / English / Japanese)

Quick Try with Docker

Don't want to install Python or PostgreSQL? Run with Docker — includes demo data, supports OpenAI / DeepSeek / Groq.

Prerequisites

Python 3.11 or 3.12
PostgreSQL
LLM API Key (e.g. OpenAI, Anthropic) or local Ollama

Quick Start (from source)

# 1. Clone the repository
git clone https://github.com/wangjiake/RiverHistory.git
cd RiverHistory

# 2. Create virtual environment and install dependencies
python3 -m venv .venv
source .venv/bin/activate        # macOS / Linux
# .venv\Scripts\activate         # Windows

pip install -r requirements.txt

# 3. Configure
# Edit settings.yaml:
#   - database.user: change to your PostgreSQL username
#     macOS Homebrew is usually your system username (run whoami in terminal)
#     Linux/Windows is usually postgres
#   - openai.api_key: enter your API key (or set llm_provider to "local" for Ollama)

# 4. Initialize database
# This creates all tables needed for both this project and the Riverse main project.
# If you have already run Riverse's schema.sql, you can skip this step.
python setup_db.py --db Riverse

# 5. Import conversation data
# Place your export files in data/ (see data/README.md for details)
python import_data.py --chatgpt data/ChatGPT/conversations.json
python import_data.py --claude data/Claude/conversations.json
python import_data.py --gemini "data/Gemini/My Activity.html"
# Note: The Gemini export filename varies by language. Adjust the filename accordingly.

# 6. Run profile extraction
#    Format: python run.py <source> <count> [--owner-name <name>]
#    source:       chatgpt / claude / gemini / all
#    count:        a number = process N conversations starting from the oldest
#                  max     = process all conversations
#    --owner-name: which family member (account) to write the extracted data under.
#                  Optional — auto-selected if only one account exists.
#                  Required if the DB has multiple accounts (e.g. shared with JKRiver in family mode).
#    All commands process conversations in chronological order (oldest first)

python run.py chatgpt 50                          # ChatGPT only, 50 oldest, auto-pick owner
python run.py claude max --owner-name jk          # Claude all, written under 'jk'
python run.py gemini 100 --owner-name wife        # Gemini 100 oldest, written under 'wife'
python run.py all max                              # All 3 sources mixed by time, auto-pick owner

# 7. View results
python web.py --db Riverse
# Open http://localhost:2345 in your browser

# Optional: protect with an access token (recommended for server deployment)
ACCESS_TOKEN=your-secret python web.py --db Riverse
# First visit will redirect to an unlock page — enter the token to access

Note: run.py no longer clears profile tables automatically. To start over, run python reset_db.py first (it leaves your imported source data — chatgpt/claude/gemini/demo tables — untouched).

No Chat Data? Try the Demo

The project includes built-in test data, so you can experience the full workflow without exporting your own AI chat history:

Dataset	Character	Language	Sessions	Command
`--demo`	Lin Yutong	Chinese	50	`python import_data.py --demo`
`--demo2`	Shen Yifan	Chinese	15	`python import_data.py --demo2`
`--demo3`	Jake Morrison	English	20	`python import_data.py --demo3`

--demo2 and --demo3 clear the demo table before importing.

python setup_db.py                  # Create database and tables
python import_data.py --demo        # Import demo test data (or --demo2 / --demo3)
python run.py demo max              # Process all demo conversations
python web.py --db Riverse        # View the extracted profile

Reset Profile Data

Clear all processing and profile tables while keeping imported source data (chatgpt/claude/gemini/demo tables are not affected):

python reset_db.py                  # Clear profile data, keep source data
python reset_db.py --db mydb        # Specify database name

Exporting Conversations

Platform	Steps
ChatGPT	Settings → Data controls → Export data → Extract `conversations.json`
Claude	Settings → Account → Export Data → Extract `conversations.json`
Gemini	Google Takeout → Select Gemini Apps → Put `Gemini Apps` folder into `data/`

LLM Configuration

OpenAI API (recommended): Set llm_provider: "openai" in settings.yaml and enter your API key.

Local Ollama: Install Ollama, pull a model with ollama pull qwen2.5:14b, and set llm_provider: "local".

Prompt language: Set the language field in settings.yaml. Supported values: "zh" (Chinese), "en" (English), "ja" (Japanese). This controls the language of LLM prompts, not the web interface.

Project Structure

├── settings.yaml          # LLM and database configuration
├── setup_db.py          # Initialize database and tables
├── import_data.py       # Import conversation exports into database
├── run.py               # Run profile extraction (perceive + sleep)
├── web.py               # Local web viewer (Flask, port 2345)
├── reset_db.py          # Clear profile tables, keep source data
├── requirements.txt     # Python dependencies
├── data/                # Conversation export files (git-ignored)
│   ├── demo.json        # Demo: Lin Yutong (Chinese, 50 sessions)
│   ├── demo2.json       # Demo: Shen Yifan (Chinese, 15 sessions)
│   └── demo3.json       # Demo: Jake Morrison (English, 20 sessions)
├── agent/
│   ├── perceive.py      # Perception module — classify user input
│   ├── config/
│   │   ├── __init__.py  # settings.yaml loader
│   │   ├── owner.py     # --owner-name → owner_id resolution (reads `accounts`)
│   │   ├── prompts.py   # Multilingual prompt loader
│   │   └── prompts/     # zh.yaml / en.yaml / ja.yaml prompt strings
│   ├── storage/         # Database operations (modular subpackage, owner_id-aware)
│   │   ├── _db.py       # Connection & helpers
│   │   ├── profile.py   # Profile facts CRUD (replaces old hypotheses module)
│   │   ├── observations.py, events.py, conversation.py, ...
│   │   └── parsing.py   # History format parsers (Claude/ChatGPT/Gemini)
│   ├── utils/           # LLM client, embedding, clustering
│   └── sleep/           # Offline extraction pipeline (modular subpackage)
│       ├── orchestration.py  # run(owner_id) — single-owner entry point
│       ├── _pipeline_state.py # Shared state struct (carries owner_id)
│       ├── steps_extract.py  # Step 1-2: extract observations + tags
│       ├── steps_analyze.py  # Step 3-5: classify, behavior, cross-verify
│       ├── steps_maintain.py # Step 6-7: edges, expiry, maturity decay
│       ├── steps_output.py   # Step 8-end: user_model, trajectory, snapshot
│       ├── extractors.py     # LLM-driven extraction helpers
│       ├── analysis.py       # LLM-driven analysis helpers
│       ├── disputes.py       # Contradiction resolution
│       └── trajectory.py     # Life trajectory summary
└── templates/
    └── profile.html     # Web viewer template

License

License	Usage
AGPL-3.0	Open source, modifications must be open-sourced
Commercial	Contact: mailwangjk@gmail.com

Contact

X (Twitter): @JKRiverse
Discord: Join
Email: mailwangjk@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

River Algorithm — AI Chat History Edition

What's New (May 2026)

Features

Quick Try with Docker

Prerequisites

Quick Start (from source)

No Chat Data? Try the Demo

Reset Profile Data

Exporting Conversations

LLM Configuration

Project Structure

License

Contact

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
agent		agent
data		data
img		img
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_ja.md		README_ja.md
README_zh.md		README_zh.md
import_data.py		import_data.py
requirements.txt		requirements.txt
reset_db.py		reset_db.py
run.py		run.py
settings.yaml.default		settings.yaml.default
setup_db.py		setup_db.py
web.py		web.py

Folders and files

Latest commit

History

Repository files navigation

River Algorithm — AI Chat History Edition

What's New (May 2026)

Features

Quick Try with Docker

Prerequisites

Quick Start (from source)

No Chat Data? Try the Demo

Reset Profile Data

Exporting Conversations

LLM Configuration

Project Structure

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages