Skip to content

fix(rag): restore chat history and flashcards during session recovery#568

Merged
FireFistisDead merged 1 commit into
FireFistisDead:masterfrom
Sandeep6135:fix/session-recovery-data-loss
Jun 18, 2026
Merged

fix(rag): restore chat history and flashcards during session recovery#568
FireFistisDead merged 1 commit into
FireFistisDead:masterfrom
Sandeep6135:fix/session-recovery-data-loss

Conversation

@Sandeep6135

@Sandeep6135 Sandeep6135 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Pull Request: Resolve Critical Data Loss inside Session Recovery Flow

Closes #561

📌 Classification & Priority

Metric Value
Change Type bug-fix
Severity Level critical
Target Service rag-service
Database Impact exceptional-data-protection
Fix Strategy Native Code Patch (no external dependencies)

📖 Summary

Important

This PR resolves a critical logical defect in the session recovery flow where a user's entire chat history and flashcards were permanently deleted from disk when an evicted session or restarted session was recovered from the registry database.

🔴 Problem

When an inactive session was recovered using _recover_session_unlocked, it restored metadata values from the general session registry but omitted loading the "chat" and "flashcards" fields. Consequently, the recovered in-memory session object lacked these keys. When the background thread next ran _snapshot_session_for_persistence, it resolved the missing fields to empty lists ([]), overwrote session_meta.json on disk, and wiped out the user's historical data.

🟢 Solution

Updated the session recovery function to:

  1. Check for the existence of <session_dir>/session_meta.json inside the recovered session directory.
  2. Read and parse the session metadata files to fetch the historical chat and flashcards states.
  3. Validate the retrieved data formats using normalize_chat_history.
  4. Properly instantiate the recovered session with the restored history.

🧪 Steps to Reproduce

  1. Ingest a PDF document and start a chat session.
  2. Execute multiple chat interactions to populate chat and flashcards history.
  3. Wait for cache eviction or manually trigger a session reload from the registry.
  4. Call /validate-session-write or /ask to recover the session from registry.
  5. Wait for the background flush thread (default: 10s).
  6. Verify the contents of session_meta.json in the database directory.

🔍 Expected Behaviour

The session recovery routine retrieves the existing session_meta.json file on disk and fully restores the user's chat history and flashcards.

❌ Actual Behaviour (Before Fix)

The chat log and flashcards were missing during the active session after recovery, and the corresponding lists were overwritten with empty arrays ([]) in the JSON file on disk, permanently wiping out user data.


🛠️ Code Diff Walkthrough

rag-service/main.py

@@ -1255,6 +1255,20 @@
         remove_persisted_session(session_id, session_dir)
         return None
 
+    chat = []
+    flashcards = []
+    meta_path = os.path.join(session_dir, "session_meta.json")
+    if os.path.isfile(meta_path):
+        try:
+            with open(meta_path, "r", encoding="utf-8") as f:
+                per = json.load(f)
+            if isinstance(per.get("chat"), list):
+                chat = normalize_chat_history(per["chat"])
+            if isinstance(per.get("flashcards"), list):
+                flashcards = per["flashcards"]
+        except Exception as e:
+            logger.warning("Failed to load session metadata during recovery for %s: %s", session_id, e)
+
     try:
         vectorstore = _load_vectorstore_from_snapshot(session_id, get_embedding_model())
     except Exception:
@@ -1269,6 +1269,8 @@
         "session_dir": session_dir,
         "created_at": float(entry.get("created_at", last_accessed) or last_accessed),
         "last_accessed": last_accessed,
+        "chat": chat,
+        "flashcards": flashcards,
     }


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **Bug Fixes**
  * Sessions now properly restore chat history and flashcards data during recovery operations, preserving user interactions.
  * Enhanced error handling with logging when session data recovery encounters issues.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Copilot AI review requested due to automatic review settings June 18, 2026 17:24
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

@Sandeep6135 is attempting to deploy a commit to the firefistisdead's projects Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions Bot added bug Something isn't working enhancement New feature or request fix A targeted fix or cleanup level:critical rag-service FastAPI / model service work labels Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

In _recover_session_unlocked, 16 lines are added to read session_meta.json from the session directory during recovery. chat and flashcards are initialized to empty lists, populated from the file when present (with chat normalized via normalize_chat_history), and then included in the returned session metadata dict.

Changes

Session Recovery Data Restoration

Layer / File(s) Summary
Load and normalize chat/flashcards during session recovery
rag-service/main.py
_recover_session_unlocked now initializes chat and flashcards to [], attempts to read session_meta.json from the session directory, normalizes loaded chat history via normalize_chat_history, and includes both values in the constructed session metadata dict; failures are caught and logged as warnings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Possibly related issues

  • #561 [Bug]: Critical Data Loss during Session Recovery in RAG Service — This PR directly implements the recommended fix from issue #561: reading session_meta.json in _recover_session_unlocked and restoring chat and flashcards into the session metadata dict, preventing the background flush thread from overwriting persisted data with empty lists.

Suggested labels

backend

🐇 Hippity-hop, what a fix today,
Lost chats and flashcards? Not anymore, hooray!
The session awakes from its slumber so deep,
With history restored from its JSON sleep.
No data lost — every message we keep! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: restoring chat history and flashcards during session recovery, which is the core fix in this PR.
Description check ✅ Passed The description comprehensively covers the problem, solution, reproduction steps, and code diff. It follows the general structure with Summary, Issue link, and detailed context, though some template sections are not explicitly formatted.
Linked Issues check ✅ Passed The PR fully implements the fix recommended in issue #561: reading session_meta.json, recovering chat and flashcards, validating with normalize_chat_history, and populating these fields in the session metadata dictionary.
Out of Scope Changes check ✅ Passed All changes are scoped to the session recovery function in rag-service/main.py and directly address the data loss issue. No extraneous modifications are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a data-loss bug in the RAG service’s session recovery path by restoring per-session chat history and flashcards from on-disk metadata before the session is re-hydrated into memory, preventing the background flush thread from overwriting history with empty lists.

Changes:

  • Load <session_dir>/session_meta.json during _recover_session_unlocked to restore chat and flashcards.
  • Normalize recovered chat via normalize_chat_history before storing it in the recovered session metadata.
  • Include restored chat/flashcards in the recovered in-memory meta dict.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rag-service/main.py
Comment on lines +1258 to +1270
chat = []
flashcards = []
meta_path = os.path.join(session_dir, "session_meta.json")
if os.path.isfile(meta_path):
try:
with open(meta_path, "r", encoding="utf-8") as f:
per = json.load(f)
if isinstance(per.get("chat"), list):
chat = normalize_chat_history(per["chat"])
if isinstance(per.get("flashcards"), list):
flashcards = per["flashcards"]
except Exception as e:
logger.warning("Failed to load session metadata during recovery for %s: %s", session_id, e)
Comment thread rag-service/main.py
Comment on lines +1258 to +1260
chat = []
flashcards = []
meta_path = os.path.join(session_dir, "session_meta.json")

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rag-service/main.py`:
- Around line 1258-1271: The code initializes chat and flashcards as empty lists
and silently proceeds with those defaults if the metadata file exists but cannot
be read or parsed, which can cause data loss. To fix this fail-open
vulnerability, remove the default empty list initialization and instead only set
chat and flashcards if the metadata loads successfully. Wrap the entire metadata
loading logic (from opening the file through the isinstance checks) in a
try-except block that catches only specific expected exceptions like
JSONDecodeError and IOError rather than the broad Exception catch, and when any
of these expected exceptions occur and the file exists, return None or abort the
recovery process instead of proceeding with empty defaults. This ensures that
transient issues reading session_meta.json do not result in empty data being
persisted and overwriting existing chat/flashcard history.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 227f4dfb-00f5-47c5-aa9a-2a2bb34a32ad

📥 Commits

Reviewing files that changed from the base of the PR and between 5590b87 and 216b79d.

📒 Files selected for processing (1)
  • rag-service/main.py

Comment thread rag-service/main.py
Comment on lines +1258 to +1271
chat = []
flashcards = []
meta_path = os.path.join(session_dir, "session_meta.json")
if os.path.isfile(meta_path):
try:
with open(meta_path, "r", encoding="utf-8") as f:
per = json.load(f)
if isinstance(per.get("chat"), list):
chat = normalize_chat_history(per["chat"])
if isinstance(per.get("flashcards"), list):
flashcards = per["flashcards"]
except Exception as e:
logger.warning("Failed to load session metadata during recovery for %s: %s", session_id, e)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail-open recovery still permits chat/flashcard data loss on metadata read errors.

On Line 1258 and Line 1259, defaults are []; on Line 1269, any exception during read/parse/normalize is swallowed, and recovery proceeds with those empty lists. That means transient I/O/JSON issues can still lead to empty chat/flashcards being persisted later, recreating the overwrite-loss path this PR is fixing.

Prefer fail-closed when session_meta.json exists but cannot be safely loaded (e.g., abort recovery/return None), and narrow caught exceptions to expected read/parse failures.

Suggested patch
-    chat = []
-    flashcards = []
+    chat = []
+    flashcards = []
     meta_path = os.path.join(session_dir, "session_meta.json")
     if os.path.isfile(meta_path):
         try:
             with open(meta_path, "r", encoding="utf-8") as f:
                 per = json.load(f)
             if isinstance(per.get("chat"), list):
                 chat = normalize_chat_history(per["chat"])
             if isinstance(per.get("flashcards"), list):
                 flashcards = per["flashcards"]
-        except Exception as e:
+        except (OSError, json.JSONDecodeError, UnicodeDecodeError) as e:
             logger.warning("Failed to load session metadata during recovery for %s: %s", session_id, e)
+            return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
chat = []
flashcards = []
meta_path = os.path.join(session_dir, "session_meta.json")
if os.path.isfile(meta_path):
try:
with open(meta_path, "r", encoding="utf-8") as f:
per = json.load(f)
if isinstance(per.get("chat"), list):
chat = normalize_chat_history(per["chat"])
if isinstance(per.get("flashcards"), list):
flashcards = per["flashcards"]
except Exception as e:
logger.warning("Failed to load session metadata during recovery for %s: %s", session_id, e)
chat = []
flashcards = []
meta_path = os.path.join(session_dir, "session_meta.json")
if os.path.isfile(meta_path):
try:
with open(meta_path, "r", encoding="utf-8") as f:
per = json.load(f)
if isinstance(per.get("chat"), list):
chat = normalize_chat_history(per["chat"])
if isinstance(per.get("flashcards"), list):
flashcards = per["flashcards"]
except (OSError, json.JSONDecodeError, UnicodeDecodeError) as e:
logger.warning("Failed to load session metadata during recovery for %s: %s", session_id, e)
return None
🧰 Tools
🪛 Ruff (0.15.17)

[warning] 1269-1269: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rag-service/main.py` around lines 1258 - 1271, The code initializes chat and
flashcards as empty lists and silently proceeds with those defaults if the
metadata file exists but cannot be read or parsed, which can cause data loss. To
fix this fail-open vulnerability, remove the default empty list initialization
and instead only set chat and flashcards if the metadata loads successfully.
Wrap the entire metadata loading logic (from opening the file through the
isinstance checks) in a try-except block that catches only specific expected
exceptions like JSONDecodeError and IOError rather than the broad Exception
catch, and when any of these expected exceptions occur and the file exists,
return None or abort the recovery process instead of proceeding with empty
defaults. This ensures that transient issues reading session_meta.json do not
result in empty data being persisted and overwriting existing chat/flashcard
history.

Source: Linters/SAST tools

@FireFistisDead FireFistisDead merged commit a55ae92 into FireFistisDead:master Jun 18, 2026
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request fix A targeted fix or cleanup gssoc:approved level:critical quality:exceptional rag-service FastAPI / model service work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Critical Data Loss during Session Recovery in RAG Service

3 participants