Next-Generation Semantic Retrieval Operating System
High Performance · Multi-Stage Retrieval · Graph-RAG · Memory-Enhanced · Self-Evolving
- Project Overview
- Core Features
- System Architecture
- Project Structure
- Quick Start
- Core Components
- SharpAIKit
- SA-AgentOS
- API Documentation
- Testing
- Performance Comparison
- Contributing
- License
SA-RAG is an enterprise-grade Semantic Retrieval Augmented Generation (RAG) framework built with a Rust + Python + C# three-tier architecture, providing:
- 🚀 High-Performance Retrieval Engine: Rust-based core retrieval system
- 🧠 Intelligent Orchestration Layer: Python implementation for LLM integration and query analysis
- 🤖 Agent System: C# Agent framework based on SharpAIKit
- 📊 Multi-Stage Retrieval: Hybrid retrieval combining Vector + BM25 + Graph + Memory
- 🔄 Self-Evolving Capabilities: Learning ranker, consistency checking, autonomous debugging
SA-RAG is not just a RAG framework, but a Semantic Retrieval Operating System with:
- Adaptive: Automatically optimizes retrieval strategies based on data
- Explainable: Complete execution graph tracking and visualization
- Extensible: Plugin-based architecture supporting custom components
- Self-Optimizing: Automatically learns optimal ranking weights
- Learnable: Continuously improves retrieval quality from data
| Feature | Description | Implementation |
|---|---|---|
| Semantic Node Engine | Semantic node segmentation engine | Rust |
| Hybrid Retrieval | Vector + BM25 hybrid retrieval | Rust |
| Graph-RAG | Knowledge graph-enhanced retrieval | Rust |
| Multi-Stage Ranking | Multi-stage ranking fusion | Rust |
| Memory Store | Long-term memory management | Rust |
| Diff Index | Incremental index updates | Rust |
| Feature | Description | Implementation |
|---|---|---|
| Self-Evolving Ranker | Self-evolving ranker (RL + contrastive learning) | Rust + Python |
| Execution Graph | Semantic execution graph visualization | Rust + Python |
| Cognitive Memory | Three-tier memory system (short-term/long-term/semantic consolidation) | Rust + Python |
| Multimodal Nodes | Multimodal node support (images/tables/code/formulas) | Rust |
| Feature Store | High-dimensional feature storage (versioned + TTL) | Rust |
| Autonomous Debugger | Autonomous retrieval debugger | Rust + Python |
| Consistency Checker | Retrieval consistency validator | Rust + Python |
| Plugin System | Plugin-based extensible architecture | Rust + Python |
Complete System Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ SA-AgentOS (C# Agent Layer) │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Agent │ │ Planner │ │ Memory │ │
│ │ Pipeline │ │ │ │ Tool │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ └────────────────┴────────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ SharpAIKit Tools │ │
│ │ ┌──────────────┐ │ │
│ │ │ Knowledge │ │ │
│ │ │ Retrieval │ │ │
│ │ └──────┬───────┘ │ │
│ │ ┌──────┴───────┐ │ │
│ │ │ DeepSeek │ │ │
│ │ │ Reasoning │ │ │
│ │ └──────────────┘ │ │
└──────────────┼────────────────────┼──────────────────────────────┘
│ HTTP/gRPC │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Python Orchestration Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Query │ │ Structured│ │ Embedding│ │ LLM │ │
│ │ Analyzer │ │ Query │ │ Generator│ │ Service │ │
│ └────┬─────┘ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │ │
│ └──────────────┴───────────────┴───────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ RAG Pipeline │ │
│ │ (Orchestrator) │ │
└──────────────┼─────────────────────┼──────────────────────────────┘
│ PyO3 / FFI │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Rust Core Engine (High Performance) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Semantic │ │ Vector │ │ BM25 │ │ Graph │ │
│ │ Node │ │ Index │ │ Index │ │ RAG │ │
│ └────┬─────┘ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │ │
│ └──────────────┴───────────────┴───────────────┘ │
│ │ │
│ ┌──────────┴──────────┐ │
│ │ Multi-Stage │ │
│ │ Ranking Engine │ │
│ └─────────────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Learning │ │Execution │ │ Feature │ │ Memory │ │
│ │ Ranker │ │ Graph │ │ Store │ │ System │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
SA-RAG/
├── SA_RAG/ # Core RAG Framework
│ ├── src/ # Rust Core Engine
│ │ ├── lib.rs # Main entry (PyO3 bindings)
│ │ ├── semantic_node.rs # Semantic node engine
│ │ ├── parser.rs # Document parser
│ │ ├── indexer/ # Indexer modules
│ │ │ ├── vector.rs # HNSW vector index
│ │ │ ├── bm25.rs # BM25 full-text index
│ │ │ ├── hybrid.rs # Hybrid retrieval engine
│ │ │ └── multi_stage.rs # Multi-stage ranking
│ │ ├── graph/ # Graph retrieval module
│ │ │ ├── graph.rs # Knowledge graph
│ │ │ └── expansion.rs # Graph expansion algorithms
│ │ ├── memory.rs # Memory store
│ │ ├── diff.rs # Incremental indexing
│ │ ├── learning_ranker/ # Self-evolving ranker
│ │ ├── execution_graph/ # Execution graph
│ │ ├── feature_store/ # Feature store
│ │ ├── consistency/ # Consistency checker
│ │ ├── debugger/ # Autonomous debugger
│ │ └── plugins/ # Plugin system
│ ├── python/ # Python Orchestration Layer
│ │ ├── sa_rag/
│ │ │ ├── __init__.py # RAG main class
│ │ │ ├── client.py # Client API
│ │ │ ├── rag.py # RAG pipeline
│ │ │ ├── orchestrator.py # Orchestrator
│ │ │ ├── query_analysis.py # Query analysis
│ │ │ ├── llm.py # LLM service
│ │ │ ├── embedding.py # Embedding generation
│ │ │ ├── api_server.py # FastAPI server
│ │ │ └── [next-gen modules] # Next-generation feature modules
│ │ └── pyproject.toml
│ ├── examples/ # Example code
│ ├── tests/ # Test suite
│ └── Cargo.toml
│
├── sa_agentos/ # Agent System
│ ├── agent_csharp/ # C# Agent Project
│ │ └── SaAgentOS/
│ │ ├── Program.cs # Main program
│ │ ├── Agents/ # Agent implementation
│ │ ├── Tools/ # Tool collection
│ │ ├── Pipelines/ # Pipeline orchestration
│ │ └── Config/ # Configuration files
│ └── python_server/ # Python HTTP Service
│ ├── server.py # FastAPI server
│ └── tests/ # Tests
│
├── SharpAIKit/ # C# AI Framework (Dependency)
│ ├── src/SharpAIKit/ # Core library
│ │ ├── Agent/ # Agent framework
│ │ ├── LLM/ # LLM client
│ │ ├── Memory/ # Memory management
│ │ └── [other modules]
│ └── samples/ # Sample projects
│
└── docs/ # Documentation
├── architecture.md # Architecture documentation
├── api_reference.md # API reference
└── NEXT_GEN_FEATURES.md # Next-generation features documentation
- Rust: 1.70+ (for compiling core engine)
- Python: 3.10+ (recommended to use
uvpackage manager) - .NET: 8.0+ (for Agent system)
- DeepSeek API Key: for LLM reasoning
cd SA_RAG
cargo build --releasecd SA_RAG/python
# Using uv (recommended)
uv sync
# Or using pip
pip install -e .cd SA_RAG/tests
uv run pytest test_framework_validation.py -vcd SA_RAG/python
uv run python -m sa_rag.api_server# 1. Start Python RAG service
cd sa_agentos/python_server
uv run python server.py
# 2. Configure DeepSeek API Key (in appsettings.json)
# 3. Run C# Agent
cd sa_agentos/agent_csharp/SaAgentOS
dotnet run- Semantic Node Engine: Intelligent document segmentation
- Hybrid Index: Vector + BM25 hybrid index
- Graph Engine: Knowledge graph construction and querying
- Multi-Stage Ranker: Multi-stage ranking fusion
- Memory Store: Long-term memory management
- Diff Index: Incremental index updates
- Query Analyzer: LLM-driven query analysis
- Structured Query Builder: Structured query generation
- Orchestrator: Retrieval strategy orchestration
- Embedding Generator: Embedding vector generation
- RAG Pipeline: End-to-end RAG workflow
- SaRagAgent: Core Agent class (based on SharpAIKit)
- KnowledgeRetrievalTool: Knowledge retrieval tool
- DeepSeekReasoningTool: DeepSeek reasoning tool
- MemoryTool: Memory management tool
- SaAgentPipeline: Agent execution pipeline
- FastAPI Server: Provides REST API
- RAG Query Endpoint:
/rag/query - Memory Management:
/rag/memory - Health Check:
/health
SharpAIKit is a unified .NET large-model application and agentic AI development framework that powers the SA-AgentOS system. It provides:
- ✅ Unified LLM Interface: One API for all OpenAI-compatible models (OpenAI, DeepSeek, Qwen, Mistral, etc.)
- ✅ LCEL-style Chains: Elegant chain composition with pipe operators
- ✅ Multiple Memory Strategies: Buffer, Window, Summary, Vector, and Entity memory
- ✅ Advanced Agents: ReAct, Plan-and-Execute, and Multi-Agent systems
- ✅ Native C# Code Interpreter: Execute C# code directly using Roslyn, no Python needed
- ✅ SharpGraph: Graph-based orchestration with FSM, supports loops and complex branches
- ✅ DSPy-style Optimizer: Automatically optimize prompts through iterative improvement
SA-AgentOS leverages SharpAIKit's powerful Agent framework:
- Tool System: All SA-RAG tools inherit from
SharpAIKit.Agent.ToolBase - Agent Framework:
SaRagAgentusesSharpAIKit.Agent.AiAgentfor tool orchestration - LLM Client: Uses
SharpAIKit.LLM.LLMClientFactoryfor DeepSeek integration - Memory Management: Uses
SharpAIKit.Memory.BufferMemoryfor conversation history
For more information about SharpAIKit, see: SharpAIKit/README.md
SharpAIKit is a unified .NET large-model application and agentic AI development framework that powers the SA-AgentOS system. It's the .NET equivalent of LangChain, but with killer features that leverage the .NET ecosystem.
- ✅ Unified LLM Interface: One API for all OpenAI-compatible models (OpenAI, DeepSeek, Qwen, Mistral, Yi, Groq, Moonshot, Ollama, etc.)
- ✅ LCEL-style Chains: Elegant chain composition with pipe operators
- ✅ Multiple Memory Strategies: Buffer, Window, Summary, Vector, and Entity memory
- ✅ Advanced Agents: ReAct, Plan-and-Execute, and Multi-Agent systems
- ✅ Native C# Code Interpreter: Execute C# code directly using Roslyn, no Python needed 🔮
- ✅ SharpGraph: Graph-based orchestration with FSM, supports loops and complex branches 🕸️
- ✅ DSPy-style Optimizer: Automatically optimize prompts through iterative improvement 🧬
- 🔮 Native C# Code Interpreter: Execute C# code directly using Roslyn - no Python dependency, blazing fast!
- 🕸️ SharpGraph: Graph orchestration with loops and complex branches - more powerful than LangGraph
- 🧬 DSPy Optimizer: Auto-optimize prompts, gets smarter over time
SA-AgentOS leverages SharpAIKit's powerful Agent framework:
- Tool System: All SA-RAG tools inherit from
SharpAIKit.Agent.ToolBase- Tools are automatically discovered via
[Tool]and[Parameter]attributes - Seamless integration with Agent's tool calling mechanism
- Tools are automatically discovered via
- Agent Framework:
SaRagAgentusesSharpAIKit.Agent.AiAgentfor tool orchestration- Automatic tool discovery and execution
- Built-in conversation history management
- LLM Client: Uses
SharpAIKit.LLM.LLMClientFactoryfor DeepSeek integration- Unified interface for all OpenAI-compatible APIs
- Streaming support
- Memory Management: Uses
SharpAIKit.Memory.BufferMemoryfor conversation history- Automatic context management
- Configurable history length
using SharpAIKit.LLM;
// Create client for any OpenAI-compatible API
var client = LLMClientFactory.Create(
apiKey: "your-api-key",
baseUrl: "https://api.deepseek.com/v1",
model: "deepseek-chat"
);
// Simple chat
var response = await client.ChatAsync("Hello!");
// Streaming output
await foreach (var chunk in client.ChatStreamAsync("Tell me a story"))
{
Console.Write(chunk);
}| Provider | Base URL | Preset Method |
|---|---|---|
| OpenAI | https://api.openai.com/v1 |
CreateOpenAI() |
| DeepSeek | https://api.deepseek.com/v1 |
CreateDeepSeek() |
| Qwen (Alibaba) | https://dashscope.aliyuncs.com/compatible-mode/v1 |
CreateQwen() |
| Mistral | https://api.mistral.ai/v1 |
CreateMistral() |
| Yi (01.AI) | https://api.lingyiwanwu.com/v1 |
CreateYi() |
| Groq | https://api.groq.com/openai/v1 |
CreateGroq() |
| Moonshot (Kimi) | https://api.moonshot.cn/v1 |
CreateMoonshot() |
| Ollama (Local) | http://localhost:11434 |
CreateOllama() |
| Any OpenAI-compatible | Custom | Create(key, url, model) |
For more information about SharpAIKit, see: SharpAIKit/README.md
SA-AgentOS is an intelligent agent system built on top of SA-RAG Core and SharpAIKit, providing:
- ✅ Planning: Intelligent decision-making about when to retrieve knowledge
- ✅ World Model (Memory): Long-term memory management
- ✅ Knowledge Retrieval: Calls SA-RAG for multi-stage retrieval
- ✅ Execution Graph: Visualizes Agent reasoning process
- ✅ Self-Correction: Built-in debugging and reflection capabilities
- ✅ Multi-Turn Dialogue: Supports context-aware conversations
# Start Agent
cd sa_agentos/agent_csharp/SaAgentOS
dotnet run
# Interactive dialogue
> Hello, I'm John
> What's my name? # Agent remembers: Your name is John
> Please help me find information about Python
> clear # Clear conversation history
> exit # ExitFor detailed documentation, see: sa_agentos/README.md
| Feature | SA-RAG | LangChain | LlamaIndex | Haystack | Chroma | Weaviate |
|---|---|---|---|---|---|---|
| Performance | ||||||
| Rust Core Engine | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ |
| Native Speed | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ |
| Core Features | ||||||
| Semantic Node Parsing | ✅ | ✅ | ❌ | ❌ | ||
| Multi-stage Retrieval | ✅ | ✅ | ✅ | ❌ | ❌ | |
| Graph-RAG | ✅ | ✅ | ❌ | ❌ | ||
| Differential Indexing | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Long-term Memory | ✅ | ❌ | ❌ | |||
| Hybrid Retrieval | ✅ | ✅ | ✅ | ✅ | ||
| Next-Gen Features | ||||||
| Self-Evolving Ranker | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Execution Graph | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Cognitive Memory System | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Multimodal Support | ✅ | ❌ | ||||
| Feature Store | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Autonomous Debugger | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Consistency Checker | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Plugin System | ✅ | ❌ | ❌ | |||
| Architecture | ||||||
| Python + Rust Hybrid | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| PyO3 Integration | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Extensible Plugin API | ✅ | ❌ | ❌ | |||
| Performance Metrics | ||||||
| Indexing Speed | ~1000 docs/s | ~100 docs/s | ~200 docs/s | ~150 docs/s | ~500 docs/s | ~800 docs/s |
| Retrieval Latency | <10ms | ~50ms | ~30ms | ~40ms | ~20ms | ~15ms |
| Memory Efficiency | High | Medium | Medium | Medium | High | High |
Legend:
- ✅ Full support
⚠️ Partial support or requires additional setup- ❌ Not supported
- Performance: Rust core engine provides native-speed operations, 5-10x faster than pure Python implementations
- Self-Learning: Unique self-evolving ranker learns optimal retrieval strategies from your data
- Explainability: Execution graph visualization shows exactly how queries are processed
- Cognitive Memory: Three-tier memory system (short-term/long-term/semantic) with automatic consolidation
- Autonomous Debugging: Automatically identifies and suggests fixes for retrieval issues
- Consistency: Built-in drift detection ensures reproducible and reliable retrieval
- Extensibility: Comprehensive plugin system for custom rankers, parsers, and policies
- Multimodal-Ready: Native support for images, tables, code blocks, and formulas
cd SA_RAG/tests
uv run pytest test_framework_validation.py -v- ✅ Framework initialization tests
- ✅ Document indexing tests
- ✅ Search functionality tests
- ✅ Q&A functionality tests
- ✅ Memory management tests
- ✅ Document update tests
- ✅ Next-generation feature tests
- ✅ Error handling tests
- ✅ Performance tests
- ✅ Integration tests
from sa_rag import RAG
# Initialize
rag = RAG(llm_provider="deepseek", embedding_provider="deepseek")
# Index documents
rag.index_documents(["Document 1", "Document 2"])
# Search
results = rag.search("query", top_k=5, use_graph=True, use_memory=True)
# Q&A
answer = rag.ask("question", top_k=5, use_graph=True, use_memory=True)
# Add memory
rag.add_memory("Important information", importance=0.8)# Query
curl -X POST http://localhost:8000/rag/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is Python?",
"use_graph": true,
"use_memory": true,
"top_k": 6
}'
# Add memory
curl -X POST "http://localhost:8000/rag/memory?text=User likes Python&importance=0.8"For detailed API documentation, see: docs/api_reference.md
cd SA_RAG
cargo build --releasecd SA_RAG/python
uv sync --dev
uv run pytestcd sa_agentos/agent_csharp/SaAgentOS
dotnet build
dotnet test- Rust core engine
- Python orchestration layer
- Multi-stage retrieval
- Graph-RAG
- Memory management
- Incremental indexing
- Next-generation features (8 major modules)
- SA-AgentOS Agent system
- Conversation history support
- Web UI console
- Distributed deployment support
- More LLM provider support
- Performance optimization
- More examples and tutorials
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License. See the LICENSE file for details.
- SharpAIKit: C# AI framework support
- PyO3: Rust-Python bindings
- FastAPI: Python web framework
- DeepSeek: LLM service provider
SA-RAG - Next-Generation Semantic Retrieval Operating System
⭐ If this project helps you, please give it a Star!

