Skip to content

FarhadShariatzadeh/StackPilot

Repository files navigation

StackPilot

A production-grade Agentic AI orchestrator built with .NET 10 and Semantic Kernel 1.74 — featuring a full RAG pipeline, autonomous reasoning loops, stateful multi-turn chat, an embedded React UI, enterprise security, and cloud-ready deployment infrastructure.

CI Release Tests .NET SK


What is StackPilot?

StackPilot is a fully engineered AI backend demonstrating senior-level system design across the entire AI application stack: from raw text ingestion through hybrid retrieval, autonomous tool-using agents, stateful conversation memory, observability, and enterprise governance — all served through an embedded React web UI that ships inside the same single binary.

Covering every layer a production AI system needs, not just "call an LLM and return the answer."


Quick Start — no install, no terminal needed

  1. Go to Releases and download the zip for your OS
  2. Unzip it
  3. Open .env in any text editor and set your OpenAI API key:
    OpenAI__ApiKey=sk-your-key-here
    OpenAI__ModelId=gpt-4o-mini
    OpenAI__EmbeddingModelId=text-embedding-3-small
    
  4. Double-click StackPilot.Api.exe (Windows) or run ./StackPilot.Api (Linux/macOS)
  5. Open http://localhost:5050 — the UI loads automatically

Windows SmartScreen warning: Click More info → Run anyway. This is expected for unsigned open-source binaries.


Architecture

graph TB
    User([User])

    subgraph UI [Embedded React UI - served from wwwroot]
        Chat[Chat Tab]
        Playground[Playground Tab]
        Logs[Log Ingestion Tab]
        Agent[Agent Tab]
    end

    subgraph API [ASP.NET Core Minimal API]
        Auth[JWT Auth + RBAC]
        Guard[Guardrail Service]
        PII[PII Masking]
        Tenant[Tenant Middleware]
    end

    subgraph RAG [RAG Pipeline]
        direction TB
        Ingest[Ingestion - SK TextChunker]
        Embed[Embedding - text-embedding-3-small]
        Store[Vector Store - IVectorStore]
        Hybrid[Hybrid Search - Vector + Keyword + RRF]
        Rerank[Reranker]
        Threshold[Score Threshold]
        Compress[Context Compressor]
        Cache[Semantic Cache]
        RagSvc[RagService]
    end

    subgraph Agent [Agentic Layer]
        direction TB
        Plugins[SK Plugins - StackPilot / LogSearch / GitHub]
        AgentSvc[AgentService - RunAsync + SolveAsync]
        Loop[Think-Act-Observe Loop - MaxIterations=5]
    end

    subgraph Memory [Stateful Memory]
        direction TB
        ChatHist[Chat History - IChatHistoryStore]
        Window[Sliding Window - MaxMessages=10]
        Summarise[Conversation Summariser]
        SemMem[Semantic Memory - Vector-stored summaries]
    end

    subgraph Async [Async Infrastructure]
        direction TB
        Queue[IJobQueue - Channel / Service Bus]
        Worker[IngestionWorker - BackgroundService]
        Status[Job Status API]
    end

    subgraph Obs [Observability]
        direction TB
        OTel[OpenTelemetry Tracing]
        Latency[Latency Tracker]
        Budget[Token Budget]
        Health[Health Checks]
        Audit[Audit Log]
    end

    LLM([OpenAI - gpt-4o / gpt-4o-mini])

    User --> UI --> API
    Auth --> Guard --> Tenant
    Tenant --> RAG
    Tenant --> Agent
    Tenant --> Memory
    PII --> Store

    Ingest --> Embed --> Store
    Store --> Hybrid --> Rerank --> Threshold --> Compress --> RagSvc
    Cache --> RagSvc
    RagSvc --> LLM

    Plugins --> AgentSvc --> Loop --> LLM
    ChatHist --> Window --> Summarise --> SemMem
    Memory --> LLM

    Queue --> Worker --> Store
    Worker --> Status

    RagSvc --> Latency
    RagSvc --> Budget
    AgentSvc --> Audit
    OTel --> Health
Loading

Feature Highlights

Phase 1 — Production RAG Engine

  • Text Chunking via SK TextChunker with configurable token size and overlap (IngestionOptions)
  • Embeddings Pipeline using OpenAI text-embedding-3-small via IEmbeddingGenerator<string, Embedding<float>> with 96-chunk batching to stay within OpenAI's 300k token/request limit
  • Hybrid Search combining cosine vector similarity + TF keyword scoring fused with Reciprocal Rank Fusion (RRF, k=60)
  • Score Thresholding — configurable quality gate rejects low-confidence chunks before prompting
  • Prompt Optimisation — 6-rule anti-hallucination system instruction, forbidden phrases, fully configurable
  • Response CachingIResponseCache / MemoryResponseCache, Redis-swap-ready
  • OpenAPI + Scalar UI at /scalar/v1 — all endpoints typed, tagged, and described

Phase 2 — Agentic Reasoning & Memory

  • Native Functions[KernelFunction] plugins: system status, log search (live vector store), live GitHub API
  • Automatic Tool SelectionFunctionChoiceBehavior.Auto() lets the LLM choose tools
  • Think → Act → Observe Loop — explicit ReAct implementation with MaxIterations safety cap
  • Full Reasoning Trace — every step (thought / tool / observation) returned in AgentResponse
  • Stateful Multi-Turn ChatIChatHistoryStore, sliding window, LLM-driven summarisation
  • Semantic Memory — past conversation summaries embedded and scoped by userId

Phase 3 — Engineering & Scale

  • Async Ingestion QueueIJobQueue<T> over System.Threading.Channels, BackgroundService worker, job status polling
  • Semantic Reranker — retrieve 20 chunks, rerank to top 5 via LLM scoring (IReranker)
  • Metadata FilteringSearchFilter (TenantId, Source, AfterDate) on all search paths
  • SHA-256 Deduplication — prevents storing identical chunks twice
  • Context Compression — long contexts summarised before the LLM call to reduce token cost
  • OpenTelemetry Tracing — AspNetCore + HTTP instrumentation, OTLP-swappable
  • Per-Stage Latency Tracking — Retrieval, Reranking, LLM Inference as structured log events
  • Token Budget Alerts — cost estimate per query with configurable MaxCostPerQueryUsd
  • Semantic Cache — cosine similarity check against cached query embeddings (threshold: 0.97)
  • JWT Authentication + RBAC — Bearer token auth; Admin / User / Reader roles
  • Soft Multi-TenancyX-Tenant-Id header; all vector records tagged with tenantId
  • Prompt Injection Guardrails — static pattern detection on all user-facing inputs
  • PII Masking — email, SSN, phone, credit card regex applied before embedding
  • Tamper-Resistant Audit Log — every query and agent decision recorded
  • Health Checks/health and /health/detail with per-component status
  • Dockerfile + docker-compose — multi-stage .NET 10 build; API + Redis in one command
  • GitHub Actions CI — build + 158 tests on every push; automatic release on every PR merge to main

Phase 4 — Embedded UI & Production Delivery

  • Embedded React UI — Chat, Playground, Log Ingestion, and Agent tabs compiled into wwwroot/ and served directly from the binary via UseStaticFiles() + SPA fallback
  • Double-click to launch — reads .env from the directory next to the exe; Kestrel bound to port 5050 via appsettings.json; no terminal or environment variable setup required
  • Log Ingestion tab — drag-and-drop or paste log files; immediate (sync) or background (async) ingestion with live job status polling; ingestion history persists across tab navigation
  • Per-source file deletion — each ingested file is tagged with metadata["source"]; the trash icon on any history entry removes only that file's chunks from the vector store via DELETE /store/by-source/{source}
  • Auto-release on PR mergerelease.yml triggers on every push to main; version is MAJOR.MINOR (from VERSION file) + GitHub run number as patch; builds self-contained single-file binaries for Linux, Windows, and macOS in parallel, then publishes a GitHub Release with all three zips attached
  • Chat history persistence — React Context keeps chat state alive across tab switches; fixed stale-closure bug that caused user messages to disappear from the UI

How to Run

Option A — Download a release (easiest)

See Quick Start above. No SDK or terminal required.

Option B — Run from source

Prerequisites: .NET 10 SDK, Node.js 20+, an OpenAI API key

# 1. Build the React UI
cd StackPilot.UI
npm install
npm run build        # outputs to StackPilot.Api/wwwroot/
cd ..

# 2. Configure secrets
cd StackPilot.Api
dotnet user-secrets set "OpenAI:ApiKey"            "sk-..."
dotnet user-secrets set "OpenAI:ModelId"           "gpt-4o-mini"
dotnet user-secrets set "OpenAI:EmbeddingModelId"  "text-embedding-3-small"

# 3. Run
dotnet run
URL Purpose
http://localhost:5050 Embedded React UI
http://localhost:5050/health Health check
http://localhost:5050/scalar/v1 Interactive API docs (dev only)
http://localhost:5050/dashboard Vector store debug view

Option C — Run with Docker

export OPENAI_API_KEY=sk-...
docker-compose up --build

Run tests

dotnet test
# 158 tests, 0 failures

UI Guide

Chat tab

Stateful multi-turn conversation with semantic memory. Chat history persists when switching tabs. Use New session to start fresh; change the User field to test per-user memory isolation.

Use this tab to ask questions about logs you've uploaded — the agent automatically searches your ingested content.

Log Ingestion tab

Upload log files (drag-and-drop or file picker), paste raw text, or mix both. Choose Immediate (sync) or Background (async) mode. The ingestion history lists every uploaded file with its chunk count and timestamp.

To remove a file from the vector store: click the 🗑️ icon on any history entry and confirm. Only that file's chunks are deleted — everything else is untouched.

To wipe everything: use the Clear Vector Store button at the bottom of the page. This also clears the ingestion history.

Recommended workflow: Clear → upload your file → ask questions in Chat or Playground.

Playground tab

Direct access to the RAG /ask endpoint and the /store endpoint. Good for testing retrieval quality and inspecting which source chunks were used in an answer.

Agent tab

Runs the autonomous Think→Act→Observe reasoning loop. The agent uses tools (vector store search, system status, GitHub API) to break down complex goals into steps. The full reasoning trace is shown after each run.


API Reference

Method Endpoint Description Auth
POST /store Chunk → embed → persist text (optional source tag)
POST /store/async Async ingestion via background queue → 202 + jobId
POST /store/deduped Ingest with SHA-256 deduplication
GET /store List all vector store records
DELETE /store Remove all records from the vector store
DELETE /store/by-source/{source} Remove all chunks tagged with a specific source
GET /jobs/{jobId} Poll async ingestion job status
POST /ask Full RAG pipeline: retrieve → rerank → compress → LLM
POST /ask/stream SSE streaming RAG answer
POST /search Hybrid vector + keyword search with metadata filter
POST /agent Single-shot agent with automatic tool selection
POST /agent/solve Autonomous Think→Act→Observe reasoning loop with trace
POST /chat Stateful multi-turn chat with sliding window + memory
GET /chat/{sessionId} Retrieve full session message history
POST /auth/token Issue JWT for testing (dev only)
POST /evaluate Run 10-question RAG accuracy test set Admin
GET /audit Retrieve audit log entries Admin
GET /health Liveness health check
GET /health/detail Detailed per-component health (JSON)
GET /dashboard Internal vector store debug UI

Project Structure

StackPilot/
├── StackPilot.Api/
│   ├── Async/                   # Queue, Worker, Job Status
│   ├── Deployment/              # Cloud deployment guide
│   ├── Extensions/              # DI registration
│   ├── Middleware/              # Global exception handler
│   ├── Observability/           # OTel, Latency, Token Budget
│   ├── Persistence/             # SQLite chat history + audit log
│   ├── Plugins/                 # SK native function plugins (live vector store)
│   ├── Resilience/              # Polly retry + circuit breaker
│   ├── Search/                  # IReranker
│   ├── Security/                # JWT, RBAC, Tenancy, Guardrails, PII, Audit
│   ├── Storage/                 # IVectorStore + QdrantVectorStore
│   ├── wwwroot/                 # Compiled React UI (generated by npm run build)
│   ├── Program.cs               # Minimal API endpoints
│   ├── RagService.cs            # Core RAG orchestrator
│   ├── AgentService.cs          # Agentic reasoning loop
│   ├── ChatService.cs           # Stateful multi-turn chat
│   ├── HybridSearchService.cs   # RRF fusion search
│   ├── VectorStore.cs           # In-memory vector store
│   ├── SemanticCache.cs         # Embedding-similarity cache
│   ├── appsettings.json         # Kestrel port 5050 + all config sections
│   ├── .env.example             # Template for double-click launch
│   ├── Dockerfile
│   └── docker-compose.yml
├── StackPilot.UI/               # React 18 + Vite + Tailwind CSS
│   ├── src/
│   │   ├── api/client.ts        # Typed API client
│   │   ├── components/          # Sidebar navigation
│   │   ├── context/AppState.tsx # Shared state (chat, ingestion history)
│   │   └── pages/               # Chat, Playground, Logs, Agent
│   └── vite.config.ts           # Dev proxy → localhost:5050
├── StackPilot.Api.Tests/
│   ├── 158 tests across 24 test files
│   └── StackPilotApiFactory.cs  # WebApplicationFactory with DI stubs
├── VERSION                      # MAJOR.MINOR for auto-release versioning
└── .github/
    ├── workflows/ci.yml         # Build + test on every push
    └── workflows/release.yml    # Auto-release on every merge to main

CI / CD

Workflow Trigger What it does
CI (ci.yml) Every push to main, features, phase-*; every PR dotnet build + dotnet test (158 tests)
Release (release.yml) Every merge to main Builds React UI, publishes self-contained binaries for Linux / Windows / macOS, creates a GitHub Release with all three zips

Versioning: the VERSION file contains MAJOR.MINOR (e.g. 1.1). The GitHub run number is appended as the patch, producing tags like v1.1.42. To bump the major or minor version, edit VERSION and merge.


Key Engineering Decisions

Decision Chosen Why
Search strategy Hybrid RRF (vector + keyword) Neither alone is sufficient — keyword catches exact terms, vectors catch semantics; RRF fusion outperforms either individually
Plugin isolation ILlmService + IAgentService interfaces Decouples RAG and agent logic from Semantic Kernel — full test coverage with zero OpenAI dependency
Memory layers Sliding window + session store + semantic memory Mirrors human cognition: working memory (window), short-term (session), long-term (vector-embedded summaries)
Queue abstraction IJobQueue<T> over Channel<T> Azure Service Bus / RabbitMQ swap requires one new class; no changes to worker or endpoint code
Vector store abstraction IVectorStore interface Azure AI Search / Qdrant drop-in; all consumers remain unchanged
Source tagging metadata["source"] on every chunk Enables per-file deletion without tracking chunk IDs in the frontend; works for both sync and async ingestion
UI delivery React SPA compiled into wwwroot/, served by Kestrel No separate web server, no CORS configuration, no deploy step — the binary is the full product
Single-file publish PublishSingleFile=true + Environment.ProcessPath AppContext.BaseDirectory points to the temp extraction dir in single-file apps; ProcessPath finds the actual exe dir so .env and wwwroot are resolved correctly
Semantic cache threshold 0.97 cosine similarity Conservative — avoids returning a cached answer for a subtly different question; tunable per use case
Reranker default PassThroughReranker Zero token cost by default; LlmReranker activated by DI swap when quality > cost matters
Auth scope JWT on admin endpoints only Integration tests run without auth headers; production hardens all endpoints at the API Gateway layer

Full Demo

# 1. Store a log file with a source tag
curl -X POST http://localhost:5050/store \
  -H "Content-Type: application/json" \
  -d '{"text": "2024-01-15 ERROR PowerController voltage spike on channel 3", "source": "power-controller.log"}'

# 2. Ask a grounded question
curl -X POST http://localhost:5050/ask \
  -H "Content-Type: application/json" \
  -d '{"query": "What errors occurred in the power controller?", "topK": 3}'

# 3. Delete all chunks from a specific file
curl -X DELETE http://localhost:5050/store/by-source/power-controller.log

# 4. Run the autonomous reasoning agent
curl -X POST http://localhost:5050/agent/solve \
  -H "Content-Type: application/json" \
  -d '{"goal": "Check system health and search logs for any errors"}'

# 5. Start a stateful conversation
curl -X POST http://localhost:5050/chat \
  -H "Content-Type: application/json" \
  -d '{"sessionId": "demo-1", "userId": "farhad", "message": "What issues did you find in the logs?"}'

Roadmap

  • Phase 1: Production RAG Engine (Days 1–14)
  • Phase 2: Agentic Reasoning & Memory (Days 15–35)
  • Phase 3: Engineering & Scale (Days 36–70)
  • SQLite persistence for ChatHistory and AuditLog
  • Qdrant vector store (HTTP REST, Docker-compose included)
  • Polly retry + circuit breaker for all LLM calls
  • Global exception handler (no stack trace leaks)
  • Rate limiting (60 req/min per IP)
  • CORS policy (configurable origins)
  • Request size limits (10 MB max body)
  • Dev-only endpoints gated behind IsDevelopment()
  • Embedded React UI (Chat, Playground, Logs, Agent)
  • Double-click to launch — .env next to exe, no terminal needed
  • Log ingestion tab with drag-and-drop, sync/async modes, per-file deletion
  • Auto-release on PR merge to main (Linux / Windows / macOS binaries)
  • Swap InMemoryJobQueue → Azure Service Bus (for multi-instance deployments)
  • Add OTLP exporter → Jaeger / Azure Monitor
  • Apply JWT auth to all endpoints at the API Gateway layer

Author

Farhad Shariatzadeh — Senior AI Systems Engineer
Built as a structured programme to demonstrate production-grade AI backend engineering with .NET 10 and Semantic Kernel.

About

A production-grade Agentic AI orchestrator built with .NET 9 and Semantic Kernel, featuring multi-step reasoning loops, distributed RAG pipelines, and enterprise observability.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages