A production-grade Agentic AI orchestrator built with .NET 10 and Semantic Kernel 1.74 — featuring a full RAG pipeline, autonomous reasoning loops, stateful multi-turn chat, an embedded React UI, enterprise security, and cloud-ready deployment infrastructure.
StackPilot is a fully engineered AI backend demonstrating senior-level system design across the entire AI application stack: from raw text ingestion through hybrid retrieval, autonomous tool-using agents, stateful conversation memory, observability, and enterprise governance — all served through an embedded React web UI that ships inside the same single binary.
Covering every layer a production AI system needs, not just "call an LLM and return the answer."
- Go to Releases and download the zip for your OS
- Unzip it
- Open
.envin any text editor and set your OpenAI API key:OpenAI__ApiKey=sk-your-key-here OpenAI__ModelId=gpt-4o-mini OpenAI__EmbeddingModelId=text-embedding-3-small - Double-click
StackPilot.Api.exe(Windows) or run./StackPilot.Api(Linux/macOS) - Open http://localhost:5050 — the UI loads automatically
Windows SmartScreen warning: Click More info → Run anyway. This is expected for unsigned open-source binaries.
graph TB
User([User])
subgraph UI [Embedded React UI - served from wwwroot]
Chat[Chat Tab]
Playground[Playground Tab]
Logs[Log Ingestion Tab]
Agent[Agent Tab]
end
subgraph API [ASP.NET Core Minimal API]
Auth[JWT Auth + RBAC]
Guard[Guardrail Service]
PII[PII Masking]
Tenant[Tenant Middleware]
end
subgraph RAG [RAG Pipeline]
direction TB
Ingest[Ingestion - SK TextChunker]
Embed[Embedding - text-embedding-3-small]
Store[Vector Store - IVectorStore]
Hybrid[Hybrid Search - Vector + Keyword + RRF]
Rerank[Reranker]
Threshold[Score Threshold]
Compress[Context Compressor]
Cache[Semantic Cache]
RagSvc[RagService]
end
subgraph Agent [Agentic Layer]
direction TB
Plugins[SK Plugins - StackPilot / LogSearch / GitHub]
AgentSvc[AgentService - RunAsync + SolveAsync]
Loop[Think-Act-Observe Loop - MaxIterations=5]
end
subgraph Memory [Stateful Memory]
direction TB
ChatHist[Chat History - IChatHistoryStore]
Window[Sliding Window - MaxMessages=10]
Summarise[Conversation Summariser]
SemMem[Semantic Memory - Vector-stored summaries]
end
subgraph Async [Async Infrastructure]
direction TB
Queue[IJobQueue - Channel / Service Bus]
Worker[IngestionWorker - BackgroundService]
Status[Job Status API]
end
subgraph Obs [Observability]
direction TB
OTel[OpenTelemetry Tracing]
Latency[Latency Tracker]
Budget[Token Budget]
Health[Health Checks]
Audit[Audit Log]
end
LLM([OpenAI - gpt-4o / gpt-4o-mini])
User --> UI --> API
Auth --> Guard --> Tenant
Tenant --> RAG
Tenant --> Agent
Tenant --> Memory
PII --> Store
Ingest --> Embed --> Store
Store --> Hybrid --> Rerank --> Threshold --> Compress --> RagSvc
Cache --> RagSvc
RagSvc --> LLM
Plugins --> AgentSvc --> Loop --> LLM
ChatHist --> Window --> Summarise --> SemMem
Memory --> LLM
Queue --> Worker --> Store
Worker --> Status
RagSvc --> Latency
RagSvc --> Budget
AgentSvc --> Audit
OTel --> Health
- Text Chunking via SK
TextChunkerwith configurable token size and overlap (IngestionOptions) - Embeddings Pipeline using OpenAI
text-embedding-3-smallviaIEmbeddingGenerator<string, Embedding<float>>with 96-chunk batching to stay within OpenAI's 300k token/request limit - Hybrid Search combining cosine vector similarity + TF keyword scoring fused with Reciprocal Rank Fusion (RRF, k=60)
- Score Thresholding — configurable quality gate rejects low-confidence chunks before prompting
- Prompt Optimisation — 6-rule anti-hallucination system instruction, forbidden phrases, fully configurable
- Response Caching —
IResponseCache/MemoryResponseCache, Redis-swap-ready - OpenAPI + Scalar UI at
/scalar/v1— all endpoints typed, tagged, and described
- Native Functions —
[KernelFunction]plugins: system status, log search (live vector store), live GitHub API - Automatic Tool Selection —
FunctionChoiceBehavior.Auto()lets the LLM choose tools - Think → Act → Observe Loop — explicit ReAct implementation with
MaxIterationssafety cap - Full Reasoning Trace — every step (thought / tool / observation) returned in
AgentResponse - Stateful Multi-Turn Chat —
IChatHistoryStore, sliding window, LLM-driven summarisation - Semantic Memory — past conversation summaries embedded and scoped by
userId
- Async Ingestion Queue —
IJobQueue<T>overSystem.Threading.Channels,BackgroundServiceworker, job status polling - Semantic Reranker — retrieve 20 chunks, rerank to top 5 via LLM scoring (
IReranker) - Metadata Filtering —
SearchFilter(TenantId, Source, AfterDate) on all search paths - SHA-256 Deduplication — prevents storing identical chunks twice
- Context Compression — long contexts summarised before the LLM call to reduce token cost
- OpenTelemetry Tracing — AspNetCore + HTTP instrumentation, OTLP-swappable
- Per-Stage Latency Tracking — Retrieval, Reranking, LLM Inference as structured log events
- Token Budget Alerts — cost estimate per query with configurable
MaxCostPerQueryUsd - Semantic Cache — cosine similarity check against cached query embeddings (threshold: 0.97)
- JWT Authentication + RBAC — Bearer token auth;
Admin/User/Readerroles - Soft Multi-Tenancy —
X-Tenant-Idheader; all vector records tagged withtenantId - Prompt Injection Guardrails — static pattern detection on all user-facing inputs
- PII Masking — email, SSN, phone, credit card regex applied before embedding
- Tamper-Resistant Audit Log — every query and agent decision recorded
- Health Checks —
/healthand/health/detailwith per-component status - Dockerfile + docker-compose — multi-stage .NET 10 build; API + Redis in one command
- GitHub Actions CI — build + 158 tests on every push; automatic release on every PR merge to
main
- Embedded React UI — Chat, Playground, Log Ingestion, and Agent tabs compiled into
wwwroot/and served directly from the binary viaUseStaticFiles()+ SPA fallback - Double-click to launch — reads
.envfrom the directory next to the exe; Kestrel bound to port 5050 viaappsettings.json; no terminal or environment variable setup required - Log Ingestion tab — drag-and-drop or paste log files; immediate (sync) or background (async) ingestion with live job status polling; ingestion history persists across tab navigation
- Per-source file deletion — each ingested file is tagged with
metadata["source"]; the trash icon on any history entry removes only that file's chunks from the vector store viaDELETE /store/by-source/{source} - Auto-release on PR merge —
release.ymltriggers on every push tomain; version isMAJOR.MINOR(fromVERSIONfile) + GitHub run number as patch; builds self-contained single-file binaries for Linux, Windows, and macOS in parallel, then publishes a GitHub Release with all three zips attached - Chat history persistence — React Context keeps chat state alive across tab switches; fixed stale-closure bug that caused user messages to disappear from the UI
See Quick Start above. No SDK or terminal required.
Prerequisites: .NET 10 SDK, Node.js 20+, an OpenAI API key
# 1. Build the React UI
cd StackPilot.UI
npm install
npm run build # outputs to StackPilot.Api/wwwroot/
cd ..
# 2. Configure secrets
cd StackPilot.Api
dotnet user-secrets set "OpenAI:ApiKey" "sk-..."
dotnet user-secrets set "OpenAI:ModelId" "gpt-4o-mini"
dotnet user-secrets set "OpenAI:EmbeddingModelId" "text-embedding-3-small"
# 3. Run
dotnet run| URL | Purpose |
|---|---|
http://localhost:5050 |
Embedded React UI |
http://localhost:5050/health |
Health check |
http://localhost:5050/scalar/v1 |
Interactive API docs (dev only) |
http://localhost:5050/dashboard |
Vector store debug view |
export OPENAI_API_KEY=sk-...
docker-compose up --builddotnet test
# 158 tests, 0 failuresStateful multi-turn conversation with semantic memory. Chat history persists when switching tabs. Use New session to start fresh; change the User field to test per-user memory isolation.
Use this tab to ask questions about logs you've uploaded — the agent automatically searches your ingested content.
Upload log files (drag-and-drop or file picker), paste raw text, or mix both. Choose Immediate (sync) or Background (async) mode. The ingestion history lists every uploaded file with its chunk count and timestamp.
To remove a file from the vector store: click the 🗑️ icon on any history entry and confirm. Only that file's chunks are deleted — everything else is untouched.
To wipe everything: use the Clear Vector Store button at the bottom of the page. This also clears the ingestion history.
Recommended workflow: Clear → upload your file → ask questions in Chat or Playground.
Direct access to the RAG /ask endpoint and the /store endpoint. Good for testing retrieval quality and inspecting which source chunks were used in an answer.
Runs the autonomous Think→Act→Observe reasoning loop. The agent uses tools (vector store search, system status, GitHub API) to break down complex goals into steps. The full reasoning trace is shown after each run.
| Method | Endpoint | Description | Auth |
|---|---|---|---|
POST |
/store |
Chunk → embed → persist text (optional source tag) |
— |
POST |
/store/async |
Async ingestion via background queue → 202 + jobId | — |
POST |
/store/deduped |
Ingest with SHA-256 deduplication | — |
GET |
/store |
List all vector store records | — |
DELETE |
/store |
Remove all records from the vector store | — |
DELETE |
/store/by-source/{source} |
Remove all chunks tagged with a specific source | — |
GET |
/jobs/{jobId} |
Poll async ingestion job status | — |
POST |
/ask |
Full RAG pipeline: retrieve → rerank → compress → LLM | — |
POST |
/ask/stream |
SSE streaming RAG answer | — |
POST |
/search |
Hybrid vector + keyword search with metadata filter | — |
POST |
/agent |
Single-shot agent with automatic tool selection | — |
POST |
/agent/solve |
Autonomous Think→Act→Observe reasoning loop with trace | — |
POST |
/chat |
Stateful multi-turn chat with sliding window + memory | — |
GET |
/chat/{sessionId} |
Retrieve full session message history | — |
POST |
/auth/token |
Issue JWT for testing (dev only) | — |
POST |
/evaluate |
Run 10-question RAG accuracy test set | Admin |
GET |
/audit |
Retrieve audit log entries | Admin |
GET |
/health |
Liveness health check | — |
GET |
/health/detail |
Detailed per-component health (JSON) | — |
GET |
/dashboard |
Internal vector store debug UI | — |
StackPilot/
├── StackPilot.Api/
│ ├── Async/ # Queue, Worker, Job Status
│ ├── Deployment/ # Cloud deployment guide
│ ├── Extensions/ # DI registration
│ ├── Middleware/ # Global exception handler
│ ├── Observability/ # OTel, Latency, Token Budget
│ ├── Persistence/ # SQLite chat history + audit log
│ ├── Plugins/ # SK native function plugins (live vector store)
│ ├── Resilience/ # Polly retry + circuit breaker
│ ├── Search/ # IReranker
│ ├── Security/ # JWT, RBAC, Tenancy, Guardrails, PII, Audit
│ ├── Storage/ # IVectorStore + QdrantVectorStore
│ ├── wwwroot/ # Compiled React UI (generated by npm run build)
│ ├── Program.cs # Minimal API endpoints
│ ├── RagService.cs # Core RAG orchestrator
│ ├── AgentService.cs # Agentic reasoning loop
│ ├── ChatService.cs # Stateful multi-turn chat
│ ├── HybridSearchService.cs # RRF fusion search
│ ├── VectorStore.cs # In-memory vector store
│ ├── SemanticCache.cs # Embedding-similarity cache
│ ├── appsettings.json # Kestrel port 5050 + all config sections
│ ├── .env.example # Template for double-click launch
│ ├── Dockerfile
│ └── docker-compose.yml
├── StackPilot.UI/ # React 18 + Vite + Tailwind CSS
│ ├── src/
│ │ ├── api/client.ts # Typed API client
│ │ ├── components/ # Sidebar navigation
│ │ ├── context/AppState.tsx # Shared state (chat, ingestion history)
│ │ └── pages/ # Chat, Playground, Logs, Agent
│ └── vite.config.ts # Dev proxy → localhost:5050
├── StackPilot.Api.Tests/
│ ├── 158 tests across 24 test files
│ └── StackPilotApiFactory.cs # WebApplicationFactory with DI stubs
├── VERSION # MAJOR.MINOR for auto-release versioning
└── .github/
├── workflows/ci.yml # Build + test on every push
└── workflows/release.yml # Auto-release on every merge to main
| Workflow | Trigger | What it does |
|---|---|---|
CI (ci.yml) |
Every push to main, features, phase-*; every PR |
dotnet build + dotnet test (158 tests) |
Release (release.yml) |
Every merge to main |
Builds React UI, publishes self-contained binaries for Linux / Windows / macOS, creates a GitHub Release with all three zips |
Versioning: the VERSION file contains MAJOR.MINOR (e.g. 1.1). The GitHub run number is appended as the patch, producing tags like v1.1.42. To bump the major or minor version, edit VERSION and merge.
| Decision | Chosen | Why |
|---|---|---|
| Search strategy | Hybrid RRF (vector + keyword) | Neither alone is sufficient — keyword catches exact terms, vectors catch semantics; RRF fusion outperforms either individually |
| Plugin isolation | ILlmService + IAgentService interfaces |
Decouples RAG and agent logic from Semantic Kernel — full test coverage with zero OpenAI dependency |
| Memory layers | Sliding window + session store + semantic memory | Mirrors human cognition: working memory (window), short-term (session), long-term (vector-embedded summaries) |
| Queue abstraction | IJobQueue<T> over Channel<T> |
Azure Service Bus / RabbitMQ swap requires one new class; no changes to worker or endpoint code |
| Vector store abstraction | IVectorStore interface |
Azure AI Search / Qdrant drop-in; all consumers remain unchanged |
| Source tagging | metadata["source"] on every chunk |
Enables per-file deletion without tracking chunk IDs in the frontend; works for both sync and async ingestion |
| UI delivery | React SPA compiled into wwwroot/, served by Kestrel |
No separate web server, no CORS configuration, no deploy step — the binary is the full product |
| Single-file publish | PublishSingleFile=true + Environment.ProcessPath |
AppContext.BaseDirectory points to the temp extraction dir in single-file apps; ProcessPath finds the actual exe dir so .env and wwwroot are resolved correctly |
| Semantic cache threshold | 0.97 cosine similarity | Conservative — avoids returning a cached answer for a subtly different question; tunable per use case |
| Reranker default | PassThroughReranker |
Zero token cost by default; LlmReranker activated by DI swap when quality > cost matters |
| Auth scope | JWT on admin endpoints only | Integration tests run without auth headers; production hardens all endpoints at the API Gateway layer |
# 1. Store a log file with a source tag
curl -X POST http://localhost:5050/store \
-H "Content-Type: application/json" \
-d '{"text": "2024-01-15 ERROR PowerController voltage spike on channel 3", "source": "power-controller.log"}'
# 2. Ask a grounded question
curl -X POST http://localhost:5050/ask \
-H "Content-Type: application/json" \
-d '{"query": "What errors occurred in the power controller?", "topK": 3}'
# 3. Delete all chunks from a specific file
curl -X DELETE http://localhost:5050/store/by-source/power-controller.log
# 4. Run the autonomous reasoning agent
curl -X POST http://localhost:5050/agent/solve \
-H "Content-Type: application/json" \
-d '{"goal": "Check system health and search logs for any errors"}'
# 5. Start a stateful conversation
curl -X POST http://localhost:5050/chat \
-H "Content-Type: application/json" \
-d '{"sessionId": "demo-1", "userId": "farhad", "message": "What issues did you find in the logs?"}'- Phase 1: Production RAG Engine (Days 1–14)
- Phase 2: Agentic Reasoning & Memory (Days 15–35)
- Phase 3: Engineering & Scale (Days 36–70)
- SQLite persistence for ChatHistory and AuditLog
- Qdrant vector store (HTTP REST, Docker-compose included)
- Polly retry + circuit breaker for all LLM calls
- Global exception handler (no stack trace leaks)
- Rate limiting (60 req/min per IP)
- CORS policy (configurable origins)
- Request size limits (10 MB max body)
- Dev-only endpoints gated behind
IsDevelopment() - Embedded React UI (Chat, Playground, Logs, Agent)
- Double-click to launch —
.envnext to exe, no terminal needed - Log ingestion tab with drag-and-drop, sync/async modes, per-file deletion
- Auto-release on PR merge to
main(Linux / Windows / macOS binaries) - Swap
InMemoryJobQueue→ Azure Service Bus (for multi-instance deployments) - Add OTLP exporter → Jaeger / Azure Monitor
- Apply JWT auth to all endpoints at the API Gateway layer
Farhad Shariatzadeh — Senior AI Systems Engineer
Built as a structured programme to demonstrate production-grade AI backend engineering with .NET 10 and Semantic Kernel.