🔒 Secure LLM Assistant

Air-gapped, on-prem LLM platform for translating legacy Java codebases and requirements into idiomatic Python — with zero internet egress and a full audit trail.

🚀 What Is This & Why Should You Care?

If your team is staring down a mountain of legacy Java code that needs to become Python — and you work in a classified, air-gapped, or otherwise locked-down environment where sending code to OpenAI or any cloud API is simply not an option — this is the platform built for that exact problem.

Secure LLM Assistant runs a full AI-powered code modernisation stack entirely inside your network. No packets leave. No model calls an external server. No code ever touches a vendor's API. The LLM runs on your GPU node. The audit log stays on your hardware.

🎯 The Core Job: Java → Python, Done Right

The platform's primary mission is translating legacy Java codebases into production-quality Python 3.12+. This isn't a simple find-and-replace — it's a structured, context-aware conversion pipeline that understands Java semantics:

Parses your Java first using a pure-Python Java parser (javalang) — no JVM required. It extracts class names, inheritance chains, method signatures, field types, and import graphs before a single token goes to the LLM.
Enriches the prompt with that metadata so the LLM understands the class structure, not just the raw text. A HashMap<String, List<Order>> becomes dict[str, list[Order]] — not dict.
Applies a translation style you choose — "idiomatic" for maximum Pythonic output (dataclasses, comprehensions, context managers), "typed" for strict mypy/pyright-compatible annotations, or "literal" for a line-by-line reference mapping that makes audits easy.
Handles whole projects, not just single files. Submit a {filename: source} dict of your entire Java package. The platform builds a dependency graph, runs a topological sort (Kahn's algorithm), and translates base classes before subclasses — so OrderService already sees a translated Order class when it's generated. No broken imports. No undefined references.
Remembers your conversation. Each translation returns a session_id. Pass it back and say "now make all methods async" or "add type annotations throughout" — the LLM sees the full prior context and refines the output iteratively, exactly the way a human pair-programmer would.

Full Java → Python type mapping reference — every primitive, collection, and pattern covered — is in the Java → Python Translation section.

🛡️ Security Is Not An Afterthought — It's The Architecture

Every request passes through five independent defence layers regardless of which endpoint is called:

_Layer	_{What It Does}	_{Why It Matters}
_{① Input Guardrail}	_{Blocks 8 families of prompt injection patterns, detects credentials/secrets in the input, enforces size limits}	_{An attacker cannot use the translate endpoint as an LLM jailbreak vector}
_{② JWT + RBAC}	_{RS256 asymmetric token validation against your internal IdP; role → permission mapping enforced per endpoint}	_{The orchestrator holds only the public key — it can verify tokens but cannot mint them, so a compromised orchestrator cannot impersonate users}
_{③ Hardened System Prompt}	_{Anti-override and anti-exfiltration directives that the LLM cannot ignore via user input}	_{Prevents "ignore all previous instructions" style attacks from succeeding even if the input guardrail is somehow bypassed}
_{④ Output Guardrail}	_{Scrubs credentials, secrets, and PII patterns from every LLM response before it reaches the caller}	_{Prevents an LLM that hallucinates a credential string from leaking it to an unprivileged caller}
_{⑤ Immutable Audit Log}	_{Append-only JSONL log of every action — metadata only, no raw code or prompts recorded}	_{Provides a tamper-evident record for STIG compliance and incident response}

Provider Lock (PROVIDER_LOCK=true) is the IT-administered configuration freeze: once set in the Kubernetes Secret, no developer, env var injection, or misconfiguration can redirect LLM traffic to a public internet endpoint. The assert_egress_url_safe() function blocks public cloud domains at the application layer; the Kubernetes NetworkPolicy drops those packets at the CNI layer. Two independent layers that must both be defeated to exfiltrate data.

PRC-origin models are permanently blocked. core/provider_lock.py rejects Qwen (Alibaba), DeepSeek, Baichuan, InternLM, ChatGLM/GLM, MiniMax, and Moonshot/Kimi at startup and on every call. No configuration option re-enables them. This is a hard-coded supply-chain control, not a policy setting.

Built-in static analysis catches dangerous Python patterns in translated output before it reaches the caller: eval/exec calls, pickle deserialisation, subprocess injection, and hardcoded credentials. The Java analyser runs the same checks on Java input to enrich translation prompts with security context.

🔧 Why These Technologies?

Every component in the stack was chosen for a specific reason — not defaults, not hype:

_Technology	_{Why It Was Chosen (Not Just What It Is)}
_{Python 3.12 + FastAPI}	_{FastAPI's Depends() injection chain is ideal for stacking auth → guardrail → session → LLM in a single readable pipeline. Pydantic v2 validates every request at the boundary — malformed input raises before any business logic runs. The ast stdlib module is used for Python static analysis with no external dependencies.}
_{RS256 JWT (PyJWT)}	_{Asymmetric signing means the orchestrator only needs the public key. A compromised service pod cannot mint tokens. HS256 would require distributing a shared secret to every service — a supply-chain risk in a classified environment.}
_{pydantic-settings}	_{Fail-fast config validation at import time. If LLM_ENDPOINT is not set, the service refuses to start with a clear error instead of silently failing on the first LLM call. Catches misconfiguration before a deployment goes live.}
_{javalang (pure Python Java parser)}	_{No JVM needed on the inference node. Parses Java source into an AST with class structure, method signatures, and import graphs — without executing any Java. Running a JVM in an air-gapped environment creates an additional attack surface and heavyweight dependency.}
_{httpx (async HTTP)}	_{All LLM calls use httpx.AsyncClient with a per-request 120-second timeout. No global session means no shared state between requests. The assert_egress_url_safe() check runs on every constructed URL before the connection opens — not at config time.}
_{PostgreSQL 16 + pgvector}	_{RAG context enrichment without standing up a separate vector database service. pgvector's ivfflat approximate nearest-neighbour search is fast enough for code retrieval workloads. Reuses existing Postgres infrastructure the ops team already knows, monitors, and backs up.}
_{nomic-embed-text}	_{8192-token context window covers entire Java class files in a single embedding — no chunking needed for most real-world classes. Apache 2.0 licence. Runs entirely self-hosted via Ollama. No calls to an external embedding API.}
_{Ollama (dev) + vLLM (prod)}	_{Both expose an OpenAI-compatible /v1/chat/completions API, so the same llm_client.py works for both with a one-env-var switch. Ollama is one-command for a developer laptop. vLLM's PagedAttention and tensor parallelism handle the 70B+ model batching required for production throughput on a GPU cluster.}
_{boto3 + AWS Bedrock GovCloud}	_{For teams that need FedRAMP High / IL4/IL5 authorisation without managing GPU hardware, Bedrock GovCloud provides Llama 3 and Claude 3.5 Sonnet with DISA authorization. boto3's Signature V4 is handled automatically; the BedrockGovProvider enforces us-gov-west-1 / us-gov-east-1 regions at startup and blocks all PRC-origin model IDs.}
_{redis>=5.0 (session backend)}	_{The default in-process session store is single-replica only. Set SESSION_BACKEND=redis to switch to the Redis/Valkey-backed store for horizontal scaling across multiple orchestrator pods — no code changes, no API differences, just one env var.}
_{Kubernetes + NetworkPolicy}	_{NetworkPolicy operates at the CNI layer — it cannot be bypassed by application code. Default-deny with an explicit allowlist means new egress paths cannot appear accidentally. Rolling deploys ensure zero-downtime updates. Namespace isolation limits blast radius if a pod is compromised.}
_{difflib (stdlib diff translation)}	_{The /translate-diff endpoint computes unified diffs between before/after Java versions and translates only the changed hunks — not the whole file. This means a 2-line method change in a 2,000-line class sends ~40 lines to the LLM, not 2,000. Uses only Python stdlib; no additional dependency.}

📋 Quick Capability Summary

_{What You Can Do}	_Endpoint	_Notes
_{Translate a Java class to Python}	_{POST /api/v1/translate}	_{Choose idiomatic, typed, or literal style}
_{Translate a whole Java project}	_{POST /api/v1/translate-project}	_{Dependency-ordered; handles inheritance graphs}
_{Convert requirements docs to Python stubs}	_{POST /api/v1/translate-requirements}	_{One typed stub + one pytest stub per requirement}
_{Translate only the changed lines (diff)}	_{POST /api/v1/translate-diff}	_{Submit before/after Java; only changed hunks are translated}
_{Ask follow-up questions about translated code}	_{POST /api/v1/chat}	_{Full multi-turn session memory}
_{Get an OWASP code review}	_{POST /api/v1/review}	_{Python and Java supported}
_{Generate a pytest or JUnit 5 test suite}	_{POST /api/v1/generate-tests}	_{Engineers: only; contractors: blocked by RBAC}
_{Check if a model has a US government ATO}	_{POST /api/v1/evaluate-model-ato}	_{Scores 6 supply-chain criteria; returns tier + report}
_{Analyse algorithm complexity (Big-O)}	_{POST /api/v1/analyze-algorithm}	_{With session memory for iterative refinement}

🔍 Overview

The Secure LLM Assistant is an internal, air-gapped LLM orchestration platform purpose-built for engineering teams modernising legacy Java systems. Its primary mission is to translate Java source code into production-quality Python 3.12+ and convert legacy requirements documents into typed, runnable Python scaffolds — all without a single packet leaving the classified network.

Important

This system makes zero external network calls. Every LLM request, embedding operation, and authentication check routes exclusively to internal services. The K8s NetworkPolicy enforces this at the infrastructure layer — there is no outbound internet rule and none will be added.

Who is it for? Software engineering teams on classified or high-security networks who need to modernise a Java codebase to Python but cannot use cloud LLM APIs. Engineers submit code through the VS Code extension or web app; the platform returns idiomatic, type-annotated Python with an immutable, metadata-only audit record.

What problem does it solve? Large Java codebases (100k–2M LOC) take years to rewrite manually. This platform accelerates that work with an LLM that runs on your GPU node, understands Java class structure via a pure-Python parser, and enforces consistent idiomatic output through explicit, verifiable translation rules — with no IP leaving the building.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
.vscode		.vscode
docker		docker
docs		docs
frontend/jetbrains-plugin		frontend/jetbrains-plugin
infra		infra
security/access-control-policies		security/access-control-policies
services		services
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md

_Icon	_Feature	_Description	_Impact	_Status
_🔄	_{Java → Python Translation}	_{Converts legacy Java to idiomatic Python 3.12+ with type hints, dataclasses, and Pythonic patterns}	_Primary	_{✅ Stable}
_�️	_{Multi-file Project Translation}	_{Dependency graph + topological sort translates whole Java projects in base-class-first order}	_Primary	_{✅ Stable}
_📋	_{Requirements → Python Scaffold}	_{Translates legacy requirements docs into typed function stubs + pytest stubs}	_Primary	_{✅ Stable}
_💬	_{Multi-turn Session Memory}	_{Per-user conversation history with TTL, sliding-window budget, and auto session IDs}	_High	_{✅ Stable}
_🧠	_{RAG Context Enrichment}	_{pgvector semantic search injects internal code context into every translation prompt}	_High	_{✅ Stable}
_🛡️	_{Prompt Injection Defence}	_{Regex guardrail blocks 8 injection pattern families before any LLM call}	_Critical	_{✅ Stable}
_🔑	_{JWT/OIDC + RBAC}	_{RS256 token validation against internal IdP; role→permission mapping}	_Critical	_{✅ Stable}
_🔍	_{Python AST Security Scan}	_{Detects eval/exec/pickle/subprocess injection and hardcoded credentials}	_High	_{✅ Stable}
_☕	_{Java Structural Analysis}	_{Pure-Python Java parser (no JVM); enriches translation prompts with class metadata}	_High	_{✅ Stable}
_📝	_{Immutable Audit Trail}	_{Append-only JSONL log; metadata only — no raw code or prompts ever recorded}	_Critical	_{✅ Stable}
_🔒	_{Output Redaction}	_{Strips secrets/credentials from every LLM response before reaching the caller}	_Critical	_{✅ Stable}
_🔌	_{Pluggable LLM Backend + Provider Lock}	_{Switch between on-prem (Ollama/vLLM) and Azure OpenAI Government via env vars; PROVIDER_LOCK=true freezes the config and blocks all PRC-origin models + public egress at startup}	_Critical	_{✅ Stable}
_�🚀	_{One-command Deploy}	_{Docker Compose for dev; K8s manifests for production}	_High	_{✅ Stable}

_Component	_File	_{Responsibility}
_{API Router}	_{src/api/routes.py}	_{9 endpoints; guardrail + session + audit enforcement on all}
_Auth	_{src/core/auth.py}	_{RS256 JWT validation; RBAC permission check}
_{Session Store}	_{src/core/session_store.py}	_{In-process per-user conversation history; TTL=3600s; sliding-window budget}
_{LLM Client}	_{src/core/llm_client.py}	_{Internal-only httpx; temp=0; hardened system prompt; history injection}
_{Input Guard}	_{src/guardrails/input_guard.py}	_{Injection · credential · length defence}
_{Output Guard}	_{src/guardrails/output_guard.py}	_{Credential redaction from LLM responses}
_{Translation Tools}	_{src/tools/translation_tools.py}	_{Java→Python · Requirements→Python prompt builders; 3 style directives}
_{Project Translator}	_{src/tools/project_translator.py}	_{Dependency graph + Kahn topological sort for multi-file projects}
_{Java Analyzer}	_{src/tools/java_analyzer.py}	_{Pure-Python Java parser; no JVM required}
_{Python Analyzer}	_{src/tools/python_analyzer.py}	_{AST-based SAST; complexity and security scanning}
_{RAG Retriever}	_{src/rag/retriever.py}	_{Embed query → asyncpg → pgvector top-k → formatted context string}
_{RAG Indexer}	_{services/rag-indexer-python/src/indexer.py}	_{pgvector chunked embedding indexer}

_Java	_Python	_Notes
_{int / long / short}	_int	_{Python int is arbitrary precision}
_{float / double}	_float
_boolean	_bool
_char	_str	_{Single character string}
_String	_str
_{void return}	_{-> None}
_byte[]	_bytes
_Object	_Any	_{From typing}
_ArrayList<T>	_list[T]
_{LinkedList<T>}	_{collections.deque[T]}
_HashMap<K,V>	_{dict[K, V]}
_{LinkedHashMap<K,V>}	_{dict[K, V]}	_{Insertion-ordered since Python 3.7}
_HashSet<T>	_set[T]
_Optional<T>	_{`T </sub>}	_None`
_T[]	_list[T]
_{abstract class}	_{class Foo(ABC)}	_{from abc import ABC, abstractmethod}
_interface	_{class Foo(Protocol)}	_{from typing import Protocol}
_enum	_{class Foo(enum.Enum)}
_{getter/setter pair}	_{@property + setter}
_{static final field}	_{Module-level CONSTANT}
_{Builder pattern}	_{@dataclass or kwargs}
_{try-with-resources}	_{with statement}	_{Context manager}
_switch/case	_match/case	_{Python 3.10+}
_{String.format(...)}	_f-string
_{System.out.println(...)}	_{print() / logging.info()}
_instanceof	_isinstance()
_null	_None

_Method	_Endpoint	_Permission	_{Session Memory}	_Description
_POST	_{/api/v1/translate}	_translate	_✅	_{Java source → idiomatic Python 3.12+ (RAG-enriched)}
_POST	_{/api/v1/translate-project}	_translate	_✅	_{Multi-file Java project → Python package (dependency-ordered, RAG-enriched)}
_POST	_{/api/v1/translate-requirements}	_translate	_✅	_{Requirements doc → typed Python scaffold (RAG-enriched)}
_POST	_{/api/v1/assist}	_{code_assist}	_✅	_{General code assistance (Python & Java)}
_POST	_{/api/v1/review}	_review	_✅	_{OWASP-informed code review}
_POST	_{/api/v1/analyze-algorithm}	_{code_assist}	_✅	_{Big-O complexity analysis}
_POST	_{/api/v1/generate-tests}	_{test_gen}	_✅	_{pytest / JUnit 5 test scaffold generation}
_POST	_/api/v1/chat	_{code_assist}	_✅	_{Pure multi-turn conversational endpoint}
_DELETE	_{/api/v1/session/{session_id}}	_{code_assist}	_—	_{Clear conversation history for a session}
_GET	_{/api/v1/health}	_none	_—	_{Kubernetes liveness / readiness probe}

_Agent	_{Primary Responsibility}	_Inputs	_Outputs	_{Where It Lives}
_{Orchestrator Agent}	_{Coordinates end-to-end request flow and policy enforcement}	_{API request, user identity, config}	_{Final API response, audit event}	_{services/orchestrator-python/src/main.py, services/orchestrator-python/src/api/routes.py}
_{Auth/RBAC Agent}	_{Verifies JWT and enforces per-endpoint permissions}	_{Bearer token, required permission}	_{Authorized user context or 401/403}	_{services/orchestrator-python/src/core/auth.py}
_{Input Guardrail Agent}	_{Blocks prompt injection, secret leakage, and oversized payloads before model access}	_{Raw code/prompt text}	_{Sanitized text or 400}	_{services/orchestrator-python/src/guardrails/input_guard.py}
_{Java Analysis Agent}	_{Parses Java structure for class metadata and dependency extraction}	_{Java source code}	_{JavaClassInfo metadata}	_{services/orchestrator-python/src/tools/java_analyzer.py}
_{Project Translation Planner Agent}	_{Builds dependency graph and topological order for multi-file translation}	_{{filename: java_source} project map}	_{Ordered translation plan, cycle detection}	_{services/orchestrator-python/src/tools/project_translator.py}
_{RAG Retrieval Agent}	_{Retrieves relevant internal context from pgvector}	_{Query built from class metadata and prompt}	_{Ranked context snippets}	_{services/orchestrator-python/src/rag/retriever.py}
_{Prompt Construction Agent}	_{Produces deterministic translation/review/test prompts with fixed rules}	_{Sanitized input, metadata, style, RAG context}	_{LLM-ready prompt}	_{services/orchestrator-python/src/tools/translation_tools.py}
_{LLM Provider Agent}	_{Sends requests to approved model backends and enforces egress/provider lock policy}	_{Prompt + chat history + provider settings}	_{Raw model response}	_{services/orchestrator-python/src/core/llm_client.py, services/orchestrator-python/src/core/provider_lock.py}
_{Output Guardrail Agent}	_{Redacts secrets and policy-violating output patterns}	_{Raw model response}	_{Safe response payload}	_{services/orchestrator-python/src/guardrails/output_guard.py}
_{Session Memory Agent}	_{Maintains per-user conversational state and TTL cleanup}	_{user_sub, session_id, messages}	_{Retrieved/appended conversation history}	_{services/orchestrator-python/src/core/session_store.py, services/orchestrator-python/src/core/session_backend.py}
_{Audit Agent}	_{Writes immutable metadata-only event records for traceability}	_{Action metadata, user id, tool/model id, blocked flag}	_{JSONL audit event}	_{services/orchestrator-python/src/core/logging.py}
_{Client Agents (VS Code/Web)}	_{Collect user requests and render responses for engineers}	_{User actions and API payloads}	_{Structured calls to orchestrator APIs}	_{frontend/vscode-extension/, frontend/web-app/}

_Direction	_{Allowed Peer}	_Port	_Purpose
_Ingress	_{Internal API gateway}	₈₀₀₀	_{API traffic}
_Egress	_{LLM inference service}	₈₀₈₀	_{LLM calls}
_Egress	_{pgvector (PostgreSQL)}	₅₄₃₂	_{RAG index}
_Egress	_{Internal IdP}	₄₄₃	_{JWKS / token validation}
_Any	_Internet	_any	_{❌ Blocked — no rule exists}

_Step	_Name	_{What Happens}
₁	_Categorize	_{Classify the system's information (Confidentiality / Integrity / Availability impact levels per FIPS 199)}
₂	_Select	_{Choose security controls from NIST SP 800-53 appropriate to the impact level}
₃	_Implement	_{Apply the selected controls to the system}
₄	_Assess	_{An independent assessor (3PAO for FedRAMP, or a DoD SCA for IL systems) verifies the controls work}
₅	_Authorize	_{The Authorizing Official reviews residual risk and signs the ATO (or issues a Denial of ATO)}
₆	_Monitor	_{Continuous monitoring — vulnerabilities, configuration drift, and new threats are tracked; ATO is re-evaluated annually or on significant change}

_Model	_{Country of Origin}	_{Cloud Required?}	_{Formal US Gov Authorization}	_{Set LLM_MODEL to…}
_{Llama 3.3 70B Instruct}	_{🇺🇸 USA (Meta)}	_{❌ On-prem}	_{⚠️ No blanket ATO; US-origin; RMF-processable}	_llama3.3:70b
_{Llama 3.1 405B}	_{🇺🇸 USA (Meta)}	_{❌ On-prem}	_{⚠️ No blanket ATO; highest open-weight capability}	_{llama3.1:405b}
_{Mistral Large / Codestral}	_{🇫🇷 France (EU)}	_{❌ On-prem}	_{⚠️ No US ATO; non-PRC; EU data jurisdiction}	_{mistral-large}
_{Defense Llama}	_{🇺🇸 USA (Scale AI + Meta)}	_{❌ Gov-controlled env}	_{✅ Deployed in classified gov environments (Nov 2024) via Scale Donovan}	_{Contact Scale AI}
_{Claude 3.5 Sonnet v1 / Haiku}	_{🇺🇸 USA (Anthropic)}	_{✅ AWS GovCloud}	_{✅ FedRAMP High + DoD IL4/IL5 — AWS Bedrock GovCloud (May 2025)}	_{via Bedrock adapter}
_{Azure OpenAI GPT-4o}	_{🇺🇸 USA (Microsoft/OpenAI)}	_{✅ Azure Gov cloud}	_{✅ All levels: FedRAMP High (Aug 2024) · IL4/IL5 (Sep 2024) · IL6 Secret (Feb 2025) · ICD-503 Top Secret (Jan 2025)}	_{(set LLM_PROVIDER=azure)}
_{Azure OpenAI GPT-4o-mini}	_{🇺🇸 USA (Microsoft/OpenAI)}	_{✅ Azure Gov cloud}	_{✅ Same authorization as GPT-4o; lower cost}	_{(set LLM_PROVIDER=azure)}
_{Azure Local + Foundry Local}	_{🇺🇸 USA (Microsoft)}	_{❌ Fully on-prem}	_{✅ Runs large models fully disconnected — no cloud needed (Feb 2026)}	_{Contact Microsoft Federal}

_Backend	_{Where Does It Run?}	_{Who Controls Weights?}	_{Min VRAM}	_{Connectivity Needed}
_{Ollama (dev)}	_{Your GPU node (Docker)}	_{You (pulled once)}	_{8 GB (quant)}	_{None after pull}
_{vLLM (prod)}	_{Your GPU cluster (K8s)}	_You	_{48 GB (Llama 3.3 70B) / 400 GB (405B)}	_None
_{Azure OpenAI Gov}	_{Microsoft Azure Gov data centers}	_{Microsoft / OpenAI}	_{None (cloud)}	_{HTTPS to Azure Gov endpoint}
_{Azure Local + Foundry Local}	_{Your on-prem hardware (NVIDIA GPU)}	_{You (Microsoft-managed models)}	_{Large GPU required}	_{None after initial setup}
_{AWS Bedrock GovCloud}	_{Amazon GovCloud region}	_{Amazon / Meta / Anthropic}	_{None (cloud)}	_{HTTPS to AWS GovCloud}
_{Scale Donovan (Defense Llama)}	_{Scale AI gov environment}	_{Scale AI (US cleared)}	_None	_{Classified network access}

_Check	_{Action on failure}
_{LLM_PROVIDER not in {ollama, vllm, azure}}	_{Process exits — pod restarts indefinitely}
_{Model name matches a blocked pattern (Qwen, DeepSeek, etc.)}	_{Process exits}
_{On-prem endpoint uses localhost or 127.0.0.1}	_{Process exits — requires a real network address}
_{On-prem endpoint resolves to a public cloud domain}	_{Process exits}
_{Azure endpoint doesn't match `*.openai.azure.(com</sub>}	_us)`
_{Any Azure credential field is empty}	_{Process exits}

_{Enclave / Network}	_{Required Provider Setting}	_{Authorization Basis}	_Cloud?
_{Unclassified dev (air-gapped GPU lab)}	_{LLM_PROVIDER=vllm + LLM_MODEL=llama3.3:70b}	_{On-prem; no external auth needed}	_❌
_{Unclassified dev (laptop, no GPU)}	_{LLM_PROVIDER=ollama + LLM_MODEL=llama3.3:70b}	_{On-prem dev only}	_❌
_{IL2 / Low (NIPRNet-adjacent)}	_{LLM_PROVIDER=azure + GPT-4o}	_{DISA FedRAMP High (Aug 2024)}	_✅
_{IL4 / Moderate (CUI, SIPRNet-adjacent)}	_{LLM_PROVIDER=azure + GPT-4o}	_{DISA IL4 PA (Sep 2024)}	_✅
_{IL5 / High (National Security Systems)}	_{LLM_PROVIDER=azure + GPT-4o}	_{DISA IL5 PA (Sep 2024)}	_✅
_{IL6 / Secret}	_{LLM_PROVIDER=azure + GPT-4o}	_{DISA IL6 auth (Feb 2025)}	_{✅ Gov}
_{ICD-503 / Top Secret}	_{Azure OpenAI in Azure Gov Top Secret cloud}	_{ICD-503 auth (Jan 2025)}	_{✅ Gov TS}
_{Fully disconnected / sovereign (no cloud)}	_{Azure Local + Foundry Local on-prem}	_{Microsoft Sovereign Cloud (Feb 2026)}	_❌
_{IL4/IL5 — AWS preference}	_{LLM_PROVIDER=bedrock (roadmap) + Llama 3 70B or Claude 3.5}	_{FedRAMP High + IL4/IL5 AWS GovCloud (May 2025)}	_{✅ AWS Gov}

Scenario	llama.cpp	Ollama	vLLM
No GPU available (CPU-only, classified SCIF)	✅ Best option	⚠️ Slow	❌ Requires CUDA
Edge / disconnected device (Jetson, NUC)	✅	⚠️	❌
Single low-VRAM GPU (8–16 GB, dev)	✅ 4-bit GGUF	✅	⚠️ Limited
Multi-GPU production cluster (high concurrency)	❌ Single-threaded batching	❌	✅ Best option
Minimal attack surface / no Docker daemon	✅ Single binary	❌ Requires Docker	❌

_Technology	_Version	_Role	_{Why Chosen}
_Python	_3.12+	_{Orchestrator runtime}	_{Async ecosystem, ast module, match/case}
_FastAPI	_0.115+	_{REST framework}	_{Pydantic v2, Depends() injection, lifespan hooks}
_asyncio	_stdlib	_Concurrency	_{Per-session locking, background purge task}
_PyJWT	_2.8+	_{RS256 JWT validation}	_{Asymmetric; orchestrator holds public key only}
_{pydantic-settings}	_2.3+	_{Typed config}	_{Fail-fast at boot; zero runtime type surprises}
_httpx	_0.27+	_{LLM + embedding HTTP client}	_{Async; per-request timeout; no global session}
_asyncpg	_0.29+	_{pgvector queries}	_{Binary protocol; non-blocking; fastest Python PG driver}
_javalang	_0.13+	_{Java parsing}	_{Pure Python; no JVM; class/method/field extraction}
_{PostgreSQL 16 + pgvector}	_pg16	_{Vector RAG index}	_{Reuses existing Postgres infra; ivfflat ANN search}
_{nomic-embed-text}	_—	_{768-dim code embeddings}	_{Long-context (8192 tok); Apache 2.0; self-hosted}
_Ollama	_latest	_{Dev LLM inference}	_{One-command model pull; OpenAI-compatible API}
_vLLM	_latest	_{Prod LLM inference}	_{PagedAttention batching; tensor parallelism; same API}
_Kubernetes	_1.28+	_{Production orchestration}	_{NetworkPolicy CNI-level air-gap enforcement}
_{Docker Compose}	_v2	_{Dev stack}	_{One-command full-stack; no K8s locally}
_Ansible	_2.16+	_{GPU node hardening}	_{Idempotent; disables internet egress on bare metal}

_Variable	_Default	_Required	_Description
_ENV	_prod	_No	_{Set dev to enable Swagger UI at /docs}
_{LLM_ENDPOINT}	_{http://llm-inference:8080/v1}	_Yes	_{Internal LLM server base URL (Ollama or vLLM)}
_{LLM_MODEL}	_llama3.3:70b	_No	_{Model tag — PRC-origin models (Qwen, DeepSeek, etc.) are blocked at startup}
_{PROVIDER_LOCK}	_false	_No	_{Set true to freeze provider config; blocks localhost, public endpoints, and blocked models at startup}
_{LLM_TEMPERATURE}	_0.0	_No	_{Must remain 0 — ensures deterministic translation output}
_{LLM_MAX_TOKENS}	₄₀₉₆	_No	_{Max tokens per LLM response}
_{EMBEDDING_ENDPOINT}	_{http://embedding-server:8080/v1}	_No	_{nomic-embed-text server URL (RAG only)}
_{EMBEDDING_MODEL}	_{nomic-embed-text}	_No	_{Embedding model name passed to the embeddings API}
_{RAG_ENABLED}	_true	_No	_{Set false to disable pgvector retrieval entirely}
_{RAG_TOP_K}	₅	_No	_{Number of RAG chunks injected per translation prompt}
_{VECTOR_DB_URL}	_—	_{If RAG_ENABLED}	_{asyncpg-format pgvector connection string}
_{ALLOWED_ORIGINS}	_{https://llm.internal}	_Yes	_{CORS allowlist (comma-separated)}
_{AUDIT_LOG_PATH}	_{/var/log/llm-assistant/audit.jsonl}	_No	_{Audit log file path (must be append-writable)}
_{MAX_INPUT_TOKENS}	₈₁₉₂	_No	_{Input size ceiling (characters); larger inputs are rejected}
_{ENABLE_GUARDRAILS}	_true	_No	_{Never set false in production}

_Version	_Stability	_Tests	_Python	_{What's In It}
_v1.5	_{✅ Stable}	_{27 passing}	_3.12+	_{Orchestrator, JWT/RBAC, guardrails, Java→Python translation, requirements scaffold, Python/Java static analysis}
_v2.0	_{✅ Stable}	_{55 passing}	_3.12+	_{RAG retriever (pgvector + nomic-embed-text), multi-file project translation (dependency graph + topo sort), multi-turn session memory (/chat, session TTL, sliding window), VS Code extension (alpha)}
_v2.5	_{✅ Complete}	_{169 passing}	_3.12+	_{test_provider_lock.py (105 tests), Provider Lock (PROVIDER_LOCK=true), Qwen/DeepSeek model blocklist, egress URL safety enforcement, Azure OpenAI Government support, VS Code extension beta}
_v3.0	_{✅ Complete}	_{265 passing}	_3.12+	_{Incremental diff translation (/translate-diff), JetBrains plugin scaffold, multi-replica session store (Redis/Valkey), AWS Bedrock GovCloud adapter (LLM_PROVIDER=bedrock), model ATO evaluation framework (/evaluate-model-ato)}

_Phase	_Goals	_Target	_Status
_v1.5	_{Java→Python + Requirements scaffold, 27-test suite}	_{Q2 2026}	_{✅ Done}
_v2.0	_{RAG-enriched translation, multi-file project translation, multi-turn session memory, 55-test suite}	_{Q2 2026}	_{✅ Done}
_v2.5	_{test_provider_lock.py (105 tests, 169 total), PROVIDER_LOCK IT config freeze, PRC-origin model blocklist (Qwen/DeepSeek/etc.), egress URL safety, Azure OpenAI Gov support, VS Code extension beta}	_{Q3 2026}	_{✅ Complete}
_v3.0	_{Incremental diff translation (/translate-diff), JetBrains plugin, multi-replica session store (Redis/Valkey), AWS Bedrock GovCloud adapter (LLM_PROVIDER=bedrock), model ATO evaluation (/evaluate-model-ato), 265-test suite}	_{Q4 2026}	_{✅ Complete}

Folders and files

Latest commit

History

Repository files navigation

🔒 Secure LLM Assistant

🚀 What Is This & Why Should You Care?

🎯 The Core Job: Java → Python, Done Right

🛡️ Security Is Not An Afterthought — It's The Architecture

🔧 Why These Technologies?

📋 Quick Capability Summary

Table of Contents

🔍 Overview

✨ Key Features

🏗️ Architecture

Request Data Flow

Component Responsibilities

🧩 Object Model

🔄 Primary Capabilities

☕ Java → Python Translation

🗂️ Multi-file Project Translation

📋 Requirements → Python Scaffold

🧠 Multi-turn Session Memory

🤖 Agent Architecture

Agent Execution Order

🛡️ Security Model

RBAC Permission Matrix

Network Isolation

📊 Capability Distribution

� Pluggable LLM Backend

Which LLMs Are Secure?

What is an ATO?

What is the RMF?

Cloud vs On-Prem

Switching Providers

Provider Lock — IT Security Runbook

Classification Decision Matrix

�🛠️ Technology Stack

Detailed Stack Rationale

🐍 Python 3.12+

⚡ FastAPI 0.115+

🔑 PyJWT 2.8+ with RS256

📦 Pydantic-settings 2.3+

🌐 httpx 0.27+

☕ javalang 0.13+

🗄️ PostgreSQL 16 + pgvector

🧬 nomic-embed-text (768-dim embeddings)

🤖 Ollama (dev) / vLLM (prod)

☸️ Kubernetes 1.28+

🔧 asyncpg 0.29+

🔒 asyncio.Lock (in-process session store)

🚀 Setup & Installation

Prerequisites

Option A — Docker Compose (recommended for dev)

Option B — Local Python (no Docker)

Option C — Kubernetes (production)

Run Tests

📖 API Reference

/translate — Java → Python

/translate-requirements — Requirements → Python Scaffold

📈 Development Status

🗺️ Roadmap

🤝 Contributing

📄 License & Acknowledgements

About

Topics

Resources

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/translate` — Java → Python

`/translate-requirements` — Requirements → Python Scaffold

Packages