Skip to content

Latest commit

 

History

History
537 lines (438 loc) · 19.9 KB

File metadata and controls

537 lines (438 loc) · 19.9 KB

## Overview

Treat AI models as modular, customizable tools that run directly on-device with Foundry Local. This session emphasizes practical workflows for privacy-preserving, low-latency inference and how to integrate these tools via SDKs, APIs, or CLI. You'll also learn how to scale to Azure AI Foundry when needed.

🔄 Updated for Modern SDK: This module has been aligned with the latest Microsoft Foundry-Local repository patterns and matches the intelligent routing implementation in samples/06/. The examples now use the modern foundry-local-sdk and advanced model selection strategies.

🏗️ Architecture Highlights:

  • Intelligent Model Routing: Keyword-based selection between general, reasoning, code, and creative models
  • Modern SDK Integration: Uses FoundryLocalManager with automatic service discovery
  • Environment Configuration: Flexible model assignment via environment variables
  • Health Monitoring: Service validation and model availability checking
  • Production Ready: Comprehensive error handling and fallback mechanisms

📁 Local Implementation:

  • samples/06/router.py - Intelligent model router with keyword-based selection
  • samples/06/model_router.ipynb - Interactive examples and benchmarks
  • samples/06/README.md - Configuration and usage instructions

References:

Overview

Treat AI models as modular, customizable tools that run directly on-device with Foundry Local. This session emphasizes practical workflows for privacy-preserving, low-latency inference and how to integrate these tools via SDKs, APIs, or CLI. You’ll also learn how to scale to Azure AI Foundry when needed.

References:

Learning Objectives

  • Design model-as-a-tool patterns on-device
  • Integrate via OpenAI-compatible REST API or SDKs
  • Customize models to domain-specific use cases
  • Plan for hybrid scaling to Azure AI Foundry

Part 1: Intelligent Model Router (Modern Implementation)

Goal: Implement intelligent model selection with automatic routing based on query content.

📋 Note: This implementation matches the patterns used in samples/06/router.py with advanced keyword-based model selection.

Step 1) Define modern model router with FoundryLocalManager

# router/intelligent_router.py
from foundry_local import FoundryLocalManager
from openai import OpenAI
from typing import Dict, Any, Optional
import os
import json

class ModelRouter:
    """Intelligent model router that selects appropriate models for different task types."""
    
    def __init__(self):
        self.client = None
        self.base_url = None
        self.tools = self._load_tool_registry()
        self._initialize_client()
    
    def _load_tool_registry(self) -> Dict[str, Dict[str, Any]]:
        """Load tool registry from environment or use defaults."""
        default_tools = {
            "general": {
                "model": os.environ.get("GENERAL_MODEL", "phi-4-mini"),
                "notes": "Fast general-purpose chat and Q&A",
                "temperature": 0.7
            },
            "reasoning": {
                "model": os.environ.get("REASONING_MODEL", "deepseek-r1-7b"),
                "notes": "Step-by-step analysis and logical reasoning",
                "temperature": 0.3
            },
            "code": {
                "model": os.environ.get("CODE_MODEL", "qwen2.5-7b"),
                "notes": "Code generation, debugging, and technical tasks",
                "temperature": 0.2
            },
            "creative": {
                "model": os.environ.get("CREATIVE_MODEL", "phi-4-mini"),
                "notes": "Creative writing and storytelling",
                "temperature": 0.9
            }
        }
        
        # Check for environment override
        tools_env = os.environ.get("TOOL_REGISTRY")
        if tools_env:
            try:
                return json.loads(tools_env)
            except json.JSONDecodeError:
                print("Warning: Invalid TOOL_REGISTRY JSON, using defaults")
        
        return default_tools

Step 2) Initialize client with modern SDK and service discovery

    def _initialize_client(self):
        """Initialize OpenAI client with Foundry Local or fallback configuration."""
        try:
            from foundry_local import FoundryLocalManager
            # Try to use any available model for client initialization
            first_model = next(iter(self.tools.values()))["model"]
            manager = FoundryLocalManager(first_model)
            
            self.client = OpenAI(
                base_url=manager.endpoint,
                api_key=manager.api_key
            )
            self.base_url = manager.endpoint
            print(f"✅ Foundry Local SDK initialized")
        except Exception as e:
            print(f"Warning: Could not use Foundry SDK ({e}), falling back to manual configuration")
            # Fallback to manual configuration
            self.base_url = os.environ.get("BASE_URL", "http://localhost:8000")
            api_key = os.environ.get("API_KEY", "")
            
            self.client = OpenAI(
                base_url=f"{self.base_url}/v1",
                api_key=api_key
            )
            print(f"Initialized manual configuration at {self.base_url}")
    
    def select_tool(self, user_query: str) -> str:
        """Select the most appropriate tool based on the user query."""
        query_lower = user_query.lower()
        
        # Code-related keywords
        code_keywords = ["code", "python", "function", "class", "method", "bug", "debug", 
                        "programming", "script", "algorithm", "implementation", "refactor"]
        if any(keyword in query_lower for keyword in code_keywords):
            return "code"
        
        # Reasoning keywords
        reasoning_keywords = ["why", "how", "explain", "step-by-step", "reason", "analyze", 
                             "think", "logic", "because", "cause", "compare", "evaluate"]
        if any(keyword in query_lower for keyword in reasoning_keywords):
            return "reasoning"
        
        # Creative keywords
        creative_keywords = ["story", "poem", "creative", "imagine", "write", "tale", 
                           "narrative", "fiction", "character", "plot"]
        if any(keyword in query_lower for keyword in creative_keywords):
            return "creative"
        
        # Default to general
        return "general"
    
    def chat(self, model: str, content: str, max_tokens: int = 300, temperature: Optional[float] = None) -> str:
        """Send chat completion request to the specified model."""
        try:
            params = {
                "model": model,
                "messages": [{"role": "user", "content": content}],
                "max_tokens": max_tokens
            }
            
            if temperature is not None:
                params["temperature"] = temperature
            
            response = self.client.chat.completions.create(**params)
            return response.choices[0].message.content
        except Exception as e:
            return f"Error generating response with model {model}: {str(e)}"

Step 3) Implement intelligent routing and execution (see samples/06/router.py)

    def route_and_run(self, prompt: str) -> Dict[str, Any]:
        """Route the prompt to the appropriate model and generate response."""
        tool_key = self.select_tool(prompt)
        tool_config = self.tools[tool_key]
        model = tool_config["model"]
        temperature = tool_config.get("temperature", 0.7)
        
        print(f"🎯 Selected tool: {tool_key} (model: {model})")
        
        answer = self.chat(
            model=model, 
            content=prompt, 
            max_tokens=400, 
            temperature=temperature
        )
        
        return {
            "tool": tool_key,
            "model": model,
            "tool_description": tool_config["notes"],
            "temperature": temperature,
            "answer": answer
        }
    
    def check_service_health(self) -> Dict[str, Any]:
        """Check Foundry Local service health and available models."""
        try:
            models_response = self.client.models.list()
            available_models = [model.id for model in models_response.data]
            
            return {
                "status": "healthy",
                "base_url": self.base_url,
                "available_models": available_models,
                "tools_configured": list(self.tools.keys())
            }
        except Exception as e:
            return {
                "status": "error",
                "base_url": self.base_url,
                "error": str(e)
            }

if __name__ == "__main__":
    # Ensure: foundry model run phi-4-mini
    router = ModelRouter()
    
    # Check health
    health = router.check_service_health()
    print(f"Service Health: {json.dumps(health, indent=2)}")
    
    # Test different query types
    queries = [
        "Write a Python function to calculate fibonacci numbers",  # -> code
        "Explain step-by-step why the sky is blue",  # -> reasoning
        "Tell me a creative story about AI",  # -> creative
        "What's the weather like today?"  # -> general
    ]
    
    for query in queries:
        result = router.route_and_run(query)
        print(f"\nQuery: {query}")
        print(f"Selected: {result['tool']} -> {result['model']}")
        print(f"Answer: {result['answer'][:100]}...")

Part 2: Modern SDK Integration (Step-by-step)

Goal: Use the Foundry Local SDK with OpenAI Python SDK for seamless integration.

Step 1) Install dependencies

cd Module08
.\.venv\Scripts\activate
pip install foundry-local-sdk openai

Step 2) Configure environment (optional - see samples/06/README.md)

REM Override default models per tool
set GENERAL_MODEL=phi-4-mini
set REASONING_MODEL=deepseek-r1-7b
set CODE_MODEL=qwen2.5-7b
REM Or provide a full JSON registry
set TOOL_REGISTRY={"general":{"model":"phi-4-mini"},"reasoning":{"model":"deepseek-r1-7b"}}

Step 3) Modern SDK integration

# modern_sdk_demo.py
from foundry_local import FoundryLocalManager
from openai import OpenAI
import sys

def main():
    """Demonstrate modern SDK integration."""
    try:
        # Initialize with FoundryLocalManager
        alias = "phi-4-mini"
        manager = FoundryLocalManager(alias)
        
        # Create OpenAI client using Foundry Local endpoint
        client = OpenAI(
            base_url=manager.endpoint,
            api_key=manager.api_key
        )
        
        # Get model info
        model_info = manager.get_model_info(alias)
        print(f"Using model: {model_info.id}")
        
        # Make request with streaming
        stream = client.chat.completions.create(
            model=model_info.id,
            messages=[{"role": "user", "content": "Explain edge AI benefits in one paragraph."}],
            stream=True,
            max_tokens=200
        )
        
        print("Response: ", end="")
        for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)
        print()
        
    except Exception as e:
        print(f"Error: {e}")
        print("Ensure Foundry Local is running with: foundry model run phi-4-mini")
        sys.exit(1)

if __name__ == "__main__":
    main()

Part 3: Domain Customization (Step-by-step)

Goal: Tailor outputs for a domain using prompt templates and JSON schema.

Step 1) Create a domain prompt template

# domain/templates.py
BUSINESS_ANALYST_SYSTEM = """
You are a senior business analyst. Provide:
1) Key insights
2) Risks
3) Next steps
Respond in valid JSON with fields: insights, risks, next_steps.
"""

Step 2) Enforce JSON output

# domain/analyst.py
import requests, os, json

BASE_URL = os.getenv("OPENAI_BASE_URL", "http://localhost:8000/v1")
API_KEY = os.getenv("OPENAI_API_KEY", "local-key")
HEADERS = {"Content-Type":"application/json","Authorization":f"Bearer {API_KEY}"}

from domain.templates import BUSINESS_ANALYST_SYSTEM

def analyze(text: str) -> dict:
    messages = [
        {"role":"system","content": BUSINESS_ANALYST_SYSTEM},
        {"role":"user","content": f"Analyze this business text:\n{text}"}
    ]
    r = requests.post(f"{BASE_URL}/chat/completions", json={
    "model":"phi-4-mini",
        "messages": messages,
        "response_format": {"type":"json_object"},
        "temperature": 0.3
    }, headers=HEADERS, timeout=60)
    r.raise_for_status()
    # Parse JSON content
    content = r.json()["choices"][0]["message"]["content"]
    return json.loads(content)

if __name__ == "__main__":
    print(analyze("Sales dipped 12% in Q3 due to supply constraints and marketing cuts."))

Part 4: Offline and Security Posture (Step-by-step)

Goal: Ensure privacy and resilience when running models as tools locally.

Step 1) Pre-warm and validate local endpoint

foundry model run phi-4-mini
curl http://localhost:8000/v1/models

Step 2) Sanitize inputs

# security/sanitize.py
import re
EMAIL_RE = re.compile(r"[\w\.-]+@[\w\.-]+")
PHONE_RE = re.compile(r"\+?\d[\d\s\-]{7,}\d")

def sanitize(text: str) -> str:
    text = EMAIL_RE.sub("[REDACTED_EMAIL]", text)
    text = PHONE_RE.sub("[REDACTED_PHONE]", text)
    return text

Step 3) Local-only flag and logging

# security/local_only.py
import os, json, time
LOG = os.getenv("MODELS_AS_TOOLS_LOG", "./tools_logs.jsonl")

def record(event: dict):
    with open(LOG, "a", encoding="utf-8") as f:
        f.write(json.dumps(event) + "\n")

# Usage before each call
def before_call(tool_name, payload):
    record({"ts": time.time(), "tool": tool_name, "event": "before_call"})

# After each call
def after_call(tool_name, result):
    record({"ts": time.time(), "tool": tool_name, "event": "after_call"})

Part 5: Production Deployment and Scaling

Goal: Deploy the intelligent router with monitoring and Azure AI Foundry integration.

📋 Note: The local implementation in samples/06/model_router.ipynb includes comprehensive examples of production deployment patterns.

Step 1) Production router with monitoring (see samples/06/router.py)

# production/router.py
from router.intelligent_router import ModelRouter
import json
import time
import sys

class ProductionModelRouter(ModelRouter):
    """Production-ready model router with monitoring and logging."""
    
    def __init__(self):
        super().__init__()
        self.request_count = 0
        self.error_count = 0
        self.start_time = time.time()
    
    def route_and_run_with_monitoring(self, prompt: str) -> Dict[str, Any]:
        """Route with comprehensive monitoring and error handling."""
        start_time = time.time()
        self.request_count += 1
        
        try:
            result = self.route_and_run(prompt)
            processing_time = time.time() - start_time
            
            # Log successful request
            self._log_request({
                "status": "success",
                "tool": result["tool"],
                "model": result["model"],
                "processing_time": processing_time,
                "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")
            })
            
            result["processing_time"] = processing_time
            return result
            
        except Exception as e:
            self.error_count += 1
            error_result = {
                "status": "error",
                "error": str(e),
                "processing_time": time.time() - start_time,
                "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")
            }
            
            self._log_request(error_result)
            return error_result
    
    def _log_request(self, data: Dict[str, Any]):
        """Log request data for monitoring."""
        print(f"📊 {json.dumps(data)}")
    
    def get_stats(self) -> Dict[str, Any]:
        """Get router statistics."""
        uptime = time.time() - self.start_time
        return {
            "uptime_seconds": uptime,
            "total_requests": self.request_count,
            "error_count": self.error_count,
            "success_rate": (self.request_count - self.error_count) / max(1, self.request_count),
            "requests_per_minute": self.request_count / max(1, uptime / 60)
        }

def main():
    """Production router demo."""
    router = ProductionModelRouter()
    
    # Health check
    health = router.check_service_health()
    if health["status"] == "error":
        print(f"❌ Service health check failed: {health['error']}")
        sys.exit(1)
    
    print(f"✅ Service healthy with {len(health['available_models'])} models")
    
    # Process user query
    user_prompt = " ".join(sys.argv[1:]) or "Write three benefits of on-device AI in JSON format."
    print(f"\n🎯 Processing: {user_prompt}")
    
    result = router.route_and_run_with_monitoring(user_prompt)
    
    if result.get("status") == "error":
        print(f"❌ Error: {result['error']}")
    else:
        print(f"\n📋 Result:")
        print(f"Tool: {result['tool']} -> Model: {result['model']}")
        print(f"Processing Time: {result['processing_time']:.2f}s")
        print(f"Answer: {result['answer']}")
    
    # Show stats
    stats = router.get_stats()
    print(f"\n📊 Statistics: {json.dumps(stats, indent=2)}")

if __name__ == "__main__":
    main()

Hands-On Checklist

  • Implement intelligent model router with keyword-based selection (samples/06/router.py)
  • Configure multiple specialized models (general, reasoning, code, creative)
  • Test the interactive Jupyter notebook (samples/06/model_router.ipynb)
  • Set up environment-based model configuration
  • Implement service health monitoring and error handling
  • Deploy production router with comprehensive logging

Local Sample Integration

Run the complete implementation:

cd Module08
.\.venv\Scripts\activate

REM Start required models
foundry model run phi-4-mini
foundry model run qwen2.5-7b
foundry model run deepseek-r1-7b

REM Test the intelligent router
python samples\06\router.py "Write a Python function to sort a list"
python samples\06\router.py "Explain step-by-step how bubble sort works"
python samples\06\router.py "Tell me a creative story about robots"

REM Explore the interactive notebook
jupyter notebook samples/06/model_router.ipynb

References and Next Steps

  • Local Implementation: samples/06/ - Complete intelligent router with multiple model support
  • Microsoft Samples: Hello Foundry Local
  • Integration Docs: Integrate with Inference SDKs
  • Advanced Patterns: Explore function calling and multi-agent orchestration in Module 5

Wrap-up

Foundry Local enables robust on-device AI where models become intelligent, specialized tools. With automatic model selection, comprehensive monitoring, and production-ready patterns, teams can ship sophisticated AI applications that adapt to different task types while maintaining privacy and performance. The intelligent router pattern demonstrated here provides a foundation for building complex AI systems that can scale from local development to production deployment.