ARGUS (Automated Retrieval and GPT Understanding System) is a document intelligence platform that provides APIs for document processing, AI-powered extraction, and chat-based document interaction. The system is built with FastAPI and integrates with Azure services for scalable document processing.
- Production: Container App endpoint (configured via Azure deployment)
- Local Development:
http://localhost:8000 - Frontend: Next.js app (deployed as a Container App)
- Azure Services: Managed identity authentication via
DefaultAzureCredential— no API keys - Service-to-Service: User-assigned managed identity with RBAC role assignments
- Local Auth: Disabled on all Azure services (
disableLocalAuth: true) - CORS: Enabled for frontend integration
Root Health Check
Simple health check endpoint to verify service availability.
Response:
{
"status": "healthy",
"service": "ARGUS Backend"
}Detailed Health Check
Comprehensive health check including Azure service connectivity.
Response:
{
"status": "healthy",
"timestamp": "2025-07-17T10:30:00.123456Z",
"services": {
"storage": "connected",
"cosmos_db": "connected"
}
}Error Response (503):
{
"detail": "Service unhealthy"
}Handle Event Grid Blob Events
Processes Azure Event Grid blob created events for automatic document processing.
Request Body:
{
"eventType": "Microsoft.Storage.BlobCreated",
"subject": "/blobServices/default/containers/documents/blobs/sample.pdf",
"data": {
"url": "https://storage.blob.core.windows.net/documents/sample.pdf"
}
}Response:
{
"message": "Blob processing queued successfully",
"blob_url": "https://storage.blob.core.windows.net/documents/sample.pdf"
}Manual Blob Processing
Manually trigger processing of a blob in Azure Storage.
Request Body:
{
"blob_url": "https://storage.blob.core.windows.net/documents/sample.pdf",
"dataset": "default-dataset"
}Response:
{
"message": "Blob processing queued successfully",
"blob_url": "https://storage.blob.core.windows.net/documents/sample.pdf"
}Process File (Logic App Integration)
Endpoint called by Azure Logic Apps to process uploaded files.
Request Body:
{
"filename": "invoice.pdf",
"dataset": "invoices",
"blob_path": "invoices/invoice.pdf",
"trigger_source": "logic_app"
}Response:
{
"message": "File invoice.pdf queued for processing",
"filename": "invoice.pdf",
"dataset": "invoices",
"blob_url": "https://storage.blob.core.windows.net/invoices/invoice.pdf"
}Get Configuration
Retrieve current system configuration from Cosmos DB including datasets, prompts and schemas.
Response:
{
"id": "configuration",
"partitionKey": "configuration",
"datasets": {
"default-dataset": {
"system_prompt": "Extract key information from the document...",
"output_schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"total_amount": {"type": "number"}
}
},
"processing_options": {
"include_ocr": true,
"include_images": true,
"enable_summary": true,
"enable_evaluation": true,
"ocr_provider": "azure"
}
},
"medical-dataset": {
"system_prompt": "Extract medical information...",
"output_schema": {...},
"processing_options": {
"ocr_provider": "mistral"
}
}
}
}Default Response (when no config exists):
{
"id": "configuration",
"partitionKey": "configuration",
"datasets": {}
}Update Configuration
Update system configuration in Cosmos DB with new prompts, schemas, or settings.
Request Body:
{
"id": "configuration",
"partitionKey": "configuration",
"datasets": {
"default-dataset": {
"system_prompt": "Updated extraction prompt...",
"output_schema": {
"type": "object",
"properties": {
"document_type": {"type": "string"},
"key_data": {"type": "object"}
}
}
}
}
}Response:
{
"status": "success",
"message": "Configuration updated"
}Refresh Configuration
Force refresh configuration by reloading demo datasets from filesystem.
Response:
{
"status": "success",
"message": "Configuration refreshed from demo files"
}List Datasets
Get all available dataset configurations.
Response:
{
"datasets": [
{
"name": "default-dataset",
"system_prompt_preview": "Extract all data from the document...",
"schema_fields": ["invoice_number", "vendor_name", "total_amount"],
"max_pages_per_chunk": 10
},
{
"name": "medical-dataset",
"system_prompt_preview": "Extract medical information...",
"schema_fields": ["patient_name", "diagnosis", "medications"],
"max_pages_per_chunk": 5
}
]
}Create Dataset
Create a new dataset configuration for document processing.
Request Body:
{
"dataset_name": "purchase-orders",
"system_prompt": "Extract purchase order information including vendor details, line items, quantities, prices, and total amounts. Return data in the specified JSON format.",
"output_schema": {
"vendor_name": "Name of the vendor/supplier",
"po_number": "Purchase order number",
"order_date": "Date of the order (YYYY-MM-DD format)",
"line_items": [
{
"description": "Item description",
"quantity": "Number of units",
"unit_price": "Price per unit",
"total": "Line item total"
}
],
"subtotal": "Sum of all line items",
"tax": "Tax amount if applicable",
"total_amount": "Final total including tax"
},
"max_pages_per_chunk": 10
}Response (201):
{
"success": true,
"dataset_name": "purchase-orders",
"message": "Dataset 'purchase-orders' created successfully",
"configuration": {
"system_prompt_length": 156,
"output_schema_fields": ["vendor_name", "po_number", "order_date", "line_items", "subtotal", "tax", "total_amount"],
"max_pages_per_chunk": 10
}
}Error Response (400):
{
"detail": "Dataset name must contain only alphanumeric characters and hyphens"
}{
"detail": "Dataset 'purchase-orders' already exists. Use update_dataset to modify it."
}List Dataset Documents
Get all documents in a specific dataset.
Response:
{
"dataset": "default-dataset",
"documents": [
{
"id": "default-dataset__invoice-001.pdf",
"filename": "invoice-001.pdf",
"status": "completed",
"processed_at": "2025-07-17T10:30:00Z"
}
],
"count": 1
}Get Concurrency Settings
Retrieve current Logic App concurrency configuration.
Response:
{
"enabled": true,
"trigger_concurrency": 5,
"workflow_name": "document-processing-workflow",
"subscription_id": "12345678-1234-1234-1234-123456789012",
"resource_group": "argus-rg",
"logic_app_name": "argus-logic-app"
}Error Response (503 - Logic App Manager not initialized):
{
"detail": "Logic App Manager not initialized"
}Update Concurrency Settings
Update Logic App concurrency settings for optimal performance.
Request Body:
{
"max_runs": 10
}Response:
{
"success": true,
"message": "Concurrency updated successfully",
"new_max_runs": 10,
"backend_semaphore_updated": true,
"backend_max_concurrent": 10
}Get Workflow Definition
Retrieve the current Logic App workflow definition.
Response:
{
"definition": {
"triggers": {
"When_a_blob_is_added_or_modified": {
"type": "ApiConnection",
"inputs": {...}
}
},
"actions": {
"Process_Document": {
"type": "Http",
"inputs": {...}
}
}
},
"parameters": {...}
}Update Full Concurrency Settings
Update both trigger and action concurrency settings.
Request Body:
{
"max_runs": 15
}Response:
{
"success": true,
"message": "Full concurrency settings updated successfully",
"trigger_concurrency_updated": true,
"action_concurrency_updated": true
}Concurrency Diagnostics
Get diagnostic information about Logic App Manager setup.
Response:
{
"timestamp": "2025-07-17T10:30:00.123456Z",
"logic_app_manager_initialized": true,
"logic_app_manager_enabled": true,
"subscription_id_configured": true,
"resource_group_configured": true,
"logic_app_name_configured": true,
"environment_variables": {
"AZURE_SUBSCRIPTION_ID": true,
"AZURE_RESOURCE_GROUP_NAME": true,
"LOGIC_APP_NAME": true
},
"environment_values": {
"AZURE_SUBSCRIPTION_ID": "12345678...",
"AZURE_RESOURCE_GROUP_NAME": "argus-rg",
"LOGIC_APP_NAME": "argus-logic-app"
},
"azure_credentials_available": true
}Get OpenAI Settings
Retrieve current OpenAI configuration from environment variables (read-only). Authentication uses managed identity — no API keys are stored or transmitted.
Response:
{
"openai_endpoint": "https://myopenai.openai.azure.com/",
"deployment_name": "gpt-5.4",
"authentication": "managed_identity",
"note": "Configuration is read from environment variables only. Authentication uses Azure Managed Identity (DefaultAzureCredential). Update deployment settings via infrastructure (Bicep/azd)."
}Update OpenAI Settings
Update OpenAI endpoint and deployment name by modifying environment variables. API key is not required — authentication uses managed identity.
Request Body:
{
"openai_endpoint": "https://myopenai.openai.azure.com/",
"deployment_name": "gpt-5.4"
}Response:
{
"message": "Environment variables updated successfully",
"config": {
"openai_endpoint": "https://myopenai.openai.azure.com/",
"deployment_name": "gpt-5.4"
}
}Chat with Document
Interactive chat endpoint for asking questions about processed documents.
Request Body:
{
"document_id": "sample-invoice-123",
"message": "What is the total amount on this invoice?",
"chat_history": [
{
"role": "user",
"content": "Previous question..."
},
{
"role": "assistant",
"content": "Previous response..."
}
]
}Response:
{
"response": "The total amount on this invoice is $1,250.00.",
"finish_reason": "stop",
"usage": {
"prompt_tokens": 456,
"completion_tokens": 23,
"total_tokens": 479
}
}Error Responses:
400: Missing required parameters or no extracted data available404: Document not found500: Chat processing failed
Note: These endpoints are only available in local development mode (main_local.py) for testing without Azure dependencies.
List Documents (Local Development Only)
Get all documents in local memory storage.
Response:
{
"documents": [
{
"id": "sample-invoice-123",
"properties": {
"blob_name": "sample-invoice.pdf",
"blob_size": 12345,
"request_timestamp": "2025-07-17T10:00:00.123456",
"num_pages": 2,
"dataset": "default-dataset"
},
"state": {
"file_landed": true,
"ocr_completed": true,
"gpt_extraction_completed": true,
"gpt_evaluation_completed": false,
"gpt_summary_completed": false,
"processing_completed": false
},
"extracted_data": {
"ocr_output": "Sample OCR text...",
"gpt_output": {"invoice_number": "INV-001", "total": 1250.00},
"gpt_evaluation": {},
"gpt_summary": ""
}
}
],
"count": 1
}Get Document (Local Development Only)
Retrieve a specific document by ID.
Response:
{
"id": "sample-invoice-123",
"properties": {
"blob_name": "sample-invoice.pdf",
"blob_size": 12345,
"request_timestamp": "2025-07-17T10:00:00.123456",
"num_pages": 2,
"dataset": "default-dataset"
},
"state": {
"file_landed": true,
"ocr_completed": true,
"gpt_extraction_completed": true,
"gpt_evaluation_completed": false,
"gpt_summary_completed": false,
"processing_completed": false
},
"extracted_data": {
"ocr_output": "Sample OCR text...",
"gpt_output": {"invoice_number": "INV-001", "total": 1250.00},
"gpt_evaluation": {},
"gpt_summary": ""
}
}Error Response (404):
{
"detail": "Document not found"
}Update Document (Local Development Only)
Update a document in local storage.
Request Body:
{
"id": "sample-invoice-123",
"properties": {
"blob_name": "updated-invoice.pdf",
"blob_size": 15000,
"request_timestamp": "2025-07-17T11:00:00.123456",
"num_pages": 3,
"dataset": "invoices"
},
"state": {
"file_landed": true,
"ocr_completed": true,
"gpt_extraction_completed": true,
"gpt_evaluation_completed": true,
"gpt_summary_completed": true,
"processing_completed": true
},
"extracted_data": {
"ocr_output": "Updated OCR text...",
"gpt_output": {"invoice_number": "INV-002", "total": 1500.00},
"gpt_evaluation": {"quality_score": 0.95},
"gpt_summary": "This is an invoice for services..."
}
}Response:
{
"message": "Document updated successfully",
"id": "sample-invoice-123"
}Delete Document (Local Development Only)
Remove a document from local storage.
Response:
{
"message": "Document deleted successfully",
"id": "sample-invoice-123"
}Error Response (404):
{
"detail": "Document not found"
}Note: These endpoints are only available in local development mode (main_local.py) and are NOT implemented in the production backend.
Upload File (Local Development Only)
Upload files for processing in local development environment.
Request: Multipart form data
file: File content (UploadFile)dataset_name: Target dataset name (default: "default-dataset")
Response:
{
"message": "File uploaded successfully",
"id": "uploaded-20250717-103000-invoice.pdf",
"filename": "invoice.pdf",
"dataset": "default-dataset",
"status": "uploaded"
}List Datasets (Local Development Only)
Get available datasets in local development.
Response:
["default-dataset", "medical-dataset", "test-dataset"]Get Dataset Files (Local Development Only)
List files in a specific dataset (mock data in local development).
Response:
[
{
"filename": "invoice-001.pdf",
"size": 12345,
"uploaded_at": "2025-06-17T09:00:00Z"
},
{
"filename": "receipt-002.pdf",
"size": 8765,
"uploaded_at": "2025-06-17T08:30:00Z"
}
]Process Document (Local Development Only)
Start processing an uploaded document in local development.
Response:
{
"message": "Document processing started",
"id": "uploaded-20250717-103000-invoice.pdf",
"status": "processing"
}Error Response (404):
{
"detail": "Document not found"
}Get Statistics (Local Development Only)
Get processing statistics for local development.
Response:
{
"total_documents": 5,
"completed_documents": 3,
"pending_documents": 2,
"success_rate": 0.6
}Delete Document (Local Development Only)
Remove a document from local storage.
Response:
{
"message": "Document deleted successfully",
"id": "sample-invoice-123"
}Error Response (404):
{
"detail": "Document not found"
}Important: The production backend does NOT have document management endpoints like /api/documents. Document data in production is stored in Azure Cosmos DB and accessed through the chat endpoint or through direct Cosmos DB queries.
For production document access:
- Chat Interface: Use
/api/chatto interact with processed documents - Direct Database Access: Query Cosmos DB directly for document retrieval
- Frontend Integration: The Streamlit frontend handles document listing and management
The frontend expects these endpoints that are NOT implemented in the current backend:
POST /api/upload- ❌ Only available in local developmentGET /api/datasets- ❌ Only available in local developmentGET /api/datasets/{dataset_name}/files- ❌ Only available in local developmentDELETE /api/documents/{document_id}- ❌ Only available in local developmentPOST /api/documents/{document_id}/reprocess- ❌ Not implemented anywhere
Important: The production backend does NOT have a direct file upload endpoint. Instead, files are processed through:
- Azure Blob Storage: Files uploaded to Azure Storage trigger Event Grid events
- Event Grid Integration:
/api/blob-createdprocesses these events - Logic Apps:
/api/process-filehandles Logic App-triggered processing - Manual Processing:
/api/process-blobfor manual blob processing
For production file uploads, you need to:
- Upload files directly to Azure Blob Storage using Azure SDKs or Storage Explorer
- Configure Azure Logic Apps for automated processing triggers
- Use Event Grid to automatically trigger processing when files are uploaded
- Or manually trigger processing via
/api/process-blobwith the blob URL
{
"id": "string",
"properties": {
"blob_name": "string",
"blob_size": "number",
"request_timestamp": "string (ISO datetime)",
"num_pages": "number",
"dataset": "string"
},
"state": {
"file_landed": "boolean",
"ocr_completed": "boolean",
"gpt_extraction_completed": "boolean",
"gpt_evaluation_completed": "boolean",
"gpt_summary_completed": "boolean",
"processing_completed": "boolean"
},
"extracted_data": {
"ocr_output": "string",
"gpt_output": "object",
"gpt_evaluation": "object",
"gpt_summary": "string"
}
}{
"id": "configuration",
"partitionKey": "configuration",
"datasets": {
"dataset-name": {
"system_prompt": "string",
"output_schema": "object",
"max_pages": "number",
"processing_options": {
"include_ocr": "boolean",
"include_images": "boolean",
"enable_summary": "boolean",
"enable_evaluation": "boolean",
"ocr_provider": "string (azure|mistral)"
}
}
}
}ARGUS supports two OCR providers for document text extraction:
-
Azure Document Intelligence (default)
- Uses Azure's Document Intelligence service
- Requires
DOCUMENT_INTELLIGENCE_ENDPOINTenvironment variable - Configured with
"ocr_provider": "azure"
-
Mistral Document AI (alternative)
- Uses Mistral's Document AI API
- Requires
MISTRAL_DOC_AI_ENDPOINTandMISTRAL_DOC_AI_KEYenvironment variables - Configured with
"ocr_provider": "mistral" - Supports base64-encoded PDFs and images
- Can use structured extraction with bbox annotation
Example Configuration with Mistral:
{
"id": "configuration",
"partitionKey": "configuration",
"datasets": {
"medical-dataset": {
"system_prompt": "Extract medical information...",
"output_schema": {...},
"processing_options": {
"include_ocr": true,
"include_images": true,
"enable_summary": true,
"enable_evaluation": true,
"ocr_provider": "mistral"
}
}
}
}Environment Variables Required for Mistral:
MISTRAL_DOC_AI_ENDPOINT=https://your-endpoint.services.ai.azure.com/providers/mistral/azure/ocr
MISTRAL_DOC_AI_KEY=your-mistral-api-key{
"eventType": "Microsoft.Storage.BlobCreated",
"subject": "string",
"data": {
"url": "string",
"api": "string",
"requestId": "string",
"eTag": "string",
"contentType": "string",
"contentLength": "number",
"blobType": "string"
},
"eventTime": "string (ISO datetime)",
"id": "string",
"dataVersion": "string",
"metadataVersion": "string"
}{
"document_id": "string",
"message": "string",
"chat_history": [
{
"role": "user | assistant",
"content": "string"
}
]
}{
"response": "string",
"finish_reason": "stop | length | content_filter",
"usage": {
"prompt_tokens": "number",
"completion_tokens": "number",
"total_tokens": "number"
}
}- 200: Success
- 400: Bad Request - Invalid parameters or missing required fields
- 404: Not Found - Document or resource not found
- 500: Internal Server Error - Processing failed or Azure service unavailable
- 503: Service Unavailable - Health check failed or dependencies unavailable
{
"detail": "Error message describing what went wrong"
}Configuration Issues:
503: "Configuration container not available" - Cosmos DB connection issue503: "Logic App Manager not initialized" - Azure credentials or environment variables missing
Document Processing:
400: "Missing required parameters: filename, dataset, blob_path" - Invalid request body404: "Document not found" - Document ID doesn't exist in storage500: "Internal server error" - Azure service communication failure
Chat Endpoint:
400: "document_id and message are required" - Missing required fields400: "No extracted data available for this document" - Document hasn't been processed404: "Document not found" - Invalid document ID
ARGUS provides MCP support for AI assistant integration, enabling tools like GitHub Copilot, Claude, and other MCP-compatible clients to interact with your document intelligence platform.
ARGUS uses Streamable HTTP transport for MCP communication. This is the modern MCP standard (as of the 2025-03-26 protocol revision) which provides:
- Single endpoint for all MCP communication
- Better reliability and scalability
- Proper session handling with
Mcp-Session-Idheader
MCP Server Information
Get information about the MCP server and available tools.
Response:
{
"name": "argus",
"description": "ARGUS Document Intelligence MCP Server",
"version": "1.0.0",
"transport": "streamable-http",
"endpoints": {
"mcp": "/mcp"
},
"tools": [
{"name": "argus_list_documents", "description": "List all processed documents"},
{"name": "argus_get_document", "description": "Get detailed document information"},
{"name": "argus_chat_with_document", "description": "Ask questions about a document"},
{"name": "argus_list_datasets", "description": "List available dataset configurations"},
{"name": "argus_get_dataset_config", "description": "Get dataset configuration details"},
{"name": "argus_create_dataset", "description": "Create a new dataset configuration"},
{"name": "argus_process_document_url", "description": "Queue document for processing"},
{"name": "argus_get_extraction", "description": "Get extracted data from document"},
{"name": "argus_search_documents", "description": "Search documents by keyword"},
{"name": "argus_get_upload_url", "description": "Get a pre-signed SAS URL for direct blob upload"}
],
"configuration_example": {
"mcpServers": {
"argus": {
"url": "https://<your-backend-url>/mcp"
}
}
}
}Note: The
urlinconfiguration_exampleis dynamically generated based on the request origin. The example above shows a placeholder.
### `/mcp`
**Streamable HTTP MCP Endpoint**
This is the main MCP endpoint that handles all protocol communication. It supports both GET and POST methods:
- **GET**: Returns a streaming response for server-to-client messages
- **POST**: Accepts MCP protocol messages (JSON-RPC format)
**Headers:**
- `Content-Type: application/json` (for POST requests)
- `Mcp-Session-Id` (optional): Session identifier for stateful interactions
**Example Tool Call (POST):**
```json
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "argus_list_documents",
"arguments": {
"dataset": "default-dataset",
"limit": 10
}
}
}
List all processed documents with optional filtering.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
dataset |
string | No | Filter by dataset name |
status |
string | No | Filter by status (pending, processing, completed, failed) |
limit |
integer | No | Maximum number of documents to return (default: 50) |
Get detailed information about a specific document.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
document_id |
string | Yes | The document ID to retrieve |
Ask natural language questions about a document.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
document_id |
string | Yes | The document to chat about |
message |
string | Yes | Your question or message |
chat_history |
array | No | Previous conversation messages |
List all available dataset configurations.
Arguments: None
Get the configuration details for a specific dataset.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
dataset_name |
string | Yes | Name of the dataset |
Queue a document from blob storage for processing.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
blob_url |
string | Yes | Full Azure Blob Storage URL |
dataset |
string | No | Dataset to use (default: "default-dataset") |
Get the extracted structured data from a document.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
document_id |
string | Yes | The document ID |
Search documents by keyword across content and metadata.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | Search keyword or phrase |
dataset |
string | No | Limit search to specific dataset |
limit |
integer | No | Maximum results (default: 20) |
Get a pre-signed SAS URL for direct document upload to Azure Blob Storage.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
filename |
string | Yes | Name for the uploaded file |
dataset |
string | No | Target dataset (default: "default-dataset") |
Response:
{
"upload_url": "https://storage.blob.core.windows.net/datasets/...",
"method": "PUT",
"headers": {
"x-ms-blob-type": "BlockBlob",
"Content-Type": "application/pdf"
},
"filename": "invoice.pdf",
"dataset": "default-dataset",
"blob_path": "default-dataset/invoice.pdf",
"expires_in": "1 hour",
"instructions": [
"Upload your file using HTTP PUT to the upload_url",
"Set header 'x-ms-blob-type: BlockBlob'",
"Set header 'Content-Type: application/pdf'",
"The file body should be the raw file content (not base64)",
"After upload, ARGUS will automatically process the document"
]
}Create a new dataset configuration for document processing.
Arguments:
| Name | Type | Required | Description |
|---|---|---|---|
dataset_name |
string | Yes | Name for the dataset (alphanumeric and hyphens only, 2-50 chars) |
system_prompt |
string | Yes | System prompt guiding document extraction (min 10 chars) |
output_schema |
object | Yes | JSON object defining the expected extraction structure |
max_pages_per_chunk |
integer | No | Maximum pages per processing chunk (default: 10) |
Response:
{
"success": true,
"dataset_name": "purchase-orders",
"message": "Dataset 'purchase-orders' created successfully",
"configuration": {
"system_prompt_length": 245,
"output_schema_fields": ["vendor_name", "order_number", "items", "total_amount"],
"max_pages_per_chunk": 10
},
"next_steps": [
"Upload documents to this dataset using argus_get_upload_url with dataset='purchase-orders'",
"Or list documents using argus_list_documents with dataset='purchase-orders'"
]
}VS Code / GitHub Copilot:
{
"mcpServers": {
"argus": {
"url": "https://<your-backend-url>/mcp"
}
}
}Tip: Get your backend URL from the ARGUS frontend MCP page, or run
azd showafter deployment.
Claude Desktop:
{
"mcpServers": {
"argus": {
"url": "https://<your-backend-url>/mcp"
}
}
}# Run local development server
cd src/containerapp
python main_local.py
# Health check
curl -X GET "http://localhost:8000/health"
# Upload a file (local development only)
curl -X POST "http://localhost:8000/api/upload" \
-F "file=@invoice.pdf" \
-F "dataset_name=invoices"
# List documents
curl -X GET "http://localhost:8000/api/documents"
# Get processing statistics
curl -X GET "http://localhost:8000/api/stats"# 1. Manual blob processing (most common)
curl -X POST "http://your-container-app.azurecontainerapps.io/api/process-blob" \
-H "Content-Type: application/json" \
-d '{
"blob_url": "https://storage.blob.core.windows.net/docs/invoice.pdf",
"dataset": "invoices"
}'
# 2. Logic App integration (automated)
curl -X POST "http://your-container-app.azurecontainerapps.io/api/process-file" \
-H "Content-Type: application/json" \
-d '{
"filename": "invoice.pdf",
"dataset": "invoices",
"blob_path": "invoices/invoice.pdf",
"trigger_source": "logic_app"
}'
# 3. Chat with processed document
curl -X POST "http://your-container-app.azurecontainerapps.io/api/chat" \
-H "Content-Type: application/json" \
-d '{
"document_id": "invoice-123",
"message": "What is the total amount on this invoice?"
}'# Get current configuration
curl -X GET "http://your-container-app.azurecontainerapps.io/api/configuration"
# Update configuration with new dataset
curl -X POST "http://your-container-app.azurecontainerapps.io/api/configuration" \
-H "Content-Type: application/json" \
-d '{
"id": "configuration",
"partitionKey": "configuration",
"datasets": {
"custom-dataset": {
"system_prompt": "Extract all key financial data from the document...",
"output_schema": {
"type": "object",
"properties": {
"amount": {"type": "number"},
"date": {"type": "string"},
"vendor": {"type": "string"}
}
}
}
}
}'
# Refresh configuration from files
curl -X POST "http://your-container-app.azurecontainerapps.io/api/configuration/refresh"# Check current concurrency settings
curl -X GET "http://your-container-app.azurecontainerapps.io/api/concurrency"
# Update concurrency limit
curl -X PUT "http://your-container-app.azurecontainerapps.io/api/concurrency" \
-H "Content-Type: application/json" \
-d '{"max_runs": 10}'
# Get detailed diagnostics
curl -X GET "http://your-container-app.azurecontainerapps.io/api/concurrency/diagnostics"# Get current OpenAI settings
curl -X GET "http://your-container-app.azurecontainerapps.io/api/openai-settings"
# Update OpenAI endpoint and model
curl -X PUT "http://your-container-app.azurecontainerapps.io/api/openai-settings" \
-H "Content-Type: application/json" \
-d '{
"openai_endpoint": "https://my-openai.openai.azure.com/",
"openai_key": "your-api-key",
"deployment_name": "gpt-4"
}'- FastAPI Backend: Container App running the main API
- Azure Cosmos DB: Document storage and configuration
- Azure Blob Storage: File storage with Event Grid integration
- Azure Logic Apps: Automated workflow triggers
- Azure OpenAI: AI processing for extraction and chat
- FastAPI Backend: Local server with in-memory storage
- Mock Services: Simulated Azure services for testing
- File Upload: Direct file upload for testing without Azure dependencies
- File Upload: To Azure Blob Storage (production) or local endpoint (development)
- Event Trigger: Event Grid notification or manual API call
- OCR Processing: Azure Document Intelligence extracts text
- AI Extraction: Azure OpenAI processes text with custom prompts
- Storage: Results stored in Cosmos DB (production) or memory (local)
- Chat Interface: Interactive Q&A about processed documents
- Project Repository: Azure-Samples/ARGUS
- Architecture Overview: See
docs/ArchitectureOverview.png - Setup Instructions: See
README.md - Frontend Documentation: See
frontend/directory - Infrastructure: See
infra/directory for Bicep templates
Last Updated: July 17, 2025 Version: 1.0.0