Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ out/
# Logs
logs/
*.log
verif_log.txt
verif_analytics_log.txt

# Temporary files
tmp/
Expand All @@ -49,3 +51,24 @@ backups/
docs/.vitepress/cache/
docs/.vitepress/dist/


# Windows
Thumbs.db
ehthumbs.db
Desktop.ini

# Executables
*.exe
*.dll

# Local/Secret files
test_output.txt
secrets.json
*.local

# Hackathon runtime files
hackathon/watch/processed/
hackathon/watch/failed/
hackathon/sample-data/demo-call.mp3
__pycache__/
*.pyc
40 changes: 40 additions & 0 deletions docs/jsonld-ex-evaluation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Evaluation of @jsonld-ex/core Integration for vCon MCP Server

## Executive Summary

The `@jsonld-ex/core` library extends JSON-LD with features specifically targeted at AI/ML data modeling, security, and validation. Given the vCon standard's focus on conversation data, analysis, and integrity, this library offers significant potential benefits, particularly for **Analysis** and **Integrity** layers. However, integration should be approached as an **enhancement layer** rather than a core replacement to maintain strict compliance with the IETF vCon draft.

## Feature Mapping

| vCon Feature | @jsonld-ex Feature | Potential Benefit |
| :--- | :--- | :--- |
| **Analysis Confidence** | `@confidence` | **High**. Standardizes confidence scoring (0.0-1.0) in analysis outputs (transcripts, sentiment), replacing ad-hoc fields. |
| **Content Integrity** | `@integrity` | **High**. Provides a standard mechanism for cryptographic content verification, superseding manual `content_hash` checks. |
| **Embeddings** | `@vector` | **Medium**. Standardizes vector representation. Useful if vCons are exchanged between systems using different vector stores. |
| **Provenance** | `@source` | **Medium**. Enhances tracking of which model/vendor generated an analysis, linking directly to model cards or endpoints. |
| **Validation** | `@shape` | **High**. Offers native JSON-LD validation, potentially more robust than JSON Schema for graph-based data. |

## Pros & Cons

### Pros
1. **Standardization**: Moves ad-hoc metadata (like confidence scores) into a standardized, interoperable format.
2. **Security**: Native support for integrity checks and signing (`@integrity`) is critical for trusted AI pipelines.
3. **Interoperability**: Makes vCon data more consumable by other JSON-LD aware AI agents and tools.
4. **Future-Proofing**: Aligns with the trend of using Knowledge Graphs for AI memory.

### Cons
1. **Complexity**: JSON-LD processing (expansion/compaction) introduces overhead compared to raw JSON handling.
2. **Compliance Risk**: The IETF vCon draft defines a strict JSON schema. Adding `@` properties directly might require using the `extensions` mechanism to remain compliant.
3. **Dependency**: Adds a core dependency. If the library is experimental or lacks broad adoption, it introduces maintenance risk.

## Recommendation

**Proceed with integration as a Plugin/Extension.**

We should **NOT** replace the core `VCon` type or storage model immediately. Instead, we should integrate `@jsonld-ex/core` to enhance specific capabilities:

1. **Enhanced Analysis Plugin**: Create a plugin that outputs Analysis objects enriched with `@confidence` and `@source`.
2. **Integrity Verification Tool**: Use `@jsonld-ex` to implement a robust verification tool that checks `@integrity` of vCons.
3. **Export as JSON-LD**: Add an API endpoint `GET /vcons/:uuid/jsonld` that returns the vCon expanded with JSON-LD context, allowing external tools to leverage the semantic data.

This approach provides the benefits of semantic AI data without breaking existing IETF compliance or performance for standard operations.
107 changes: 107 additions & 0 deletions docs/jsonld-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# JSON-LD Integration & Integrity Guide

This guide details the `jsonld-ex` integration in the vCon MCP Server, which adds semantic web capabilities, AI metadata enrichment, and cryptographic integrity to vCons.

## Overview

The integration provides three key features:
1. **JSON-LD Context**: Maps vCon terms to standard URIs.
2. **Enrichment**: Adds `@confidence` scores and `@source` provenance to Analysis objects.
3. **Integrity**: Provides tamper-evident signing using SHA-256 hashing.

## 1. JSON-LD Context

The vCon server now supports converting standard vCons to JSON-LD format. This allows vCons to be linked with other semantic data.

### Usage

```typescript
import { toJsonLd } from '../src/jsonld/context.js';
import { VCon } from '../src/types/vcon.js';

const vcon: VCon = { ... }; // Your standard vCon
const jsonLdVcon = toJsonLd(vcon);

console.log(jsonLdVcon['@context']);
// Outputs: ["https://validator.vcon.dev/vcon.jsonld", ...]
```

## 2. Analysis Enrichment

AI extensions allow you to attach metadata to `analysis` blocks, such as confidence scores and model sources.

### `@confidence`
A formatted float (0.0 - 1.0) indicating the certainty of the analysis.

### `@source`
A URI indicating the origin of the analysis (e.g., specific model endpoint).

### Example

```typescript
import { enrichAnalysis } from '../src/jsonld/enrichment.js';

const analysis = {
type: "transcript",
vendor: "openai",
body: "Hello world"
};

// Add confidence (0.98) and source
const enriched = enrichAnalysis(
analysis,
0.98,
"https://api.openai.com/v1/chat/completions"
);

// Result:
// {
// ...analysis,
// "@confidence": 0.98,
// "@source": "https://api.openai.com/v1/chat/completions"
// }
```

## 3. Integrity & Signing

Ensure vCons have not been tampered with by adding a cryptographic signature. The server uses a deterministic SHA-256 hash of the vCon content (excluding the `@integrity` field itself).

### Signing a vCon

```typescript
import { signVCon } from '../src/jsonld/integrity.js';

const vcon = { ... };
const signedVCon = signVCon(vcon);

console.log(signedVCon['@integrity']);
// Outputs: "sha256-a1b2c3d4..."
```

### Verifying Integrity

Verification recalculates the hash and compares it to the `@integrity` field.

```typescript
import { verifyIntegrity } from '../src/jsonld/integrity.js';

const isValid = verifyIntegrity(signedVCon);

if (isValid) {
console.log("vCon is authentic and untampered.");
} else {
console.error("Integrity check failed! Data may be corrupted or tampered.");
}
```

### How it Verification Works
1. Removes the existing `@integrity` field.
2. Serializes the JSON using `fast-json-stable-stringify` (deterministic ordering).
3. Computes SHA-256 hash.
4. Compares computed hash with the provided hash.

## Best Practices

* **Sign Last**: Always sign the vCon *after* all modifications (including enrichment) are complete.
* **Enrich First**: Add confidence scores and sources before signing so they are protected by the integrity hash.
* **Transport**: JSON-LD vCons are valid JSON and can be stored/transmitted exactly like standard vCons.
62 changes: 62 additions & 0 deletions docs/mongodb/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# MongoDB Architecture in vCon MCP Server

This document outlines the architectural design of the MongoDB integration within the vCon MCP Server.

## Overview

The server supports a dual-database architecture, allowing it to run with either Supabase (PostgreSQL) or MongoDB as the backend. This is achieved through strict interface abstraction and dynamic dependency injection.

## Core Interfaces

All database interactions are governed by the following interfaces defined in `src/db/interfaces.ts` and `src/db/types.ts`:

1. **`IVConQueries`**:
- Defines CRUD operations for vCons (Create, Read, Update, Delete).
- Defines Search operations (Keyword, Semantic, Hybrid).
2. **`IDatabaseInspector`**:
- Provides methods to inspect database structure (collections/tables, indexes, schema).
- Provides database statistics.
3. **`IDatabaseAnalytics`**:
- Provides business logic analytics (growth trends, tagging stats, attachment breakdowns).
4. **`IDatabaseSizeAnalyzer`**:
- Analyzes storage usage and provides smart recommendations for query limits.

## MongoDB Implementation

The MongoDB implementation resides in `src/db/`:

| Component | Class | File | description |
| :--- | :--- | :--- | :--- |
| **Client** | `MongoDatabaseClient` | `mongo-client.ts` | Manages connection pool. |
| **Queries** | `MongoVConQueries` | `mongo-queries.ts` | Implements `IVConQueries`. |
| **Inspector** | `MongoDatabaseInspector` | `mongo-inspector.ts` | Implements `IDatabaseInspector`. |
| **Analytics** | `MongoDatabaseAnalytics` | `mongo-analytics.ts` | Implements `IDatabaseAnalytics`. |
| **Size** | `MongoDatabaseSizeAnalyzer` | `mongo-size-analyzer.ts` | Implements `IDatabaseSizeAnalyzer`. |

### Data Model

- **Collection**: `vcons`
- Stores the full vCon object as a single document.
- Uses MongoDB Text Index for keyword search.
- **Collection**: `vcon_embeddings`
- Stores vector embeddings separately to allow for optimized vector search.
- Fields: `vcon_id`, `embedding` (array of floats), `created_at`.
- Uses Atlas Vector Search Index (`vector_index`) for semantic search.

### Aggregation Pipelines

Complex analytics are implemented using MongoDB Aggregation Framework.
- **Growth Trends**: Uses `$group` by date parts of `created_at`.
- **Tag Analytics**: Uses `$project` and `$unwind` to normalize tag arrays/objects before grouping.
- **Vector Search**: Uses `$vectorSearch` stage (available in Atlas) for similarity search.

## Dependency Injection

The server determines which backend to use at runtime in `src/server/setup.ts`:

1. Checks `process.env.DB_TYPE`.
2. If `'mongodb'`, dynamically imports MongoDB classes and initializes `MongoClient`.
3. If `'supabase'` (default), initializes `SupabaseClient`.
4. Injects the selected implementation into `ServerContext`.

This allows the core server logic and MCP tools to remain agnostic of the underlying database.
39 changes: 39 additions & 0 deletions docs/mongodb/indexes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# MongoDB Atlas Vector Search Index Definition

To enable Semantic Search and Hybrid Search, you must create a Vector Search Index on the `vcon_embeddings` collection in MongoDB Atlas.

## 1. Create the Index

1. Go to your MongoDB Atlas Cluster.
2. Navigate to **Atlas Search** -> **Vector Search**.
3. Click **Create Index**.
4. Select your Database and the **`vcon_embeddings`** collection.
5. Enter the **Index Name**: `vector_index` (This name is hardcoded in `MongoVConQueries`).
6. Choose **JSON Editor** and paste the following configuration:

```json
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 384,
"similarity": "cosine"
},
{
"type": "filter",
"path": "vcon_id"
},
{
"type": "filter",
"path": "content_type"
}
]
}
```

## 2. Verify

Once the index status changes to **Active**, the `semanticSearch` functionality in the vCon MCP server will automatically start returning results.

> **Note**: The `numDimensions` is set to **384** to match the `text-embedding-3-small` (or similar) model used in `embed-vcons.ts`. If you change the embedding model, update this value accordingly.
84 changes: 84 additions & 0 deletions docs/mongodb/setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# MongoDB Setup Guide for vCon MCP Server

This guide provides instructions on how to set up the vCon MCP Server with a MongoDB backend.

## Prerequisites
- **Node.js**: v18 or higher
- **MongoDB**: v6.0 or higher (Atlas recommended for Vector Search)
- **OpenAI API Key**: Required for generating embeddings

## 1. Environment Configuration

Create or update your `.env` file with the following variables:

```env
# Database Selection (mongodb or supabase)
DB_TYPE=mongodb

# MongoDB Connection String
# Format: mongodb+srv://<username>:<password>@<cluster>.mongodb.net/?appName=<appname>
MONGO_URL=mongodb+srv://user:pass@cluster.mongodb.net/?appName=vcon-app

# Optional: Specific Database Name (default: vcon)
MONGO_DB_NAME=vcon

# Embedding Configuration (Required for Vector Search)
OPENAI_API_KEY=sk-proj-...
```

## 2. Atlas Vector Search Setup

To enable Semantic and Hybrid search, you must create a Vector Search Index on your MongoDB Atlas cluster.

1. **Create Collection**: Ensure the `vcon_embeddings` collection exists in your database.
2. **Create Index**:
- Go to **Atlas UI** -> **Database** -> **Search**.
- Click **Create Search Index**.
- Select **JSON Editor**.
- Select output database and collection: `vcon.vcon_embeddings`.
- Name the index: `vector_index`.
- Input the following definition:

```json
{
"fields": [
{
"numDimensions": 1536,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
},
{
"path": "vcon_id",
"type": "filter"
},
{
"path": "created_at",
"type": "filter"
}
]
}
```

> [!NOTE]
> If you are using a different embedding model (e.g., Azure OpenAI), ensure `numDimensions` matches your model's output (e.g., 1536 for text-embedding-3-small).

## 3. Text Search Index

For standard keyword search functionality, a text index is required on the `vcons` collection. The server will attempt to create this automatically on startup, but you can also create it manually:

```javascript
db.vcons.createIndex({ "$**": "text" }, { name: "TextIndex" })
```

## 4. Verification

Run the verification scripts to ensure everything is configured correctly:

```bash
# Verify Core CRUD and Search
npx tsx scripts/verify-mongo.ts

# Verify Analytics and Inspector
npx tsx scripts/verify-mongo-analytics.ts
```
Loading