Skip to content

Advanced Topics

hitchhiker edited this page Feb 7, 2026 · 6 revisions

Advanced Topics

For advanced use cases, customisation, and optimisation of Fold.


A-MEM Memory Evolution Tuning

A-MEM automatically suggests and creates links between related memories. You can tune how aggressive this linking is.

Configuration

# How many neighbours to consider for linking
A_MEM_NEIGHBOUR_COUNT=5              # Default: 5

# Minimum confidence score for auto-linking
A_MEM_MIN_CONFIDENCE=0.75            # Default: 0.75 (0-1)

# Whether to auto-create suggested links
A_MEM_AUTO_ACCEPT_LINKS=false        # Default: false (manual review)

How It Works

When a new memory is created:

  1. Find 5 nearest neighbours via vector similarity
  2. Ask LLM: "Should we link these memories?"
  3. LLM returns suggestions with confidence scores
  4. If AUTO_ACCEPT_LINKS=true, create links automatically
  5. Otherwise, store as pending suggestions for manual approval

Tuning Strategies

Aggressive linking (startups, fast-moving projects):

A_MEM_NEIGHBOUR_COUNT=10
A_MEM_MIN_CONFIDENCE=0.60
A_MEM_AUTO_ACCEPT_LINKS=true

Conservative linking (stable projects, mature codebases):

A_MEM_NEIGHBOUR_COUNT=3
A_MEM_MIN_CONFIDENCE=0.85
A_MEM_AUTO_ACCEPT_LINKS=false

Memory Decay Customisation

Fine-tune how recent and frequently-accessed memories are prioritised.

Parameters

# How long until memory strength halves (days)
DECAY_HALF_LIFE_DAYS=30

# Blend factor: 0=pure semantic, 1=pure strength
DECAY_STRENGTH_WEIGHT=0.3

Strength Formula

strength = recency × access_boost
recency = exp(-age / half_life)
access_boost = log(retrieval_count + 1)
combined_score = (1 - weight) × relevance + weight × strength

Project-Specific Tuning

Per-project configuration in fold/project.toml:

[decay]
half_life_days = 30          # How quickly memories fade
strength_weight = 0.3        # How much decay affects ranking

Examples:

Fast-moving projects (recent context matters):

[decay]
half_life_days = 7           # Fade in a week
strength_weight = 0.5        # 50% weight to recency

Reference projects (decisions are timeless):

[decay]
half_life_days = 365         # Fade in a year
strength_weight = 0.1        # Only 10% weight to recency

LLM Provider Configuration

Fold supports multiple LLM providers with automatic fallback.

Provider Priority Chain

GOOGLE_API_KEY=...           # Try Gemini first
ANTHROPIC_API_KEY=...        # Fallback to Claude
OPENAI_API_KEY=...           # Fallback to OpenAI
OPENROUTER_API_KEY=...       # Last resort

Fold tries providers in order. If Gemini times out, it tries Claude. If Claude is rate-limited, it tries OpenAI.

Cost Optimisation

Minimise costs (Gemini free tier):

GOOGLE_API_KEY=...           # Only set this
# Don't set others

Automatic failover (redundancy):

GOOGLE_API_KEY=...           # Primary (cheapest)
OPENROUTER_API_KEY=...       # Fallback (more expensive)

Load balancing (OpenRouter):

OPENROUTER_API_KEY=...       # Single endpoint, multiple models

Temperature and Parameters

# Model parameters (per-provider)
# Set via environment or code configuration
LLM_TEMPERATURE=0.7          # Creativity (0=deterministic, 1=creative)
LLM_MAX_TOKENS=1024          # Max output length

Custom Embedding Models

Configure which embedding model to use for semantic search.

Built-in Options

# Gemini embeddings (fast, 768 dimensions)
EMBEDDING_PROVIDER=gemini
EMBEDDING_MODEL=gemini-embedding-001

# OpenAI embeddings (high quality, 1536 dimensions)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small

Dimension Implications

Larger dimensions = more expressive but slower:

Dimensions Speed Quality Use Case
384 Fast Good Code and technical docs
768 Medium Very Good General purpose
1536 Slower Excellent Complex semantic understanding

Fold Storage Customisation

Memories are stored in fold/ as markdown files with git-native storage.

Directory Structure

fold/
├── a/b/hash1.md       # Hash-based storage
├── a/c/hash2.md
├── 9/a/hash3.md
└── project.toml       # Per-project config

Why hash-based:

  • Repo path determines identity (SHA256 first 16 chars)
  • Stable identity across content changes
  • Deterministic paths

Per-Project Configuration

Create fold/project.toml:

[project]
id = "proj_abc123"
slug = "my-app"
name = "My Application"

[indexing]
# Include patterns (customise as needed)
# See Configuration docs for full list of 50+ supported file types
include = [
  "**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx",
  "**/*.py", "**/*.rs", "**/*.go", "**/*.java",
  "**/*.cs", "**/*.kt", "**/*.swift", "**/*.rb",
  "**/*.md", "**/*.json", "**/*.yaml", "**/*.txt"
]

# Exclude patterns
exclude = [
  "node_modules/**",
  "dist/**",
  "*.test.ts",
  "*.spec.ts"
]

# Skip large files (KB)
max_file_size = 100

[embedding]
provider = "gemini"
model = "gemini-embedding-001"
dimension = 768

[decay]
half_life_days = 30
strength_weight = 0.3

Rebuilding from Fold

If your SQLite database becomes corrupted, rebuild it from fold/:

# Fold will auto-detect and rebuild on startup
# Or manually trigger:
curl -X POST http://localhost:8765/api/projects/my-app/index/rebuild \
  -H "Authorization: Bearer $TOKEN"

Job Queue Tuning

Background jobs (indexing, embedding generation) are queued and processed asynchronously.

Configuration

# Max concurrent jobs
JOB_WORKER_THREADS=4

# Job timeout (seconds)
JOB_TIMEOUT=300              # 5 minutes

# Max retry attempts
JOB_MAX_RETRIES=3

# Retry backoff (exponential)
JOB_INITIAL_BACKOFF=5        # Start with 5 seconds
JOB_MAX_BACKOFF=600          # Cap at 10 minutes

Monitoring

# Check job queue status
curl http://localhost:8765/status/jobs

# Get job details
curl http://localhost:8765/status/jobs/{job_id}

Job Types

Job Type Purpose Typical Duration
index_repo Index files from push 5-60 seconds
reindex_repo Full reindex 1-10 minutes
process_webhook Handle webhook events <1 second
generate_embedding Create vector embedding 1-5 seconds

Rate Limiting and Quotas

Control API usage and prevent abuse.

Configuration

# Requests per minute per token
RATE_LIMIT_REQUESTS=60

# Rate limit window (seconds)
RATE_LIMIT_WINDOW=60

# Burst allowance
RATE_LIMIT_BURST=10

Response Headers

When rate limited:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1643817000

Retry after Retry-After header value.


Observability and Monitoring

Logging

Control log verbosity:

# Detailed logs
RUST_LOG=fold=debug,tower_http=debug

# Production (minimal)
RUST_LOG=fold=info

# Specific modules
RUST_LOG=fold::services::memory=debug,fold::api=info

Metrics

Expose Prometheus metrics:

# Metrics endpoint
GET http://localhost:8765/metrics

Key metrics:

  • fold_memories_indexed_total - Total memories created
  • fold_search_latency_seconds - Search response time
  • fold_embeddings_generated_total - Embeddings created
  • fold_jobs_processed_total - Background jobs completed

Health Checks

# Basic health check
curl http://localhost:8765/health

# Detailed status
curl http://localhost:8765/status/jobs

Security Hardening

API Key Rotation

Rotate LLM API keys regularly:

# Update .env or environment
GOOGLE_API_KEY=new-key-here

# Restart Fold (no downtime with load balancer)
docker restart fold

Token Security

# API tokens stored as hashed values
# Only shown once at creation

# Rotate tokens regularly
# Settings → API Tokens → Delete old tokens

# Use short-lived tokens where possible

Database Backups

# Daily backup (encrypted)
0 2 * * * tar czf - /var/lib/fold | \
  openssl enc -aes-256-cbc > /backups/fold-$(date +%Y%m%d).tar.gz.enc

# Test restore monthly

Performance Optimisation

Query Performance

Enable caching for frequently-accessed memories:

# Redis cache (optional)
REDIS_URL=redis://localhost:6379

# Cache TTL
CACHE_TTL_SECONDS=3600

Batch Operations

Index multiple files at once:

# Batch indexing reduces overhead
POST /api/projects/slug/memories/batch
{
  "memories": [
    { "type": "codebase", "content": "...", "file_path": "..." },
    { "type": "codebase", "content": "...", "file_path": "..." }
  ]
}

Vector Search Optimisation

Configure Qdrant for performance:

# In docker-compose.yml
qdrant:
  environment:
    QDRANT_HNSW_EF_CONSTRUCT: 400    # Build index quality
    QDRANT_HNSW_M: 16               # Connections per node
    QDRANT_HNSW_EF_SEARCH: 100      # Search quality

Higher values = better quality but slower indexing.


Troubleshooting Advanced Issues

Memories not linking

Check A-MEM is enabled:

# Verify A_MEM configuration
echo $A_MEM_AUTO_ACCEPT_LINKS

# Check pending suggestions
curl http://localhost:8765/api/projects/slug/memories/{id}/suggested-links

Decay not affecting search

Verify decay is enabled:

# Check decay weight
echo $DECAY_STRENGTH_WEIGHT

# Search with decay disabled to compare
POST /api/projects/slug/search
{
  "query": "...",
  "include_decay": false
}

Slow embeddings

Check LLM provider:

# Verify provider is responding
curl https://generativelanguage.googleapis.com/v1/models/list \
  -H "Authorization: Bearer $GOOGLE_API_KEY"

# Check job queue for stuck jobs
curl http://localhost:8765/status/jobs?status=processing

Clone this wiki locally