diff --git a/CHANGELOG.md b/CHANGELOG.md index 89bbd01..b5064c5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -69,7 +69,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - OpenTelemetry-based tracing with OTLP export - Distributed caching with Rails.cache backend and stampede protection - Prompt management (text and chat) with Mustache templating -- In-memory caching with TTL and LRU eviction +- In-memory caching with TTL and bounded expiration-ordered eviction - Fallback prompt support - Global configuration pattern with `Langfuse.configure` diff --git a/docs/API_REFERENCE.md b/docs/API_REFERENCE.md index 1cd8be3..f0f9adf 100644 --- a/docs/API_REFERENCE.md +++ b/docs/API_REFERENCE.md @@ -43,8 +43,8 @@ Block receives a configuration object with these properties: | `cache_max_size` | Integer | No | `1000` | Max cached prompts | | `cache_backend` | Symbol | No | `:memory` | `:memory` or `:rails` | | `cache_lock_timeout` | Integer | No | `10` | Lock timeout (seconds) | -| `cache_stale_while_revalidate` | Boolean | No | `false` | Enable SWR (requires stale TTL) | -| `cache_stale_ttl` | Integer | No | `0` | Stale TTL (seconds, >0 enables) | +| `cache_stale_while_revalidate` | Boolean | No | `false` | Advisory SWR intent flag (effective activation depends on `cache_stale_ttl`) | +| `cache_stale_ttl` | Integer or `:indefinite` | No | `0` | Stale TTL (seconds, `>0` enables SWR) | | `cache_refresh_threads` | Integer | No | `5` | Background refresh threads | | `batch_size` | Integer | No | `50` | Score + trace export batch size | | `flush_interval` | Integer | No | `10` | Score + trace export interval (s) | @@ -149,7 +149,7 @@ get_prompt(name, version: nil, label: nil, fallback: nil, type: nil) | `name` | String | Yes | Prompt name | | `version` | Integer | No | Specific version (mutually exclusive with `label`) | | `label` | String | No | Version label (e.g., "production") | -| `fallback` | String | No | Fallback template if not found | +| `fallback` | String or Array | No | Fallback prompt if not found (`String` for text, `Array` for chat) | | `type` | Symbol | Conditional | `:text` or `:chat` (required if `fallback` provided) | **Returns:** `TextPromptClient` or `ChatPromptClient` @@ -163,7 +163,7 @@ get_prompt(name, version: nil, label: nil, fallback: nil, type: nil) **Examples:** ```ruby -# Latest version +# API default selection (no version/label sent) prompt = client.get_prompt("greeting") # Specific version @@ -218,6 +218,89 @@ messages = client.compile_prompt("chat-assistant", # => [{ role: :system, content: "..." }, { role: :user, content: "..." }] ``` +### `Client#create_prompt` + +Create a new prompt (or a new version if the name already exists). + +**Signature:** + +```ruby +create_prompt(name:, prompt:, type:, config: {}, labels: [], tags: [], commit_message: nil) +``` + +**Parameters:** + +| Parameter | Type | Required | Description | +| ---------------- | ------------------ | -------- | ------------------------------------------------------------------------ | +| `name` | String | Yes | Prompt name | +| `prompt` | String or Array | Yes | Prompt content (String for text, array of role/content hashes for chat) | +| `type` | Symbol | Yes | Prompt type (`:text` or `:chat`) | +| `config` | Hash | No | Prompt config metadata (for example model parameters) | +| `labels` | Array | No | Labels to assign (for example `["production"]`) | +| `tags` | Array | No | Tags for categorization | +| `commit_message` | String | No | Optional commit message | + +**Returns:** `TextPromptClient` or `ChatPromptClient` + +**Raises:** + +- `ArgumentError` for missing/invalid prompt type or content +- `UnauthorizedError` if credentials invalid +- `ApiError` on network/server errors + +**Example:** + +```ruby +prompt = client.create_prompt( + name: "support-assistant", + prompt: [ + { role: "system", content: "You are a helpful assistant for {{product}}" }, + { role: "user", content: "{{question}}" } + ], + type: :chat, + labels: ["staging"], + tags: ["support"], + config: { model: "gpt-4o-mini" } +) +``` + +### `Client#update_prompt` + +Update labels for an existing prompt version. + +**Signature:** + +```ruby +update_prompt(name:, version:, labels:) +``` + +**Parameters:** + +| Parameter | Type | Required | Description | +| --------- | ------------- | -------- | ----------------------------------------- | +| `name` | String | Yes | Prompt name | +| `version` | Integer | Yes | Prompt version to update | +| `labels` | Array | Yes | Replacement labels for that prompt version | + +**Returns:** `TextPromptClient` or `ChatPromptClient` + +**Raises:** + +- `ArgumentError` if `labels` is not an array +- `NotFoundError` if prompt/version not found +- `UnauthorizedError` if credentials invalid +- `ApiError` on network/server errors + +**Example:** + +```ruby +prompt = client.update_prompt( + name: "support-assistant", + version: 3, + labels: ["production"] +) +``` + ### `Client#list_prompts` List all prompts in the project. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index a2d08fc..2b3dcf3 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -95,7 +95,14 @@ end HTTP layer with Faraday: ```ruby -api_client = Langfuse::ApiClient.new(config, cache) +api_client = Langfuse::ApiClient.new( + public_key: config.public_key, + secret_key: config.secret_key, + base_url: config.base_url, + timeout: config.timeout, + logger: config.logger, + cache: cache +) prompt_data = api_client.get_prompt("name") ``` @@ -146,7 +153,7 @@ cached = cache.get(key) **Features:** - Thread-safe with Monitor - TTL expiration -- LRU eviction +- Bounded expiration-ordered eviction #### RailsCacheAdapter (Distributed) diff --git a/docs/CACHING.md b/docs/CACHING.md index 2d4ff76..3b3500a 100644 --- a/docs/CACHING.md +++ b/docs/CACHING.md @@ -19,14 +19,14 @@ For configuration options, see [CONFIGURATION.md](CONFIGURATION.md). The Langfuse Ruby SDK provides two caching backends to optimize prompt fetching: -1. **In-Memory Cache** (default) - Thread-safe, local cache with TTL and LRU eviction +1. **In-Memory Cache** (default) - Thread-safe, local cache with TTL and bounded expiration-ordered eviction 2. **Rails.cache Backend** - Distributed caching with Redis/Memcached -Both backends support TTL-based expiration and automatic stampede protection (Rails.cache only). +Both backends support TTL-based expiration and stale-while-revalidate (SWR). Distributed stampede protection via locking is specific to the Rails.cache backend; the in-memory backend mitigates stampedes within a single process using Monitor-based single-flight locks. ## In-Memory Cache (Default) -The default caching backend stores prompts in memory with automatic TTL expiration and LRU eviction. +The default caching backend stores prompts in memory with automatic TTL expiration and bounded eviction when the cache reaches max size. ### Configuration @@ -42,7 +42,7 @@ end - **Thread-safe**: Uses Monitor-based synchronization - **TTL-based expiration**: Automatically expires after configured TTL -- **LRU eviction**: Removes least recently used prompts when max_size is reached +- **Bounded eviction**: When max_size is reached, removes the entry with earliest expiration (`stale_until`) - **Zero dependencies**: No external services required - **Fast**: ~1ms cache hits @@ -179,8 +179,8 @@ Total latency: ~1ms Langfuse.configure do |config| config.cache_backend = :memory # Works with both :memory and :rails config.cache_ttl = 300 # Fresh for 5 minutes - config.cache_stale_while_revalidate = true # Enable SWR - config.cache_stale_ttl = 300 # Serve stale for up to 5 minutes + config.cache_stale_while_revalidate = true # Advisory intent flag + config.cache_stale_ttl = 300 # `> 0` activates SWR; serve stale for up to 5 minutes end ``` @@ -419,7 +419,7 @@ puts "Cached #{results[:success].size} prompts" # Warm with different label results = warmer.warm_all(default_label: "staging") -# Warm latest versions (no label) +# Warm without a label (API-determined selection) results = warmer.warm_all(default_label: nil) ``` @@ -536,8 +536,8 @@ See [CONFIGURATION.md](CONFIGURATION.md) for all cache-related configuration opt **In-Memory Cache:** -- TTL expiration + LRU eviction -- Evicts least recently used when max_size reached +- TTL expiration + bounded eviction +- Evicts the entry with the earliest expiration (`stale_until`) when max_size is reached **Rails.cache:** @@ -564,7 +564,7 @@ config.cache_stale_while_revalidate = !Rails.env.development? # Production: enabled for best performance if Rails.env.production? - config.cache_stale_ttl = config.cache_ttl # Auto-set, but can customize + config.cache_stale_ttl = config.cache_ttl # Set explicitly (common default) end ``` diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index 1f63ef8..67095a7 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -121,26 +121,25 @@ See [CACHING.md](CACHING.md#stampede-protection) for details. - **Type:** Boolean - **Default:** `false` -- **Description:** Enable stale-while-revalidate caching pattern +- **Description:** Advisory SWR intent flag (effective SWR behavior is controlled by `cache_stale_ttl`) ```ruby -config.cache_stale_while_revalidate = true # Enable SWR +config.cache_stale_while_revalidate = true # Optional intent flag ``` -When enabled, serves stale cached data immediately while refreshing in the background. This dramatically reduces P99 latency by avoiding synchronous API waits after cache expiration. +This flag does not independently turn SWR on or off. SWR activates when `cache_stale_ttl > 0`; the flag exists only as an advisory indicator of intent. -**Behavior:** +**Behavior (driven by `cache_stale_ttl`):** -- `false` (default): Cache expires at TTL, next request waits for API (~100ms) -- `true`: After TTL, serves stale data instantly (~1ms) + refreshes in background +- `cache_stale_ttl <= 0` (default): Cache expires at TTL, next request waits for API (~100ms) +- `cache_stale_ttl > 0`: After TTL, serves stale data instantly (~1ms) + refreshes in background -**Important:** SWR only activates when `cache_stale_ttl` is a positive value. Set it explicitly (typically equal to `cache_ttl`). +**Important:** To activate SWR, set `cache_stale_ttl` to a positive value (typically equal to `cache_ttl`). **Compatibility:** - ✅ Works with `:memory` backend - ✅ Works with `:rails` backend -- Set `cache_stale_ttl` to a positive value to activate SWR (often the same as `cache_ttl`) See [CACHING.md](CACHING.md#stale-while-revalidate-swr) for detailed usage. @@ -414,7 +413,8 @@ Langfuse.configure do |config| config.secret_key = Rails.application.credentials.dig(:langfuse, :secret_key) config.cache_ttl = 300 # Longer TTL for stability config.cache_backend = :rails # Shared cache - config.cache_stale_while_revalidate = true # Enable SWR for best latency + config.cache_stale_while_revalidate = true # Advisory intent flag (SWR activates via cache_stale_ttl > 0) + config.cache_stale_ttl = 300 # Activates SWR config.timeout = 10 # Handle network variability config.logger = Rails.logger end @@ -435,7 +435,7 @@ Langfuse.configure do |config| config.public_key = 'pk-lf-test' config.secret_key = 'sk-lf-test' config.cache_backend = :memory # Isolated per-process cache - config.cache_stale_while_revalidate = false # Disable SWR for predictable tests + config.cache_stale_ttl = 0 # Disable SWR for predictable tests end ``` diff --git a/docs/PROMPTS.md b/docs/PROMPTS.md index df1060d..98469e7 100644 --- a/docs/PROMPTS.md +++ b/docs/PROMPTS.md @@ -183,13 +183,13 @@ prompt = client.get_prompt("greeting", label: "production") # => version 3 ### Best Practices -1. **Default to production:** Omitting `version`/`label` fetches the `production`-labeled prompt (matching JS/Python SDK behavior) -2. **Use labels in production:** Pin to `production` label for stability +1. **Be explicit in production:** Always pass `label: "production"` for deterministic selection +2. **Treat implicit selection as API-defined:** Omitting both `version` and `label` sends no selector — the Langfuse API decides which version to return 3. **Version for rollback:** Keep version numbers for emergency rollbacks ```ruby -# Development -prompt = client.get_prompt("greeting") # Latest version +# Development (API-defined selection) +prompt = client.get_prompt("greeting") # Production prompt = client.get_prompt("greeting", label: "production") # Stable diff --git a/lib/langfuse/cache_warmer.rb b/lib/langfuse/cache_warmer.rb index 32cb556..4a36ca4 100644 --- a/lib/langfuse/cache_warmer.rb +++ b/lib/langfuse/cache_warmer.rb @@ -85,7 +85,7 @@ def warm(prompt_names, versions: {}, labels: {}) # @example Warm with a different default label # results = warmer.warm_all(default_label: "staging") # - # @example Warm without any label (latest versions) + # @example Warm without any label (API-determined selection) # results = warmer.warm_all(default_label: nil) # # @example With specific versions for some prompts