diff --git a/CHANGELOG.md b/CHANGELOG.md
index 89bbd01..b5064c5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -69,7 +69,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - OpenTelemetry-based tracing with OTLP export
 - Distributed caching with Rails.cache backend and stampede protection
 - Prompt management (text and chat) with Mustache templating
-- In-memory caching with TTL and LRU eviction
+- In-memory caching with TTL and bounded expiration-ordered eviction
 - Fallback prompt support
 - Global configuration pattern with `Langfuse.configure`
 
diff --git a/docs/API_REFERENCE.md b/docs/API_REFERENCE.md
index 1cd8be3..f0f9adf 100644
--- a/docs/API_REFERENCE.md
+++ b/docs/API_REFERENCE.md
@@ -43,8 +43,8 @@ Block receives a configuration object with these properties:
 | `cache_max_size`               | Integer | No       | `1000`                         | Max cached prompts                |
 | `cache_backend`                | Symbol  | No       | `:memory`                      | `:memory` or `:rails`             |
 | `cache_lock_timeout`           | Integer | No       | `10`                           | Lock timeout (seconds)            |
-| `cache_stale_while_revalidate` | Boolean | No       | `false`                        | Enable SWR (requires stale TTL)   |
-| `cache_stale_ttl`              | Integer | No       | `0`                            | Stale TTL (seconds, >0 enables)   |
+| `cache_stale_while_revalidate` | Boolean | No       | `false`                        | Advisory SWR intent flag (effective activation depends on `cache_stale_ttl`) |
+| `cache_stale_ttl`              | Integer or `:indefinite` | No | `0`                  | Stale TTL (seconds, `>0` enables SWR) |
 | `cache_refresh_threads`        | Integer | No       | `5`                            | Background refresh threads        |
 | `batch_size`                   | Integer | No       | `50`                           | Score + trace export batch size   |
 | `flush_interval`               | Integer | No       | `10`                           | Score + trace export interval (s) |
@@ -149,7 +149,7 @@ get_prompt(name, version: nil, label: nil, fallback: nil, type: nil)
 | `name`     | String  | Yes         | Prompt name                                          |
 | `version`  | Integer | No          | Specific version (mutually exclusive with `label`)   |
 | `label`    | String  | No          | Version label (e.g., "production")                   |
-| `fallback` | String  | No          | Fallback template if not found                       |
+| `fallback` | String or Array<Hash> | No | Fallback prompt if not found (`String` for text, `Array<Hash>` for chat) |
 | `type`     | Symbol  | Conditional | `:text` or `:chat` (required if `fallback` provided) |
 
 **Returns:** `TextPromptClient` or `ChatPromptClient`
@@ -163,7 +163,7 @@ get_prompt(name, version: nil, label: nil, fallback: nil, type: nil)
 **Examples:**
 
 ```ruby
-# Latest version
+# API default selection (no version/label sent)
 prompt = client.get_prompt("greeting")
 
 # Specific version
@@ -218,6 +218,89 @@ messages = client.compile_prompt("chat-assistant",
 # => [{ role: :system, content: "..." }, { role: :user, content: "..." }]
 ```
 
+### `Client#create_prompt`
+
+Create a new prompt (or a new version if the name already exists).
+
+**Signature:**
+
+```ruby
+create_prompt(name:, prompt:, type:, config: {}, labels: [], tags: [], commit_message: nil)
+```
+
+**Parameters:**
+
+| Parameter        | Type               | Required | Description                                                              |
+| ---------------- | ------------------ | -------- | ------------------------------------------------------------------------ |
+| `name`           | String             | Yes      | Prompt name                                                              |
+| `prompt`         | String or Array<Hash> | Yes   | Prompt content (String for text, array of role/content hashes for chat) |
+| `type`           | Symbol             | Yes      | Prompt type (`:text` or `:chat`)                                         |
+| `config`         | Hash               | No       | Prompt config metadata (for example model parameters)                    |
+| `labels`         | Array<String>      | No       | Labels to assign (for example `["production"]`)                          |
+| `tags`           | Array<String>      | No       | Tags for categorization                                                  |
+| `commit_message` | String             | No       | Optional commit message                                                  |
+
+**Returns:** `TextPromptClient` or `ChatPromptClient`
+
+**Raises:**
+
+- `ArgumentError` for missing/invalid prompt type or content
+- `UnauthorizedError` if credentials invalid
+- `ApiError` on network/server errors
+
+**Example:**
+
+```ruby
+prompt = client.create_prompt(
+  name: "support-assistant",
+  prompt: [
+    { role: "system", content: "You are a helpful assistant for {{product}}" },
+    { role: "user", content: "{{question}}" }
+  ],
+  type: :chat,
+  labels: ["staging"],
+  tags: ["support"],
+  config: { model: "gpt-4o-mini" }
+)
+```
+
+### `Client#update_prompt`
+
+Update labels for an existing prompt version.
+
+**Signature:**
+
+```ruby
+update_prompt(name:, version:, labels:)
+```
+
+**Parameters:**
+
+| Parameter | Type          | Required | Description                               |
+| --------- | ------------- | -------- | ----------------------------------------- |
+| `name`    | String        | Yes      | Prompt name                               |
+| `version` | Integer       | Yes      | Prompt version to update                  |
+| `labels`  | Array<String> | Yes      | Replacement labels for that prompt version |
+
+**Returns:** `TextPromptClient` or `ChatPromptClient`
+
+**Raises:**
+
+- `ArgumentError` if `labels` is not an array
+- `NotFoundError` if prompt/version not found
+- `UnauthorizedError` if credentials invalid
+- `ApiError` on network/server errors
+
+**Example:**
+
+```ruby
+prompt = client.update_prompt(
+  name: "support-assistant",
+  version: 3,
+  labels: ["production"]
+)
+```
+
 ### `Client#list_prompts`
 
 List all prompts in the project.
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index a2d08fc..2b3dcf3 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -95,7 +95,14 @@ end
 HTTP layer with Faraday:
 
 ```ruby
-api_client = Langfuse::ApiClient.new(config, cache)
+api_client = Langfuse::ApiClient.new(
+  public_key: config.public_key,
+  secret_key: config.secret_key,
+  base_url: config.base_url,
+  timeout: config.timeout,
+  logger: config.logger,
+  cache: cache
+)
 prompt_data = api_client.get_prompt("name")
 ```
 
@@ -146,7 +153,7 @@ cached = cache.get(key)
 **Features:**
 - Thread-safe with Monitor
 - TTL expiration
-- LRU eviction
+- Bounded expiration-ordered eviction
 
 #### RailsCacheAdapter (Distributed)
 
diff --git a/docs/CACHING.md b/docs/CACHING.md
index 2d4ff76..3b3500a 100644
--- a/docs/CACHING.md
+++ b/docs/CACHING.md
@@ -19,14 +19,14 @@ For configuration options, see [CONFIGURATION.md](CONFIGURATION.md).
 
 The Langfuse Ruby SDK provides two caching backends to optimize prompt fetching:
 
-1. **In-Memory Cache** (default) - Thread-safe, local cache with TTL and LRU eviction
+1. **In-Memory Cache** (default) - Thread-safe, local cache with TTL and bounded expiration-ordered eviction
 2. **Rails.cache Backend** - Distributed caching with Redis/Memcached
 
-Both backends support TTL-based expiration and automatic stampede protection (Rails.cache only).
+Both backends support TTL-based expiration and stale-while-revalidate (SWR). Distributed stampede protection via locking is specific to the Rails.cache backend; the in-memory backend mitigates stampedes within a single process using Monitor-based single-flight locks.
 
 ## In-Memory Cache (Default)
 
-The default caching backend stores prompts in memory with automatic TTL expiration and LRU eviction.
+The default caching backend stores prompts in memory with automatic TTL expiration and bounded eviction when the cache reaches max size.
 
 ### Configuration
 
@@ -42,7 +42,7 @@ end
 
 - **Thread-safe**: Uses Monitor-based synchronization
 - **TTL-based expiration**: Automatically expires after configured TTL
-- **LRU eviction**: Removes least recently used prompts when max_size is reached
+- **Bounded eviction**: When max_size is reached, removes the entry with earliest expiration (`stale_until`)
 - **Zero dependencies**: No external services required
 - **Fast**: ~1ms cache hits
 
@@ -179,8 +179,8 @@ Total latency: ~1ms
 Langfuse.configure do |config|
   config.cache_backend = :memory  # Works with both :memory and :rails
   config.cache_ttl = 300  # Fresh for 5 minutes
-  config.cache_stale_while_revalidate = true  # Enable SWR
-  config.cache_stale_ttl = 300  # Serve stale for up to 5 minutes
+  config.cache_stale_while_revalidate = true  # Advisory intent flag
+  config.cache_stale_ttl = 300  # `> 0` activates SWR; serve stale for up to 5 minutes
 end
 ```
 
@@ -419,7 +419,7 @@ puts "Cached #{results[:success].size} prompts"
 # Warm with different label
 results = warmer.warm_all(default_label: "staging")
 
-# Warm latest versions (no label)
+# Warm without a label (API-determined selection)
 results = warmer.warm_all(default_label: nil)
 ```
 
@@ -536,8 +536,8 @@ See [CONFIGURATION.md](CONFIGURATION.md) for all cache-related configuration opt
 
 **In-Memory Cache:**
 
-- TTL expiration + LRU eviction
-- Evicts least recently used when max_size reached
+- TTL expiration + bounded eviction
+- Evicts the entry with the earliest expiration (`stale_until`) when max_size is reached
 
 **Rails.cache:**
 
@@ -564,7 +564,7 @@ config.cache_stale_while_revalidate = !Rails.env.development?
 
 # Production: enabled for best performance
 if Rails.env.production?
-  config.cache_stale_ttl = config.cache_ttl  # Auto-set, but can customize
+  config.cache_stale_ttl = config.cache_ttl  # Set explicitly (common default)
 end
 ```
 
diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md
index 1f63ef8..67095a7 100644
--- a/docs/CONFIGURATION.md
+++ b/docs/CONFIGURATION.md
@@ -121,26 +121,25 @@ See [CACHING.md](CACHING.md#stampede-protection) for details.
 
 - **Type:** Boolean
 - **Default:** `false`
-- **Description:** Enable stale-while-revalidate caching pattern
+- **Description:** Advisory SWR intent flag (effective SWR behavior is controlled by `cache_stale_ttl`)
 
 ```ruby
-config.cache_stale_while_revalidate = true  # Enable SWR
+config.cache_stale_while_revalidate = true  # Optional intent flag
 ```
 
-When enabled, serves stale cached data immediately while refreshing in the background. This dramatically reduces P99 latency by avoiding synchronous API waits after cache expiration.
+This flag does not independently turn SWR on or off. SWR activates when `cache_stale_ttl > 0`; the flag exists only as an advisory indicator of intent.
 
-**Behavior:**
+**Behavior (driven by `cache_stale_ttl`):**
 
-- `false` (default): Cache expires at TTL, next request waits for API (~100ms)
-- `true`: After TTL, serves stale data instantly (~1ms) + refreshes in background
+- `cache_stale_ttl <= 0` (default): Cache expires at TTL, next request waits for API (~100ms)
+- `cache_stale_ttl > 0`: After TTL, serves stale data instantly (~1ms) + refreshes in background
 
-**Important:** SWR only activates when `cache_stale_ttl` is a positive value. Set it explicitly (typically equal to `cache_ttl`).
+**Important:** To activate SWR, set `cache_stale_ttl` to a positive value (typically equal to `cache_ttl`).
 
 **Compatibility:**
 
 - ✅ Works with `:memory` backend
 - ✅ Works with `:rails` backend
-- Set `cache_stale_ttl` to a positive value to activate SWR (often the same as `cache_ttl`)
 
 See [CACHING.md](CACHING.md#stale-while-revalidate-swr) for detailed usage.
 
@@ -414,7 +413,8 @@ Langfuse.configure do |config|
   config.secret_key = Rails.application.credentials.dig(:langfuse, :secret_key)
   config.cache_ttl = 300  # Longer TTL for stability
   config.cache_backend = :rails  # Shared cache
-  config.cache_stale_while_revalidate = true  # Enable SWR for best latency
+  config.cache_stale_while_revalidate = true  # Advisory intent flag (SWR activates via cache_stale_ttl > 0)
+  config.cache_stale_ttl = 300  # Activates SWR
   config.timeout = 10  # Handle network variability
   config.logger = Rails.logger
 end
@@ -435,7 +435,7 @@ Langfuse.configure do |config|
   config.public_key = 'pk-lf-test'
   config.secret_key = 'sk-lf-test'
   config.cache_backend = :memory  # Isolated per-process cache
-  config.cache_stale_while_revalidate = false  # Disable SWR for predictable tests
+  config.cache_stale_ttl = 0      # Disable SWR for predictable tests
 end
 ```
 
diff --git a/docs/PROMPTS.md b/docs/PROMPTS.md
index df1060d..98469e7 100644
--- a/docs/PROMPTS.md
+++ b/docs/PROMPTS.md
@@ -183,13 +183,13 @@ prompt = client.get_prompt("greeting", label: "production")  # => version 3
 
 ### Best Practices
 
-1. **Default to production:** Omitting `version`/`label` fetches the `production`-labeled prompt (matching JS/Python SDK behavior)
-2. **Use labels in production:** Pin to `production` label for stability
+1. **Be explicit in production:** Always pass `label: "production"` for deterministic selection
+2. **Treat implicit selection as API-defined:** Omitting both `version` and `label` sends no selector — the Langfuse API decides which version to return
 3. **Version for rollback:** Keep version numbers for emergency rollbacks
 
 ```ruby
-# Development
-prompt = client.get_prompt("greeting")  # Latest version
+# Development (API-defined selection)
+prompt = client.get_prompt("greeting")
 
 # Production
 prompt = client.get_prompt("greeting", label: "production")  # Stable
diff --git a/lib/langfuse/cache_warmer.rb b/lib/langfuse/cache_warmer.rb
index 32cb556..4a36ca4 100644
--- a/lib/langfuse/cache_warmer.rb
+++ b/lib/langfuse/cache_warmer.rb
@@ -85,7 +85,7 @@ def warm(prompt_names, versions: {}, labels: {})
     # @example Warm with a different default label
     #   results = warmer.warm_all(default_label: "staging")
     #
-    # @example Warm without any label (latest versions)
+    # @example Warm without any label (API-determined selection)
     #   results = warmer.warm_all(default_label: nil)
     #
     # @example With specific versions for some prompts