feat: Multi-tenancy for Parseable server #1518

parmesant · 2026-01-13T04:15:02Z

This is a WIP

PR to introduce multi-tenancy to parseable server

Description

This PR has:

been tested to ensure log ingestion and log query works.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added documentation for new or modified features or behaviors.

Summary by CodeRabbit

New Features
- Multi-tenant mode: per-tenant isolation for streams, alerts, targets, dashboards, filters, correlations, hot-tier, tenants management.
Improvements
- Tenant-aware ingestion, querying, execution, storage, metrics, retention, migrations, and event/OTEL pipelines.
- RBAC, auth, and HTTP handlers now honor tenant context across UI/API.
- Metastore, schema, manifests, and object-store operations support tenant scoping.
Chores
- CLI flag and runtime plumbing to enable/toggle multi-tenancy; tenant suspension/resume utilities added.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-13T04:15:14Z

Walkthrough

Adds optional tenant_id across the codebase, threading tenant context through alerts, metastore/object-store, parseable/streams, HTTP handlers/middleware, RBAC, query execution, storage/hot-tier/retention, CLI, tenants management, and related utilities. Changes are purely tenant-scoping plumbing and API surface adjustments.

Changes

Cohort / File(s)	Summary
Alerts `src/alerts/...` `src/alerts/alert_structs.rs`, `src/alerts/alert_traits.rs`, `src/alerts/alert_types.rs`, `src/alerts/alerts_utils.rs`, `src/alerts/mod.rs`, `src/alerts/target.rs`	Threaded `tenant_id` into alert structs, traits, manager APIs, MTTR/history, targets and in-memory maps (now tenant → map). Many public signatures changed to accept tenant context; validations and path resolution now include tenant.
Metastore / Object-store `src/metastore/...`, `src/metastore/metastores/object_store_metastore.rs`, `src/storage/...`	Added `tenant_id: &Option<String>` to most Metastore/ObjectStorage methods; many getters now return tenant-keyed collections. Path helpers (schema/stream/manifest/alert/mttr) and manifest/schema operations are tenant-prefixed.
Parseable & Streams `src/parseable/...`, `src/parseable/streams.rs`, `src/parseable/staging/mod.rs`	Introduced DEFAULT_TENANT and tenant lifecycle APIs. Streams and staging become per-tenant maps; stream create/check/load APIs accept tenant_id and thread it through stream lifecycle.
HTTP handlers & Middleware `src/handlers/http/...` (many files)	Many handlers now accept `HttpRequest` to extract tenant_id and propagate it to downstream services (streams, RBAC, alerts, targets, dashboards, ingest, LLM, prism). Middleware exposes tenant header and suspension checks.
RBAC / Users / Roles `src/rbac/...`, `src/users/...`	Roles, Users, UserGroups, sessions and permission resolution made tenant-scoped (nested maps). User and GroupUser carry tenant fields; added SuperAdmin changes and tenant-aware session/permission APIs.
Query / Execution / Schema provider `src/query/...`, `src/query/stream_schema_provider.rs`, `src/handlers/http/query.rs`	Query execution and session context accept tenant_id; QUERY_SESSION made per-tenant; schema provider, manifest collection and plan transforms accept tenant context.
Storage / Retention / Hot tier / Field stats `src/storage/...`, `src/hottier.rs`, `src/storage/field_stats.rs`, `src/storage/retention.rs`	Upload, retention, hot-tier, manifest/parquet and field-stats flows accept tenant_id; object-store upload/listing and snapshot/manifest paths are tenant-aware; metrics label helpers updated.
Events / Formats / CLI / Tenants `src/event/...`, `src/utils/mod.rs`, `src/cli.rs`, `src/tenants/mod.rs`, `src/migration/mod.rs`, `src/enterprise/utils.rs`	Event/EventFormat include tenant_id; utils add tenant extraction helpers; CLI Options add `multi_tenancy`; new `tenants` module (TENANT_METADATA); migration/enterprise helpers accept tenant_id.
Prism / Analytics / Stats `src/prism/...`, `src/analytics.rs`, `src/stats.rs`, `src/metrics/mod.rs`	Prism home/logstream and analytics iterate/aggregate per-tenant; stats and metrics gained tenant label dimension and tenant-aware collection/reporting.
UI & Misc (Dashboards, Filters, Correlations, etc.) `src/users/`, `src/correlation.rs`, `src/handlers/http/`	Dashboards, filters, correlations, and many HTTP endpoints moved to per-tenant storage and now accept tenant_id in their APIs and metastore calls.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant HTTP as HTTP Handler
  participant MW as Middleware
  participant RBAC
  participant Parseable
  participant Metastore
  participant ObjectStore

  Client->>HTTP: request (with tenant header)
  HTTP->>MW: forward HttpRequest
  MW->>RBAC: extract (user, tenant) & check suspension/permissions
  RBAC-->>MW: authorized (tenant_id)
  MW-->>HTTP: continue
  HTTP->>Parseable: ensure/get stream or perform action (tenant_id)
  Parseable->>Metastore: get_stream_json / schema (tenant_id)
  Metastore->>ObjectStore: read tenant-prefixed path (tenant_id)
  ObjectStore-->>Metastore: object bytes
  Metastore-->>Parseable: stream/schema bytes
  HTTP->>Metastore: put/get alerts/targets/metrics (tenant_id)
  Metastore->>ObjectStore: put/get tenant-scoped object
  ObjectStore-->>Metastore: OK
  Metastore-->>HTTP: result
  HTTP-->>Client: response

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Feat: Metastore #1424 — large metastore/object-store tenant-scoping and method-signature changes closely overlap this PR's metastore surface changes.
chore: simplify alert creation and evaluation #1388 — earlier alert subsystem refactor that touches AlertConfig/alert flows; overlaps tenant plumbing inserted here.
refactor: specialized flatten happens at the same semantic location #1177 — changes to ingest flattening utilities (flatten_and_push_logs / push_logs) that intersect tenant threading in the ingestion pipeline.

Suggested labels

for next release

Suggested reviewers

nikhilsinhaparseable
de-sh
nitisht

Poem

"I’m a rabbit in the code so spry,
I hop tenant_ids from low to high,
Maps nested cozy, paths all neat,
Per-tenant hops keep data sweet,
A little fluff — multi-tenant sky!" 🐇✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. While it correctly identifies the PR as WIP and mentions multi-tenancy, it fails to provide key details requested in the template: no goal description, no solution explanation with rationale, no summary of key changes, and all checklist items remain unchecked.	Complete the description by providing: (1) a clear goal statement, (2) an explanation of the chosen approach with rationale, (3) a summary of key changes, and (4) confirmation that testing, documentation, and code comments have been added per the checklist items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 67.97% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Multi-tenancy for Parseable server' clearly and concisely describes the main feature being added. It is specific, informative, and aligned with the substantial changes throughout the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (22)

src/hottier.rs (2)
208-220: Critical: delete_hot_tier does not use tenant_id when constructing the delete path.

The function accepts tenant_id and uses it for the existence check at line 213, but line 216 constructs the path without considering tenant_id:
let path = self.hot_tier_path.join(stream);
For multi-tenant deployments, this will delete the wrong directory (the non-tenant path) instead of the tenant-scoped path.
🐛 Proposed fix
 pub async fn delete_hot_tier(
     &self,
     stream: &str,
     tenant_id: &Option<String>,
 ) -> Result<(), HotTierError> {
     if !self.check_stream_hot_tier_exists(stream, tenant_id) {
         return Err(HotTierValidationError::NotFound(stream.to_owned()).into());
     }
-    let path = self.hot_tier_path.join(stream);
+    let path = if let Some(tid) = tenant_id.as_ref() {
+        self.hot_tier_path.join(tid).join(stream)
+    } else {
+        self.hot_tier_path.join(stream)
+    };
     fs::remove_dir_all(path).await?;

     Ok(())
 }
471-497: fetch_hot_tier_dates and get_stream_path_for_date must accept and use tenant_id parameter.

These functions construct paths without tenant awareness, while hot_tier_file_path() is already tenant-scoped. This causes a mismatch: cleanup_hot_tier_old_data() has access to tenant_id but cannot pass it to fetch_hot_tier_dates(), and process_parquet_file() cannot pass tenant_id to get_stream_path_for_date(). In multi-tenant deployments, this will cause incorrect path resolution for hot-tier data. Update both function signatures to accept tenant_id and construct paths as self.hot_tier_path.join(tenant_id).join(stream) when present, consistent with hot_tier_file_path().
src/handlers/http/alerts.rs (1)
209-244: Missing tenant_id in list endpoint - potential cross-tenant alert visibility.

The list handler does not extract tenant_id from the request, unlike all other handlers in this file. The list_alerts_for_user call may return alerts across all tenants instead of filtering by the requesting tenant's context.
🐛 Proposed fix to add tenant context
 pub async fn list(req: HttpRequest) -> Result<impl Responder, AlertError> {
     let session_key = extract_session_key_from_req(&req)?;
+    let tenant_id = get_tenant_id_from_request(&req);
     let query_map = web::Query::<HashMap<String, String>>::from_query(req.query_string())
         .map_err(|_| AlertError::InvalidQueryParameter("malformed query parameters".to_string()))?;
 
     // ... existing code ...
 
     // Fetch alerts for the user
     let alerts = alerts
-        .list_alerts_for_user(session_key, params.tags_list)
+        .list_alerts_for_user(session_key, params.tags_list, &tenant_id)
         .await?;
src/storage/store_metadata.rs (1)
301-323: Missing directory creation for tenant-specific staging path.

When tenant_id is provided, the code constructs a path under a tenant subdirectory (line 309), but doesn't ensure this directory exists. The OpenOptions::open() call will fail with NotFound if the tenant directory hasn't been created yet.
🐛 Proposed fix to ensure tenant directory exists
 pub fn put_staging_metadata(meta: &StorageMetadata, tenant_id: &Option<String>) -> io::Result<()> {
     let mut staging_metadata = meta.clone();
     staging_metadata.server_mode = PARSEABLE.options.mode;
     staging_metadata.staging = PARSEABLE.options.staging_dir().to_path_buf();
     let path = if let Some(tenant_id) = tenant_id.as_ref() {
-        PARSEABLE
+        let tenant_dir = PARSEABLE
             .options
             .staging_dir()
-            .join(tenant_id)
-            .join(PARSEABLE_METADATA_FILE_NAME)
+            .join(tenant_id);
+        fs::create_dir_all(&tenant_dir)?;
+        tenant_dir.join(PARSEABLE_METADATA_FILE_NAME)
     } else {
         PARSEABLE
             .options
             .staging_dir()
             .join(PARSEABLE_METADATA_FILE_NAME)
     };
src/handlers/http/targets.rs (2)
35-45: Missing tenant_id in post endpoint - targets created without tenant context.

The post handler doesn't extract tenant_id from the request, unlike list, get, update, and delete. This could result in targets being created without proper tenant association, breaking tenant isolation.
🐛 Proposed fix to add tenant context
 // POST /targets
 pub async fn post(
-    _req: HttpRequest,
+    req: HttpRequest,
     Json(target): Json<Target>,
 ) -> Result<impl Responder, AlertError> {
+    let tenant_id = get_tenant_id_from_request(&req);
     // should check for duplicacy and liveness (??)
     // add to the map
-    TARGETS.update(target.clone()).await?;
+    TARGETS.update(target.clone(), &tenant_id).await?;
 
     // Ok(web::Json(target.mask()))
     Ok(web::Json(target))
 }
72-98: update handler missing tenant_id in TARGETS.update call.

While tenant_id is correctly extracted and used to fetch old_target, the subsequent TARGETS.update(target.clone()) call on line 94 doesn't pass the tenant context. This may cause the updated target to lose tenant association.
🐛 Proposed fix
     // should check for duplicacy and liveness (??)
     // add to the map
-    TARGETS.update(target.clone()).await?;
+    TARGETS.update(target.clone(), &tenant_id).await?;
src/alerts/alerts_utils.rs (1)

77-90: Tenant isolation gap: execute_remote_query does not receive tenant_id parameter.

The execute_local_query path explicitly receives and uses tenant_id for stream creation and query execution (lines 101, 112), but execute_remote_query (line 84) is called without this parameter and does not propagate any tenant context to send_query_request. The Query struct serialized to the remote querier contains no tenant information. If Prism mode requires tenant isolation, either:

Add tenant_id parameter to execute_remote_query and include it in the Query struct or HTTP request, or

Verify that tenant context is derived from the Authorization header on the remote side and document this assumption.

src/handlers/http/ingest.rs (1)

426-445: Pass tenant context through the unchecked event path.

push_logs_unchecked and append_temporary_events hardcode tenant_id: None, but the calling context in airplane.rs has access to tenant information via the key (SessionKey) parameter. Extract tenant_id using get_tenant_id_from_key(&key) and thread it through both functions to maintain consistency with the normal ingest flow.
src/handlers/http/modal/ingest/ingestor_rbac.rs (2)
189-213: Metadata persisted before password hash is updated.

Line 198 calls put_staging_metadata before the password hash is actually updated in the metadata (lines 199-211). This means the old password hash is persisted instead of the new one.
🐛 Proposed fix: Move persistence after the mutation
 pub async fn post_gen_password(
     req: HttpRequest,
     username: web::Path<String>,
 ) -> Result<HttpResponse, RBACError> {
     let username = username.into_inner();
     let tenant_id = get_tenant_id_from_request(&req);
     let mut new_hash = String::default();
     let mut metadata = get_metadata(&tenant_id).await?;

-    let _ = storage::put_staging_metadata(&metadata, &tenant_id);
     if let Some(user) = metadata
         .users
         .iter_mut()
         .filter_map(|user| match user.ty {
             user::UserType::Native(ref mut user) => Some(user),
             _ => None,
         })
         .find(|user| user.username == username)
     {
         new_hash.clone_from(&user.password_hash);
     } else {
         return Err(RBACError::UserDoesNotExist);
     }
+    let _ = storage::put_staging_metadata(&metadata, &tenant_id);
     Users.change_password_hash(&username, &new_hash, &tenant_id);
     Ok(HttpResponse::Ok().status(StatusCode::OK).finish())
 }
98-107: Use tenant_id to access the nested roles HashMap.

The roles().get(r) calls at lines 101 and 145 (in remove_roles_from_user) incorrectly attempt to look up role names directly. The roles() function returns HashMap<tenant_id, HashMap<role_name, privileges>>, so the lookup must first access by tenant_id. Both functions have tenant_id available from the request but don't use it:

Change:
if roles().get(r).is_none()
To:
if roles().get(&tenant_id).and_then(|r_map| r_map.get(r)).is_none()
This mirrors the pattern used throughout the codebase (e.g., src/rbac/utils.rs, src/rbac/mod.rs).
src/users/dashboards.rs (1)
244-268: Critical: Dashboard creation silently fails for new tenants.

If dashboards.get_mut(tenant) returns None (tenant doesn't exist in the map), the function returns Ok(()) without creating the dashboard. This is a logic error — new tenants would never be able to create dashboards.
     pub async fn create(
         &self,
         user_id: &str,
         dashboard: &mut Dashboard,
         tenant_id: &Option<String>,
     ) -> Result<(), DashboardError> {
         dashboard.created = Some(Utc::now());
         dashboard.set_metadata(user_id, None);
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
         let mut dashboards = self.0.write().await;

-        if let Some(dbs) = dashboards.get_mut(tenant) {
-            let has_duplicate = dbs
-                .iter()
-                .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id);
-            if has_duplicate {
-                return Err(DashboardError::Metadata("Dashboard title must be unique"));
-            }
-            self.save_dashboard(dashboard, tenant_id).await?;
-
-            dbs.push(dashboard.clone());
+        let dbs = dashboards.entry(tenant.to_owned()).or_default();
+        let has_duplicate = dbs
+            .iter()
+            .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id);
+        if has_duplicate {
+            return Err(DashboardError::Metadata("Dashboard title must be unique"));
         }
+        self.save_dashboard(dashboard, tenant_id).await?;
+        dbs.push(dashboard.clone());

         Ok(())
     }
src/handlers/http/rbac.rs (1)
128-136: Role existence checks are not tenant-aware.

The roles().contains_key(role) checks query the global roles map without tenant scoping. In a multi-tenant system, this could allow:

Validating against roles from other tenants

Assigning roles that exist in another tenant but not in the user's tenant

Consider using tenant-scoped role lookups:
-    for role in &user_roles {
-        if !roles().contains_key(role) {
-            non_existent_roles.push(role.clone());
-        }
-    }
+    let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
+    for role in &user_roles {
+        if !roles().get(tenant).map_or(true, |r| r.contains_key(role)) {
+            non_existent_roles.push(role.clone());
+        }
+    }
Also applies to: 322-333, 378-389
src/rbac/user.rs (1)

153-164: Use standard SaltString::generate(&mut OsRng) instead of custom salt generation.

RFC 9106 (Argon2 specification) recommends 16 bytes of salt; this implementation uses 32 bytes. While the custom approach with SaltString::encode_b64 is technically compatible with Argon2, it's unnecessarily complex and deviates from the specification without clear justification. The commented-out standard approach (SaltString::generate(&mut OsRng)) handles salt generation correctly and should be used instead for consistency with best practices.
src/catalog/mod.rs (1)
397-490: Avoid failing snapshot/retention flows if the stream isn’t in memory.
Both create_manifest() and remove_manifest_from_snapshot() can error out on PARSEABLE.get_stream(...)?, which can break cleanup on nodes that haven’t loaded that stream. Prefer best-effort in-memory updates, and keep storage updates authoritative.
Proposed fix (best-effort in-memory updates)
- let mut first_event_at = PARSEABLE
-     .get_stream(stream_name, tenant_id)?
-     .get_first_event();
+ let mut first_event_at = PARSEABLE
+     .get_stream(stream_name, tenant_id)
+     .ok()
+     .and_then(|s| s.get_first_event());

  ...
- match PARSEABLE.get_stream(stream_name, tenant_id) {
-     Ok(stream) => stream.set_first_event_at(first_event_at.as_ref().unwrap()),
-     Err(err) => error!(...),
- }
+ if let Some(first_event_at) = first_event_at.as_deref()
+     && let Ok(stream) = PARSEABLE.get_stream(stream_name, tenant_id)
+ {
+     stream.set_first_event_at(first_event_at);
+ }

 // remove_manifest_from_snapshot():
- PARSEABLE.get_stream(stream_name, tenant_id)?.reset_first_event_at();
+ if let Ok(stream) = PARSEABLE.get_stream(stream_name, tenant_id) {
+     stream.reset_first_event_at();
+ }
Also applies to: 492-527
src/parseable/streams.rs (1)

1188-1725: Tests need updates for new Stream::new(..., tenant_id) + local_stream_data_path(..., tenant_id) signatures.
As written, the test module still uses the old function arity and will fail to compile.
src/rbac/map.rs (1)
201-306: Sessions.user_sessions indexing is inconsistent (will reduce to “always not found”).
track_new() writes user_sessions[user][tenant], but is_session_expired() / remove_session() / remove_user() / remove_expired_session() read it as user_sessions[tenant][user]. Also, remove_expired_session() keeps expired sessions (expiry < now).
Proposed fix (align to user → tenant → sessions, and correct expiry retention)
 pub fn is_session_expired(&self, key: &SessionKey) -> bool {
     let (userid, tenant_id) = if let Some((user, tenant_id, _)) = self.active_sessions.get(key) {
         (user, tenant_id)
     } else {
         return false;
     };

-    let session = if let Some(tenant_sessions) = self.user_sessions.get(tenant_id)
-        && let Some(session) = tenant_sessions.get(userid)
-    {
-        session
-    } else {
-        return false;
-    };
+    let session = self
+        .user_sessions
+        .get(userid)
+        .and_then(|m| m.get(tenant_id));
+    let Some(session) = session else { return false };

     session
         .par_iter()
         .find_first(|(sessionid, expiry)| sessionid.eq(key) && expiry < &Utc::now())
         .is_some()
 }

 pub fn track_new(
     &mut self,
     user: String,
     key: SessionKey,
     expiry: DateTime<Utc>,
     permissions: Vec<Permission>,
     tenant_id: &Option<String>,
 ) {
     let tenant_id = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
     self.remove_expired_session(&user, tenant_id);

-    let sessions = self.user_sessions.entry(user.clone()).or_default();
-    sessions.insert(tenant_id.to_owned(), vec![(key.clone(), expiry)]);
+    self.user_sessions
+        .entry(user.clone())
+        .or_default()
+        .entry(tenant_id.to_owned())
+        .or_default()
+        .push((key.clone(), expiry));

     self.active_sessions
         .insert(key, (user, tenant_id.to_string(), permissions));
 }

 pub fn remove_session(&mut self, key: &SessionKey) -> Option<String> {
     let (user, tenant_id, _) = self.active_sessions.remove(key)?;
-    if let Some(tenant_sessions) = self.user_sessions.get_mut(&tenant_id)
-        && let Some(sessions) = tenant_sessions.get_mut(&user)
+    if let Some(user_sessions) = self.user_sessions.get_mut(&user)
+        && let Some(sessions) = user_sessions.get_mut(&tenant_id)
     {
         sessions.retain(|(session, _)| session != key);
         Some(user)
     } else {
         None
     }
 }

 pub fn remove_user(&mut self, username: &str, tenant_id: &str) {
-    tracing::warn!("removing user- {username}, tenant_id- {tenant_id}");
-    tracing::warn!("active sessions- {:?}", self.active_sessions);
-    tracing::warn!("user sessions- {:?}", self.user_sessions);
-    let sessions = if let Some(tenant_sessions) = self.user_sessions.get_mut(tenant_id) {
-        tenant_sessions.remove(username)
-    } else {
-        None
-    };
+    let sessions = self
+        .user_sessions
+        .get_mut(username)
+        .and_then(|m| m.remove(tenant_id));

     if let Some(sessions) = sessions {
         sessions.into_iter().for_each(|(key, _)| {
             self.active_sessions.remove(&key);
         })
     }
 }

 fn remove_expired_session(&mut self, user: &str, tenant_id: &str) {
     let now = Utc::now();

-    let sessions = if let Some(tenant_sessions) = self.user_sessions.get_mut(tenant_id)
-        && let Some(sessions) = tenant_sessions.get_mut(user)
-    {
-        sessions
-    } else {
-        return;
-    };
-    sessions.retain(|(_, expiry)| expiry < &now);
+    let Some(sessions) = self
+        .user_sessions
+        .get_mut(user)
+        .and_then(|m| m.get_mut(tenant_id))
+    else {
+        return;
+    };
+    // keep only non-expired
+    sessions.retain(|(_, expiry)| expiry >= &now);
 }
src/storage/object_storage.rs (1)
1149-1182: Inconsistent tenant_id handling across path builder functions.

schema_path(), stream_json_path(), and manifest_path() include empty string segments when tenant_id is None, whereas alert_json_path() and mttr_json_path() in the same file use conditional logic to omit the tenant segment entirely. Standardize all path builders to conditionally include tenant only when present, matching the established pattern.
Proposed fix (conditional segments)
 pub fn schema_path(stream_name: &str, tenant_id: &Option<String>) -> RelativePathBuf {
-    let tenant = tenant_id.as_ref().map_or("", |v| v);
+    let tenant = tenant_id.as_deref();

     if PARSEABLE.options.mode == Mode::Ingest {
         ...
-        RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        if let Some(tenant) = tenant {
+            RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        } else {
+            RelativePathBuf::from_iter([stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        }
     } else {
-        RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, SCHEMA_FILE_NAME])
+        if let Some(tenant) = tenant {
+            RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, SCHEMA_FILE_NAME])
+        } else {
+            RelativePathBuf::from_iter([stream_name, STREAM_ROOT_DIRECTORY, SCHEMA_FILE_NAME])
+        }
     }
 }

 pub fn stream_json_path(stream_name: &str, tenant_id: &Option<String>) -> RelativePathBuf {
-    let tenant = tenant_id.as_ref().map_or("", |v| v);
+    let tenant = tenant_id.as_deref();

     if PARSEABLE.options.mode == Mode::Ingest {
         ...
-        RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        if let Some(tenant) = tenant {
+            RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        } else {
+            RelativePathBuf::from_iter([stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        }
     } else {
-        RelativePathBuf::from_iter([
-            tenant,
-            stream_name,
-            STREAM_ROOT_DIRECTORY,
-            STREAM_METADATA_FILE_NAME,
-        ])
+        if let Some(tenant) = tenant {
+            RelativePathBuf::from_iter([
+                tenant,
+                stream_name,
+                STREAM_ROOT_DIRECTORY,
+                STREAM_METADATA_FILE_NAME,
+            ])
+        } else {
+            RelativePathBuf::from_iter([
+                stream_name,
+                STREAM_ROOT_DIRECTORY,
+                STREAM_METADATA_FILE_NAME,
+            ])
+        }
     }
 }

 pub fn manifest_path(prefix: &str, tenant_id: &Option<String>) -> RelativePathBuf {
-    let tenant = tenant_id.as_ref().map_or("", |v| v);
+    let tenant = tenant_id.as_deref();
     ...
-    RelativePathBuf::from_iter([tenant, prefix, &manifest_file_name])
+    if let Some(tenant) = tenant {
+        RelativePathBuf::from_iter([tenant, prefix, &manifest_file_name])
+    } else {
+        RelativePathBuf::from_iter([prefix, &manifest_file_name])
+    }
 }
src/metastore/metastores/object_store_metastore.rs (5)
342-390: put_alert_state does not use tenant_id in path construction.

Similar to get_alert_state_entry, the tenant_id parameter is accepted but not used when calling alert_state_json_path at line 352.
Proposed fix
-        let path = alert_state_json_path(id);
+        let path = alert_state_json_path(id, tenant_id);
1028-1049: get_all_schemas does not use tenant_id in path construction.

The path is constructed as {stream_name}/{STREAM_ROOT_DIRECTORY} without tenant prefix, which would fetch schemas from the wrong location for tenant-scoped streams.
Proposed fix
     async fn get_all_schemas(
         &self,
         stream_name: &str,
         tenant_id: &Option<String>,
     ) -> Result<Vec<Schema>, MetastoreError> {
-        let path_prefix =
-            relative_path::RelativePathBuf::from(format!("{stream_name}/{STREAM_ROOT_DIRECTORY}"));
+        let path_prefix = if let Some(tenant) = tenant_id {
+            relative_path::RelativePathBuf::from(format!("{tenant}/{stream_name}/{STREAM_ROOT_DIRECTORY}"))
+        } else {
+            relative_path::RelativePathBuf::from(format!("{stream_name}/{STREAM_ROOT_DIRECTORY}"))
+        };
864-866: date_path in get_all_manifest_files doesn't include tenant prefix.

While root is correctly constructed with tenant prefix, the date_path on line 865 only uses stream_name without the tenant, which may cause incorrect path resolution.
Proposed fix
         for date in dates {
-            let date_path = object_store::path::Path::from(format!("{}/{}", stream_name, &date));
+            let date_path = object_store::path::Path::from(format!("{}/{}", root, &date));
             let resp = self.storage.list_with_delimiter(Some(date_path)).await?;
323-340: alert_state_json_path function signature must be updated to accept and use tenant_id.

The get_alert_state_entry, put_alert_state, and delete_alert_state methods accept tenant_id but don't use it when constructing paths. This breaks tenant isolation—different tenants can access and modify each other's alert states.

The root cause is that alert_state_json_path(alert_id: Ulid) doesn't accept tenant_id, unlike related functions such as alert_json_path and mttr_json_path which properly scope paths by tenant. The get_alert_states method correctly demonstrates the pattern by constructing tenant-scoped paths: {tenant}/.alerts/.

Update alert_state_json_path to accept tenant_id and include it in the path construction, similar to how alert_json_path handles tenants. Then update all callers to pass tenant_id.

392-403: Unused tenant_id parameter creates inconsistent behavior in delete/put methods.

Methods like delete_alert_state, delete_alert, delete_target, and others accept tenant_id but ignore it when constructing paths. However, the corresponding get_* methods use tenant_id to retrieve the same data (e.g., get_alert_states retrieves from [&tenant, ALERTS_ROOT_DIRECTORY] but delete_alert_state uses the tenant-independent alert_state_json_path(id)). This inconsistency creates cross-tenant data isolation risks.

For example:

Target struct has a tenant field, but get_object_path() doesn't include it, yet get_targets() retrieves from tenant-specific paths

AlertStateEntry is retrieved with tenant context in get_alert_states() but deleted without it in delete_alert_state()

The pattern affects: delete_alert, delete_alert_state, put_llmconfig, delete_llmconfig, put_dashboard, delete_dashboard, put_chat, delete_chat, put_filter, delete_filter, put_correlation, delete_correlation, put_target, delete_target.

Either remove the unused parameter from the method signature, or ensure the path construction includes tenant context consistently with how data is retrieved.

🤖 Fix all issues with AI agents

In @src/alerts/mod.rs:
- Around line 1244-1254: The update() method (and similarly update_state() and
update_notification_state()) currently ignores writes when
self.alerts.write().await.get_mut(tenant) returns None; change the logic to
ensure a tenant bucket is created when missing before inserting: acquire the
write lock on self.alerts and use an entry-or-insert pattern (or explicitly
insert a default bucket for tenant/DEFAULT_TENANT) so that
alerts.insert(*alert.get_id(), alert.clone_box()) always runs for first-time
tenants or racing initializations; apply the same fix to the other referenced
functions (update_state, update_notification_state) that use get_mut(tenant).

In @src/correlation.rs:
- Around line 203-206: The memory delete is removing from the outer map using
correlation.id (self.write().await.remove(&correlation.id)) which deletes a
tenant entry; instead, acquire the write lock, find the tenant's CorrelationMap
by tenant_id, and remove the correlation.id from that inner map (and optionally
remove the tenant key if the inner map becomes empty). Update the code that
follows PARSEABLE.metastore.delete_correlation to lookup
self.write().await.get_mut(&tenant_id) or equivalent, call
inner_map.remove(&correlation.id), and only fall back to removing the outer
entry if you explicitly intend to delete the tenant when its map is empty.
- Around line 140-144: create() persists correlations but only inserts into the
in-memory map if the tenant bucket already exists, so new tenant buckets are
never created and the in-memory cache is inconsistent; update the write lock
usage to use the HashMap entry API (e.g., on self.write().await) for tenant
(using tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v)) so you call
entry(tenant).or_insert_with(|| /* new map */) and then insert the correlation
(correlation.id / correlation.clone()) into that returned bucket; ensure this
change is applied where the current code checks if let Some(corrs) =
self.write().await.get_mut(tenant) and also verify get_correlation reads from
the same map.
- Around line 176-183: The current update() silently does nothing when the
tenant bucket is missing; change the in-memory update to use the map's entry API
instead of get_mut so a missing bucket is created and the correlation is
inserted. Specifically, replace the get_mut(tenant) branch with an entry for the
tenant (using DEFAULT_TENANT when tenant_id is None) that either returns the
existing corrs map or inserts a new empty map, then put updated_correlation.id
-> updated_correlation.clone() into that map (referencing tenant,
DEFAULT_TENANT, updated_correlation, and corrs.insert).

In @src/handlers/http/cluster/mod.rs:
- Around line 319-322: The sync_streams_with_ingestors call is dropping tenant
context: uncomment and restore the tenant_id parameter in
sync_streams_with_ingestors's signature and update its implementation to attach
tenant_id to the outgoing sync payload/headers, then propagate tenant_id from
put_stream (where it is extracted) into the call; also update
create_update_stream so it preserves and returns headers that include tenant_id
instead of discarding them. Mirror the pattern used by
sync_role_update_with_ingestors and fetch_stats_from_ingestors: add tenant_id
parameter to signatures, forward it through all callers, and ensure ingestor
requests include the tenant_id header/field so stream syncs honor tenant
isolation.

In @src/handlers/http/modal/ingest/ingestor_role.rs:
- Around line 46-52: The tenant validation is inverted: currently it rejects
when a non-default request tenant matches the payload tenant. Update the
condition in the block that uses get_tenant_id_from_request, req_tenant_id,
req_tenant, DEFAULT_TENANT and sync_req.tenant_id to reject only when the
request tenant is not the default AND does not equal the payload tenant (i.e.,
change the second check to a mismatch), and keep returning RoleError::Anyhow
with the same message in that case.

In @src/handlers/http/modal/ingest/mod.rs:
- Around line 27-36: The SyncRole struct currently has private fields which
break external access in ingestor_role.rs; either make the fields public (change
privileges: Vec<DefaultPrivilege> and tenant_id: String to pub privileges:
Vec<DefaultPrivilege> and pub tenant_id: String in SyncRole) or add accessor
methods (e.g., impl SyncRole { pub fn tenant_id(&self) -> &String {
&self.tenant_id } and pub fn privileges(&self) -> &Vec<DefaultPrivilege> {
&self.privileges } }) and update ingestor_role.rs to use those getters if
chosen.

In @src/handlers/http/modal/query/querier_logstream.rs:
- Around line 73-75: The delete_stream call on PARSEABLE.storage currently omits
tenant scoping and may delete across tenants; modify the deletion to use tenant
context by passing tenant_id to delete_stream (or build the tenant-scoped path
before calling it). Locate PARSEABLE.storage.delete_stream and change the
invocation to include tenant_id and stream_name (or compute path = [tenant_id,
stream_name, ...] and call delete_stream(path)) so the removal is scoped to the
requesting tenant.

In @src/handlers/http/oidc.rs:
- Around line 286-290: Existing OIDC user creation is omitting tenant context:
in the match over (existing_user, final_roles) replace the None tenant passed to
put_user with the extracted tenant_id so new users are created within the tenant
scope (i.e., call put_user(&user_id, roles, user_info, bearer, tenant_id)); if
leaving it intentionally unset, instead add a clear TODO with a tracking issue
reference next to the (None, roles) arm to avoid silent multi-tenant gaps;
update related comments to reflect the chosen approach and ensure
existing_user/lookups use the same tenant_id variable.

In @src/handlers/http/rbac.rs:
- Around line 147-148: The user is created without tenant context by calling
user::User::new_basic(username.clone(), None) which breaks tenant scoping;
change that call to pass the extracted tenant_id (e.g. Some(tenant_id.clone())
or wrap/convert tenant_id to the expected tenant type) so the new user is
affiliated with the correct tenant, and ensure the surrounding code imports/uses
tenant_id and matches the function signature of user::User::new_basic.

In @src/hottier.rs:
- Around line 100-116: The loop filtering logic incorrectly uses separate
inequality checks for stream and tenant; replace the dual checks so we skip only
when both the stream and tenant match the current ones. Concretely, in the
for-loop condition that currently uses stream != current_stream && tenant_id !=
*current_tenant_id, change it to exclude entries only when (stream ==
current_stream && tenant_id == *current_tenant_id) — e.g., use && !(stream ==
current_stream && tenant_id == *current_tenant_id) or equivalent — so
check_stream_hot_tier_exists(&stream, &tenant_id) && !(stream == current_stream
&& tenant_id == *current_tenant_id) before calling get_hot_tier and accumulating
into total_hot_tier_size/total_hot_tier_used_size.

In @src/metastore/metastores/object_store_metastore.rs:
- Around line 949-954: The put_manifest (and delete_manifest) implementation
uses manifest_path("", tenant_id) which creates a different hierarchy than
get_manifest_path that calls manifest_path(path.as_str(), tenant_id); update
put_manifest (and delete_manifest) to pass the full partition path into
manifest_path (i.e., build the partition_path string first via
partition_path(stream_name, lower_bound, upper_bound) and use that string as the
first argument to manifest_path) so manifest_path, get_manifest_path,
put_manifest, and delete_manifest all construct the same final object key
consistently.

In @src/parseable/mod.rs:
- Around line 1073-1109: The function load_tenants has an empty branch for
single-tenant mode and silently swallows a poisoned write lock; change the
tenant-meta handling so that when get_parseable_metadata returns Some(...) and
is_multi_tenant is false you return an error (e.g., "Found tenant directory
while not in multi-tenant mode") instead of doing nothing, and replace the last
block that currently does if let Ok(mut t) = self.tenants.write() { ... } else {
Ok(None) } with a match that on Ok(mut t) extends and returns Ok(Some(())) and
on Err(poison) converts the poisoned lock into an anyhow::Error and returns
Err(...) so callers always get an Err on real failures rather than Ok(None); use
the existing symbols load_tenants, PARSEABLE.metastore.get_parseable_metadata,
TENANT_METADATA.insert, and self.tenants.write() to locate and update the logic.

In @src/prism/logstream/mod.rs:
- Around line 71-73: The code has a hardcoded fallback—QueriedStats::default()
with a tracing::warn—that replaces the real query result; revert to using the
actual result by restoring the original assignment (replace the fake "let stats
= QueriedStats::default();" and warning with the real "let stats = stats?;" or
equivalent error-propagating handling), remove the debugging warn, and ensure
the enclosing function signature allows propagation of the error (or map the
error to the appropriate error type) so the real stats are returned instead of
defaults.

In @src/query/mod.rs:
- Around line 136-167: The registered per-tenant schemas (via
catalog.register_schema in create_session_context and GlobalSchemaProvider)
don’t give unqualified table names a way to resolve to the current tenant at
query time; fix by applying tenant-specific schema routing per request rather
than only at transform time: either (A) ensure SQL is rewritten to qualify table
names with the tenant (e.g., "tenant"."table") before parsing/execution, or (B)
set the session’s default schema/search path per request using the tenant_id
(update the SessionContext/SessionState before parsing/execution in the request
path that calls execute()) so unqualified names resolve to the tenant’s
registered schema; update code references create_session_context,
GlobalSchemaProvider, catalog.register_schema and the request/execution
entrypoint that passes tenant_id to apply the per-request default schema.

In @src/query/stream_schema_provider.rs:
- Around line 284-291: The borrow-of-temporary and unwrap are present here as in
get_hottier_execution_plan: stop passing a reference to a temporary format!
result and remove unwrap; construct an owned String for object_store_url (e.g.
let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
format!("file:///{tenant_id}/") } else { "file:///".to_string() }) and then call
ObjectStoreUrl::parse(&object_store_url) handling the Result (propagate with ?
or map_err to a descriptive error) before passing the parsed ObjectStoreUrl into
create_parquet_physical_plan; update the surrounding function signature to
return Result if needed.
- Around line 224-231: The code currently takes a reference to a temporary
String with &format!(...) and then calls ObjectStoreUrl::parse(...).unwrap(),
which risks a borrow-of-temporary and panics on invalid input; change to build
an owned String (e.g., let object_store_url_string = if let Some(tenant_id) =
self.tenant_id.as_ref() { format!("file:///{tenant_id}/") } else {
"file:///".to_string() }) and then call
ObjectStoreUrl::parse(&object_store_url_string) but handle the Result instead of
unwrap (propagate the error, return a Result, or map_err with a descriptive
error) before passing the parsed ObjectStoreUrl into
create_parquet_physical_plan so no temporary borrow or panic occurs; refer to
tenant_id, object_store_url_string, ObjectStoreUrl::parse, and
create_parquet_physical_plan.
- Around line 631-638: Replace the unwraps with proper error propagation in the
scan() flow: when building object_store_url use
glob_storage.store_url().join(tenant_id).map_err(|e|
DataFusionError::Execution(format!("joining tenant id into store URL failed:
{}", e)))? (or propagate with ? after mapping to DataFusionError), and when
converting to ObjectStoreUrl call
ObjectStoreUrl::parse(object_store_url).map_err(|e|
DataFusionError::Execution(format!("parsing ObjectStoreUrl failed: {}", e)))? so
the errors bubble up from the join and parse calls instead of panicking; keep
the call to self.create_parquet_physical_plan(...) but pass the parsed
ObjectStoreUrl result.

In @src/rbac/map.rs:
- Around line 62-104: Remove the sensitive debug logging that can leak BasicAuth
passwords: delete the tracing::warn! calls in users(), mut_users(), roles(), and
mut_roles() (remove the warn lines that print caller info and {by}). Also update
remove_user() (the code that logs active_sessions and user_sessions around lines
276-306) to stop printing SessionKey::BasicAuth passwords — either remove those
session logs or map/redact sessions before logging (e.g., log only session IDs,
types, or a redacted flag, not the password). Ensure no SessionKey or
Display/Debug impl reveals the password in any log emitted by remove_user(),
users(), mut_users(), roles(), or mut_roles().

In @src/users/filters.rs:
- Around line 128-133: The update method currently uses get_mut on self.0 and
silently drops filters when the tenant bucket doesn't exist; change update to
use the map's entry API so a bucket is created if missing (use
entry(tenant_key_or_DEFAULT_TENANT).or_default()), then remove any existing
filter with matching filter_id (retain) and push the cloned filter into that
bucket; reference the update function, DEFAULT_TENANT, self.0, retain, and push
when locating where to replace the get_mut logic.

🟡 Minor comments (7)

src/hottier.rs-596-603 (1)
596-603: Avoid unwrap() on hot_tier_file_path result - could panic on path errors.

hot_tier_file_path returns a Result and can fail (e.g., on invalid path conversion). Using unwrap() here could cause a panic and crash the service. Since this is an existence check, it should gracefully return false on path errors.

Also, remove the commented-out dead code (lines 597-600).
🐛 Proposed fix
 pub fn check_stream_hot_tier_exists(&self, stream: &str, tenant_id: &Option<String>) -> bool {
-    // let path = self
-    //     .hot_tier_path
-    //     .join(stream)
-    //     .join(STREAM_HOT_TIER_FILENAME);
-    let path = self.hot_tier_file_path(stream, tenant_id).unwrap();
-    PathBuf::from(path.to_string()).exists()
+    match self.hot_tier_file_path(stream, tenant_id) {
+        Ok(path) => PathBuf::from(path.to_string()).exists(),
+        Err(_) => false,
+    }
 }
src/utils/mod.rs-79-85 (1)
79-85: Potential panic on invalid UTF-8 header value.

tenant_value.to_str().unwrap() will panic if the tenant header contains non-UTF8 bytes. Consider handling the error gracefully.
🔧 Proposed fix
 pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> {
     if let Some(tenant_value) = req.headers().get("tenant") {
-        Some(tenant_value.to_str().unwrap().to_owned())
+        tenant_value.to_str().ok().map(|s| s.to_owned())
     } else {
         None
     }
 }
src/handlers/http/middleware.rs-167-177 (1)
167-177: Potential panic on invalid tenant_id header value.

HeaderValue::from_str(&tid).unwrap() will panic if tid contains characters that are invalid in HTTP headers (e.g., non-visible ASCII). Consider handling the error gracefully.
🔧 Proposed fix
         let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) {
             Ok((uid, tid)) => {
-                req.headers_mut().insert(
-                    HeaderName::from_static("tenant"),
-                    HeaderValue::from_str(&tid).unwrap(),
-                );
+                if let Ok(header_value) = HeaderValue::from_str(&tid) {
+                    req.headers_mut().insert(
+                        HeaderName::from_static("tenant"),
+                        header_value,
+                    );
+                }
                 Ok((uid, tid))
             }
             Err(e) => Err(e),
         };
src/rbac/map.rs-95-104 (1)

95-104: Minor: mut_roles() log message says mut_users.
Copy/paste typo makes debugging harder.

src/catalog/mod.rs-455-461 (1)

455-461: warn! for manifest path_url is too loud (and tenant-revealing).
This should likely be debug!/trace! (or removed).

src/storage/object_storage.rs-109-117 (1)

109-117: Remove/خفض warn! debug spam from the hot path.
These look like temporary diagnostics and will flood logs (and may expose tenant IDs and local paths). Prefer debug!/trace! behind a feature flag if needed.

Also applies to: 460-461, 964-973, 1078-1084, 1086-1116

src/metastore/metastore_traits.rs-172-182 (1)

172-182: Remove unused get_chats() method or align return type with other similar methods.

get_chats() returns DashMap<String, Vec<Bytes>> while all comparable methods (get_dashboards(), get_filters(), get_correlations(), get_keystones(), etc.) return HashMap. The DashMap choice is inconsistent with the pattern used throughout the trait. Additionally, this method is not called anywhere in the codebase, making it dead code. Either remove the method if no longer needed, or change the return type to HashMap to match other similar getters.

coderabbitai · 2026-01-13T04:26:28Z

src/alerts/mod.rs

    /// Update the in-mem vector of alerts
    async fn update(&self, alert: &dyn AlertTrait) {
-        self.alerts
-            .write()
-            .await
-            .insert(*alert.get_id(), alert.clone_box());
+        let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v);
+        if let Some(alerts) = self.alerts.write().await.get_mut(tenant) {
+            alerts.insert(*alert.get_id(), alert.clone_box());
+        }
+        // self.alerts
+        //     .write()
+        //     .await
+        //     .insert(*alert.get_id(), alert.clone_box());
    }


⚠️ Potential issue | 🟠 Major

Don’t silently drop alert updates when the tenant bucket doesn’t exist.
update(), the write-back in update_state(), and update_notification_state() only insert if get_mut(tenant) returns Some(_). For first-time tenants (or races with initialization), this can lose writes.

Proposed fix (always create tenant bucket)

async fn update(&self, alert: &dyn AlertTrait) { let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v); - if let Some(alerts) = self.alerts.write().await.get_mut(tenant) { - alerts.insert(*alert.get_id(), alert.clone_box()); - } + let mut guard = self.alerts.write().await; + guard + .entry(tenant.to_owned()) + .or_default() + .insert(*alert.get_id(), alert.clone_box()); } // In update_state() write-back block: { let mut write_access = self.alerts.write().await; let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v); - if let Some(alerts) = write_access.get_mut(tenant) { - alerts.insert(*alert.get_id(), alert.clone_box()); - } + write_access + .entry(tenant.to_owned()) + .or_default() + .insert(*alert.get_id(), alert.clone_box()); } // In update_notification_state(): - if let Some(alerts) = write_access.get_mut(tenant) { - alerts.insert(*alert.get_id(), alert.clone_box()); - } + write_access + .entry(tenant.to_owned()) + .or_default() + .insert(*alert.get_id(), alert.clone_box());

Also applies to: 1333-1342, 1347-1386, 1388-1404

🤖 Prompt for AI Agents

In @src/alerts/mod.rs around lines 1244 - 1254, The update() method (and similarly update_state() and update_notification_state()) currently ignores writes when self.alerts.write().await.get_mut(tenant) returns None; change the logic to ensure a tenant bucket is created when missing before inserting: acquire the write lock on self.alerts and use an entry-or-insert pattern (or explicitly insert a default bucket for tenant/DEFAULT_TENANT) so that alerts.insert(*alert.get_id(), alert.clone_box()) always runs for first-time tenants or racing initializations; apply the same fix to the other referenced functions (update_state, update_notification_state) that use get_mut(tenant).

coderabbitai · 2026-01-13T04:26:28Z

src/correlation.rs

+        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
        // Update in memory
-        self.write()
-            .await
-            .insert(correlation.id.to_owned(), correlation.clone());
+        if let Some(corrs) = self.write().await.get_mut(tenant) {
+            corrs.insert(correlation.id.to_owned(), correlation.clone());
+        }


⚠️ Potential issue | 🟠 Major

Correlation silently not added to memory when tenant bucket doesn't exist.

In create(), if the tenant bucket doesn't exist in the in-memory map (e.g., first correlation for a tenant after restart), the correlation is persisted to metastore but not added to the in-memory cache. This could cause inconsistencies where get_correlation fails even after successful creation.

🐛 Proposed fix using entry API

let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v); // Update in memory - if let Some(corrs) = self.write().await.get_mut(tenant) { - corrs.insert(correlation.id.to_owned(), correlation.clone()); - } + self.write() + .await + .entry(tenant.to_owned()) + .or_default() + .insert(correlation.id.to_owned(), correlation.clone());

🤖 Prompt for AI Agents

In @src/correlation.rs around lines 140 - 144, create() persists correlations but only inserts into the in-memory map if the tenant bucket already exists, so new tenant buckets are never created and the in-memory cache is inconsistent; update the write lock usage to use the HashMap entry API (e.g., on self.write().await) for tenant (using tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v)) so you call entry(tenant).or_insert_with(|| /* new map */) and then insert the correlation (correlation.id / correlation.clone()) into that returned bucket; ensure this change is applied where the current code checks if let Some(corrs) = self.write().await.get_mut(tenant) and also verify get_correlation reads from the same map.

coderabbitai · 2026-01-13T04:26:28Z

src/correlation.rs

+        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
        // Update in memory
-        self.write().await.insert(
-            updated_correlation.id.to_owned(),
-            updated_correlation.clone(),
-        );
+        if let Some(corrs) = self.write().await.get_mut(tenant) {
+            corrs.insert(
+                updated_correlation.id.to_owned(),
+                updated_correlation.clone(),
+            );
+        }


⚠️ Potential issue | 🟠 Major

Same issue: update() silently fails when tenant bucket doesn't exist.

Apply the same fix using the entry API to ensure the correlation is always inserted.

🐛 Proposed fix

let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v); // Update in memory - if let Some(corrs) = self.write().await.get_mut(tenant) { - corrs.insert( - updated_correlation.id.to_owned(), - updated_correlation.clone(), - ); - } + self.write() + .await + .entry(tenant.to_owned()) + .or_default() + .insert(updated_correlation.id.to_owned(), updated_correlation.clone());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);

// Update in memory

self.write().await.insert(

updated_correlation.id.to_owned(),

updated_correlation.clone(),

);

if let Some(corrs) = self.write().await.get_mut(tenant) {

corrs.insert(

updated_correlation.id.to_owned(),

updated_correlation.clone(),

);

}

let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);

// Update in memory

self.write()

.await

.entry(tenant.to_owned())

.or_default()

.insert(updated_correlation.id.to_owned(), updated_correlation.clone());

🤖 Prompt for AI Agents

In @src/correlation.rs around lines 176 - 183, The current update() silently does nothing when the tenant bucket is missing; change the in-memory update to use the map's entry API instead of get_mut so a missing bucket is created and the correlation is inserted. Specifically, replace the get_mut(tenant) branch with an entry for the tenant (using DEFAULT_TENANT when tenant_id is None) that either returns the existing corrs map or inserts a new empty map, then put updated_correlation.id -> updated_correlation.clone() into that map (referencing tenant, DEFAULT_TENANT, updated_correlation, and corrs.insert).

coderabbitai · 2026-01-13T04:26:28Z

src/correlation.rs

+        PARSEABLE.metastore.delete_correlation(&correlation, tenant_id).await?;

        // Delete from memory
        self.write().await.remove(&correlation.id);


⚠️ Potential issue | 🔴 Critical

Critical bug: remove operates on wrong map level, deletes entire tenant's correlations.

Line 206 calls self.write().await.remove(&correlation.id) which removes an entry from the outer HashMap<String, CorrelationMap> using correlation.id as the key. This is incorrect—it should remove the correlation from the inner CorrelationMap for the specific tenant. As written, this could corrupt the correlation store by deleting an unrelated tenant's data (if a tenant_id happens to match a correlation_id) or silently fail.

🐛 Proposed fix

// Delete from storage PARSEABLE.metastore.delete_correlation(&correlation, tenant_id).await?; // Delete from memory - self.write().await.remove(&correlation.id); + let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v.as_str()); + if let Some(corrs) = self.write().await.get_mut(tenant) { + corrs.remove(&correlation.id); + } Ok(())

🤖 Prompt for AI Agents

In @src/correlation.rs around lines 203 - 206, The memory delete is removing from the outer map using correlation.id (self.write().await.remove(&correlation.id)) which deletes a tenant entry; instead, acquire the write lock, find the tenant's CorrelationMap by tenant_id, and remove the correlation.id from that inner map (and optionally remove the tenant key if the inner map becomes empty). Update the code that follows PARSEABLE.metastore.delete_correlation to lookup self.write().await.get_mut(&tenant_id) or equivalent, call inner_map.remove(&correlation.id), and only fall back to removing the outer entry if you explicitly intend to delete the tenant when its map is empty.

src/handlers/http/cluster/mod.rs

coderabbitai · 2026-01-13T04:26:29Z

src/query/stream_schema_provider.rs

+        let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
+            &format!("file:///{tenant_id}/")
+        } else {
+            "file:///"
+        };
        self.create_parquet_physical_plan(
            execution_plans,
-            ObjectStoreUrl::parse("file:///").unwrap(),
+            ObjectStoreUrl::parse(object_store_url).unwrap(),


⚠️ Potential issue | 🟠 Major

Potential borrow-of-temporary issue and unhandled parse error.

The pattern &format!("file:///{tenant_id}/") creates a reference to a temporary String that may be dropped before use. This typically causes a compile error ("temporary value dropped while borrowed") or leads to fragile code.

The unwrap() on ObjectStoreUrl::parse() can panic if tenant_id contains characters that produce an invalid URL.

Suggested fix using owned String

- let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() { - &format!("file:///{tenant_id}/") - } else { - "file:///" - }; + let object_store_url = match self.tenant_id.as_ref() { + Some(tenant_id) => format!("file:///{tenant_id}/"), + None => "file:///".to_string(), + }; self.create_parquet_physical_plan( execution_plans, - ObjectStoreUrl::parse(object_store_url).unwrap(), + ObjectStoreUrl::parse(&object_store_url) + .map_err(|e| DataFusionError::Plan(format!("Invalid object store URL: {e}")))?,

🤖 Prompt for AI Agents

In @src/query/stream_schema_provider.rs around lines 224 - 231, The code currently takes a reference to a temporary String with &format!(...) and then calls ObjectStoreUrl::parse(...).unwrap(), which risks a borrow-of-temporary and panics on invalid input; change to build an owned String (e.g., let object_store_url_string = if let Some(tenant_id) = self.tenant_id.as_ref() { format!("file:///{tenant_id}/") } else { "file:///".to_string() }) and then call ObjectStoreUrl::parse(&object_store_url_string) but handle the Result instead of unwrap (propagate the error, return a Result, or map_err with a descriptive error) before passing the parsed ObjectStoreUrl into create_parquet_physical_plan so no temporary borrow or panic occurs; refer to tenant_id, object_store_url_string, ObjectStoreUrl::parse, and create_parquet_physical_plan.

coderabbitai · 2026-01-13T04:26:29Z

src/query/stream_schema_provider.rs

+        let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
+            &format!("file:///{tenant_id}/")
+        } else {
+            "file:///"
+        };
        self.create_parquet_physical_plan(
            execution_plans,
-            ObjectStoreUrl::parse("file:///").unwrap(),
+            ObjectStoreUrl::parse(object_store_url).unwrap(),


⚠️ Potential issue | 🟠 Major

Same borrow-of-temporary and unwrap issues as hot tier path.

This duplicates the problematic pattern from get_hottier_execution_plan. Apply the same fix using an owned String and proper error handling for ObjectStoreUrl::parse().

🤖 Prompt for AI Agents

In @src/query/stream_schema_provider.rs around lines 284 - 291, The borrow-of-temporary and unwrap are present here as in get_hottier_execution_plan: stop passing a reference to a temporary format! result and remove unwrap; construct an owned String for object_store_url (e.g. let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() { format!("file:///{tenant_id}/") } else { "file:///".to_string() }) and then call ObjectStoreUrl::parse(&object_store_url) handling the Result (propagate with ? or map_err to a descriptive error) before passing the parsed ObjectStoreUrl into create_parquet_physical_plan; update the surrounding function signature to return Result if needed.

coderabbitai · 2026-01-13T04:26:29Z

src/query/stream_schema_provider.rs

+        let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
+            glob_storage.store_url().join(tenant_id).unwrap()
+        } else {
+            glob_storage.store_url()
+        };
        self.create_parquet_physical_plan(
            &mut execution_plans,
-            ObjectStoreUrl::parse(glob_storage.store_url()).unwrap(),
+            ObjectStoreUrl::parse(object_store_url).unwrap(),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's look at the actual file and the specific lines cat -n src/query/stream_schema_provider.rs | sed -n '625,645p'

Repository: parseablehq/parseable

Length of output: 1004

🏁 Script executed:

#!/bin/bash # Find the store_url() method definition rg -n "fn store_url" --type rust -B 2 -A 5

Repository: parseablehq/parseable

Length of output: 2397

🏁 Script executed:

#!/bin/bash # Check ObjectStoreUrl type and its methods, particularly parse() and join() rg -n "ObjectStoreUrl" --type rust | head -30

Repository: parseablehq/parseable

Length of output: 1227

🏁 Script executed:

#!/bin/bash # Look for the actual implementation of join() on the store_url return type rg -n "pub fn join" --type rust -A 3

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

#!/bin/bash # Check the url crate's join() method behavior - look for examples in the codebase rg -n "\.join$" --type rust src/query/stream_schema_provider.rs -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 585

🏁 Script executed:

#!/bin/bash # Check how other code in this file handles ObjectStoreUrl::parse errors rg -n "ObjectStoreUrl::parse" --type rust -B 2 -A 2 src/query/stream_schema_provider.rs

Repository: parseablehq/parseable

Length of output: 735

🏁 Script executed:

#!/bin/bash # Look at the create_parquet_physical_plan signature to understand what type it expects rg -n "fn create_parquet_physical_plan" --type rust -A 8 src/query/stream_schema_provider.rs

Repository: parseablehq/parseable

Length of output: 435

🏁 Script executed:

#!/antml/bash # Check if there are any error handling patterns for url::Url operations elsewhere rg -n "store_url\($\.join" --type rust -B 3 -A 3

Repository: parseablehq/parseable

Length of output: 640

🏁 Script executed:

#!/bin/bash # Search for where tenant_id is assigned/validated in the scan method rg -n "tenant_id" --type rust src/query/stream_schema_provider.rs | head -20

Repository: parseablehq/parseable

Length of output: 1225

🏁 Script executed:

#!/bin/bash # Check the scan method signature and how tenant_id is provided rg -n "fn scan" --type rust -A 15 src/query/stream_schema_provider.rs | head -50

Repository: parseablehq/parseable

Length of output: 745

Handle error cases in URL construction instead of unwrapping.

The join() call on line 632 and ObjectStoreUrl::parse() on line 638 should propagate errors rather than panic, since the scan() method returns Result<Arc<dyn ExecutionPlan>, DataFusionError>. The parse() call is necessary here for type conversion from url::Url to ObjectStoreUrl, not redundant.

Suggested fix with error handling

let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() { - glob_storage.store_url().join(tenant_id).unwrap() + glob_storage.store_url().join(tenant_id) + .map_err(|e| DataFusionError::Plan(format!("Invalid tenant path: {e}")))? } else { glob_storage.store_url() }; self.create_parquet_physical_plan( &mut execution_plans, - ObjectStoreUrl::parse(object_store_url).unwrap(), + ObjectStoreUrl::parse(&object_store_url) + .map_err(|e| DataFusionError::Plan(format!("Invalid object store URL: {e}")))?,

🤖 Prompt for AI Agents

In @src/query/stream_schema_provider.rs around lines 631 - 638, Replace the unwraps with proper error propagation in the scan() flow: when building object_store_url use glob_storage.store_url().join(tenant_id).map_err(|e| DataFusionError::Execution(format!("joining tenant id into store URL failed: {}", e)))? (or propagate with ? after mapping to DataFusionError), and when converting to ObjectStoreUrl call ObjectStoreUrl::parse(object_store_url).map_err(|e| DataFusionError::Execution(format!("parsing ObjectStoreUrl failed: {}", e)))? so the errors bubble up from the join and parse calls instead of panicking; keep the call to self.create_parquet_physical_plan(...) but pass the parsed ObjectStoreUrl result.

src/rbac/map.rs

coderabbitai · 2026-01-13T04:26:29Z

src/users/filters.rs

+    pub async fn update(&self, filter: &Filter, tenant_id: &Option<String>) {
        let mut s = self.0.write().await;
-        s.retain(|f| f.filter_id != filter.filter_id);
-        s.push(filter.clone());
+        if let Some(filters) = s.get_mut(tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v)) {
+            filters.retain(|f| f.filter_id != filter.filter_id);
+            filters.push(filter.clone());
+        }


⚠️ Potential issue | 🟠 Major

Filter silently dropped when tenant bucket doesn't exist.

The update method only modifies existing tenant buckets via get_mut(). If the tenant entry doesn't exist in the map (e.g., when adding the first filter for a new tenant after a server restart), the filter will be silently dropped without insertion.

🐛 Proposed fix using entry API

pub async fn update(&self, filter: &Filter, tenant_id: &Option<String>) { let mut s = self.0.write().await; - if let Some(filters) = s.get_mut(tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v)) { - filters.retain(|f| f.filter_id != filter.filter_id); - filters.push(filter.clone()); - } + let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v.as_str()); + let filters = s.entry(tenant.to_owned()).or_default(); + filters.retain(|f| f.filter_id != filter.filter_id); + filters.push(filter.clone()); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub async fn update(&self, filter: &Filter, tenant_id: &Option<String>) {

let mut s = self.0.write().await;

s.retain(|f| f.filter_id != filter.filter_id);

s.push(filter.clone());

if let Some(filters) = s.get_mut(tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v)) {

filters.retain(|f| f.filter_id != filter.filter_id);

filters.push(filter.clone());

}

pub async fn update(&self, filter: &Filter, tenant_id: &Option<String>) {

let mut s = self.0.write().await;

let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v.as_str());

let filters = s.entry(tenant.to_owned()).or_default();

filters.retain(|f| f.filter_id != filter.filter_id);

filters.push(filter.clone());

}

🤖 Prompt for AI Agents

In @src/users/filters.rs around lines 128 - 133, The update method currently uses get_mut on self.0 and silently drops filters when the tenant bucket doesn't exist; change update to use the map's entry API so a bucket is created if missing (use entry(tenant_key_or_DEFAULT_TENANT).or_default()), then remove any existing filter with matching filter_id (retain) and push the cloned filter into that bucket; reference the update function, DEFAULT_TENANT, self.0, retain, and push when locating where to replace the get_mut logic.

coderabbitai

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (20)

src/migration/mod.rs (1)
499-504: Missing parent directory creation for tenant-specific staging path.

When tenant_id is Some, the path includes a tenant subdirectory (e.g., staging_dir/tenant_id/.parseable.json). If this tenant directory doesn't exist, OpenOptions::open() will fail with "No such file or directory" since create(true) only creates the file, not parent directories.
Suggested fix: create parent directory if needed
     } else {
         config.options.staging_dir().join(".parseable.json")
     };
+    if let Some(parent) = path.parent() {
+        std::fs::create_dir_all(parent)?;
+    }
     let mut file = OpenOptions::new()
         .create(true)
         .truncate(true)
         .write(true)
         .open(path)?;
src/handlers/http/rbac.rs (3)
128-136: Role existence check is not tenant-scoped.

The check roles().contains_key(role) at line 130 doesn't use tenant_id, but later role lookups (lines 218-224, 241-247) use tenant-scoped access via roles().get(tenant). This inconsistency could allow roles from other tenants to pass validation or cause false negatives.
🔧 Suggested fix
+    let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
     let mut non_existent_roles = Vec::new();
     for role in &user_roles {
-        if !roles().contains_key(role) {
+        if !roles().get(tenant).map_or(false, |r| r.contains_key(role)) {
             non_existent_roles.push(role.clone());
         }
     }
322-333: Role existence check not tenant-scoped (same issue as post_user).

Same issue as in post_user: the role existence check roles().contains_key(role) doesn't use tenant context, while subsequent lookups are tenant-scoped.
🔧 Suggested fix
+    let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
     let mut non_existent_roles = Vec::new();
 
     // check if the role exists
     for role in &roles_to_add {
-        if !roles().contains_key(role) {
+        if !roles().get(tenant).map_or(false, |r| r.contains_key(role)) {
             non_existent_roles.push(role.clone());
         }
     }
378-389: Role existence check not tenant-scoped (same pattern).

The remove_roles_from_user handler has the same tenant-scoping gap in role validation.
🔧 Suggested fix
+    let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
     let mut non_existent_roles = Vec::new();
 
     // check if the role exists
     for role in &roles_to_remove {
-        if !roles().contains_key(role) {
+        if !roles().get(tenant).map_or(false, |r| r.contains_key(role)) {
             non_existent_roles.push(role.clone());
         }
     }
src/handlers/http/cluster/mod.rs (1)
600-641: Unused HttpRequest parameter in sync_password_reset_with_ingestors.

The req: HttpRequest parameter is accepted but never used. If tenant context is needed, it should be extracted via get_tenant_id_from_request(&req). Otherwise, remove the unused parameter.
🔧 Suggested fix - either use it or remove it

Option 1 - Use it:
 pub async fn sync_password_reset_with_ingestors(
     req: HttpRequest,
     username: &str,
 ) -> Result<(), RBACError> {
     let username = username.to_owned();
+    let tenant_id = get_tenant_id_from_request(&req);
Option 2 - Remove if not needed:
 pub async fn sync_password_reset_with_ingestors(
-    req: HttpRequest,
     username: &str,
 ) -> Result<(), RBACError> {
src/alerts/alerts_utils.rs (1)

129-153: Pass auth_token to send_query_request or remove the unused parameter.

The auth_token parameter is accepted by execute_remote_query but always passes None to send_query_request instead (line 148). This causes the auth token from the caller to be ignored, falling back to the querier's token. Either convert and pass the auth_token as a HeaderMap to send_query_request, or remove the unused parameter if it's not needed.
src/storage/store_metadata.rs (1)
298-320: Add parent directory creation for tenant-scoped staging metadata.

When tenant_id is present, the path includes a tenant subdirectory (line 306). OpenOptions::open will fail with NotFound if the parent directory doesn't exist. The fix is to create the parent directory before opening the file, following the standard Rust pattern.

Note: create_dir_all is already imported at line 21; the implementation can safely call it.
Proposed fix
 pub fn put_staging_metadata(meta: &StorageMetadata, tenant_id: &Option<String>) -> io::Result<()> {
     let mut staging_metadata = meta.clone();
     staging_metadata.server_mode = PARSEABLE.options.mode;
     staging_metadata.staging = PARSEABLE.options.staging_dir().to_path_buf();
     let path = if let Some(tenant_id) = tenant_id.as_ref() {
-        PARSEABLE
+        let tenant_dir = PARSEABLE
             .options
             .staging_dir()
-            .join(tenant_id)
-            .join(PARSEABLE_METADATA_FILE_NAME)
+            .join(tenant_id);
+        create_dir_all(&tenant_dir)?;
+        tenant_dir.join(PARSEABLE_METADATA_FILE_NAME)
     } else {
         PARSEABLE
             .options
             .staging_dir()
             .join(PARSEABLE_METADATA_FILE_NAME)
     };
src/catalog/mod.rs (2)
529-548: Retention cleanup request does not propagate tenant_id to ingestors.

The for_each_live_node call sends retention cleanup requests without including the tenant_id. In a multi-tenant setup, this could cause ingestors to delete data from the wrong tenant or fail to scope the cleanup correctly.

Consider passing tenant_id to the closure and including it in the cleanup request URL or payload.
+    let tenant_for_closure = tenant_id.clone();
     for_each_live_node(move |ingestor| {
         let stream_name = stream_name_clone.clone();
         let dates = dates_clone.clone();
+        let tenant_id = tenant_for_closure.clone();
         async move {
             let url = format!(
-                "{}{}/logstream/{}/retention/cleanup",
+                "{}{}/logstream/{}/retention/cleanup?tenant_id={}",
                 ingestor.domain_name,
                 base_path_without_preceding_slash(),
-                stream_name
+                stream_name,
+                tenant_id.as_deref().unwrap_or("")
             );
556-569: Inconsistent tenant_id handling pattern.

The partition_path function uses map_or("", |v| v) to handle the optional tenant_id, but this deviates from the established pattern in the same codebase. Functions like alert_json_path (line 1209) and alert_config_mttr_json_path (line 1244) explicitly use if let Some(tenant_id) to conditionally build paths without empty segments.

When tenant_id is None, passing an empty string to from_iter is inconsistent with similar functions and less explicit about intent. Align with the established pattern:
Proposed fix
 pub fn partition_path(
     stream: &str,
     lower_bound: DateTime<Utc>,
     upper_bound: DateTime<Utc>,
     tenant_id: &Option<String>,
 ) -> RelativePathBuf {
-    let root = tenant_id.as_ref().map_or("", |v| v);
     let lower = lower_bound.date_naive().format("%Y-%m-%d").to_string();
     let upper = upper_bound.date_naive().format("%Y-%m-%d").to_string();
-    if lower == upper {
-        RelativePathBuf::from_iter([root, stream, &format!("date={lower}")])
+    let date_segment = if lower == upper {
+        format!("date={lower}")
     } else {
-        RelativePathBuf::from_iter([root, stream, &format!("date={lower}:{upper}")])
+        format!("date={lower}:{upper}")
+    };
+    if let Some(tenant) = tenant_id {
+        RelativePathBuf::from_iter([tenant.as_str(), stream, &date_segment])
+    } else {
+        RelativePathBuf::from_iter([stream, &date_segment])
     }
 }
src/handlers/http/modal/ingest/ingestor_rbac.rs (2)
196-222: post_gen_password does not generate a new password.

This handler is documented as "Resets password for the user to a newly generated one" but it:

Calls put_staging_metadata at line 207 before any password change

Reads the existing password_hash from metadata at line 217 instead of generating a new one

Never calls Basic::gen_new_password() or similar

Compare with src/handlers/http/modal/query/querier_rbac.rs lines 302-334 which correctly generates a new password using user::Basic::gen_new_password().
🐛 Proposed fix
 pub async fn post_gen_password(
     req: HttpRequest,
     username: web::Path<String>,
 ) -> Result<HttpResponse, RBACError> {
     let username = username.into_inner();
     let tenant_id = get_tenant_id_from_request(&req);
-    let mut new_hash = String::default();
     let mut metadata = get_metadata(&tenant_id).await?;
 
-    let _ = storage::put_staging_metadata(&metadata, &tenant_id);
+    let user::PassCode { password, hash } = user::Basic::gen_new_password();
     if let Some(user) = metadata
         .users
         .iter_mut()
         .filter_map(|user| match user.ty {
             user::UserType::Native(ref mut user) => Some(user),
             _ => None,
         })
         .find(|user| user.username == username)
     {
-        new_hash.clone_from(&user.password_hash);
+        user.password_hash.clone_from(&hash);
     } else {
         return Err(RBACError::UserDoesNotExist);
     }
-    Users.change_password_hash(&username, &new_hash, &tenant_id);
-    Ok(HttpResponse::Ok().status(StatusCode::OK).finish())
+    let _ = storage::put_staging_metadata(&metadata, &tenant_id);
+    Users.change_password_hash(&username, &hash, &tenant_id);
+    Ok(HttpResponse::Ok().status(StatusCode::OK).body(password))
 }
107-117: Role existence checks not tenant-scoped.

Both add_roles_to_user (line 110) and remove_roles_from_user (line 154) check role existence using roles().get(r).is_none() without tenant scoping. This could allow referencing roles from other tenants.

Also applies to: 151-161
src/handlers/http/query.rs (1)
82-110: Tenant-scoped schema not configured in get_records_and_fields.

The query() function (lines 121-125) configures the session state with a tenant-aware default schema, but get_records_and_fields obtains the session state without this configuration. Since into_query() relies on session_state.config_options().catalog.default_schema for schema resolution, queries executed through get_records_and_fields may resolve to the wrong schema in a multi-tenant environment.

The function has access to tenant_id but does not apply it to the session configuration. Apply the same pattern:
Suggested fix
 pub async fn get_records_and_fields(
     query_request: &Query,
     creds: &SessionKey,
     tenant_id: &Option<String>,
 ) -> Result<(Option<Vec<RecordBatch>>, Option<Vec<String>>), QueryError> {
-    let session_state = QUERY_SESSION.get_ctx().state();
+    let mut session_state = QUERY_SESSION.get_ctx().state();
+    session_state
+        .config_mut()
+        .options_mut()
+        .catalog
+        .default_schema = tenant_id.as_ref().map_or("public".into(), |v| v.to_owned());
src/handlers/http/modal/query/querier_rbac.rs (1)
60-68: Add tenant-scoped lookup for role existence check.

The roles().contains_key(role) and roles().get(r) checks query the outer HashMap level (checking for tenant_id keys) instead of the inner level where role names are stored. In a multi-tenant setup, this allows users to assign non-existent roles without validation.

The data structure is HashMap<String, HashMap<String, Vec<DefaultPrivilege>>> where the outer key is tenant_id. The correct pattern, already used elsewhere in the codebase (e.g., src/rbac/map.rs:478), is:
if let Some(roles) = roles().get(&tenant_id)
    && let Some(privileges) = roles.get(role_name)
{
    // role exists for this tenant
}
Fix this in:

post_user() at line 62

add_roles_to_user() at line 197

remove_roles_from_user() at line 257

The tenant_id is available in all these functions via get_tenant_id_from_request(&req). This same issue also exists in src/handlers/http/modal/ingest/ingestor_rbac.rs.
src/hottier.rs (2)
208-220: delete_hot_tier ignores tenant_id (can delete wrong directory / leave tenant data behind)

You’re scoping the metadata file under {hot_tier_path}/{tenant}/{stream}/.hot_tier.json, but deletion still uses {hot_tier_path}/{stream}. In multi-tenant this can delete the wrong tree (or fail to delete the right one).
Proposed fix
 pub async fn delete_hot_tier(
     &self,
     stream: &str,
     tenant_id: &Option<String>,
 ) -> Result<(), HotTierError> {
     if !self.check_stream_hot_tier_exists(stream, tenant_id) {
         return Err(HotTierValidationError::NotFound(stream.to_owned()).into());
     }
-    let path = self.hot_tier_path.join(stream);
+    let path = if let Some(t) = tenant_id.as_ref() {
+        self.hot_tier_path.join(t).join(stream)
+    } else {
+        self.hot_tier_path.join(stream)
+    };
     fs::remove_dir_all(path).await?;
 
     Ok(())
 }
186-206: Tenant-scoped metadata storage vs. non-tenant-aware local traversal is inconsistent and breaks hot tier operations

hot_tier_file_path() is tenant-aware and stores metadata with tenant prefix. However, manifest files downloaded via process_manifest() have file_path that includes the tenant prefix (from object store path), so they download to {hot_tier_path}/{tenant}/{stream}/date=.../.... But retrieval and cleanup functions (fetch_hot_tier_dates(), get_stream_path_for_date(), get_oldest_date_time_entry(), delete_hot_tier()) only join stream without tenant, looking for files at {hot_tier_path}/{stream}/.... This mismatch prevents cleanup and oldest-date calculation from finding files, and risks cross-tenant collisions when multiple tenants share the same stream name.

Affected locations:

fetch_hot_tier_dates() (line 473): should include tenant when constructing paths

get_stream_path_for_date() (line 529): should include tenant

delete_hot_tier() (line 216): should include tenant

get_oldest_date_time_entry() (line 708): inherits tenant issue via fetch_hot_tier_dates
src/parseable/streams.rs (2)
117-137: Update tests (and any call sites) for the new tenant_id parameter and nested map shape

Stream::new(..., tenant_id) and Streams::get_or_create(..., tenant_id) changed signatures, but the tests still call the old arity and still assume Streams is a flat HashMap<stream_name, ...>. As-is, unit tests won’t compile / assertions won’t match.
Example pattern to apply across tests
 let options = Arc::new(Options::default());
 let staging = Stream::new(
     options.clone(),
     stream_name,
     LogStreamMetadata::default(),
     None,
+    &None,
 );

 assert_eq!(
     staging.data_path,
-    options.local_stream_data_path(stream_name)
+    options.local_stream_data_path(stream_name, &None)
 );
And for Streams assertions (new nested map):
 let guard = streams.read().expect("Failed to acquire read lock");
-assert!(guard.contains_key(stream_name));
+assert!(guard
+    .get(DEFAULT_TENANT)
+    .is_some_and(|m| m.contains_key(stream_name)));
Also applies to: 1200-1725

1046-1078: Remove/downgrade tracing::warn! that logs full metadata/options in get_or_create

This will be extremely noisy and may leak sensitive config (and potentially user-related metadata) into logs. This should be trace!/debug! at most, and avoid dumping structs.
Proposed fix
-        tracing::warn!(
-            "get_or_create\nstream- {stream_name}\ntenant- {tenant_id:?}\nmetadata- {metadata:?}\noptions- {options:?}"
-        );
+        tracing::debug!(stream_name = %stream_name, tenant_id = ?tenant_id, "streams.get_or_create");
 
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
src/storage/object_storage.rs (1)

618-710: Remove/downgrade warn-level debug logging in hot paths

There are multiple tracing::warn! that look like debug leftovers (e.g., logging every parquet path, logging fetched schema). This will spam production logs and can leak internal paths. Prefer trace!/debug! with minimal fields.

Also applies to: 925-947

src/metastore/metastores/object_store_metastore.rs (2)

342-390: put_alert_state also ignores tenant_id - data isolation issue.

Like get_alert_state_entry, this method accepts tenant_id but constructs the path without it (line 352). Combined with get_alert_states() which filters by tenant path (line 302), this creates a data isolation issue where alert states may be written globally but read tenant-scoped.

323-340: tenant_id parameter is unused across all alert state methods - inconsistent with tenant-scoped get_alert_states().

The tenant_id parameter is accepted but not used in get_alert_state_entry(), put_alert_state(), and delete_alert_state(). All three call alert_state_json_path() which constructs paths without tenant context (format: .alerts/alert_state_{alert_id}.json).

This conflicts with get_alert_states() (line 302), which constructs a tenant-scoped base path using RelativePathBuf::from_iter([&tenant, ALERTS_ROOT_DIRECTORY]).

Fix: Update alert_state_json_path() to accept and use tenant_id as a path component, or remove the tenant_id parameters from the trait methods if alert states are intentionally global. Ensure consistency across all four alert state methods.

🤖 Fix all issues with AI agents

In `@src/handlers/http/middleware.rs`:
- Around line 168-178: The code uses HeaderValue::from_str(&tid).unwrap() inside
the match for get_user_and_tenant_from_request, which can panic for invalid
header characters; replace the unwrap with proper error handling: call
HeaderValue::from_str(&tid) and match or use map_err to convert the header error
into the existing Err branch (or log and skip inserting the header), then only
call req.headers_mut().insert(...) on Ok(val). Update the user_and_tenant_id
assignment so failures to construct the HeaderValue return an Err (propagated)
or a controlled fallback instead of panicking, referencing
get_user_and_tenant_from_request, HeaderValue::from_str,
req.headers_mut().insert and user_and_tenant_id.
- Around line 309-320: check_suspension currently treats missing or unknown
tenants as Authorized; change it to reject those cases: in function
check_suspension, when the tenant header is missing or tenant.to_str() fails
return rbac::Response::Unauthorized (or another appropriate denial variant)
instead of rbac::Response::Authorized, and in the branch where
TENANT_METADATA.is_action_suspended returns Ok(None) (the "tenant does not
exist" case) return rbac::Response::Unauthorized rather than falling through to
Authorized; keep the existing Suspended return when an actual suspension is
found and optionally add a short debug log mentioning the tenant value on
unauthorized paths.

In `@src/handlers/http/modal/ingest/ingestor_rbac.rs`:
- Around line 52-58: The tenant validation in ingestor_rbac.rs is inverted:
change the condition that currently returns an error when req_tenant equals the
requester's tenant to instead return an error when a non-super-admin (req_tenant
!= DEFAULT_TENANT) is trying to create a user for a different tenant;
specifically, update the check that uses req_tenant, DEFAULT_TENANT and
user.tenant (as_ref().map_or(...)) so it tests for inequality (req_tenant !=
user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)) and then return
RBACError::Anyhow(...) when that inequality is true.

In `@src/handlers/http/modal/query/querier_rbac.rs`:
- Line 79: The call to user::User::new_basic uses None for the tenant, creating
users without tenant association; update the call in querier_rbac.rs to pass the
request's tenant_id (e.g., tenant_id.clone()) instead of None so the new user is
associated with the tenant (ensure you pass the same tenant_id variable used
elsewhere in this function when calling user::User::new_basic with username).

In `@src/handlers/http/oidc.rs`:
- Around line 132-162: The cluster sync currently treats any successful TCP
exchange as success because .send().await may return non-2xx responses; update
the closure inside for_each_live_node (the async block using
INTRA_CLUSTER_CLIENT.post(...).send().await) to call .error_for_status() on the
Response (e.g., let resp = INTRA_CLUSTER_CLIENT.post(...).send().await?;
resp.error_for_status()? ) and convert that into the closure Result so non-2xx
becomes Err; additionally catch and log per-node failures with identifying info
(node.domain_name or node.token) before returning Err so tracing shows which
node failed.
- Around line 227-228: get_tenant_id_from_request currently calls
tenant_value.to_str().unwrap(), which can panic on invalid UTF-8; change it to
handle the conversion failure and return None instead of panicking. Update
get_tenant_id_from_request(req: &HttpRequest) to check
req.headers().get("tenant") and call tenant_value.to_str().ok().map(|s|
s.to_owned()) (or equivalent) so malformed header values produce None rather
than causing a process panic.
- Around line 104-118: The basic-auth branch incorrectly uses
get_tenant_id_from_key(&session_key) which yields None for
SessionKey::BasicAuth; replace the tenant lookup inside the
SessionKey::BasicAuth arm to call get_tenant_id_from_request(&req) (or compute a
separate tenant_id_for_basic_auth = get_tenant_id_from_request(&req) before
calling Users.get_user) and pass that tenant_id to Users.get_user(&username,
&tenant_id_for_basic_auth); keep the existing tenant_id usage for non-basic-auth
branches and ensure you only switch tenant source for the SessionKey::BasicAuth
pattern.

In `@src/hottier.rs`:
- Around line 595-603: The helper check_stream_hot_tier_exists currently calls
self.hot_tier_file_path(stream, tenant_id).unwrap() which can panic; change
check_stream_hot_tier_exists to handle the Result/Option from hot_tier_file_path
without unwrapping (e.g., match or if let Ok(path) / Some(path) -> return
path.exists(); Err(_) / None -> return false), avoid unnecessary
to_string/PathBuf::from conversions and ensure any path conversion error or
missing tenant returns false rather than panicking.
- Around line 779-808: In create_pstats_hot_tier, change the StreamHotTier
initialization to use INTERNAL_STREAM_HOT_TIER_SIZE_BYTES (like pmeta) instead
of MIN_STREAM_HOT_TIER_SIZE_BYTES: set StreamHotTier.size and
StreamHotTier.available_size to INTERNAL_STREAM_HOT_TIER_SIZE_BYTES (leave
used_size at 0 and version/oldest_date_time_entry as-is), then call put_hot_tier
as before; this ensures pstats uses the internal-stream default rather than the
user-stream minimum.

In `@src/metastore/metastore_traits.rs`:
- Around line 108-112: get_alert_state_entry currently ignores tenant_id when
building the storage path; update this and the helper so alert state is
tenant-scoped. Change alert_state_json_path signature in object_storage.rs to
accept the tenant_id (e.g., &Option<String> or Option<&str>) and return a path
that includes tenant context (matching the pattern used by mttr_json_path), then
update calls: in object_store_metastore.rs modify get_alert_state_entry to call
alert_state_json_path(alert_id, tenant_id) (or the chosen arg order) and adjust
any other callers (e.g., get_all_alert_states) to use the new signature so all
alert state reads/writes are tenant-isolated. Ensure types/signatures line up
across trait and impls (get_alert_state_entry declaration, its implementation,
and alert_state_json_path).

In `@src/migration/mod.rs`:
- Around line 490-498: The path construction uses a hardcoded ".parseable.json"
instead of the established PARSEABLE_METADATA_FILE_NAME constant, causing
potential mismatches with get_staging_metadata; update the two occurrences to
use PARSEABLE_METADATA_FILE_NAME when building the path (in the branch that uses
tenant_id and the else branch), referencing tenant_id,
config.options.staging_dir(), and PARSEABLE_METADATA_FILE_NAME so the produced
path matches get_staging_metadata.
- Around line 168-199: The loop currently uses
PARSEABLE.metastore.list_streams(&tenant_id).await? which returns early on error
and skips remaining tenants; change this to handle errors per-tenant (e.g.,
match or if let Err(e) = ...) so failures from list_streams are logged/collected
and the loop continues, while successful list_streams still produce the stream
migration futures; keep the existing migration_stream(&stream_name, &*storage,
&id) handling and config.get_or_create_stream(&stream_name,
&id).set_metadata(...) logic unchanged, but ensure you aggregate or return a
composed error result after iterating all tenants instead of propagating
immediately from list_streams.

In `@src/parseable/mod.rs`:
- Around line 1116-1144: delete_tenant currently removes tenant data from
TENANT_METADATA, users and roles but never removes the tenant entry from
self.tenants, so list_tenants() still returns it; update delete_tenant to also
remove the tenant from self.tenants (e.g., by acquiring a mutable borrow of
self.tenants and calling remove(tenant_id) or filtering/retaining entries that
don't match tenant_id), ensuring you reference the same tenant_id string; keep
existing cleanup (mut_users(), Users.delete_user, mut_roles(),
TENANT_METADATA.delete_tenant) and perform the self.tenants removal before
returning Ok(()) so the in-memory tenant list and list_tenants() reflect the
deletion.
- Around line 1057-1076: The add_tenant method has a TOCTOU race: it does a
contains() under a read lock then pushes under a separate write lock; fix by
taking a single write lock once (let mut tenants =
self.tenants.write().unwrap()), perform the contains() check on that guard,
return Err if present, otherwise push the tenant_id and call
TENANT_METADATA.insert_tenant(tenant_id, tenant_meta) while still holding that
write lock so the check-and-insert is atomic.

In `@src/query/mod.rs`:
- Around line 86-121: create_session_context currently ignores errors from
catalog.register_schema (used around lines referenced) with `let _ = ...`;
update that to handle the Result and log any Err using the project's logging
facility (e.g., tracing::error! or log::error!), e.g. replace the `let _ =
catalog.register_schema(...)` with an `if let Err(e) =
catalog.register_schema(...) { error!("failed to register schema for tenant {}:
{:?}", tenant_id, e); }` pattern so startup schema registration failures are
visible; do not change the existing InMemorySessionContext::add_schema behavior
that uses .expect().

In `@src/query/stream_schema_provider.rs`:
- Around line 529-534: The logging call using tracing::warn! inside the scan
routine is too noisy for per-scan instrumentation; change it to a lower level
(tracing::debug! or tracing::trace!) so it doesn’t flood production logs—locate
the invocation that logs self.tenant_id, self.schema, and self.stream (the
tracing::warn! call in the scan path of the StreamSchemaProvider implementation)
and replace with tracing::debug! (or tracing::trace!) keeping the same message
and fields.
- Line 645: Remove or reduce the noisy warning by deleting or lowering the log
level of the tracing macro call
tracing::warn!(object_store_url=?object_store_url); — either remove it entirely
or change it to tracing::debug! or tracing::trace! (or guard it behind a
verbose/diagnostic flag) so the object_store_url is not logged as a warn on
every scan.

♻️ Duplicate comments (13)

src/query/stream_schema_provider.rs (3)

224-232: Tenant-aware URL construction is disabled; unwrap() remains risky.

The tenant-aware object store URL construction is commented out (lines 224-228), and the current code uses a hardcoded "file:///" with an unwrap() that could panic on parse failure. While ObjectStoreUrl::parse("file:///") is unlikely to fail, the pattern should handle errors gracefully.

When re-enabling tenant support, ensure proper error handling is added.

282-293: Same pattern: commented tenant URL and unwrap() on parse.

This duplicates the issue from get_hottier_exectuion_plan. The tenant-aware URL logic is commented out and unwrap() is used on parse.

639-648: Tenant-aware object store URL not yet implemented for remote storage.

The commented code (lines 639-643) shows the intended tenant-aware URL construction using glob_storage.store_url().join(tenant_id), but it's currently disabled. The active code uses glob_storage.store_url() directly without tenant scoping.

This means queries will not be properly tenant-isolated when reading from object storage. The past review comment about unwrap() on join() and parse() still applies when this is re-enabled.

src/rbac/map.rs (1)

142-168: SessionKey should not derive Debug due to password exposure.

The SessionKey enum (line 187) derives Debug with the BasicAuth variant containing plaintext passwords. The Sessions struct (line 193) also derives Debug and contains active_sessions: HashMap<SessionKey, ...>. If either struct were logged with {:?} formatting, credentials would leak—even though current logging in remove_user (lines 275-286) is commented out.

Either remove Debug from SessionKey or implement a custom Debug impl that redacts the password field to prevent accidental credential exposure if logging is later enabled.

src/handlers/http/rbac.rs (1)

147-148: User now created with tenant context - previous issue addressed.

The user creation now correctly passes tenant_id.clone() instead of None, ensuring proper tenant affiliation for new users.
src/handlers/http/cluster/mod.rs (2)
327-332: Tenant context must be propagated during stream synchronization.

The tenant_id parameter is commented out (line 331), meaning stream sync requests to ingestors/queriers won't include tenant context. This breaks tenant isolation in multi-tenant deployments.
🔧 Suggested fix
 pub async fn sync_streams_with_ingestors(
     headers: HeaderMap,
     body: Bytes,
     stream_name: &str,
-    // tenant_id: &Option<String>
+    tenant_id: &Option<String>,
 ) -> Result<(), StreamError> {
Then add the tenant header to the request:
+                    .header("tenant", tenant_id.clone().unwrap_or_default())
539-544: sync_user_creation missing tenant_id propagation.

Similar to stream sync, the tenant_id parameter is commented out. User creation sync requests won't include tenant context, breaking tenant isolation.
🔧 Suggested fix
 pub async fn sync_user_creation(
     user: User,
     role: &Option<HashSet<String>>,
-    // tenant_id: &str
+    tenant_id: &Option<String>,
 ) -> Result<(), RBACError> {
And add tenant header to the sync request.
src/handlers/http/modal/ingest/ingestor_role.rs (1)
46-52: Inverted tenant validation logic (previously flagged).

The condition req_tenant.ne(DEFAULT_TENANT) && (req_tenant.eq(&sync_req.tenant_id)) rejects requests when the request tenant matches the payload tenant, which is the opposite of the intended behavior based on the error message.

The second condition should use .ne() to check for a mismatch:
-    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.eq(&sync_req.tenant_id)) {
+    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.ne(&sync_req.tenant_id)) {
src/handlers/http/oidc.rs (1)
216-369: Fix tenantless OAuth user creation (put_user(..., None))

New OIDC users are currently persisted with tenant=None even though tenant_id is extracted from the request. That’s a multi-tenant isolation bug (and the inline comment suggests it’s knowingly incomplete).
Proposed fix (keep tenant_id available and pass it through)
-    let existing_user = find_existing_user(&user_info, tenant_id);
+    let existing_user = find_existing_user(&user_info, &tenant_id);

     let user = match (existing_user, final_roles) {
         (Some(user), roles) => update_user_if_changed(user, roles, user_info, bearer).await?,
-        // LET TENANT BE NONE FOR NOW!!!
-        (None, roles) => put_user(&user_id, roles, user_info, bearer, None).await?,
+        (None, roles) => put_user(&user_id, roles, user_info, bearer, tenant_id.clone()).await?,
     };
-fn find_existing_user(user_info: &user::UserInfo, tenant_id: Option<String>) -> Option<User> {
+fn find_existing_user(user_info: &user::UserInfo, tenant_id: &Option<String>) -> Option<User> {
     if let Some(sub) = &user_info.sub
-        && let Some(user) = Users.get_user(sub, &tenant_id)
+        && let Some(user) = Users.get_user(sub, tenant_id)
         && matches!(user.ty, UserType::OAuth(_))
     {
         return Some(user);
     }
     ...
 }
Also applies to: 499-529
src/hottier.rs (1)
92-119: Fix get_hot_tiers_size exclusion logic (and avoid moving current_tenant_id)

The current predicate stream != current_stream && tenant_id != *current_tenant_id both (a) excludes all other streams in the current tenant (undercount), and (b) won’t compile because *current_tenant_id moves an Option<String>. You want to skip only when both stream and tenant match.
Proposed fix
 for tenant_id in tenants {
     for stream in PARSEABLE.streams.list(&tenant_id) {
-        if self.check_stream_hot_tier_exists(&stream, &tenant_id)
-            && stream != current_stream
-            && tenant_id != *current_tenant_id
-        {
+        if self.check_stream_hot_tier_exists(&stream, &tenant_id)
+            && !(stream == current_stream
+                && tenant_id.as_deref() == current_tenant_id.as_deref())
+        {
             let stream_hot_tier = self.get_hot_tier(&stream, &tenant_id).await?;
-            total_hot_tier_size += &stream_hot_tier.size;
+            total_hot_tier_size += stream_hot_tier.size;
             total_hot_tier_used_size += stream_hot_tier.used_size;
         }
     }
 }
src/metastore/metastores/object_store_metastore.rs (1)
939-965: Manifest path construction inconsistency persists.

This was flagged in a previous review. The put_manifest and delete_manifest methods use:
partition_path(stream_name, lower_bound, upper_bound, tenant_id).join(&manifest_path("").to_string())
While get_manifest and get_manifest_path use:
manifest_path(partition_path(stream_name, lower_bound, upper_bound, tenant_id).as_str())
These approaches may produce different path structures depending on how manifest_path handles the prefix argument.
src/alerts/mod.rs (1)
1244-1254: Silent write drops persist for missing tenant buckets.

The update() method still silently drops alert updates when the tenant bucket doesn't exist. This was flagged in a previous review. The same issue affects update_state() (lines 1336-1340) and update_notification_state() (lines 1380-1382).

Use entry().or_default() pattern to ensure tenant buckets are created on first write:
 async fn update(&self, alert: &dyn AlertTrait) {
     let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v);
-    if let Some(alerts) = self.alerts.write().await.get_mut(tenant) {
-        alerts.insert(*alert.get_id(), alert.clone_box());
-    }
+    self.alerts
+        .write()
+        .await
+        .entry(tenant.to_owned())
+        .or_default()
+        .insert(*alert.get_id(), alert.clone_box());
 }
src/parseable/mod.rs (1)
1146-1182: Incomplete logic and silent lock failure in load_tenants.

This was flagged in a previous review:

Empty else branch (line 1168): The else if !is_multi_tenant { } does nothing - unclear what should happen for single-tenant mode with tenant directories.

Silent lock failure (lines 1176-1180): Returns Ok(None) on write lock failure instead of propagating the error, masking potential poisoned lock issues.
Suggested fix
-            } else if !is_multi_tenant {
-            } else {
+            } else if is_multi_tenant {
+                // Tenant directory without metadata is invalid in multi-tenant mode
                 return Err(anyhow::Error::msg(format!(
                     "Found invalid tenant directory with multi-tenant mode- {tenant_id}.\nExiting."
                 )));
             }
+            // Single-tenant mode: directories without .parseable.json are ignored
         }
 
-        if let Ok(mut t) = self.tenants.write() {
-            t.extend(dirs);
-            Ok(Some(()))
-        } else {
-            Ok(None)
-        }
+        let mut t = self.tenants.write().expect("tenants lock poisoned");
+        t.extend(dirs);
+        Ok(Some(()))

🧹 Nitpick comments (22)

src/migration/mod.rs (1)

473-483: Consider renaming to avoid confusion with store_metadata::put_remote_metadata.

This function has the same name as the one in src/storage/store_metadata.rs but accepts serde_json::Value instead of &StorageMetadata. While they serve different contexts (migration vs. normal operation), the naming overlap may cause confusion when importing or maintaining the code.

Consider renaming to something like put_remote_metadata_from_json or making it private since it's only used within this migration module.
src/tenants/mod.rs (3)
57-67: Silent no-op when tenant doesn't exist in suspend/resume operations.

Both suspend_service and resume_service silently do nothing if the tenant doesn't exist. This could mask configuration errors or race conditions where a tenant was deleted but suspension operations are still being attempted.

Consider returning a Result<(), TenantNotFound> to allow callers to handle missing tenants appropriately, similar to how is_action_suspended handles this case.
♻️ Suggested approach
-    pub fn suspend_service(&self, tenant_id: &str, service: Service) {
-        if let Some(mut tenant) = self.tenants.get_mut(tenant_id) {
-            tenant.suspended_services.insert(service);
-        }
+    pub fn suspend_service(&self, tenant_id: &str, service: Service) -> Result<(), TenantNotFound> {
+        if let Some(mut tenant) = self.tenants.get_mut(tenant_id) {
+            tenant.suspended_services.insert(service);
+            Ok(())
+        } else {
+            Err(TenantNotFound(tenant_id.to_owned()))
+        }
     }

-    pub fn resume_service(&self, tenant_id: &str, service: Service) {
-        if let Some(mut tenant) = self.tenants.get_mut(tenant_id) {
-            tenant.suspended_services.remove(&service);
-        }
+    pub fn resume_service(&self, tenant_id: &str, service: Service) -> Result<(), TenantNotFound> {
+        if let Some(mut tenant) = self.tenants.get_mut(tenant_id) {
+            tenant.suspended_services.remove(&service);
+            Ok(())
+        } else {
+            Err(TenantNotFound(tenant_id.to_owned()))
+        }
     }
73-92: Minor: Redundant return keyword.

The explicit return on line 90 is unnecessary since it's the last expression in the function.
♻️ Suggested fix
         } else {
-            return Err(TenantNotFound(tenant_id.to_owned()));
+            Err(TenantNotFound(tenant_id.to_owned()))
         }
106-153: Consider removing or tracking commented-out code.

This large commented-out section represents planned functionality that isn't yet implemented. Having substantial commented code in the codebase can:

Become stale as surrounding code evolves

Cause confusion about what's active vs planned

Add noise during code reviews

Consider either removing this and tracking the planned implementation in an issue, or adding a TODO comment explaining when this will be enabled.
src/rbac/map.rs (1)

313-411: Consider removing commented-out debug statements or converting to structured logging.

Multiple commented-out tracing::warn! calls remain in check_auth (lines 320-322, 330, 370, 390, 403-408). While they're disabled, they add noise. Consider either:

Removing them entirely

Converting to proper trace! level logging with structured fields that don't risk leaking sensitive data
src/rbac/mod.rs (3)
72-80: Consider simplifying the nested option handling.

The pattern .filter().map().unwrap() is verbose. Consider using and_then for cleaner code:
♻️ Suggested simplification
     pub fn get_user(&self, userid: &str, tenant_id: &Option<String>) -> Option<User> {
         let tenant_id = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
-        let u = users()
+        users()
             .get(tenant_id)
-            .filter(|users| users.get(userid).is_some())
-            .map(|users| users.get(userid).unwrap().to_owned());
-        u
-        // .get(userid).cloned()
+            .and_then(|users| users.get(userid).cloned())
     }
119-126: Simplify the empty match arm.

The None => {} branch does nothing and can be replaced with if let:
♻️ Suggested fix
     fn remove_user(&mut self, userid: &str, tenant_id: &str) {
-        match mut_users().get_mut(tenant_id) {
-            Some(users) => {
-                users.remove(userid);
-            }
-            None => {}
+        if let Some(users) = mut_users().get_mut(tenant_id) {
+            users.remove(userid);
         }
     }
274-286: Consider caching or indexing for cross-tenant user lookup.

get_user_from_basic performs a full scan across all tenants and users (O(tenants × users)). While necessary for basic auth where tenant context isn't known upfront, this could become a performance bottleneck at scale.

Additionally, the iteration order may leak timing information about which tenant a user belongs to. Consider:

Adding a username → tenant index for faster lookups

Using constant-time comparison to prevent timing attacks
src/handlers/http/middleware.rs (1)
322-330: Simplify suspension check pattern.

The match with _ => {} is verbose for checking a single variant. Consider using if let:
♻️ Suggested fix for all three auth functions
 pub fn auth_no_context(req: &mut ServiceRequest, action: Action) -> Result<rbac::Response, Error> {
     // check if tenant is suspended
-    match check_suspension(req.request(), action) {
-        rbac::Response::Suspended(msg) => return Ok(rbac::Response::Suspended(msg)),
-        _ => {}
+    if let rbac::Response::Suspended(msg) = check_suspension(req.request(), action) {
+        return Ok(rbac::Response::Suspended(msg));
     }
     let creds = extract_session_key(req);
     creds.map(|key| Users.authorize(key, action, None, None))
 }
Apply the same pattern to auth_resource_context and auth_user_context.
src/handlers/http/role.rs (3)

56-61: Remove commented-out dead code.

The commented line // mut_roles().insert(name.clone(), privileges.clone()); is superseded by the tenant-scoped implementation above it. Consider removing to improve readability.

143-147: Remove commented-out dead code.

The commented line // mut_roles().remove(&name); duplicates the functionality of the tenant-scoped removal above.

183-190: Remove commented-out dead code block.

This large commented block in get_default should be removed as it's replaced by the tenant-scoped implementation.
src/handlers/http/cluster/mod.rs (1)
1789-1803: Auth token handling in send_query_request.

The function now accepts an optional HeaderMap for auth. The fallback creates a new map with the querier's token. This is a reasonable pattern, though the commented line on 1803 should be removed.
         .headers(auth.into())
-        // .header(header::AUTHORIZATION, auth)
         .header(header::CONTENT_TYPE, "application/json")
src/catalog/mod.rs (1)
460-460: Remove debug logging before merging.

This tracing::warn! appears to be debug output that should not remain in production code. Either remove it or downgrade to trace! level.
-    tracing::warn!("manifest path_url= {path_url}");
src/handlers/http/modal/query/querier_rbac.rs (1)

115-121: Repeated username lookup pattern.

The same pattern for looking up username by userid appears in delete_user, add_roles_to_user, and remove_roles_from_user. Consider extracting this into a helper function to reduce duplication.

Also applies to: 185-191, 245-251
src/handlers/http/query.rs (1)
117-120: Redundant tenant_id extraction.

tenant_id is extracted at line 118 for create_streams_for_distributed, then extracted again at line 120. Consider extracting once and reusing:
+    let tenant_id = get_tenant_id_from_request(&req);
     // check or load streams in memory
-    create_streams_for_distributed(tables.clone(), &get_tenant_id_from_request(&req)).await?;
-
-    let tenant_id = get_tenant_id_from_request(&req);
+    create_streams_for_distributed(tables.clone(), &tenant_id).await?;
src/parseable/streams.rs (1)

1091-1100: contains() warning on missing tenant bucket is likely a normal case

A tenant can legitimately exist with zero streams (or before the first get_or_create call), so warning here may spam logs. Consider returning false quietly (or debug!).

src/storage/object_storage.rs (1)

168-214: Per-stream metrics may collide across tenants (if stream names are not globally unique)

STORAGE_SIZE, EVENTS_STORAGE_SIZE_DATE, etc. label by stream_name only. In multi-tenant mode with same stream name in different tenants, these metrics will merge tenants. Consider adding tenant as an additional label or prefixing stream_name with tenant.

src/query/mod.rs (1)

946-1021: PartitionedMetricMonitor.tenant_id is currently unused

Either remove it for now, or (if you intend per-tenant billing/metrics) use it to label the emitted metrics to avoid cross-tenant aggregation.
src/parseable/mod.rs (2)
1184-1191: Consider propagating lock errors instead of returning None.

The list_tenants() method returns None on lock failure, which callers handle with map_or(). However, a poisoned lock typically indicates a panic occurred while holding the lock, which should probably propagate as an error rather than be silently treated as "no tenants."

250-265: Debug logging uses warn! level - should be trace! or debug!

Lines 255 and 259 use tracing::warn! for what appears to be debug/diagnostic logging during normal operation. This will clutter production logs.
-        tracing::warn!("check or load streams- {stream_name}, {tenant_id:?}");
+        tracing::trace!("check or load streams- {stream_name}, {tenant_id:?}");
         if self.streams.contains(stream_name, tenant_id) {
             return true;
         }
-        tracing::warn!("check or load streams not present, creating {stream_name}, {tenant_id:?}");
+        tracing::trace!("check or load streams not present, creating {stream_name}, {tenant_id:?}");
src/metastore/metastores/object_store_metastore.rs (1)

1200-1273: Remove commented-out debug logging before merge.

The method has extensive commented-out tracing::warn! statements (lines 1220-1227, 1232-1233, 1238, 1245, 1256-1260, 1262-1267). These should be cleaned up or converted to proper tracing::trace! if still needed.

coderabbitai · 2026-01-15T08:51:15Z

src/alerts/alert_types.rs

+        let auth = if let Some(tenant) = &self.tenant_id
+            && let Some(tenant_users) = users().get(tenant)
+            && let Some(tenant_roles) = roles().get(tenant)
+            && let Some(user) = tenant_users.iter().find_map(|(_, user)| {
+                let mut res = None;
+                for role in &user.roles {
+                    if let Some(role) = tenant_roles.get(role)
+                        && role.contains(&DefaultPrivilege::Admin)
+                    {
+                        res = Some(user.clone());
+                        break;
+                    }
+                }
+                res
+            }) {
+            // fetch admin credentials for tenant
+            match user.ty {
+                crate::rbac::user::UserType::Native(basic) => {
+                    // Create a protected user whose details can't be edited
+                    // save that user's basic auth
+                    // use that to send request
+                    None
+                },
+                crate::rbac::user::UserType::OAuth(_) => {
+                    tracing::warn!("admin user is oauth");
+                    None
+                },
+            }
+        } else {
+            None
+        };
+        let query_result =
+            execute_alert_query(auth, self.get_query(), &time_range, &self.tenant_id).await?;


⚠️ Potential issue | 🟠 Major

Auth credential extraction is incomplete - always returns None.

The complex logic to find an admin user for the tenant (lines 91-105) correctly identifies a user with admin privileges, but the credential extraction (lines 107-118) always returns None:

Native user: returns None with TODO comments about protected user

OAuth user: logs warning and returns None

This means execute_alert_query will always receive None for auth_token, which may cause remote alert queries in Prism mode to fail authentication.

This appears to be work-in-progress. Would you like help implementing the credential extraction for Native users, or should this be tracked as a follow-up issue?

coderabbitai · 2026-01-15T08:51:15Z

src/handlers/http/middleware.rs

+        // append tenant id if present
+        let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) {
+            Ok((uid, tid)) => {
+                req.headers_mut().insert(
+                    HeaderName::from_static("tenant"),
+                    HeaderValue::from_str(&tid).unwrap(),
+                );
+                Ok((uid, tid))
+            }
+            Err(e) => Err(e),
+        };


⚠️ Potential issue | 🟡 Minor

Handle potential panic from HeaderValue::from_str().unwrap().

If tenant_id contains characters not valid in HTTP headers (e.g., control characters), HeaderValue::from_str() will return an error and unwrap() will panic. Consider handling this gracefully:

♻️ Suggested fix

let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) { Ok((uid, tid)) => { - req.headers_mut().insert( - HeaderName::from_static("tenant"), - HeaderValue::from_str(&tid).unwrap(), - ); + if let Ok(header_val) = HeaderValue::from_str(&tid) { + req.headers_mut().insert( + HeaderName::from_static("tenant"), + header_val, + ); + } Ok((uid, tid)) } Err(e) => Err(e), };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// append tenant id if present

let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) {

Ok((uid, tid)) => {

req.headers_mut().insert(

HeaderName::from_static("tenant"),

HeaderValue::from_str(&tid).unwrap(),

);

Ok((uid, tid))

}

Err(e) => Err(e),

};

// append tenant id if present

let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) {

Ok((uid, tid)) => {

if let Ok(header_val) = HeaderValue::from_str(&tid) {

req.headers_mut().insert(

HeaderName::from_static("tenant"),

header_val,

);

}

Ok((uid, tid))

}

Err(e) => Err(e),

};

🤖 Prompt for AI Agents

In `@src/handlers/http/middleware.rs` around lines 168 - 178, The code uses HeaderValue::from_str(&tid).unwrap() inside the match for get_user_and_tenant_from_request, which can panic for invalid header characters; replace the unwrap with proper error handling: call HeaderValue::from_str(&tid) and match or use map_err to convert the header error into the existing Err branch (or log and skip inserting the header), then only call req.headers_mut().insert(...) on Ok(val). Update the user_and_tenant_id assignment so failures to construct the HeaderValue return an Err (propagated) or a controlled fallback instead of panicking, referencing get_user_and_tenant_from_request, HeaderValue::from_str, req.headers_mut().insert and user_and_tenant_id.

coderabbitai · 2026-01-15T08:51:15Z

src/handlers/http/middleware.rs

+pub fn check_suspension(req: &HttpRequest, action: Action) -> rbac::Response {
+    if let Some(tenant) = req.headers().get("tenant")
+        && let Ok(tenant) = tenant.to_str()
+    {
+        if let Ok(Some(suspension)) = TENANT_METADATA.is_action_suspended(tenant, &action) {
+            return rbac::Response::Suspended(suspension);
+        } else {
+            // tenant does not exist
+        }
+    }
+    rbac::Response::Authorized
+}


⚠️ Potential issue | 🟠 Major

Security: Consider stricter handling when tenant doesn't exist.

The check_suspension function returns Authorized when:

No tenant header is present (line 319)

Tenant doesn't exist in TENANT_METADATA (line 315-317)

This could allow requests to bypass tenant-level controls. Consider whether:

Missing tenant header should be an error for tenant-scoped endpoints

Non-existent tenant should return Unauthorized rather than Authorized

The empty else branch with the "tenant does not exist" comment suggests this needs further handling.

♻️ Suggested approach

pub fn check_suspension(req: &HttpRequest, action: Action) -> rbac::Response { if let Some(tenant) = req.headers().get("tenant") && let Ok(tenant) = tenant.to_str() { - if let Ok(Some(suspension)) = TENANT_METADATA.is_action_suspended(tenant, &action) { - return rbac::Response::Suspended(suspension); - } else { - // tenant does not exist + match TENANT_METADATA.is_action_suspended(tenant, &action) { + Ok(Some(suspension)) => return rbac::Response::Suspended(suspension), + Ok(None) => {} // Tenant exists, action not suspended + Err(_) => { + // Tenant not found - consider returning UnAuthorized or logging + tracing::warn!(tenant = tenant, "Tenant not found in metadata"); + } } } rbac::Response::Authorized }

🤖 Prompt for AI Agents

In `@src/handlers/http/middleware.rs` around lines 309 - 320, check_suspension currently treats missing or unknown tenants as Authorized; change it to reject those cases: in function check_suspension, when the tenant header is missing or tenant.to_str() fails return rbac::Response::Unauthorized (or another appropriate denial variant) instead of rbac::Response::Authorized, and in the branch where TENANT_METADATA.is_action_suspended returns Ok(None) (the "tenant does not exist" case) return rbac::Response::Unauthorized rather than falling through to Authorized; keep the existing Suspended return when an actual suspension is found and optionally add a short debug log mentioning the tenant value on unauthorized paths.

coderabbitai · 2026-01-15T08:51:15Z

src/handlers/http/modal/ingest/ingestor_rbac.rs

+        if req_tenant.ne(DEFAULT_TENANT)
+            && (req_tenant.eq(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))
+        {
+            return Err(RBACError::Anyhow(anyhow::Error::msg(
+                "non super-admin user trying to create user for another tenant",
+            )));
+        }


⚠️ Potential issue | 🔴 Critical

Inverted tenant validation logic.

Same issue as in ingestor_role.rs: the condition checks for a match when it should check for a mismatch. The error message indicates this should prevent cross-tenant operations, but the current logic prevents same-tenant operations.

- if req_tenant.ne(DEFAULT_TENANT) - && (req_tenant.eq(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v))) + if req_tenant.ne(DEFAULT_TENANT) + && (req_tenant.ne(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if req_tenant.ne(DEFAULT_TENANT)

&& (req_tenant.eq(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))

{

return Err(RBACError::Anyhow(anyhow::Error::msg(

"non super-admin user trying to create user for another tenant",

)));

}

if req_tenant.ne(DEFAULT_TENANT)

&& (req_tenant.ne(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))

{

return Err(RBACError::Anyhow(anyhow::Error::msg(

"non super-admin user trying to create user for another tenant",

)));

}

🤖 Prompt for AI Agents

In `@src/handlers/http/modal/ingest/ingestor_rbac.rs` around lines 52 - 58, The tenant validation in ingestor_rbac.rs is inverted: change the condition that currently returns an error when req_tenant equals the requester's tenant to instead return an error when a non-super-admin (req_tenant != DEFAULT_TENANT) is trying to create a user for a different tenant; specifically, update the check that uses req_tenant, DEFAULT_TENANT and user.tenant (as_ref().map_or(...)) so it tests for inequality (req_tenant != user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)) and then return RBACError::Anyhow(...) when that inequality is true.

src/handlers/http/modal/query/querier_rbac.rs

coderabbitai · 2026-01-15T08:51:16Z

src/parseable/mod.rs

+    pub fn add_tenant(
+        &self,
+        tenant_id: String,
+        tenant_meta: StorageMetadata,
+    ) -> Result<(), anyhow::Error> {
+        if !self.options.is_multi_tenant() {
+            return Err(anyhow::Error::msg("P_MULTI_TENANCY is set to false"));
+        }
+
+        if self.tenants.read().unwrap().contains(&tenant_id) {
+            return Err(anyhow::Error::msg(format!(
+                "Tenant with id- {tenant_id} already exists"
+            )));
+        } else {
+            self.tenants.write().unwrap().push(tenant_id.clone());
+            TENANT_METADATA.insert_tenant(tenant_id, tenant_meta);
+        }
+
+        Ok(())
+    }


⚠️ Potential issue | 🟠 Major

TOCTOU race condition in tenant addition.

The method has a time-of-check-to-time-of-use race: the existence check (line 1066) uses a read lock, but the insertion (line 1071) acquires a separate write lock. Another thread could add the same tenant between these operations.

Fix using single write lock

pub fn add_tenant( &self, tenant_id: String, tenant_meta: StorageMetadata, ) -> Result<(), anyhow::Error> { if !self.options.is_multi_tenant() { return Err(anyhow::Error::msg("P_MULTI_TENANCY is set to false")); } - if self.tenants.read().unwrap().contains(&tenant_id) { - return Err(anyhow::Error::msg(format!( - "Tenant with id- {tenant_id} already exists" - ))); - } else { - self.tenants.write().unwrap().push(tenant_id.clone()); + let mut tenants = self.tenants.write().unwrap(); + if tenants.contains(&tenant_id) { + return Err(anyhow::Error::msg(format!( + "Tenant with id- {tenant_id} already exists" + ))); + } + tenants.push(tenant_id.clone()); + drop(tenants); // Release lock before external call TENANT_METADATA.insert_tenant(tenant_id, tenant_meta); - } Ok(()) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn add_tenant(

&self,

tenant_id: String,

tenant_meta: StorageMetadata,

) -> Result<(), anyhow::Error> {

if !self.options.is_multi_tenant() {

return Err(anyhow::Error::msg("P_MULTI_TENANCY is set to false"));

}

if self.tenants.read().unwrap().contains(&tenant_id) {

return Err(anyhow::Error::msg(format!(

"Tenant with id- {tenant_id} already exists"

)));

} else {

self.tenants.write().unwrap().push(tenant_id.clone());

TENANT_METADATA.insert_tenant(tenant_id, tenant_meta);

}

Ok(())

}

pub fn add_tenant(

&self,

tenant_id: String,

tenant_meta: StorageMetadata,

) -> Result<(), anyhow::Error> {

if !self.options.is_multi_tenant() {

return Err(anyhow::Error::msg("P_MULTI_TENANCY is set to false"));

}

let mut tenants = self.tenants.write().unwrap();

if tenants.contains(&tenant_id) {

return Err(anyhow::Error::msg(format!(

"Tenant with id- {tenant_id} already exists"

)));

}

tenants.push(tenant_id.clone());

drop(tenants); // Release lock before external call

TENANT_METADATA.insert_tenant(tenant_id, tenant_meta);

Ok(())

}

🤖 Prompt for AI Agents

In `@src/parseable/mod.rs` around lines 1057 - 1076, The add_tenant method has a TOCTOU race: it does a contains() under a read lock then pushes under a separate write lock; fix by taking a single write lock once (let mut tenants = self.tenants.write().unwrap()), perform the contains() check on that guard, return Err if present, otherwise push the tenant_id and call TENANT_METADATA.insert_tenant(tenant_id, tenant_meta) while still holding that write lock so the check-and-insert is atomic.

coderabbitai · 2026-01-15T08:51:16Z

src/parseable/mod.rs

+    pub fn delete_tenant(&self, tenant_id: &str) -> Result<(), anyhow::Error> {
+        // let mut metadata = get_metadata(&Some(tenant_id.to_owned())).await?;
+        // delete users and sessions
+        let users = mut_users().remove(tenant_id);
+        if let Some(users) = users {
+            tracing::warn!("found tenant users, deleting");
+            for (userid, user) in users {
+                // metadata
+                //     .users
+                //     .retain(|u| u.tenant.eq(&Some(tenant_id.to_owned())));
+
+                Users.delete_user(&userid, &user.tenant);
+            }
+        }
+
+        // delete roles
+        mut_roles().remove(tenant_id);
+        // if let Some(roles) = mut_roles().remove(tenant_id) {
+        //     for (role, _) in roles {
+        //         // metadata.roles.retain(|r, _| !role.eq(r));
+        //     }
+        // }
+
+        // delete resources
+
+        // delete from in-mem
+        TENANT_METADATA.delete_tenant(&tenant_id);
+        Ok(())
+    }


⚠️ Potential issue | 🔴 Critical

Incomplete tenant deletion - missing tenants list cleanup.

The method removes the tenant from TENANT_METADATA but does not remove it from self.tenants. This leaves the system in an inconsistent state where list_tenants() still returns the deleted tenant.

Add tenant list cleanup

pub fn delete_tenant(&self, tenant_id: &str) -> Result<(), anyhow::Error> { // delete users and sessions let users = mut_users().remove(tenant_id); if let Some(users) = users { tracing::warn!("found tenant users, deleting"); for (userid, user) in users { Users.delete_user(&userid, &user.tenant); } } // delete roles mut_roles().remove(tenant_id); + // Remove from tenants list + if let Ok(mut tenants) = self.tenants.write() { + tenants.retain(|t| t != tenant_id); + } // delete from in-mem TENANT_METADATA.delete_tenant(&tenant_id); Ok(()) }

🤖 Prompt for AI Agents

In `@src/parseable/mod.rs` around lines 1116 - 1144, delete_tenant currently removes tenant data from TENANT_METADATA, users and roles but never removes the tenant entry from self.tenants, so list_tenants() still returns it; update delete_tenant to also remove the tenant from self.tenants (e.g., by acquiring a mutable borrow of self.tenants and calling remove(tenant_id) or filtering/retaining entries that don't match tenant_id), ensuring you reference the same tenant_id string; keep existing cleanup (mut_users(), Users.delete_user, mut_roles(), TENANT_METADATA.delete_tenant) and perform the self.tenants removal before returning Ok(()) so the in-memory tenant list and list_tenants() reflect the deletion.

coderabbitai · 2026-01-15T08:51:16Z

src/query/mod.rs

+pub static QUERY_SESSION: Lazy<InMemorySessionContext> = Lazy::new(|| {
+    let ctx = Query::create_session_context(PARSEABLE.storage());
+    InMemorySessionContext {
+        session_context: Arc::new(RwLock::new(ctx)),
+    }
+});
+
+pub struct InMemorySessionContext {
+    session_context: Arc<RwLock<SessionContext>>,
+}
+
+impl InMemorySessionContext {
+    pub fn get_ctx(&self) -> SessionContext {
+        let ctx = self
+            .session_context
+            .read()
+            .expect("SessionContext should be readable");
+        ctx.clone()
+    }
+
+    pub fn add_schema(&self, tenant_id: &str) {
+        self.session_context
+            .write()
+            .expect("SessionContext should be writeable")
+            .catalog("datafusion")
+            .expect("Default catalog should be available")
+            .register_schema(
+                tenant_id,
+                Arc::new(GlobalSchemaProvider {
+                    storage: PARSEABLE.storage().get_object_store(),
+                    tenant_id: Some(tenant_id.to_owned()),
+                }),
+            )
+            .expect("Should be able to register new schema");
+    }
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the code context around the mentioned lines and the secondary location cat -n src/query/mod.rs | sed -n '70,130p'

Repository: parseablehq/parseable

Length of output: 2638

🏁 Script executed:

# Check the secondary location (lines 171-205) cat -n src/query/mod.rs | sed -n '165,210p'

Repository: parseablehq/parseable

Length of output: 2225

🏁 Script executed:

# Search for how add_schema is called throughout the codebase rg "add_schema" -B 3 -A 3 --type rust

Repository: parseablehq/parseable

Length of output: 662

🌐 Web query:

DataFusion Rust CatalogProvider register_schema behavior duplicate schema error handling

💡 Result:

Short answer:

CatalogProvider::register_schema adds or replaces a schema; if a schema with the same name existed it is replaced and the previous schema is returned as Option<Arc>. The provided default implementation returns a "Not Implemented" error, while MemoryCatalogProvider implements the replace-and-return-old behavior. If you want a "duplicate schema" error instead of replace, implement a custom CatalogProvider/register_schema that checks existence and returns an Err(DataFusionError). [1][2]

Sources:

CatalogProvider trait docs (register_schema semantics and default behavior). [1]

MemoryCatalogProvider / catalog docs and Python wrapper showing register_schema behavior. [2]

Log schema registration errors instead of silently ignoring them during initialization

Lines 191 and 201 in create_session_context silently drop errors with let _ = catalog.register_schema(...). These errors should be logged since failed schema registration at startup may cause query failures.

Note: The .expect() in add_schema (line 119) is safe—DataFusion's register_schema replaces existing schemas rather than erroring on duplicates, so it will only panic on legitimate I/O failures.

Also applies to: 171-205

🤖 Prompt for AI Agents

In `@src/query/mod.rs` around lines 86 - 121, create_session_context currently ignores errors from catalog.register_schema (used around lines referenced) with `let _ = ...`; update that to handle the Result and log any Err using the project's logging facility (e.g., tracing::error! or log::error!), e.g. replace the `let _ = catalog.register_schema(...)` with an `if let Err(e) = catalog.register_schema(...) { error!("failed to register schema for tenant {}: {:?}", tenant_id, e); }` pattern so startup schema registration failures are visible; do not change the existing InMemorySessionContext::add_schema behavior that uses .expect().

coderabbitai · 2026-01-15T08:51:16Z

src/query/stream_schema_provider.rs

+        tracing::warn!(
+            "entered scan with\ntenant- {:?}\nschema- {:?}\nstream- {}",
+            self.tenant_id,
+            self.schema,
+            self.stream
+        );


⚠️ Potential issue | 🟡 Minor

Reduce logging level from warn to debug or trace.

This logging statement runs on every table scan and outputs schema/tenant/stream info. Using warn level will flood logs in production. This appears to be debug instrumentation that should be reduced to debug or trace level.

♻️ Suggested fix

- tracing::warn!( + tracing::debug!( "entered scan with\ntenant- {:?}\nschema- {:?}\nstream- {}", self.tenant_id, self.schema, self.stream );

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tracing::warn!(

"entered scan with\ntenant- {:?}\nschema- {:?}\nstream- {}",

self.tenant_id,

self.schema,

self.stream

);

tracing::debug!(

"entered scan with\ntenant- {:?}\nschema- {:?}\nstream- {}",

self.tenant_id,

self.schema,

self.stream

);

🤖 Prompt for AI Agents

In `@src/query/stream_schema_provider.rs` around lines 529 - 534, The logging call using tracing::warn! inside the scan routine is too noisy for per-scan instrumentation; change it to a lower level (tracing::debug! or tracing::trace!) so it doesn’t flood production logs—locate the invocation that logs self.tenant_id, self.schema, and self.stream (the tracing::warn! call in the scan path of the StreamSchemaProvider implementation) and replace with tracing::debug! (or tracing::trace!) keeping the same message and fields.

coderabbitai · 2026-01-15T08:51:16Z

src/query/stream_schema_provider.rs

+        //     glob_storage.store_url()
+        // };
+        let object_store_url = glob_storage.store_url();
+        tracing::warn!(object_store_url=?object_store_url);


⚠️ Potential issue | 🟡 Minor

Remove or reduce debug logging.

This warn! log on every scan will be noisy in production.

♻️ Suggested fix

- tracing::warn!(object_store_url=?object_store_url); + tracing::trace!(object_store_url=?object_store_url);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tracing::warn!(object_store_url=?object_store_url);

tracing::trace!(object_store_url=?object_store_url);

🤖 Prompt for AI Agents

In `@src/query/stream_schema_provider.rs` at line 645, Remove or reduce the noisy warning by deleting or lowering the log level of the tracing macro call tracing::warn!(object_store_url=?object_store_url); — either remove it entirely or change it to tracing::debug! or tracing::trace! (or guard it behind a verbose/diagnostic flag) so the object_store_url is not logged as a warn on every scan.

coderabbitai

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (20)

src/storage/store_metadata.rs (1)
298-320: Missing parent directory creation for tenant-specific staging path.

When tenant_id is Some, the staging metadata is written to staging_dir/tenant_id/.parseable.json. However, the tenant subdirectory may not exist, causing OpenOptions::open() to fail with NotFound. The migration module's put_staging_metadata (shown in relevant snippets) has the same pattern but also doesn't create the directory.
🐛 Proposed fix to ensure parent directory exists
 pub fn put_staging_metadata(meta: &StorageMetadata, tenant_id: &Option<String>) -> io::Result<()> {
     let mut staging_metadata = meta.clone();
     staging_metadata.server_mode = PARSEABLE.options.mode;
     staging_metadata.staging = PARSEABLE.options.staging_dir().to_path_buf();
     let path = if let Some(tenant_id) = tenant_id.as_ref() {
-        PARSEABLE
+        let tenant_path = PARSEABLE
             .options
             .staging_dir()
-            .join(tenant_id)
-            .join(PARSEABLE_METADATA_FILE_NAME)
+            .join(tenant_id);
+        std::fs::create_dir_all(&tenant_path)?;
+        tenant_path.join(PARSEABLE_METADATA_FILE_NAME)
     } else {
         PARSEABLE
             .options
             .staging_dir()
             .join(PARSEABLE_METADATA_FILE_NAME)
     };
src/stats.rs (1)
151-168: Inconsistent tenant_id usage in metric labels within update_deleted_stats.

The function accepts tenant_id and uses it when calling get_current_stats (line 169), but the metric updates between lines 151-168 use hardcoded 2-element and 3-element label arrays without tenant_id. This creates a mismatch: metrics are written without tenant context but stats are read with tenant context.

These labels should include tenant_id to match the label structure used elsewhere (e.g., event_labels returns 3 elements, storage_size_labels returns 4 elements).
🐛 Proposed fix
+    let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v.as_str());
     EVENTS_DELETED
-        .with_label_values(&[stream_name, "json"])
+        .with_label_values(&[stream_name, "json", tenant])
         .add(num_row);
     EVENTS_DELETED_SIZE
-        .with_label_values(&[stream_name, "json"])
+        .with_label_values(&[stream_name, "json", tenant])
         .add(ingestion_size);
     DELETED_EVENTS_STORAGE_SIZE
-        .with_label_values(&["data", stream_name, "parquet"])
+        .with_label_values(&["data", stream_name, "parquet", tenant])
         .add(storage_size);
     EVENTS_INGESTED
-        .with_label_values(&[stream_name, "json"])
+        .with_label_values(&[stream_name, "json", tenant])
         .sub(num_row);
     EVENTS_INGESTED_SIZE
-        .with_label_values(&[stream_name, "json"])
+        .with_label_values(&[stream_name, "json", tenant])
         .sub(ingestion_size);
     STORAGE_SIZE
-        .with_label_values(&["data", stream_name, "parquet"])
+        .with_label_values(&["data", stream_name, "parquet", tenant])
         .sub(storage_size);
src/hottier.rs (1)
208-220: delete_hot_tier doesn't use tenant_id in path construction.

The function accepts tenant_id but line 216 constructs the path as self.hot_tier_path.join(stream) without tenant isolation. This is inconsistent with hot_tier_file_path which includes the tenant subdirectory. In a multi-tenant environment, this could delete another tenant's data.
🐛 Proposed fix
     pub async fn delete_hot_tier(
         &self,
         stream: &str,
         tenant_id: &Option<String>,
     ) -> Result<(), HotTierError> {
         if !self.check_stream_hot_tier_exists(stream, tenant_id) {
             return Err(HotTierValidationError::NotFound(stream.to_owned()).into());
         }
-        let path = self.hot_tier_path.join(stream);
+        let path = if let Some(tid) = tenant_id.as_ref() {
+            self.hot_tier_path.join(tid).join(stream)
+        } else {
+            self.hot_tier_path.join(stream)
+        };
         fs::remove_dir_all(path).await?;

         Ok(())
     }
src/rbac/user.rs (1)

153-164: Revert to the recommended salt generation approach.

The manual 32-byte salt generation with fill_bytes and encode_b64 is technically valid but not aligned with best practices. The Argon2 ecosystem and password-hash crate recommend using SaltString::generate(&mut OsRng) directly, which provides simpler, safer, and more idiomatic code. This approach handles PHC-safe encoding and appropriate salt length automatically, eliminating the need for manual encoding. Restore the commented line at 158 and remove the manual implementation at lines 154-157.
src/metastore/metastores/object_store_metastore.rs (5)
393-403: tenant_id parameter unused in delete_alert_state.

Similar to put_alert_state, the tenant_id parameter at line 396 is accepted but unused. The path comes from obj.get_object_path() which may not be tenant-aware.

541-566: get_chats is not tenant-aware unlike similar methods.

While get_dashboards, get_filters, and get_correlations iterate over tenants via PARSEABLE.list_tenants(), get_chats only uses a single USERS_ROOT_DIR path without tenant prefixing. This inconsistency means chats won't be properly scoped per tenant.
🐛 Suggested fix for tenant-aware chats
     async fn get_chats(&self) -> Result<DashMap<String, Vec<Bytes>>, MetastoreError> {
         let all_user_chats = DashMap::new();
-
-        let users_dir = RelativePathBuf::from(USERS_ROOT_DIR);
-        for user in self.storage.list_dirs_relative(&users_dir).await? {
-            if user.starts_with(".") {
-                continue;
-            }
-            let mut chats = Vec::new();
-            let chats_path = users_dir.join(&user).join("chats");
-            let user_chats = self
-                .storage
-                .get_objects(
-                    Some(&chats_path),
-                    Box::new(|file_name| file_name.ends_with(".json")),
-                )
-                .await?;
-            for chat in user_chats {
-                chats.push(chat);
+        let base_paths = PARSEABLE.list_tenants().map_or(vec!["".into()], |v| v);
+        for tenant in base_paths {
+            let users_dir = RelativePathBuf::from_iter([&tenant, USERS_ROOT_DIR]);
+            for user in self.storage.list_dirs_relative(&users_dir).await? {
+                if user.starts_with(".") {
+                    continue;
+                }
+                let mut chats = Vec::new();
+                let chats_path = users_dir.join(&user).join("chats");
+                let user_chats = self
+                    .storage
+                    .get_objects(
+                        Some(&chats_path),
+                        Box::new(|file_name| file_name.ends_with(".json")),
+                    )
+                    .await?;
+                for chat in user_chats {
+                    chats.push(chat);
+                }
+                all_user_chats.insert(user, chats);
             }
-
-            all_user_chats.insert(user, chats);
         }
 
         Ok(all_user_chats)
     }
297-321: Tenant path mismatch between get_alert_states and get_alert_state_entry.

get_alert_states constructs a tenant-specific base path at lines 301-302, but get_alert_state_entry (line 328) calls alert_state_json_path(*alert_id) which ignores the tenant_id parameter and returns a non-tenant-prefixed path. This breaks tenant isolation—get_alert_states lists from {tenant}/alerts/ while get_alert_state_entry reads from alerts/. The same issue affects put_alert_state (line 352).

342-390: alert_state_json_path is missing the tenant_id parameter.

The put_alert_state method accepts tenant_id but never uses it. The path construction at line 352 uses alert_state_json_path(id) without tenant context, storing alert states in a global location instead of per-tenant.

This is inconsistent with related path functions:

alert_json_path(alert_id, tenant_id) accepts and uses tenant_id

mttr_json_path(tenant_id) accepts and uses tenant_id

schema_path(stream_name, tenant_id) accepts and uses tenant_id

The get_alert_states method (lines 297–322) demonstrates the correct pattern by manually constructing a tenant-scoped path with tenant_id. The singular get_alert_state_entry method has the same issue.

Update alert_state_json_path signature to accept tenant_id: &Option<String> and pass it in both get_alert_state_entry and put_alert_state calls.

569-594: Unused tenant_id parameters create dead code and path inconsistency.

The methods put_chat, put_filter, put_correlation, put_target, and put_llmconfig accept tenant_id but never use it. Their implementations rely solely on obj.get_object_path(), which bypasses tenant context:

Filter: Path uses filter_path(user_id, ...) without tenant scoping

CorrelationConfig: Path uses self.path() with user_id only

Target: Path uses target_json_path(&self.id) while the object has a pub tenant: Option<String> field that is ignored; target_json_path() includes a TODO comment "Needs to be updated for distributed mode"

This contrasts with put_conversation and put_alert, which correctly build paths using the tenant_id parameter. In a distributed/multi-tenant scenario, this inconsistency could lead to operations on incorrect paths or data isolation issues.
src/alerts/alerts_utils.rs (2)
130-153: The auth_token parameter is received but never used.

The execute_remote_query function accepts auth_token: Option<String> but passes None to send_query_request on line 148. This means the auth context from the caller is completely ignored, and the remote query will always fall back to the internal cluster token.

Looking at the send_query_request signature (from the relevant snippets), it expects Option<HeaderMap>. The auth_token should be converted and passed through.
🐛 Proposed fix to use the auth_token
+use http::header::HeaderValue;
+use reqwest::header::HeaderMap;
+
 /// Execute alert query remotely (Prism mode)
 async fn execute_remote_query(
     auth_token: Option<String>,
     query: &str,
     time_range: &TimeRange,
 ) -> Result<AlertQueryResult, AlertError> {
     let session_state = QUERY_SESSION.get_ctx().state();
     let raw_logical_plan = session_state.create_logical_plan(query).await?;
 
     let query_request = Query {
         query: query.to_string(),
         start_time: time_range.start.to_rfc3339(),
         end_time: time_range.end.to_rfc3339(),
         streaming: false,
         send_null: false,
         fields: false,
         filter_tags: None,
     };
 
-    let (result_value, _) = send_query_request(None,&query_request)
+    let auth_header = auth_token.map(|token| {
+        let mut map = HeaderMap::new();
+        map.insert(
+            http::header::AUTHORIZATION,
+            HeaderValue::from_str(&token).expect("valid auth token"),
+        );
+        map
+    });
+
+    let (result_value, _) = send_query_request(auth_header, &query_request)
         .await
         .map_err(|err| AlertError::CustomError(format!("Failed to send query request: {err}")))?;
 
     convert_result_to_group_results(result_value, raw_logical_plan)
 }
77-91: Pass tenant_id to remote query execution in Prism mode.

The execute_remote_query function does not accept or forward the tenant_id parameter, even though the parent execute_alert_query function receives it. This breaks multi-tenant isolation in Prism deployments. The send_query_request signature also lacks tenant context (note the commented-out tenant_id parameter on line 380 of cluster/mod.rs), preventing proper tenant-scoped query execution on remote nodes. Both functions need to be updated to propagate tenant context.
src/handlers/http/modal/ingest/ingestor_logstream.rs (1)

70-89: Consider consistency in stream validation for the delete handler.

This delete handler uses get_stream (line 74) without an upfront stream existence check, while similar handlers in src/handlers/http/logstream.rs (line 52) and src/handlers/http/modal/query/querier_logstream.rs (line 55) verify stream existence before calling get_or_create_stream.

Using get_stream will error if the stream exists in storage but isn't loaded in memory, whereas the other handlers safeguard against this with explicit checks. If this stricter behavior is intentional for ingestor nodes (where streams should always be in memory), add a comment explaining why the pattern differs from other delete handlers.

src/handlers/http/targets.rs (2)

71-98: Missing tenant_id enforcement in update handler allows potential cross-tenant modifications.

The handler extracts tenant_id from the request and uses it to fetch the existing target, but TARGETS.update(target.clone()) accepts the target object deserialized directly from the JSON request body without enforcing that target.tenant matches the authenticated tenant_id. A user could send a target with a different or missing tenant value, allowing the update to bypass tenant isolation.

Set target.tenant = tenant.clone() before calling TARGETS.update() to enforce tenant isolation, matching the pattern used in get_target_by_id() which properly validates the tenant context.

34-45: Both post and update handlers must set target.tenant before calling TARGETS.update().

The post handler doesn't extract tenant_id, and the update handler extracts it but never assigns it to target.tenant. Since TargetConfigs.update() relies on target.tenant to determine the storage location (defaulting to DEFAULT_TENANT if unset), both handlers will incorrectly store targets under the default tenant, breaking multi-tenant isolation.

Add target.tenant = tenant_id; in both handlers before calling TARGETS.update(), and extract tenant_id in post() using get_tenant_id_from_request().
src/handlers/http/modal/ingest/ingestor_rbac.rs (3)
108-112: Role existence check not tenant-scoped.

The roles() map is now tenant-scoped (HashMap<String, HashMap<String, Vec<DefaultPrivilege>>>), so roles().get(r) where r is a role name will always return None. You need to first get the tenant's role map.
🐛 Proposed fix
     // check if all roles exist
     let mut non_existent_roles = Vec::new();
+    let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
+    let tenant_roles = roles();
+    let tenant_role_map = tenant_roles.get(tenant);
     roles_to_add.iter().for_each(|r| {
-        if roles().get(r).is_none() {
+        if tenant_role_map.map_or(true, |m| m.get(r).is_none()) {
             non_existent_roles.push(r.clone());
         }
     });
150-156: Role existence check not tenant-scoped.

Same issue as in add_roles_to_user - the role lookup needs to be scoped to the tenant's role map.
🐛 Proposed fix
     // check if all roles exist
     let mut non_existent_roles = Vec::new();
+    let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
+    let tenant_roles = roles();
+    let tenant_role_map = tenant_roles.get(tenant);
     roles_to_remove.iter().for_each(|r| {
-        if roles().get(r).is_none() {
+        if tenant_role_map.map_or(true, |m| m.get(r).is_none()) {
             non_existent_roles.push(r.clone());
         }
     });
197-221: Missing password generation logic - critical bug.

This implementation is fundamentally broken compared to the other post_gen_password implementations in src/handlers/http/rbac.rs (lines 180) and src/handlers/http/modal/query/querier_rbac.rs (line 313). Both those functions call user::Basic::gen_new_password() to actually generate a new password and hash, then persist the modified metadata.

In ingestor_rbac.rs:

Line 206 writes unmodified metadata to staging (pointless, nothing changed yet)

Lines 207-219 copy the existing password hash instead of generating a new one

Line 220 updates the in-memory Users table with that existing hash

The function returns a success response instead of returning the new generated password

The function comment claims it "Resets password for the user to a newly generated one" but this implementation skips the generation entirely and just syncs an unchanged password. Add the missing call to user::Basic::gen_new_password() before line 207, update metadata with the new hash, and persist the modified metadata (like rbac.rs does at line 196).
src/parseable/streams.rs (1)
1627-1632: Streams tests also missing tenant_id parameter.

Streams::get_or_create calls in tests (lines 1627-1632, 1635-1640, 1665-1670, 1703, 1712) are missing the required tenant_id parameter. Additionally, assertions at line 1647 and 1661 use guard.len() and guard.contains_key() which no longer work with the nested HashMap structure.
         let stream1 = streams.get_or_create(
             options.clone(),
             stream_name.to_owned(),
             metadata.clone(),
             ingestor_id.clone(),
+            &None,
         );
The assertion logic also needs updating to account for the nested HashMap:
-        assert_eq!(guard.len(), 1);
+        assert_eq!(guard.values().map(|m| m.len()).sum::<usize>(), 1);
Also applies to: 1661-1661, 1676-1678
src/handlers/http/rbac.rs (1)

128-136: Role existence check should use tenant context.

The roles().contains_key(role) check at line 130 bypasses tenant-scoping. While tenant_id is available on line 117, the validation doesn't use it. Other handlers like get_role correctly access tenant-scoped roles via roles().get(tenant). The current implementation may accept roles from other tenants or incorrectly reject valid tenant-specific roles if roles are indeed tenant-scoped.

src/handlers/http/logstream.rs (1)

380-392: Pass tenant_id to get_first_and_latest_event_from_storage call.

Event data is stored under tenant-scoped paths (e.g., {tenant}/{stream_name}/date=.../...), but get_first_and_latest_event_from_storage receives only stream_name and calls list_dates(stream_name), which will not find data stored under the tenant prefix. For multi-tenant deployments, this causes retrieval of timestamps from the wrong path or no data at all. Add tenant_id parameter to the function signature and storage method chain to ensure proper path scoping.

🤖 Fix all issues with AI agents

In `@src/handlers/http/cluster/mod.rs`:
- Around line 697-703: The function sync_role_update currently accepts an unused
HttpRequest parameter named req; remove the unused parameter from the signature
(change sync_role_update(req: HttpRequest, ...) to sync_role_update(name:
String, privileges: Vec<DefaultPrivilege>, tenant_id: &str)) and update every
call site to stop passing an HttpRequest, or alternatively rename it to _req to
silence the unused warning if callers cannot be changed; ensure the function
signature and any trait impls or tests referencing sync_role_update are updated
consistently.
- Around line 654-657: The function signature for
sync_password_reset_with_ingestors currently accepts an unused HttpRequest
parameter (req); either remove the unused parameter from the signature and all
call sites (update any invocations of sync_password_reset_with_ingestors) or use
req inside the function to extract and propagate tenant/context info (e.g., read
tenant header or extractor used elsewhere) and forward that context to any
downstream calls; update the function signature and callers consistently and
adjust any RBAC or tenant-related logic to use the extracted context if you
choose to keep req.
- Around line 593-598: post_user currently constructs the user with
user::User::new_basic(username.clone(), None) which drops tenant context; change
the call to pass the extracted tenant_id (e.g.,
user::User::new_basic(username.clone(), Some(tenant_id.clone()))) so the tenant
is preserved when syncing to ingestors/queriers. Ensure the tenant_id variable
extracted earlier in post_user is used and cloned as needed; the User::new_basic
call is the only change required to match the pattern used in rbac.rs.

In `@src/handlers/http/modal/query/querier_rbac.rs`:
- Line 163: The call to sync_user_deletion_with_ingestors(&userid).await? omits
tenant context so ingestors' delete_user reads tenant_id from the incoming
request and may delete from the wrong tenant; update the querier's delete_user
to pass the correct tenant_id into sync_user_deletion_with_ingestors (e.g.
sync_user_deletion_with_ingestors(&tenant_id, &userid).await?) and modify the
ingestor request builder inside sync_user_deletion_with_ingestors to include
tenant_id (preferably as a dedicated HTTP header like "X-Tenant-ID" or an
explicit query parameter) so the ingestor_rbac::delete_user can unambiguously
target the correct tenant.

In `@src/handlers/http/role.rs`:
- Around line 176-190: Remove the leftover commented-out match block and its
surrounding commented lines so only the active let-chains code remains: keep the
existing assignment to res using DEFAULT_ROLE.read().unwrap().get(tenant_id)
with the let-chain and serde_json::Value variants, and delete the old commented
match example that references DEFAULT_ROLE and role to avoid clutter and stale
code.
- Around line 162-168: The code updates the in-memory DEFAULT_ROLE via
DEFAULT_ROLE.write().unwrap() before calling put_metadata, risking inconsistency
if persistence fails and risking panic on lock poisoning; change the order to
call await put_metadata(&metadata, &tenant_id) first and only on Ok update
DEFAULT_ROLE, and replace write().unwrap() with proper error handling (e.g.,
.write().map_err(|e| …) or .write().expect("failed to acquire DEFAULT_ROLE write
lock") or propagate a mapped error) when inserting the tenant key
(tenant_id.map_or(DEFAULT_TENANT, |v| v).to_owned()) and value Some(name) to
ensure no panic and consistency between store and memory.

In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 487-510: In get_dashboards, the code currently overwrites the
HashMap entry for a tenant each time a new user's dashboards are inserted
(dashboards.insert(tenant.to_owned(), dashboard_bytes)), so change it to
accumulate/merge dashboard_bytes into the existing Vec for that tenant: ensure
you normalize empty tenant to DEFAULT_TENANT before using it, then use
dashboards.entry(tenant.to_owned()).or_insert_with(Vec::new) and extend that Vec
with dashboard_bytes so all users' dashboards for the tenant are preserved
instead of replaced.
- Around line 1229-1244: The code that builds streams from resp.common_prefixes
(using flat_map(|path| path.parts()) and mapping to strings) doesn't remove the
tenant prefix when tenant_id is provided; adjust the logic in the same block
that constructs streams (referencing resp, common_prefixes, path.parts(),
streams, and tenant_id) so that if tenant_id.is_some() you first strip the
"{tenant_id}/" prefix from each path (or only take the last non-empty path
component after splitting) before mapping to a stream name, then apply the
existing filters; this ensures the tenant segment is not included in the
resulting stream names.

In `@src/parseable/mod.rs`:
- Around line 1184-1191: The current list_tenants method silently returns None
when tenants.as_ref().read() fails, masking poisoned lock errors; change the
read() call to unwrap/expect (e.g., self.tenants.as_ref().read().expect("tenants
lock poisoned")) so the function panics consistently on lock poisoning and then
return the cloned Vec<String> (remove the None return branch), preserving the
existing clone and Some(...) return behavior.

In `@src/prism/logstream/mod.rs`:
- Around line 256-260: In get_datasets (around get_tenant_id_from_key and the
call to PARSEABLE.streams.list), remove the debug log call
tracing::warn!(get_datasets_streams=?self.streams); so the method no longer
emits debug/warn output; simply keep the tenant lookup and streams population
logic (self.streams = PARSEABLE.streams.list(&tenant_id)) and delete the
tracing::warn! line.
- Around line 66-70: Remove the debug tracing statements in
src/prism/logstream/mod.rs by deleting the three tracing::warn! calls ("starting
dataset info", "got info", and "got schema") that surround the lines assigning
let info = info?; and let schema = schema?; so production code no longer
contains those temporary debug logs; keep the info and schema assignments intact
and ensure compilation (no unused import of tracing) after removal.
- Around line 115-116: The debug logging call inside stats::get_current_stats
(used here as get_current_stats) should be removed to avoid noisy debug output;
open the get_current_stats implementation and delete any tracing::debug! /
println! / log debug statements (or guard them behind a feature flag if needed),
ensuring the function still returns the stats value unchanged and that
tracing::warn!("starting stats") and the let stats =
stats::get_current_stats(...) call remain as-is.
- Around line 338-348: In build_dataset_response remove the temporary debug
traces: delete the tracing::warn!("got counts"); and
tracing::warn!(prism_logstream_res=?res); lines so the function only constructs
and returns PrismDatasetResponse (res) without emitting development logs; ensure
no other residual debug-only tracing remains in the build_dataset_response code
path.
- Around line 297-303: In process_stream (function process_stream) remove the
debug/tracing logs: delete the tracing::warn!("not authorized for datasets") and
the tracing::warn!("unable to load stream {stream} for tenant {tenant_id:?}")
calls so the function no longer emits those debug warnings; preserve the
existing control flow (the return Ok(None) and the check_or_load_stream call and
its conditional behavior) but eliminate the two tracing::warn invocations (or
replace them with trace-level logging if you prefer quieter diagnostics) to
satisfy the "remove debug logging" request.

In `@src/tenants/mod.rs`:
- Around line 51-55: TenantOverview currently duplicates suspended_services
(stored separately and inside StorageMetadata.meta), causing state divergence
when suspend_service/resume_service update only the HashSet; remove the
duplication by deleting the suspended_services field from TenantOverview and
update insert_tenant to store only meta (no cloned suspensions), then update
suspend_service and resume_service to mutate
StorageMetadata.meta.suspended_services (or helper methods on TenantOverview
that forward to meta) so get_tenants persists the correct suspension state;
update any accessors that relied on the old suspended_services field to read
from meta.suspended_services instead.

In `@src/users/dashboards.rs`:
- Around line 255-267: The code path using dashboards.get_mut(tenant) skips
creation when the tenant key is missing, causing silent no-ops; change to obtain
a mutable bucket with dashboards.entry(tenant).or_default() (or equivalent) so a
Vec is created when missing, then perform the duplicate check using that bucket,
call self.save_dashboard(dashboard, tenant_id).await? and push the dashboard
into the bucket; ensure you still return Err(DashboardError::Metadata(...)) on
duplicate and Ok(()) on success.

In `@src/utils/mod.rs`:
- Around line 79-85: The get_tenant_id_from_request function currently calls
to_str().unwrap() which can panic on non-UTF8 header values; change it to handle
the Result returned by to_str() safely (e.g., use map_ok/ok or match) and return
None when to_str() fails instead of unwrapping, preserving the existing
signature; update the branch that extracts tenant_value to call
tenant_value.to_str().ok().map(|s| s.to_owned()) or equivalent error-safe logic
so malformed header bytes do not cause a panic.

♻️ Duplicate comments (29)

src/users/filters.rs (1)
128-133: Filter silently dropped when tenant bucket doesn't exist.

This is a duplicate of the past review comment. The update method only modifies existing tenant buckets via get_mut(). If the tenant entry doesn't exist in the map (e.g., when adding the first filter for a new tenant after a server restart), the filter will be silently dropped without insertion.
🐛 Proposed fix using entry API
     pub async fn update(&self, filter: &Filter, tenant_id: &Option<String>) {
         let mut s = self.0.write().await;
-        if let Some(filters) = s.get_mut(tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v)) {
-            filters.retain(|f| f.filter_id != filter.filter_id);
-            filters.push(filter.clone());
-        }
+        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v.as_str());
+        let filters = s.entry(tenant.to_owned()).or_default();
+        filters.retain(|f| f.filter_id != filter.filter_id);
+        filters.push(filter.clone());
     }
src/correlation.rs (3)
140-144: Correlation silently not added to memory when tenant bucket doesn't exist.

This is a duplicate of the past review comment. In create(), if the tenant bucket doesn't exist in the in-memory map, the correlation is persisted to metastore but not added to the in-memory cache.
🐛 Proposed fix using entry API
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
         // Update in memory
-        if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(correlation.id.to_owned(), correlation.clone());
-        }
+        self.write()
+            .await
+            .entry(tenant.to_owned())
+            .or_default()
+            .insert(correlation.id.to_owned(), correlation.clone());
176-183: Same issue: update() silently fails when tenant bucket doesn't exist.

This is a duplicate of the past review comment. Apply the same fix using the entry API.
🐛 Proposed fix
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
         // Update in memory
-        if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(
-                updated_correlation.id.to_owned(),
-                updated_correlation.clone(),
-            );
-        }
+        self.write()
+            .await
+            .entry(tenant.to_owned())
+            .or_default()
+            .insert(updated_correlation.id.to_owned(), updated_correlation.clone());
204-211: Critical bug: remove operates on wrong map level, corrupts correlation store.

This is a duplicate of the past review comment. Line 211 calls self.write().await.remove(&correlation.id) which removes an entry from the outer HashMap<String, CorrelationMap> using correlation.id as the key. This is incorrect—it should remove the correlation from the inner CorrelationMap for the specific tenant. As written, this could delete an unrelated tenant's data (if a tenant_id happens to match a correlation_id) or silently fail.
🐛 Proposed fix
         // Delete from storage
         PARSEABLE
             .metastore
             .delete_correlation(&correlation, tenant_id)
             .await?;

         // Delete from memory
-        self.write().await.remove(&correlation.id);
+        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v.as_str());
+        if let Some(corrs) = self.write().await.get_mut(tenant) {
+            corrs.remove(&correlation.id);
+        }

         Ok(())
src/query/stream_schema_provider.rs (2)
529-534: Reduce logging level from warn to debug or trace.

This logging runs on every table scan and will flood production logs. This appears to be debug instrumentation.
♻️ Suggested fix
-        tracing::warn!(
+        tracing::debug!(
             "entered scan with\ntenant- {:?}\nschema- {:?}\nstream- {}",
             self.tenant_id,
             self.schema,
             self.stream
         );
639-648: Same issues: commented tenant URL code, unwrap, and noisy logging.

Lines 639-643 have commented-out tenant-aware URL logic (same pattern as hot tier). Line 645 has warn! logging that should be trace!. Line 648 has ObjectStoreUrl::parse(...).unwrap() that can panic.
♻️ Combined fix
-        // let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
-        //     glob_storage.store_url().join(tenant_id).unwrap()
-        // } else {
-        //     glob_storage.store_url()
-        // };
         let object_store_url = glob_storage.store_url();
-        tracing::warn!(object_store_url=?object_store_url);
+        tracing::trace!(object_store_url=?object_store_url);
         self.create_parquet_physical_plan(
             &mut execution_plans,
-            ObjectStoreUrl::parse(object_store_url).unwrap(),
+            ObjectStoreUrl::parse(&object_store_url)
+                .map_err(|e| DataFusionError::Plan(format!("Invalid object store URL: {e}")))?,
src/hottier.rs (3)
794-800: create_pstats_hot_tier should use INTERNAL_STREAM_HOT_TIER_SIZE_BYTES like pmeta.

Both pstats and pmeta are internal streams, but pstats allocates MIN_STREAM_HOT_TIER_SIZE_BYTES (10 GiB) while pmeta allocates INTERNAL_STREAM_HOT_TIER_SIZE_BYTES (10 MiB). For consistency, pstats should use the internal stream constant.
♻️ Proposed fix
                     let mut stream_hot_tier = StreamHotTier {
                         version: Some(CURRENT_HOT_TIER_VERSION.to_string()),
-                        size: MIN_STREAM_HOT_TIER_SIZE_BYTES,
+                        size: INTERNAL_STREAM_HOT_TIER_SIZE_BYTES,
                         used_size: 0,
-                        available_size: MIN_STREAM_HOT_TIER_SIZE_BYTES,
+                        available_size: INTERNAL_STREAM_HOT_TIER_SIZE_BYTES,
                         oldest_date_time_entry: None,
                     };
105-116: Logic error: stream/tenant exclusion uses incorrect boolean logic.

The condition at lines 108-109 uses && which means a stream is only skipped if BOTH conditions are true independently. The intent is to skip when the stream AND tenant both match the current ones. Current logic incorrectly includes streams when either condition fails.
🐛 Proposed fix
             for stream in PARSEABLE.streams.list(&tenant_id) {
                 if self.check_stream_hot_tier_exists(&stream, &tenant_id)
-                    && stream != current_stream
-                    && tenant_id != *current_tenant_id
+                    && !(stream == current_stream && tenant_id == *current_tenant_id)
                 {
595-603: Avoid panics in check_stream_hot_tier_exists.

hot_tier_file_path(...).unwrap() can crash on path conversion errors. This is an existence check and should be best-effort, returning false on errors.
🐛 Proposed fix
     pub fn check_stream_hot_tier_exists(&self, stream: &str, tenant_id: &Option<String>) -> bool {
-        let path = self.hot_tier_file_path(stream, tenant_id).unwrap();
-        PathBuf::from(path.to_string()).exists()
+        match self.hot_tier_file_path(stream, tenant_id) {
+            Ok(path) => PathBuf::from(path.to_string()).exists(),
+            Err(err) => {
+                warn!("Failed to resolve hot tier path for stream={stream} tenant={tenant_id:?}: {err}");
+                false
+            }
+        }
     }
src/handlers/http/modal/ingest/mod.rs (1)
27-37: Private fields lack accessors for external usage.

The SyncRole struct has private fields (privileges, tenant_id), but if ingestor_role.rs needs to access these fields directly, either make them public or add getter methods.
♻️ Option 1: Make fields public
 #[derive(Deserialize, Serialize)]
 pub struct SyncRole {
-    privileges: Vec<DefaultPrivilege>,
-    tenant_id: String
+    pub privileges: Vec<DefaultPrivilege>,
+    pub tenant_id: String,
 }
♻️ Option 2: Add getter methods
 impl SyncRole {
     pub fn new(privileges: Vec<DefaultPrivilege>, tenant_id: String) -> Self {
         Self { privileges, tenant_id }
     }
+
+    pub fn privileges(&self) -> &[DefaultPrivilege] {
+        &self.privileges
+    }
+
+    pub fn tenant_id(&self) -> &str {
+        &self.tenant_id
+    }
 }
src/handlers/http/modal/query/querier_logstream.rs (1)

73-75: Critical: delete_stream still lacks tenant context.

The delete_stream call doesn't include tenant_id, while all surrounding operations (stream existence check, local directory cleanup, hot tier deletion, in-memory cleanup) properly use tenant context. Storage paths are tenant-scoped, so deleting only by stream_name could delete data belonging to other tenants.

This needs the same tenant-scoping treatment as other operations in this function.

src/alerts/alert_types.rs (1)

91-123: Auth credential extraction remains incomplete.

This is a known work-in-progress issue. The complex logic to find an admin user for the tenant (lines 91-105) identifies a user but the credential extraction (lines 107-118) always returns None for both Native and OAuth user types. This means execute_alert_query will receive None for auth_token, which may cause remote alert queries in Prism mode to fail authentication.

src/handlers/http/middleware.rs (2)

167-177: Handle potential panic from HeaderValue::from_str().unwrap().

If tenant_id contains characters not valid in HTTP headers (e.g., control characters), HeaderValue::from_str() will return an error and unwrap() will panic. Consider handling this gracefully.

304-315: Security: Consider stricter handling when tenant doesn't exist.

The check_suspension function returns Authorized when:

No tenant header is present (line 314)

Tenant doesn't exist in TENANT_METADATA (line 310-312)

This could allow requests to bypass tenant-level controls. The empty else branch with the "tenant does not exist" comment suggests this needs further handling.
src/handlers/http/modal/ingest/ingestor_role.rs (1)
46-52: Inverted tenant validation logic still present.

The condition on line 48 checks if the request tenant matches the payload tenant (req_tenant.eq(&sync_req.tenant_id)), but the error message indicates this should block cross-tenant operations. The second condition should check for a mismatch (ne) to prevent non-super-admin users from creating roles for other tenants.
🐛 Proposed fix
-    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.eq(&sync_req.tenant_id)) {
+    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.ne(&sync_req.tenant_id)) {
         return Err(RoleError::Anyhow(anyhow::Error::msg(
             "non super-admin user trying to create role for another tenant",
         )));
     }
src/handlers/http/modal/ingest/ingestor_rbac.rs (1)
51-57: Inverted tenant validation logic.

Same issue as in ingestor_role.rs: the condition checks for equality when it should check for inequality. This will reject valid same-tenant operations instead of cross-tenant operations.
🐛 Proposed fix
         if req_tenant.ne(DEFAULT_TENANT)
-            && (req_tenant.eq(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))
+            && (req_tenant.ne(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))
         {
             return Err(RBACError::Anyhow(anyhow::Error::msg(
                 "non super-admin user trying to create user for another tenant",
             )));
         }
src/migration/mod.rs (2)
168-170: Early return on list_streams failure prevents migration of remaining tenants.

If list_streams fails for one tenant, the ? operator causes an early return, skipping migration for all subsequent tenants. This should handle errors per-tenant to allow other tenants to proceed.
🐛 Suggested fix
     for tenant_id in tenants {
         // Get all stream names
-        let stream_names = PARSEABLE.metastore.list_streams(&tenant_id).await?;
+        let stream_names = match PARSEABLE.metastore.list_streams(&tenant_id).await {
+            Ok(names) => names,
+            Err(e) => {
+                warn!("Failed to list streams for tenant {:?}: {:?}", tenant_id, e);
+                continue;
+            }
+        };
490-498: Use PARSEABLE_METADATA_FILE_NAME constant instead of hardcoded string.

Lines 495 and 497 use the hardcoded string ".parseable.json" while other parts of the codebase use the PARSEABLE_METADATA_FILE_NAME constant. This inconsistency could cause path mismatches.
🐛 Suggested fix
     let path = if let Some(tenant) = tenant_id.as_ref() {
         config
             .options
             .staging_dir()
             .join(tenant)
-            .join(".parseable.json")
+            .join(PARSEABLE_METADATA_FILE_NAME)
     } else {
-        config.options.staging_dir().join(".parseable.json")
+        config.options.staging_dir().join(PARSEABLE_METADATA_FILE_NAME)
     };
src/handlers/http/cluster/mod.rs (1)

376-381: Tenant context not propagated to ingestors during stream synchronization.

The tenant_id parameter is commented out (line 380). Stream sync requests to ingestors won't include tenant context, breaking tenant isolation.
src/handlers/http/modal/query/querier_rbac.rs (1)
79-79: User created without tenant association.

User::new_basic is called with None for the tenant parameter, but tenant_id is available from the request. New users won't be associated with their tenant.
-    let (user, password) = user::User::new_basic(username.clone(), None);
+    let (user, password) = user::User::new_basic(username.clone(), tenant_id.clone());
src/handlers/http/oidc.rs (2)
130-159: Cluster sync should check HTTP response status.

The for_each_live_node call sends login sync requests but doesn't verify that the remote node actually accepted them. reqwest::send() succeeds even on 4xx/5xx responses, so failed syncs go undetected.
Proposed fix
                     async move {
-                        INTRA_CLUSTER_CLIENT
+                        let resp = INTRA_CLUSTER_CLIENT
                             .post(url)
                             .header(header::AUTHORIZATION, node.token)
                             .header(header::CONTENT_TYPE, "application/json")
                             .json(&json!(
                                 {
                                     "sessionCookie": _session,
                                     "user": _user,
                                     "expiry": EXPIRY_DURATION
                                 }
                             ))
                             .send()
                             .await?;
+                        resp.error_for_status()?;
                         Ok::<(), anyhow::Error>(())
                     }
325-326: Address incomplete tenant implementation in OIDC user creation.

New OAuth users are created without tenant association (None passed to put_user), despite tenant_id being extracted at line 229 and used for existing user lookups. This breaks multi-tenant isolation for new OIDC users.

Replace None with the extracted tenant_id, or add a TODO with tracking issue if intentional WIP:
-        // LET TENANT BE NONE FOR NOW!!!
-        (None, roles) => put_user(&user_id, roles, user_info, bearer, None).await?,
+        (None, roles) => put_user(&user_id, roles, user_info, bearer, tenant_id.clone()).await?,
src/prism/logstream/mod.rs (1)
71-73: Critical: Stats are hardcoded to default values.

The actual stats result is commented out and replaced with QueriedStats::default(). This breaks the stats functionality entirely and appears to be debugging code left in.
-    // let stats = stats?;
-    let stats = QueriedStats::default();
-    tracing::warn!("got FAKE stats");
+    let stats = stats?;
src/query/mod.rs (1)
191-205: Log schema registration errors instead of silently ignoring them.

Lines 191 and 201 silently drop errors from catalog.register_schema(...) with let _ = .... Failed schema registration at startup may cause query failures that are difficult to diagnose.
Suggested fix
                     // tracing::warn!("registering_schema- {schema_provider:?}\nwith tenant- {t}");
-                    let _ = catalog.register_schema(t, schema_provider);
+                    if let Err(e) = catalog.register_schema(t, schema_provider) {
+                        tracing::error!("Failed to register schema for tenant {}: {:?}", t, e);
+                    }
                     // tracing::warn!("result=> {r:?}");
src/alerts/mod.rs (1)
1246-1255: Don't silently drop alert updates when the tenant bucket doesn't exist.

update() only inserts if get_mut(tenant) returns Some(_). For first-time tenants or races with initialization, this silently loses writes.
Proposed fix
     async fn update(&self, alert: &dyn AlertTrait) {
-        let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v);
-        if let Some(alerts) = self.alerts.write().await.get_mut(tenant) {
-            alerts.insert(*alert.get_id(), alert.clone_box());
-        }
+        let tenant = alert.get_tenant_id().await.as_ref().map_or(DEFAULT_TENANT, |v| v).to_owned();
+        let mut guard = self.alerts.write().await;
+        guard
+            .entry(tenant)
+            .or_default()
+            .insert(alert.get_id().await, alert.clone_box().await);
     }
src/parseable/mod.rs (3)

1057-1076: TOCTOU race condition persists.

This issue was previously flagged. The existence check (line 1066) uses a read lock, but the insertion (line 1071) acquires a separate write lock. Another thread could add the same tenant between these operations.

1116-1144: Incomplete tenant deletion persists.

This issue was previously flagged. The method removes the tenant from TENANT_METADATA, users, and roles, but does not remove it from self.tenants. This leaves list_tenants() returning the deleted tenant.

1146-1182: Incomplete logic and silent lock failure persist.

This issue was previously flagged:

Empty else if !is_multi_tenant { } block at lines 1168-1169

Returns Ok(None) on write lock failure at lines 1179-1180 instead of propagating the error

src/metastore/metastore_traits.rs (1)

108-112: tenant_id parameter unused in implementation.

This issue was previously flagged. The tenant_id parameter is accepted in the trait signature but the implementation in object_store_metastore.rs (line 328) calls alert_state_json_path(*alert_id) without tenant context, breaking tenant isolation for alert state storage.

coderabbitai · 2026-01-15T09:58:09Z

src/handlers/http/cluster/mod.rs

+// forward the create user request to all ingestors and queriers to keep them in sync
+pub async fn sync_user_creation(
    user: User,
    role: &Option<HashSet<String>>,
+    // tenant_id: &str
 ) -> Result<(), RBACError> {


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find User struct definition and check its fields rg -n "pub struct User" --type rust -A 30

Repository: parseablehq/parseable

Length of output: 11209

🏁 Script executed:

# Check how User is being used in sync_user_creation function fd "mod.rs" --path "*cluster*" --exec cat -n {} \; | head -100

Repository: parseablehq/parseable

Length of output: 280

🏁 Script executed:

# Check for Serialize implementation for User rg -n "impl.*Serialize" --type rust | grep -i user

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Get the full sync_user_creation function implementation sed -n '593,660p' src/handlers/http/cluster/mod.rs

Repository: parseablehq/parseable

Length of output: 2044

🏁 Script executed:

# Check where sync_user_creation is called from rg -n "sync_user_creation" --type rust -B 3 -A 3

Repository: parseablehq/parseable

Length of output: 1706

🏁 Script executed:

# Check the context around the sync_user_creation call in querier_rbac.rs sed -n '70,95p' src/handlers/http/modal/query/querier_rbac.rs

Repository: parseablehq/parseable

Length of output: 835

🏁 Script executed:

# Verify that tenant field is serialized by checking the User struct derivations rg -n "^pub struct User" --type rust -B 3 -A 15

Repository: parseablehq/parseable

Length of output: 7487

🏁 Script executed:

# Check broader context of the create user function sed -n '50,95p' src/handlers/http/modal/query/querier_rbac.rs

Repository: parseablehq/parseable

Length of output: 1489

🏁 Script executed:

# Find all other calls to sync_user_creation to see if this pattern is consistent rg -n "sync_user_creation" --type rust -B 5 -A 1

Repository: parseablehq/parseable

Length of output: 1656

🏁 Script executed:

# Get more context around the create user function sed -n '40,100p' src/handlers/http/modal/query/querier_rbac.rs

Repository: parseablehq/parseable

Length of output: 1868

🏁 Script executed:

# Check if new_basic is being called correctly elsewhere rg -n "new_basic" --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 1105

Pass the extracted tenant_id when creating the user.

The User struct does contain a tenant field, and it's properly serialized. However, in post_user (line 79 of querier_rbac.rs), the user is created with new_basic(username.clone(), None) instead of new_basic(username.clone(), tenant_id.clone()). The tenant_id is extracted from the request on line 52 but never used.

This means the user will be synced to ingestors/queriers with a null tenant, causing them to lose tenant context. The correct pattern is already implemented in src/handlers/http/rbac.rs (line 148), which passes the tenant. Update line 79 to:

let (user, password) = user::User::new_basic(username.clone(), Some(tenant_id.clone()));

🤖 Prompt for AI Agents

In `@src/handlers/http/cluster/mod.rs` around lines 593 - 598, post_user currently constructs the user with user::User::new_basic(username.clone(), None) which drops tenant context; change the call to pass the extracted tenant_id (e.g., user::User::new_basic(username.clone(), Some(tenant_id.clone()))) so the tenant is preserved when syncing to ingestors/queriers. Ensure the tenant_id variable extracted earlier in post_user is used and cloned as needed; the User::new_basic call is the only change required to match the pattern used in rbac.rs.

coderabbitai · 2026-01-15T09:58:09Z

src/handlers/http/cluster/mod.rs

+pub async fn sync_password_reset_with_ingestors(
+    req: HttpRequest,
+    username: &str,
+) -> Result<(), RBACError> {


⚠️ Potential issue | 🟡 Minor

Unused req parameter in sync_password_reset_with_ingestors.

The HttpRequest parameter is accepted but never used in the function body. Either remove it or use it to extract and propagate tenant context.

pub async fn sync_password_reset_with_ingestors( - req: HttpRequest, username: &str, ) -> Result<(), RBACError> {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub async fn sync_password_reset_with_ingestors(

req: HttpRequest,

username: &str,

) -> Result<(), RBACError> {

pub async fn sync_password_reset_with_ingestors(

username: &str,

) -> Result<(), RBACError> {

🤖 Prompt for AI Agents

In `@src/handlers/http/cluster/mod.rs` around lines 654 - 657, The function signature for sync_password_reset_with_ingestors currently accepts an unused HttpRequest parameter (req); either remove the unused parameter from the signature and all call sites (update any invocations of sync_password_reset_with_ingestors) or use req inside the function to extract and propagate tenant/context info (e.g., read tenant header or extractor used elsewhere) and forward that context to any downstream calls; update the function signature and callers consistently and adjust any RBAC or tenant-related logic to use the extracted context if you choose to keep req.

coderabbitai · 2026-01-15T09:58:09Z

src/handlers/http/cluster/mod.rs

+// forward the put role request to all ingestors and queriers to keep them in sync
+pub async fn sync_role_update(
+    req: HttpRequest,
    name: String,
    privileges: Vec<DefaultPrivilege>,
+    tenant_id: &str,
 ) -> Result<(), RoleError> {


⚠️ Potential issue | 🟡 Minor

Unused req parameter in sync_role_update.

The HttpRequest parameter is accepted but not used. The tenant_id is already passed as a separate parameter.

pub async fn sync_role_update( - req: HttpRequest, name: String, privileges: Vec<DefaultPrivilege>, tenant_id: &str, ) -> Result<(), RoleError> {

🤖 Prompt for AI Agents

In `@src/handlers/http/cluster/mod.rs` around lines 697 - 703, The function sync_role_update currently accepts an unused HttpRequest parameter named req; remove the unused parameter from the signature (change sync_role_update(req: HttpRequest, ...) to sync_role_update(name: String, privileges: Vec<DefaultPrivilege>, tenant_id: &str)) and update every call site to stop passing an HttpRequest, or alternatively rename it to _req to silence the unused warning if callers cannot be changed; ensure the function signature and any trait impls or tests referencing sync_role_update are updated consistently.

src/handlers/http/modal/query/querier_rbac.rs

coderabbitai · 2026-01-15T09:58:10Z

src/prism/logstream/mod.rs

+            tracing::warn!("not authorized for datasets");
            return Ok(None);
        }

        // Skip streams that don't exist
-        if !PARSEABLE.check_or_load_stream(&stream).await {
+        if !PARSEABLE.check_or_load_stream(&stream, tenant_id).await {
+            tracing::warn!("unable to load stream {stream} for tenant {tenant_id:?}");


🛠️ Refactor suggestion | 🟠 Major

Remove debug logging in process_stream.

if !self.is_authorized(&stream, &key) { - tracing::warn!("not authorized for datasets"); return Ok(None); } // Skip streams that don't exist if !PARSEABLE.check_or_load_stream(&stream, tenant_id).await { - tracing::warn!("unable to load stream {stream} for tenant {tenant_id:?}"); return Ok(None); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tracing::warn!("not authorized for datasets");

return Ok(None);

}

// Skip streams that don't exist

if !PARSEABLE.check_or_load_stream(&stream).await {

if !PARSEABLE.check_or_load_stream(&stream, tenant_id).await {

tracing::warn!("unable to load stream {stream} for tenant {tenant_id:?}");

if !self.is_authorized(&stream, &key) {

return Ok(None);

}

// Skip streams that don't exist

if !PARSEABLE.check_or_load_stream(&stream, tenant_id).await {

return Ok(None);

}

🤖 Prompt for AI Agents

In `@src/prism/logstream/mod.rs` around lines 297 - 303, In process_stream (function process_stream) remove the debug/tracing logs: delete the tracing::warn!("not authorized for datasets") and the tracing::warn!("unable to load stream {stream} for tenant {tenant_id:?}") calls so the function no longer emits those debug warnings; preserve the existing control flow (the return Ok(None) and the check_or_load_stream call and its conditional behavior) but eliminate the two tracing::warn invocations (or replace them with trace-level logging if you prefer quieter diagnostics) to satisfy the "remove debug logging" request.

coderabbitai · 2026-01-15T09:58:10Z

src/prism/logstream/mod.rs

+        tracing::warn!("got counts");
+        let res = PrismDatasetResponse {
            stream,
            info: info.info,
            schema: info.schema,
            stats: info.stats,
            retention: info.retention,
            counts,
-        })
+        };
+        tracing::warn!(prism_logstream_res=?res);
+        Ok(res)


🛠️ Refactor suggestion | 🟠 Major

Remove debug logging in build_dataset_response.

let counts = self.get_counts(&stream, tenant_id).await?; - tracing::warn!("got counts"); let res = PrismDatasetResponse { stream, info: info.info, schema: info.schema, stats: info.stats, retention: info.retention, counts, }; - tracing::warn!(prism_logstream_res=?res); Ok(res)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tracing::warn!("got counts");

let res = PrismDatasetResponse {

stream,

info: info.info,

schema: info.schema,

stats: info.stats,

retention: info.retention,

counts,

})

};

tracing::warn!(prism_logstream_res=?res);

Ok(res)

let res = PrismDatasetResponse {

stream,

info: info.info,

schema: info.schema,

stats: info.stats,

retention: info.retention,

counts,

};

Ok(res)

🤖 Prompt for AI Agents

In `@src/prism/logstream/mod.rs` around lines 338 - 348, In build_dataset_response remove the temporary debug traces: delete the tracing::warn!("got counts"); and tracing::warn!(prism_logstream_res=?res); lines so the function only constructs and returns PrismDatasetResponse (res) without emitting development logs; ensure no other residual debug-only tracing remains in the build_dataset_response code path.

coderabbitai · 2026-01-15T09:58:10Z

src/tenants/mod.rs

+    pub fn insert_tenant(&self, tenant_id: String, meta: StorageMetadata) {
+        let suspensions = meta.suspended_services.clone().unwrap_or_default();
+        self.tenants
+            .insert(tenant_id, TenantOverview { suspended_services: suspensions, meta });
+    }


⚠️ Potential issue | 🟠 Major

Potential data duplication in tenant storage.

suspended_services is stored both in TenantOverview.suspended_services and within TenantOverview.meta.suspended_services. When suspend_service or resume_service is called, only the HashSet is updated, leaving meta.suspended_services stale.

If get_tenants() returns meta for persistence, the suspension state changes won't be reflected.

🐛 Option 1: Sync meta when suspending/resuming

pub fn suspend_service(&self, tenant_id: &str, service: Service) { if let Some(mut tenant) = self.tenants.get_mut(tenant_id) { tenant.suspended_services.insert(service.clone()); + // Keep meta in sync + let services = tenant.meta.suspended_services.get_or_insert_with(HashSet::new); + services.insert(service); } }

♻️ Option 2: Remove duplication, derive from meta

pub struct TenantOverview { - suspended_services: HashSet<Service>, meta: StorageMetadata, } +// Access suspended_services through meta +impl TenantOverview { + fn suspended_services(&self) -> &HashSet<Service> { + self.meta.suspended_services.as_ref().unwrap_or(&HashSet::new()) + } +}

🤖 Prompt for AI Agents

In `@src/tenants/mod.rs` around lines 51 - 55, TenantOverview currently duplicates suspended_services (stored separately and inside StorageMetadata.meta), causing state divergence when suspend_service/resume_service update only the HashSet; remove the duplication by deleting the suspended_services field from TenantOverview and update insert_tenant to store only meta (no cloned suspensions), then update suspend_service and resume_service to mutate StorageMetadata.meta.suspended_services (or helper methods on TenantOverview that forward to meta) so get_tenants persists the correct suspension state; update any accessors that relied on the old suspended_services field to read from meta.suspended_services instead.

coderabbitai · 2026-01-15T09:58:10Z

src/users/dashboards.rs

+        if let Some(dbs) = dashboards.get_mut(tenant) {
+            let has_duplicate = dbs
+                .iter()
+                .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id);
+            if has_duplicate {
+                return Err(DashboardError::Metadata("Dashboard title must be unique"));
+            }
+            self.save_dashboard(dashboard, tenant_id).await?;

-        if has_duplicate {
-            return Err(DashboardError::Metadata("Dashboard title must be unique"));
+            dbs.push(dashboard.clone());
        }

-        self.save_dashboard(dashboard).await?;
-
-        dashboards.push(dashboard.clone());
-
        Ok(())


⚠️ Potential issue | 🔴 Critical

Dashboard creation silently fails for new tenants.

When tenant doesn't exist in the HashMap, dashboards.get_mut(tenant) returns None, causing the method to return Ok(()) without creating the dashboard. This silently fails for any tenant that hasn't had dashboards loaded yet.

Use entry().or_default() to ensure the tenant bucket exists:

🐛 Proposed fix

- if let Some(dbs) = dashboards.get_mut(tenant) { - let has_duplicate = dbs - .iter() - .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); - if has_duplicate { - return Err(DashboardError::Metadata("Dashboard title must be unique")); - } - self.save_dashboard(dashboard, tenant_id).await?; - - dbs.push(dashboard.clone()); - } + let dbs = dashboards.entry(tenant.to_owned()).or_default(); + let has_duplicate = dbs + .iter() + .any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id); + if has_duplicate { + return Err(DashboardError::Metadata("Dashboard title must be unique")); + } + self.save_dashboard(dashboard, tenant_id).await?; + dbs.push(dashboard.clone());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if let Some(dbs) = dashboards.get_mut(tenant) {

let has_duplicate = dbs

.iter()

.any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id);

if has_duplicate {

return Err(DashboardError::Metadata("Dashboard title must be unique"));

}

self.save_dashboard(dashboard, tenant_id).await?;

if has_duplicate {

return Err(DashboardError::Metadata("Dashboard title must be unique"));

dbs.push(dashboard.clone());

}

self.save_dashboard(dashboard).await?;

dashboards.push(dashboard.clone());

Ok(())

let dbs = dashboards.entry(tenant.to_owned()).or_default();

let has_duplicate = dbs

.iter()

.any(|d| d.title == dashboard.title && d.dashboard_id != dashboard.dashboard_id);

if has_duplicate {

return Err(DashboardError::Metadata("Dashboard title must be unique"));

}

self.save_dashboard(dashboard, tenant_id).await?;

dbs.push(dashboard.clone());

Ok(())

🤖 Prompt for AI Agents

In `@src/users/dashboards.rs` around lines 255 - 267, The code path using dashboards.get_mut(tenant) skips creation when the tenant key is missing, causing silent no-ops; change to obtain a mutable bucket with dashboards.entry(tenant).or_default() (or equivalent) so a Vec is created when missing, then perform the duplicate check using that bucket, call self.save_dashboard(dashboard, tenant_id).await? and push the dashboard into the bucket; ensure you still return Err(DashboardError::Metadata(...)) on duplicate and Ok(()) on success.

coderabbitai · 2026-01-15T09:58:10Z

src/utils/mod.rs

+pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> {
+    if let Some(tenant_value) = req.headers().get("tenant") {
+        Some(tenant_value.to_str().unwrap().to_owned())
+    } else {
+        None
+    }
+}


⚠️ Potential issue | 🟡 Minor

Potential panic on malformed header value.

Using .unwrap() on to_str() can panic if the tenant header contains non-UTF8 bytes. Consider handling this gracefully.

💡 Suggested fix

pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> { if let Some(tenant_value) = req.headers().get("tenant") { - Some(tenant_value.to_str().unwrap().to_owned()) + tenant_value.to_str().ok().map(|s| s.to_owned()) } else { None } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> {

if let Some(tenant_value) = req.headers().get("tenant") {

Some(tenant_value.to_str().unwrap().to_owned())

} else {

None

}

}

pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> {

if let Some(tenant_value) = req.headers().get("tenant") {

tenant_value.to_str().ok().map(|s| s.to_owned())

} else {

None

}

}

🤖 Prompt for AI Agents

In `@src/utils/mod.rs` around lines 79 - 85, The get_tenant_id_from_request function currently calls to_str().unwrap() which can panic on non-UTF8 header values; change it to handle the Result returned by to_str() safely (e.g., use map_ok/ok or match) and return None when to_str() fails instead of unwrapping, preserving the existing signature; update the branch that extracts tenant_value to call tenant_value.to_str().ok().map(|s| s.to_owned()) or equivalent error-safe logic so malformed header bytes do not cause a panic.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (16)

src/handlers/airplane.rs (2)
106-124: Flight get_schema lacks tenant context extraction.

The tenant_id is hardcoded to &None. In the HTTP query handler (see src/handlers/http/query.rs), the tenant is extracted via get_tenant_id_from_request(&req). For consistent multi-tenancy support, the Flight service should extract tenant context from gRPC request metadata.

Consider extracting tenant from request metadata:
// Example: extract tenant from gRPC metadata
let tenant_id = req.metadata()
    .get("x-p-tenant-id")
    .and_then(|v| v.to_str().ok())
    .map(String::from);
126-248: Propagate tenant context throughout do_get instead of hardcoding &None.

Multiple calls within do_get use hardcoded &None for tenant_id:

Line 202: user_auth_for_datasets(&permissions, &streams, &None)

Line 209: execute(query, false, &None)

Line 238: PARSEABLE.get_or_create_stream(&stream_name, &None).clear()

For consistent multi-tenancy, extract the tenant at the start of do_get and propagate it:
Suggested approach
 async fn do_get(&self, req: Request<Ticket>) -> Result<Response<Self::DoGetStream>, Status> {
     let key = extract_session_key(req.metadata())
         .map_err(|e| Status::unauthenticated(e.to_string()))?;
+    
+    // Extract tenant from request metadata
+    let tenant_id: Option<String> = req.metadata()
+        .get("x-p-tenant-id")
+        .and_then(|v| v.to_str().ok())
+        .map(String::from);

     // ... later in the function ...
     
-    user_auth_for_datasets(&permissions, &streams, &None)
+    user_auth_for_datasets(&permissions, &streams, &tenant_id)
     
-    let (records, _) = execute(query, false, &None)
+    let (records, _) = execute(query, false, &tenant_id)
     
-    PARSEABLE.get_or_create_stream(&stream_name, &None).clear();
+    PARSEABLE.get_or_create_stream(&stream_name, &tenant_id).clear();
src/alerts/alerts_utils.rs (1)
130-153: auth_token parameter is received but never used.

The function accepts auth_token: Option<String> but line 148 passes None to send_query_request instead of the received token. This means remote alert queries in Prism mode will always lack authentication, likely causing failures.

Additionally, there's a type mismatch: send_query_request expects Option<HeaderMap> (per the relevant snippet), but auth_token is Option<String>.
🐛 Suggested fix

Convert the auth_token to a HeaderMap before passing:
 async fn execute_remote_query(
-    auth_token: Option<String>,
+    auth_token: Option<HeaderMap>,
     query: &str,
     time_range: &TimeRange,
 ) -> Result<AlertQueryResult, AlertError> {
     // ...
-    let (result_value, _) = send_query_request(None,&query_request)
+    let (result_value, _) = send_query_request(auth_token, &query_request)
         .await
         .map_err(|err| AlertError::CustomError(format!("Failed to send query request: {err}")))?;
This also requires updating execute_alert_query to pass the correct type.
src/storage/store_metadata.rs (1)
298-320: Missing directory creation for tenant-specific staging path.

When tenant_id is provided, the code constructs a path under a tenant subdirectory (Line 306), but doesn't ensure the parent directory exists before attempting to write. This will fail with io::ErrorKind::NotFound if the tenant directory hasn't been created.

Compare with src/migration/mod.rs (lines 484-505) which has the same pattern but also doesn't create the directory. Consider adding directory creation:
🐛 Proposed fix
 pub fn put_staging_metadata(meta: &StorageMetadata, tenant_id: &Option<String>) -> io::Result<()> {
     let mut staging_metadata = meta.clone();
     staging_metadata.server_mode = PARSEABLE.options.mode;
     staging_metadata.staging = PARSEABLE.options.staging_dir().to_path_buf();
     let path = if let Some(tenant_id) = tenant_id.as_ref() {
-        PARSEABLE
+        let tenant_dir = PARSEABLE
             .options
             .staging_dir()
-            .join(tenant_id)
-            .join(PARSEABLE_METADATA_FILE_NAME)
+            .join(tenant_id);
+        std::fs::create_dir_all(&tenant_dir)?;
+        tenant_dir.join(PARSEABLE_METADATA_FILE_NAME)
     } else {
         PARSEABLE
             .options
             .staging_dir()
             .join(PARSEABLE_METADATA_FILE_NAME)
     };
src/handlers/http/modal/ingest/ingestor_rbac.rs (2)

106-116: Roles lookup should be tenant-scoped to match other operations in these handlers.

The role existence checks at lines 109 and 153 use roles().get(r), which queries a global roles map. However, all other operations in both add_roles_to_user and remove_roles_from_user are tenant-aware: user validation, role updates, and metadata operations all use tenant_id. This inconsistency may allow roles from one tenant to be validated/applied to another.

Consider using a tenant-scoped role lookup function instead of the global roles() map, or document why role validation is intentionally global while role application is tenant-scoped.

197-221: Ingest sync handler doesn't generate a new password and writes metadata prematurely.

Line 206 calls put_staging_metadata() before reading the password hash (lines 207-219), and the metadata hasn't been modified at that point. More critically, the function clones the existing user.password_hash instead of generating a new password—compare with the working implementations in rbac.rs (line 180) and querier_rbac.rs (line 313) which both call user::Basic::gen_new_password(). The sync version should generate a new password, update metadata with the new hash, and then persist the metadata.
src/catalog/mod.rs (1)
556-569: Use conditional logic to exclude empty tenant_id from path, matching patterns elsewhere in the codebase.

The partition_path function uses map_or("", |v| v) to extract the root, passing an empty string to from_iter when tenant_id is None. This differs from similar path-building functions like mttr_json_path() and alert_json_path() at lines 1240-1246 and 1205-1214 of src/storage/object_storage.rs, which use conditional logic to exclude the tenant segment entirely when absent.

Update partition_path to follow the established pattern:
Suggested approach
pub fn partition_path(
    stream: &str,
    lower_bound: DateTime<Utc>,
    upper_bound: DateTime<Utc>,
    tenant_id: &Option<String>,
) -> RelativePathBuf {
    let lower = lower_bound.date_naive().format("%Y-%m-%d").to_string();
    let upper = upper_bound.date_naive().format("%Y-%m-%d").to_string();
    if lower == upper {
        if let Some(tenant) = tenant_id.as_ref() {
            RelativePathBuf::from_iter([tenant, stream, &format!("date={lower}")])
        } else {
            RelativePathBuf::from_iter([stream, &format!("date={lower}")])
        }
    } else {
        if let Some(tenant) = tenant_id.as_ref() {
            RelativePathBuf::from_iter([tenant, stream, &format!("date={lower}:{upper}")])
        } else {
            RelativePathBuf::from_iter([stream, &format!("date={lower}:{upper}")])
        }
    }
}
src/parseable/streams.rs (1)
1200-1270: Tests won't compile due to signature change.

Multiple test functions call Stream::new with 4 arguments, but the signature now requires 5 (adding the tenant_id: &Option<String> parameter). At least 8 test functions are affected (beyond the 4 shown below), totaling 14+ Stream::new calls that need updating.

Update all test calls to include the tenant_id parameter. For tests that don't have a tenant, pass &None::<String>:
Example fix for test_staging_new_with_valid_stream
         let staging = Stream::new(
             options.clone(),
             stream_name,
             LogStreamMetadata::default(),
             None,
+            &None::<String>,
         );
src/alerts/mod.rs (2)

1039-1140: Avoid holding self.alerts.write().await across .await in load() (startup deadlock/latency risk).

load() takes a write lock (Line 1043) and then awaits migrations and channel sends (Line 1064-1127). Even if this is “startup-only”, it can still block other alert operations and is an easy footgun later.

Refactor suggestion: parse/migrate alerts into a local Vec<(tenant_id, Box<dyn AlertTrait>, should_start_task)> without holding the lock; then:

insert into self.alerts under a short write lock, and

send AlertTask::Create outside the lock.

733-759: Pass tenant context to alert query parsing functions to ensure correct schema resolution in multi-tenant setups.

Alert queries are parsed without setting the tenant's default_schema, unlike the HTTP query path (line 122–126 in src/handlers/http/query.rs), which explicitly configures it. This affects:

get_number_of_agg_exprs() / get_aggregate_projection() in src/alerts/mod.rs (validation)

execute_local_query() / execute_remote_query() in src/alerts/alerts_utils.rs (execution)

In multi-tenant, unqualified table names like FROM "stream" may resolve incorrectly or fail if the default schema differs from the tenant schema. The validate() method in src/alerts/alert_types.rs has self.tenant_id available (line 39) but doesn't pass it to parsing functions.

Suggestion: modify parsing functions to accept tenant_id: &Option<String> and set session_state.config_mut().options_mut().catalog.default_schema before calling create_logical_plan(), matching the HTTP handler pattern.

src/storage/object_storage.rs (1)

896-951: Warn-level logging in hot loop is likely too noisy.

tracing::warn!(process_parquet_files_path=?path); (Line 936) and other warn logs around per-file operations will spam logs under normal ingestion. Suggest dropping to trace!/debug! or gating behind a feature.

Also applies to: 953-995
src/parseable/mod.rs (2)
210-264: Reduce warn-level logs and confirm tenant_id validation.

check_or_load_stream() logs at warn on normal control flow (Line 254, 258). This will be noisy at scale; consider debug!/trace!.

Also, tenant ids are used to partition in-memory streams and (elsewhere) object-store paths; please ensure tenant ids are validated (no /, .., etc.) at creation/extraction time.

462-537: Bug risk: missing tenant header when syncing internal streams to ingestors.

create_internal_stream_if_not_exists() creates internal streams per tenant (Line 464-491), but sync_streams_with_ingestors(...) calls don’t include the "tenant" header (Line 508-533). Since request handlers derive tenant via get_tenant_id_from_request() (header "tenant"), this likely causes ingestors to create/sync these internal streams under the default tenant instead.
Proposed fix
             let mut header_map = HeaderMap::new();
             header_map.insert(
                 HeaderName::from_str(STREAM_TYPE_KEY).unwrap(),
                 HeaderValue::from_str(&StreamType::Internal.to_string()).unwrap(),
             );
             header_map.insert(CONTENT_TYPE, HeaderValue::from_static("application/json"));
+            if let Some(t) = tenant_id.as_deref() {
+                header_map.insert(
+                    HeaderName::from_static("tenant"),
+                    HeaderValue::from_str(t).map_err(|e| StreamError::Anyhow(e.into()))?,
+                );
+            }

             // Sync only the streams that were created successfully
src/metastore/metastores/object_store_metastore.rs (3)
342-390: The tenant_id parameter is unused in put_alert_state path construction.

Line 352 calls alert_state_json_path(id) without passing tenant_id, mirroring the same issue in get_alert_state_entry. This causes all tenants to write to the same alert state file, breaking tenant isolation and causing data corruption across tenants.
🐛 Suggested fix

Update alert_state_json_path in object_storage.rs to accept tenant_id and modify this call:
-        let path = alert_state_json_path(id);
+        let path = alert_state_json_path(id, tenant_id);
541-566: get_chats is not tenant-aware, inconsistent with other similar methods.

Unlike get_dashboards, get_filters, get_correlations, etc., the get_chats method does not iterate over tenants. It reads from a single USERS_ROOT_DIR path without tenant prefix, causing all tenants' chats to be mixed together. This breaks tenant isolation.
🐛 Suggested fix to add tenant awareness
     async fn get_chats(&self) -> Result<DashMap<String, Vec<Bytes>>, MetastoreError> {
         let all_user_chats = DashMap::new();
-
-        let users_dir = RelativePathBuf::from(USERS_ROOT_DIR);
-        for user in self.storage.list_dirs_relative(&users_dir).await? {
-            if user.starts_with(".") {
-                continue;
-            }
-            let mut chats = Vec::new();
-            let chats_path = users_dir.join(&user).join("chats");
-            let user_chats = self
-                .storage
-                .get_objects(
-                    Some(&chats_path),
-                    Box::new(|file_name| file_name.ends_with(".json")),
-                )
-                .await?;
-            for chat in user_chats {
-                chats.push(chat);
+        let base_paths = PARSEABLE.list_tenants().map_or(vec!["".into()], |v| v);
+        for tenant in base_paths {
+            let users_dir = RelativePathBuf::from_iter([&tenant, USERS_ROOT_DIR]);
+            for user in self.storage.list_dirs_relative(&users_dir).await? {
+                if user.starts_with(".") {
+                    continue;
+                }
+                let mut chats = Vec::new();
+                let chats_path = users_dir.join(&user).join("chats");
+                let user_chats = self
+                    .storage
+                    .get_objects(
+                        Some(&chats_path),
+                        Box::new(|file_name| file_name.ends_with(".json")),
+                    )
+                    .await?;
+                for chat in user_chats {
+                    chats.push(chat);
+                }
+                // Consider keying by tenant+user for proper isolation
+                all_user_chats.insert(user, chats);
             }
-
-            all_user_chats.insert(user, chats);
         }
-
         Ok(all_user_chats)
     }
Note: The return type DashMap<String, Vec<Bytes>> may also need to change to support tenant-keyed results like other methods.
852-870: Path construction issues in get_all_manifest_files when tenant_id is provided.

Two issues:

Lines 852-858: The dates extraction filters out stream_name but not the tenant_id. When tenant_id is provided, path.parts() will include the tenant name as the first component, which will incorrectly be treated as a date.

Lines 860-861: The date_path is constructed as format!("{}/{}", stream_name, &date) without including the tenant prefix, but the objects were listed under {tenant}/{stream_name}/.... This path mismatch will cause the second list_with_delimiter call to fail or return empty results.
🐛 Suggested fix
         let dates = resp
             .common_prefixes
             .iter()
             .flat_map(|path| path.parts())
-            .filter(|name| name.as_ref() != stream_name && name.as_ref() != STREAM_ROOT_DIRECTORY)
+            .filter(|name| {
+                let name_str = name.as_ref();
+                name_str != stream_name 
+                    && name_str != STREAM_ROOT_DIRECTORY
+                    && tenant_id.as_ref().map_or(true, |t| name_str != t.as_str())
+            })
             .map(|name| name.as_ref().to_string())
             .collect::<Vec<_>>();

         for date in dates {
-            let date_path = object_store::path::Path::from(format!("{}/{}", stream_name, &date));
+            let date_path = if let Some(tenant) = tenant_id {
+                object_store::path::Path::from(format!("{}/{}/{}", tenant, stream_name, &date))
+            } else {
+                object_store::path::Path::from(format!("{}/{}", stream_name, &date))
+            };
             let resp = self.storage.list_with_delimiter(Some(date_path)).await?;

🤖 Fix all issues with AI agents

In `@src/handlers/http/cluster/mod.rs`:
- Around line 1890-1895: The insertion using
HeaderValue::from_str(&querier.token).unwrap() can panic on invalid header
characters; replace the unwrap with a fallible conversion
(reqwest::header::HeaderValue::try_from(&querier.token)) and handle the Result
instead of panicking: validate querier.token before inserting into map (the
AUTHORIZATION header), and on Err return or propagate an appropriate error
(e.g., map to an HTTP 400/BadRequest or use the enclosing function's error type)
so the code around map and querier.token safely handles invalid tokens.

In `@src/metastore/metastores/object_store_metastore.rs`:
- Around line 621-661: The code currently calls unwrap() on
filter_value.as_object() after calling migrate_v1_v2, which can panic if
migration yields a non-object; replace these unwraps with safe checks: after
filter_value = migrate_v1_v2(filter_value) do an if let Some(obj) =
filter_value.as_object() and then extract user_id, filter_id, stream_name via
obj.get(...).and_then(...); only proceed to build path with filter_path, call
to_bytes and storage.put_object when all three are Some, otherwise log/skip the
malformed migrated value (or return a controlled error) instead of unwrapping to
avoid panics in the loop that processes filter_bytes and affects methods like
migrate_v1_v2, storage.delete_object, to_bytes, and storage.put_object.

In `@src/parseable/streams.rs`:
- Around line 1091-1100: The contains method logs a warning whenever a tenant is
missing, which noisily floods logs; change the tracing::warn! call inside
contains to a lower level (tracing::debug! or tracing::trace!) so missing
tenants during normal checks aren’t noisy, keeping the same message/context
(tenant_id and stream_name); update the log invocation in the contains function
(where LOCK_EXPECT and DEFAULT_TENANT are used) accordingly.
- Around line 1055-1078: The get_or_create path contains several development
debug logs using tracing::warn! (the multi-line call logging
stream_name/tenant_id/metadata/options, the "creating new stream- {stream_name}"
call, and the "inserted stream in mem" call); replace these with either removed
statements or downgrade them to tracing::trace! to avoid polluting production
logs while keeping low-verbosity traces, keeping the rest of the logic (tenant
lookup, Stream::new, guard.entry(...).or_default().insert(...), and the returned
stream) unchanged.

In `@src/query/mod.rs`:
- Around line 97-120: The add_tenant() flow registers tenant metadata with
TENANT_METADATA.insert_tenant(...) but never registers the tenant schema with
the query session; after calling TENANT_METADATA.insert_tenant(...) in
add_tenant(), call QUERY_SESSION.add_schema(tenant_id) (using the same
tenant_id) so the DataFusion catalog contains the new schema and runtime queries
for that tenant succeed.

In `@src/rbac/map.rs`:
- Line 303: The session expiration check in remove_expired_session is inverted:
sessions.retain(|(_, expiry)| expiry < &now) keeps expired sessions and drops
valid ones; update the predicate to retain only non-expired entries by using
expiry > &now (or expiry >= &now if you want to treat equal as valid) so
sessions.retain(|(_, expiry)| expiry > &now) preserves sessions that have not
yet expired.

♻️ Duplicate comments (34)

src/handlers/http/cluster/mod.rs (4)

377-382: Tenant context not propagated to ingestors during stream sync.

The tenant_id parameter is commented out, meaning stream synchronization to ingestors lacks tenant context. This breaks tenant isolation in multi-tenant deployments. The past review flagged this issue and it remains unaddressed.

594-599: Tenant context not propagated during user creation sync.

The tenant_id parameter is commented out in sync_user_creation. While the User struct contains a tenant field that will be serialized, this was flagged in a past review - the calling code in querier_rbac.rs creates users with None tenant instead of using the extracted tenant_id.

655-658: Unused req parameter in sync_password_reset_with_ingestors.

The HttpRequest parameter is accepted but never used. Either remove it or use it to extract and propagate tenant context.

698-704: Unused req parameter in sync_role_update.

The HttpRequest parameter is accepted but not used. The tenant_id is already passed as a separate parameter, making req redundant.

src/tenants/mod.rs (1)

51-67: Data duplication causes state divergence (previously flagged).

suspended_services is stored both in TenantOverview.suspended_services and within TenantOverview.meta.suspended_services. When suspend_service or resume_service is called, only the HashSet is updated, leaving meta.suspended_services stale. Since get_tenants() returns meta for persistence, suspension state changes won't be persisted correctly.

src/alerts/alert_types.rs (1)

91-123: Auth credential extraction always returns None (previously flagged).

The complex logic to find an admin user for the tenant (lines 91-105) correctly identifies a user with admin privileges, but the credential extraction (lines 107-118) always returns None:

Native user: returns None with TODO comments

OAuth user: logs warning and returns None

This means execute_alert_query will always receive None for auth_token, which will cause remote alert queries in Prism mode to fail authentication.
src/handlers/http/middleware.rs (2)
167-177: Handle potential panic from HeaderValue::from_str().unwrap().

If tenant_id contains characters not valid in HTTP headers (e.g., control characters), HeaderValue::from_str() will return an error and unwrap() will panic.
♻️ Suggested fix
         let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) {
             Ok((uid, tid)) => {
-                req.headers_mut().insert(
-                    HeaderName::from_static("tenant"),
-                    HeaderValue::from_str(&tid).unwrap(),
-                );
+                if let Ok(header_val) = HeaderValue::from_str(&tid) {
+                    req.headers_mut().insert(
+                        HeaderName::from_static("tenant"),
+                        header_val,
+                    );
+                }
                 Ok((uid, tid))
             }
             Err(e) => Err(e),
         };
304-315: Security: Consider stricter handling when tenant doesn't exist.

The check_suspension function returns Authorized when the tenant doesn't exist in TENANT_METADATA (Lines 310-312). This could allow requests to bypass tenant-level controls. The empty else branch suggests this needs further handling.
♻️ Suggested approach
 pub fn check_suspension(req: &HttpRequest, action: Action) -> rbac::Response {
     if let Some(tenant) = req.headers().get("tenant")
         && let Ok(tenant) = tenant.to_str()
     {
-        if let Ok(Some(suspension)) = TENANT_METADATA.is_action_suspended(tenant, &action) {
-            return rbac::Response::Suspended(suspension);
-        } else {
-            // tenant does not exist
+        match TENANT_METADATA.is_action_suspended(tenant, &action) {
+            Ok(Some(suspension)) => return rbac::Response::Suspended(suspension),
+            Ok(None) => {} // Tenant exists, action not suspended
+            Err(_) => {
+                tracing::warn!(tenant = tenant, "Tenant not found in metadata");
+                // Consider returning UnAuthorized for unknown tenants
+            }
         }
     }
     rbac::Response::Authorized
 }
src/correlation.rs (3)
140-144: Correlation silently not added to memory when tenant bucket doesn't exist.

In create(), if the tenant bucket doesn't exist in the in-memory map, the correlation is persisted to metastore but not added to the in-memory cache. This could cause inconsistencies.
🐛 Proposed fix using entry API
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
         // Update in memory
-        if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(correlation.id.to_owned(), correlation.clone());
-        }
+        self.write()
+            .await
+            .entry(tenant.to_owned())
+            .or_default()
+            .insert(correlation.id.to_owned(), correlation.clone());
176-183: Same issue: update() silently fails when tenant bucket doesn't exist.

Apply the same fix using the entry API to ensure the correlation is always inserted.
🐛 Proposed fix
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
         // Update in memory
-        if let Some(corrs) = self.write().await.get_mut(tenant) {
-            corrs.insert(
-                updated_correlation.id.to_owned(),
-                updated_correlation.clone(),
-            );
-        }
+        self.write()
+            .await
+            .entry(tenant.to_owned())
+            .or_default()
+            .insert(updated_correlation.id.to_owned(), updated_correlation.clone());
205-211: Critical bug: remove operates on wrong map level, deletes entire tenant's correlations.

Line 211 calls self.write().await.remove(&correlation.id) which removes an entry from the outer HashMap<String, CorrelationMap> using correlation.id as the key. This should remove from the inner CorrelationMap for the specific tenant.
🐛 Proposed fix
         // Delete from storage
         PARSEABLE
             .metastore
             .delete_correlation(&correlation, tenant_id)
             .await?;

         // Delete from memory
-        self.write().await.remove(&correlation.id);
+        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v.as_str());
+        if let Some(corrs) = self.write().await.get_mut(tenant) {
+            corrs.remove(&correlation.id);
+        }

         Ok(())
src/handlers/http/modal/ingest/ingestor_rbac.rs (1)
40-67: Inverted tenant validation logic.

The condition at Lines 51-53 checks for equality when it should check for inequality. The error message indicates this should prevent cross-tenant operations, but the current logic prevents same-tenant operations instead.
-        if req_tenant.ne(DEFAULT_TENANT)
-            && (req_tenant.eq(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))
+        if req_tenant.ne(DEFAULT_TENANT)
+            && (req_tenant.ne(user.tenant.as_ref().map_or(DEFAULT_TENANT, |v| v)))
src/handlers/http/modal/query/querier_rbac.rs (3)
79-79: User created without tenant association.

Despite extracting tenant_id from the request at line 50 and using it for metadata operations, User::new_basic is still called with None for the tenant parameter. This creates users without tenant association, breaking multi-tenant isolation.
🐛 Proposed fix
-    let (user, password) = user::User::new_basic(username.clone(), None);
+    let (user, password) = user::User::new_basic(username.clone(), tenant_id.clone());
163-163: Tenant context missing in user deletion sync.

sync_user_deletion_with_ingestors(&userid) sends only userid to ingestors without tenant context. In multi-tenant deployments, ingestors may delete users from the wrong tenant if the same userid exists across tenants.

223-223: Tenant context missing in role sync operations.

sync_users_with_roles_with_ingestors at lines 223 and 295 doesn't receive tenant_id. This follows the same pattern as the user deletion sync issue—ingestors won't know which tenant's user to update.

Also applies to: 295-295
src/query/stream_schema_provider.rs (4)
282-293: Same incomplete tenant implementation and unwrap issue in staging path.

This duplicates the pattern from the hot tier path—commented-out tenant logic and unwrap() on ObjectStoreUrl::parse().

529-534: Reduce logging level from warn to debug or trace.

This logging statement runs on every table scan and outputs schema/tenant/stream info. Using warn level will flood logs in production.
♻️ Suggested fix
-        tracing::warn!(
+        tracing::debug!(
             "entered scan with\ntenant- {:?}\nschema- {:?}\nstream- {}",
             self.tenant_id,
             self.schema,
             self.stream
         );
639-648: Incomplete tenant implementation and noisy logging in object store scan path.

The tenant-aware URL construction is commented out (lines 639-644), and tracing::warn! at line 645 will be noisy in production. The unwrap() at line 648 should use proper error handling since scan() returns Result.
🐛 Proposed fix
-        // let object_store_url = if let Some(tenant_id) = self.tenant_id.as_ref() {
-        //     glob_storage.store_url().join(tenant_id).unwrap()
-        // } else {
-        //     glob_storage.store_url()
-        // };
+        // TODO: Enable tenant-aware paths once object store supports multi-tenancy
         let object_store_url = glob_storage.store_url();
-        tracing::warn!(object_store_url=?object_store_url);
+        tracing::trace!(object_store_url=?object_store_url);
         self.create_parquet_physical_plan(
             &mut execution_plans,
-            ObjectStoreUrl::parse(object_store_url).unwrap(),
+            ObjectStoreUrl::parse(&object_store_url)
+                .map_err(|e| DataFusionError::Plan(format!("Invalid object store URL: {e}")))?,
224-232: Incomplete tenant-aware object store URL construction in hot tier and other execution paths.

The struct has a tenant_id field but the hot tier execution plan uses hardcoded "file:///" instead. This pattern repeats in other execution paths (lines 284-293, 631-648). Additionally, ObjectStoreUrl::parse(...).unwrap() should use proper error handling instead of panic-on-error.

Apply tenant-aware path construction consistently where self.tenant_id is available, or document if global object store isolation is intentional.

Replace .unwrap() with .map_err() to convert parsing errors into Result for proper error propagation.
src/handlers/http/modal/ingest/ingestor_role.rs (1)
46-52: Inverted tenant validation logic.

The condition req_tenant.ne(DEFAULT_TENANT) && (req_tenant.eq(&sync_req.tenant_id)) rejects requests when the request tenant matches the payload tenant. Based on the error message ("non super-admin user trying to create role for another tenant"), the second condition should check for a mismatch instead.
🐛 Proposed fix
-    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.eq(&sync_req.tenant_id)) {
+    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.ne(&sync_req.tenant_id)) {
         return Err(RoleError::Anyhow(anyhow::Error::msg(
             "non super-admin user trying to create role for another tenant",
         )));
     }
src/handlers/http/oidc.rs (3)
102-116: BasicAuth flow uses incorrect tenant lookup.

For SessionKey::BasicAuth, get_tenant_id_from_key() returns None because basic credentials have no pre-established session. This causes Users.get_user(&username, &tenant_id) at line 115 to default to DEFAULT_TENANT, breaking multi-tenant support for basic-auth users. Use get_tenant_id_from_request(&req) instead.
🐛 Proposed fix
-    let tenant_id = get_tenant_id_from_key(&session_key);
     match session_key {
         // We can exchange basic auth for session cookie
-        SessionKey::BasicAuth { username, password } => match Users.get_user(&username, &tenant_id)
-        {
+        SessionKey::BasicAuth { username, password } => {
+            let tenant_id = get_tenant_id_from_request(&req);
+            match Users.get_user(&username, &tenant_id)
+        {
130-160: Intra-cluster login sync should treat non-2xx as failure.

reqwest::send() succeeds on non-2xx responses, so the sync can silently fail. Consider using .error_for_status() and logging per-node failures.
🐛 Proposed fix
                         INTRA_CLUSTER_CLIENT
                             .post(url)
                             .header(header::AUTHORIZATION, node.token)
                             .header(header::CONTENT_TYPE, "application/json")
                             .json(&json!(
                                 {
                                     "sessionCookie": _session,
                                     "user": _user,
                                     "expiry": EXPIRY_DURATION
                                 }
                             ))
                             .send()
-                            .await?;
+                            .await?
+                            .error_for_status()?;
                         Ok::<(), anyhow::Error>(())
325-326: User created without tenant association.

The comment "LET TENANT BE NONE FOR NOW!!!" at line 325 indicates new OAuth users are intentionally created without tenant context, despite tenant_id being available. This creates a multi-tenant isolation gap.

Replace with a proper TODO comment with tracking reference, or pass tenant_id:
-        // LET TENANT BE NONE FOR NOW!!!
-        (None, roles) => put_user(&user_id, roles, user_info, bearer, None).await?,
+        // TODO(multi-tenancy): Pass tenant_id once OIDC user creation supports it
+        (None, roles) => put_user(&user_id, roles, user_info, bearer, tenant_id).await?,
src/handlers/http/role.rs (2)
154-169: Persist default-role metadata before mutating DEFAULT_ROLE (consistency).

Currently DEFAULT_ROLE is updated before put_metadata() (Line 162-168). If persistence fails, in-memory and storage diverge. Also, write().unwrap() can panic on poisoning.

This matches prior feedback.
Proposed fix
 pub async fn put_default(
     req: HttpRequest,
     name: web::Json<String>,
 ) -> Result<impl Responder, RoleError> {
     let name = name.into_inner();
     let tenant_id = get_tenant_id_from_request(&req);
     let mut metadata = get_metadata(&tenant_id).await?;
     metadata.default_role = Some(name.clone());
-    DEFAULT_ROLE.write().unwrap().insert(
-        tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v).to_owned(),
-        Some(name),
-    );
-    // *DEFAULT_ROLE.lock().unwrap() = Some(name);
     put_metadata(&metadata, &tenant_id).await?;
+    DEFAULT_ROLE
+        .write()
+        .expect("failed to acquire DEFAULT_ROLE write lock")
+        .insert(
+            tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v).to_owned(),
+            Some(name),
+        );
     Ok(HttpResponse::Ok().finish())
 }
173-193: Remove commented-out block in get_default.

This matches prior feedback.
src/query/mod.rs (1)

173-218: Don’t ignore register_schema errors (and remove stale commented code).

let _ = catalog.register_schema(...) (Line 191, 201-204) silently drops failures; if registration fails, later queries can fail in surprising ways. Also, the large commented-out block (Line 207-215) should be removed.

This matches prior feedback.
src/alerts/mod.rs (1)
1245-1255: Ensure tenant buckets are created on update/write paths (don’t drop writes).

update(), the write-back in update_state(), and update_notification_state() only update if get_mut(tenant) returns Some(_) (Line 1247-1250, 1338-1341, 1381-1383). For first-time tenants / races with initialization, this loses updates.

This matches prior feedback.
Proposed fix pattern
-        let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v);
-        if let Some(alerts) = self.alerts.write().await.get_mut(tenant) {
-            alerts.insert(*alert.get_id(), alert.clone_box());
-        }
+        let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v);
+        let mut guard = self.alerts.write().await;
+        guard
+            .entry(tenant.to_owned())
+            .or_default()
+            .insert(*alert.get_id(), alert.clone_box());
Also applies to: 1334-1343, 1349-1387
src/parseable/mod.rs (4)

1056-1075: Fix TOCTOU race in add_tenant() (check+insert must be atomic).

This matches prior feedback.

1115-1143: delete_tenant() leaves self.tenants inconsistent (tenant still listed).

This matches prior feedback.

1145-1181: load_tenants() logic is incomplete and lock failures are swallowed.

The else if !is_multi_tenant {} branch (Line 1167-1168) does nothing, but the function-level comment says startup should fail when multi-tenant traces exist but the flag is off.

Returning Ok(None) on poisoned lock (Line 1175-1180) masks a fundamental failure (better to Err/panic). Based on learnings, fail fast on critical state persistence/locking issues.

This matches prior feedback.

1183-1190: Don’t silently return None on tenants lock failure.

This matches prior feedback.
src/metastore/metastores/object_store_metastore.rs (3)
323-340: The tenant_id parameter is unused in get_alert_state_entry.

Line 328 calls alert_state_json_path(*alert_id) without passing tenant_id. This breaks tenant isolation as all tenants would read from the same alert state path. The alert_state_json_path function should be updated to accept tenant_id and construct tenant-scoped paths, matching the pattern used by mttr_json_path.

487-510: Bug: Dashboard HashMap overwrites entries for each user within same tenant.

The loop at lines 492-506 iterates over users within a tenant, but line 504 always inserts with the same tenant key, overwriting the previous user's dashboards. Only the last user's dashboards will be retained per tenant.
🐛 Suggested fix to accumulate dashboards correctly
     async fn get_dashboards(&self) -> Result<HashMap<String, Vec<Bytes>>, MetastoreError> {
         let mut dashboards = HashMap::new();
         let base_paths = PARSEABLE.list_tenants().map_or(vec!["".into()], |v| v);
         for mut tenant in base_paths {
             let users_dir = RelativePathBuf::from_iter([&tenant, USERS_ROOT_DIR]);
+            let mut tenant_dashboards = Vec::new();
             for user in self.storage.list_dirs_relative(&users_dir).await? {
                 let dashboards_path = users_dir.join(&user).join("dashboards");
                 let dashboard_bytes = self
                     .storage
                     .get_objects(
                         Some(&dashboards_path),
                         Box::new(|file_name| file_name.ends_with(".json")),
                     )
                     .await?;
-                if tenant.eq(&mut "") {
-                    tenant.clone_from(&DEFAULT_TENANT.to_string());
-                }
-                dashboards.insert(tenant.to_owned(), dashboard_bytes);
+                tenant_dashboards.extend(dashboard_bytes);
             }
+            if tenant.eq(&mut "") {
+                tenant.clone_from(&DEFAULT_TENANT.to_string());
+            }
+            dashboards.insert(tenant, tenant_dashboards);
         }
         Ok(dashboards)
     }
1229-1244: Missing tenant prefix filter in stream extraction.

When tenant_id is provided, list_with_delimiter returns paths with the full tenant prefix (e.g., "tenant/stream1/"). The flat_map(|path| path.parts()) extracts all path components including the tenant name. The filter at lines 1237-1243 doesn't exclude the tenant prefix, causing the tenant ID to be incorrectly added to the stream list.
🐛 Suggested fix
             let streams = resp
                 .common_prefixes
                 .iter()
                 .flat_map(|path| {
                     path.parts()
                 })
                 .map(|name| name.as_ref().to_string())
                 .filter(|name| {
                     name != PARSEABLE_ROOT_DIRECTORY
                         && name != USERS_ROOT_DIR
                         && name != SETTINGS_ROOT_DIRECTORY
                         && name != ALERTS_ROOT_DIRECTORY
+                        && tenant_id.as_ref().map_or(true, |t| name != t)
                 })
                 .collect::<Vec<_>>();

🧹 Nitpick comments (22)

src/handlers/airplane.rs (1)
186-198: Consider a more informative error message for suspended users.

The handling of rbac::Response::Suspended is correct for completeness, but the error message "Suspended" is terse. Consider providing more context:
-            rbac::Response::Suspended(_) => return Err(Status::permission_denied("Suspended")),
+            rbac::Response::Suspended(msg) => {
+                return Err(Status::permission_denied(format!("User account suspended: {msg}")));
+            }
src/prism/home/mod.rs (1)
101-110: Consider extracting tenant_id without the reference-to-temporary pattern.

The pattern &get_tenant_id_from_key(key) relies on temporary lifetime extension. While valid, it's more conventional to bind the owned value first:
let tenant_id = get_tenant_id_from_key(key);
// then pass &tenant_id where needed
Also note that get_alerts_summary(key) on line 108 doesn't receive tenant_id. If alerts should be tenant-scoped for consistency with streams, this may need updating.
src/rbac/map.rs (3)
147-155: Remove commented-out code blocks.

Multiple sections of commented-out code remain throughout this file (lines 127-131, 147-155, 446-450, 465-467, 485-487, 496-498, 525-529). These make the code harder to read and should be removed before merging.

62-69: Unnecessary braces around function bodies.

The extra braces in users() and roles() functions serve no purpose and can be removed.
♻️ Suggested cleanup
 pub fn users() -> RwLockReadGuard<'static, Users> {
-    {
-        USERS
-            .get()
-            .expect("map is set")
-            .read()
-            .expect("not poisoned")
-    }
+    USERS
+        .get()
+        .expect("map is set")
+        .read()
+        .expect("not poisoned")
 }
274-290: Remove commented debug logging that previously leaked credentials.

The debug logging was correctly disabled to prevent credential leakage, but the commented-out code should be removed entirely rather than left in place.
♻️ Suggested cleanup
     pub fn remove_user(&mut self, username: &str, tenant_id: &str) {
-        // tracing::warn!("removing user- {username}, tenant_id- {tenant_id}");
-        // tracing::warn!("active sessions- {:?}", self.active_sessions);
-        // tracing::warn!("user sessions- {:?}", self.user_sessions);
         let sessions = if let Some(tenant_sessions) = self.user_sessions.get_mut(tenant_id) {
-            // tracing::warn!("found session for tenant- {tenant_id}");
             tenant_sessions.remove(username)
         } else {
-            // tracing::warn!("not found session for tenant- {tenant_id}");
             None
         };
         if let Some(sessions) = sessions {
-            // tracing::warn!("found active sessions for user {username}-   {sessions:?}");
             sessions.into_iter().for_each(|(key, _)| {
                 self.active_sessions.remove(&key);
             })
         }
     }
src/rbac/mod.rs (1)
119-125: Simplify match with if let or remove empty arm.

The None => {} arm is unnecessary and can be simplified.
♻️ Suggested cleanup
     fn remove_user(&mut self, userid: &str, tenant_id: &str) {
-        match mut_users().get_mut(tenant_id) {
-            Some(users) => {
-                users.remove(userid);
-            }
-            None => {}
+        if let Some(users) = mut_users().get_mut(tenant_id) {
+            users.remove(userid);
         }
     }
src/handlers/http/query.rs (1)
118-125: Duplicate call to get_tenant_id_from_request.

get_tenant_id_from_request(&req) is called at line 118 and again at line 120. Extract it once and reuse.
♻️ Suggested cleanup
+    let tenant_id = get_tenant_id_from_request(&req);
     // check or load streams in memory
-    create_streams_for_distributed(tables.clone(), &get_tenant_id_from_request(&req)).await?;
-
-    let tenant_id = get_tenant_id_from_request(&req);
+    create_streams_for_distributed(tables.clone(), &tenant_id).await?;
     session_state
         .config_mut()
src/handlers/livetail.rs (1)

119-122: Livetail doesn't propagate tenant context.

get_stream is called with &None for tenant_id, meaning livetail operates without tenant awareness. If multi-tenancy requires tenant isolation for livetail, the tenant context should be extracted from the request (similar to query handlers).

Is tenant-aware livetail required for this PR? If so, consider extracting tenant_id from the request metadata similar to how it's done in HTTP handlers.
src/handlers/http/modal/query/querier_role.rs (1)
57-61: Remove commented-out dead code.

Line 61 contains a commented-out line that is no longer needed since the tenant-scoped insertion on lines 57-60 replaces it.
♻️ Suggested cleanup
     mut_roles()
         .entry(tenant.to_owned())
         .or_default()
         .insert(name.clone(), privileges.clone());
-    // mut_roles().insert(name.clone(), privileges.clone());
src/tenants/mod.rs (2)
89-91: Redundant return keyword.

The explicit return on line 90 is unnecessary in Rust when it's the last expression in a branch.
♻️ Suggested fix
         } else {
-            return Err(TenantNotFound(tenant_id.to_owned()));
+            Err(TenantNotFound(tenant_id.to_owned()))
         }
106-153: Remove large block of commented-out code.

This 48-line commented block adds noise and should be removed. If this code is intended for future use, consider tracking it in an issue instead.
src/catalog/mod.rs (1)
460-460: Debug logging at warn level should be reduced.

This tracing::warn! appears to be debug/development logging. Consider changing to trace! or debug! level, or removing it before merge.
-    tracing::warn!("manifest path_url= {path_url}");
+    tracing::debug!("manifest path_url= {path_url}");
src/parseable/streams.rs (1)

1031-1033: Remove commented-out code.

Multiple blocks of commented-out code remain from the refactoring. These should be removed to improve code clarity.

Also applies to: 1066-1069, 1088-1088, 1128-1135, 1179-1185

src/handlers/http/role.rs (1)

41-89: Tenant propagation looks consistent; consider avoiding tenant_id shadowing + validate tenant header.

This handler does the right thing by persisting metadata before updating mut_roles() (Line 54-60). Two nits:

Shadowing tenant_id from Option<String> to &str (Line 56) is a bit error-prone; consider tenant_key.

get_tenant_id_from_request() (used on Line 47) currently unwrap()s header UTF-8 per snippet; that’s a crash vector and also needs tenant-id validation (no /, .., etc.) since tenant ids are used as storage path prefixes elsewhere.

src/query/mod.rs (3)

76-121: Drop commented-out legacy QUERY_SESSION + consider lock choice for SessionContext.

The commented-out QUERY_SESSION (Line 76-78) should be removed before merge.

std::sync::RwLock is probably fine here since you don’t hold guards across .await, but it’s worth confirming this won’t become a contention point under query load.

280-376: Minor: avoid repeated get_ctx() calls inside Query::execute().

You can grab let ctx = QUERY_SESSION.get_ctx(); once and reuse it for execute_logical_plan, state(), and task_ctx() to avoid repeated lock+clone.

946-1021: Consider tenant-scoping metrics or drop unused tenant_id field.

PartitionedMetricMonitor stores tenant_id (Line 954-969) but check_if_last_stream() doesn’t use it; either wire it into metrics labels (if desired) or remove the field to keep intent clear.
src/storage/object_storage.rs (2)
1239-1246: Minor: mttr_json_path() can avoid &tenant double-ref.

RelativePathBuf::from_iter([&tenant, ...]) (Line 1242-1243) can just use tenant.as_str() for clarity.

1070-1087: Use conditional pattern to avoid empty string components in path construction.

schema_path() and stream_json_path() build tenant = "" via map_or("", |v| v) then pass it to RelativePathBuf::from_iter([tenant, ...]). While the relative-path crate normalizes empty string components, the safer and more explicit pattern—already used for alert_json_path() and mttr_json_path() in the same file—is to conditionally include the tenant only when Some.

Refactor to match the existing pattern:
Proposed fix
 pub fn schema_path(stream_name: &str, tenant_id: &Option<String>) -> RelativePathBuf {
-    let tenant = tenant_id.as_ref().map_or("", |v| v);
     if PARSEABLE.options.mode == Mode::Ingest {
         ...
-        RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        if let Some(tenant) = tenant_id.as_deref() {
+            RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        } else {
+            RelativePathBuf::from_iter([stream_name, STREAM_ROOT_DIRECTORY, &file_name])
+        }
     } else {
-        RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, SCHEMA_FILE_NAME])
+        if let Some(tenant) = tenant_id.as_deref() {
+            RelativePathBuf::from_iter([tenant, stream_name, STREAM_ROOT_DIRECTORY, SCHEMA_FILE_NAME])
+        } else {
+            RelativePathBuf::from_iter([stream_name, STREAM_ROOT_DIRECTORY, SCHEMA_FILE_NAME])
+        }
     }
 }
Also applies to: stream_json_path() (lines 1166–1182)
src/metastore/metastores/object_store_metastore.rs (2)
460-484: Consider using tenant_id parameter directly in put_llmconfig and delete_llmconfig.

Both methods accept tenant_id but rely entirely on obj.get_object_path() for the storage path. This creates an implicit contract that the object must be constructed with the correct tenant-prefixed path. While functional, this pattern is inconsistent with methods like put_keystone which explicitly construct paths using tenant_id.

1100-1105: Consider validating tenant_id before deletion.

The delete_tenant method accepts any &str including empty string. An empty tenant_id would create a path from "", potentially deleting more than intended depending on how delete_prefix handles empty paths.
🛡️ Suggested validation
     async fn delete_tenant(&self, tenant_id: &str) -> Result<(), MetastoreError> {
+        if tenant_id.is_empty() {
+            return Err(MetastoreError::Error {
+                status_code: StatusCode::BAD_REQUEST,
+                message: "tenant_id cannot be empty".into(),
+                flow: "delete_tenant".into(),
+            });
+        }
         self.storage
             .delete_prefix(&RelativePathBuf::from(tenant_id))
             .await
             .map_err(MetastoreError::ObjectStorageError)
     }
src/metastore/metastore_traits.rs (1)

171-182: Note: get_chats has different return type pattern.

Unlike other similar methods that return HashMap<String, Vec<Bytes>> for tenant-keyed results, get_chats returns DashMap<String, Vec<Bytes>> keyed by user, not tenant. This asymmetry is compounded by the implementation not iterating over tenants (see implementation file review). Consider whether get_chats should follow the same tenant-keyed pattern.

coderabbitai · 2026-01-15T11:10:20Z

src/handlers/http/cluster/mod.rs

+    } else {
+        map.insert(
+            reqwest::header::AUTHORIZATION,
+            reqwest::header::HeaderValue::from_str(&querier.token).unwrap(),
+        );
+    };


⚠️ Potential issue | 🟡 Minor

Potential panic if querier token contains invalid header characters.

HeaderValue::from_str(&querier.token).unwrap() will panic if the token contains non-ASCII or control characters. Consider using try_from with proper error handling.

🐛 Proposed fix

} else { - map.insert( - reqwest::header::AUTHORIZATION, - reqwest::header::HeaderValue::from_str(&querier.token).unwrap(), - ); + match reqwest::header::HeaderValue::from_str(&querier.token) { + Ok(val) => { map.insert(reqwest::header::AUTHORIZATION, val); } + Err(e) => { + mark_querier_available(&domain_name).await; + return Err(QueryError::Anyhow(anyhow::anyhow!("Invalid token header: {}", e))); + } + } };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

} else {

map.insert(

reqwest::header::AUTHORIZATION,

reqwest::header::HeaderValue::from_str(&querier.token).unwrap(),

);

};

} else {

match reqwest::header::HeaderValue::from_str(&querier.token) {

Ok(val) => { map.insert(reqwest::header::AUTHORIZATION, val); }

Err(e) => {

mark_querier_available(&domain_name).await;

return Err(QueryError::Anyhow(anyhow::anyhow!("Invalid token header: {}", e)));

}

}

};

🤖 Prompt for AI Agents

In `@src/handlers/http/cluster/mod.rs` around lines 1890 - 1895, The insertion using HeaderValue::from_str(&querier.token).unwrap() can panic on invalid header characters; replace the unwrap with a fallible conversion (reqwest::header::HeaderValue::try_from(&querier.token)) and handle the Result instead of panicking: validate querier.token before inserting into map (the AUTHORIZATION header), and on Err return or propagate an appropriate error (e.g., map to an HTTP 400/BadRequest or use the enclosing function's error type) so the code around map and querier.token safely handles invalid tokens.

coderabbitai · 2026-01-15T11:10:20Z

src/metastore/metastores/object_store_metastore.rs

+                    for filter in filter_bytes {
+                        // deserialize into Value
+                        let mut filter_value =
+                            serde_json::from_slice::<serde_json::Value>(&filter)?;
+
+                        if let Some(meta) = filter_value.clone().as_object() {
+                            let version = meta.get("version").and_then(|version| version.as_str());
+
+                            if version == Some("v1") {
+                                // delete older version of the filter
+                                self.storage.delete_object(&filters_path).await?;
+
+                                filter_value = migrate_v1_v2(filter_value);
+                                let user_id = filter_value
+                                    .as_object()
+                                    .unwrap()
+                                    .get("user_id")
+                                    .and_then(|user_id| user_id.as_str());
+                                let filter_id = filter_value
+                                    .as_object()
+                                    .unwrap()
+                                    .get("filter_id")
+                                    .and_then(|filter_id| filter_id.as_str());
+                                let stream_name = filter_value
+                                    .as_object()
+                                    .unwrap()
+                                    .get("stream_name")
+                                    .and_then(|stream_name| stream_name.as_str());
+
+                                // if these values are present, create a new file
+                                if let (Some(user_id), Some(stream_name), Some(filter_id)) =
+                                    (user_id, stream_name, filter_id)
+                                {
+                                    let path = filter_path(
+                                        user_id,
+                                        stream_name,
+                                        &format!("{filter_id}.json"),
+                                    );
+                                    let filter_bytes = to_bytes(&filter_value);
+                                    self.storage.put_object(&path, filter_bytes.clone()).await?;
+                                }


⚠️ Potential issue | 🟡 Minor

Potential panics from .unwrap() calls during filter migration.

Lines 636, 639, 642, 645, and 648 use .unwrap() on filter_value.as_object() results. While these are guarded by the outer if let Some(meta) = filter_value.clone().as_object() check, after migrate_v1_v2 transforms the value, there's no guarantee the result is still a valid object. If migration produces an unexpected structure, this will panic.

🛡️ Suggested defensive approach

filter_value = migrate_v1_v2(filter_value); - let user_id = filter_value - .as_object() - .unwrap() - .get("user_id") - .and_then(|user_id| user_id.as_str()); - let filter_id = filter_value - .as_object() - .unwrap() - .get("filter_id") - .and_then(|filter_id| filter_id.as_str()); - let stream_name = filter_value - .as_object() - .unwrap() - .get("stream_name") - .and_then(|stream_name| stream_name.as_str()); + let migrated_obj = filter_value.as_object(); + let user_id = migrated_obj + .and_then(|o| o.get("user_id")) + .and_then(|v| v.as_str()); + let filter_id = migrated_obj + .and_then(|o| o.get("filter_id")) + .and_then(|v| v.as_str()); + let stream_name = migrated_obj + .and_then(|o| o.get("stream_name")) + .and_then(|v| v.as_str());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for filter in filter_bytes {

// deserialize into Value

let mut filter_value =

serde_json::from_slice::<serde_json::Value>(&filter)?;

if let Some(meta) = filter_value.clone().as_object() {

let version = meta.get("version").and_then(|version| version.as_str());

if version == Some("v1") {

// delete older version of the filter

self.storage.delete_object(&filters_path).await?;

filter_value = migrate_v1_v2(filter_value);

let user_id = filter_value

.as_object()

.unwrap()

.get("user_id")

.and_then(|user_id| user_id.as_str());

let filter_id = filter_value

.as_object()

.unwrap()

.get("filter_id")

.and_then(|filter_id| filter_id.as_str());

let stream_name = filter_value

.as_object()

.unwrap()

.get("stream_name")

.and_then(|stream_name| stream_name.as_str());

// if these values are present, create a new file

if let (Some(user_id), Some(stream_name), Some(filter_id)) =

(user_id, stream_name, filter_id)

{

let path = filter_path(

user_id,

stream_name,

&format!("{filter_id}.json"),

);

let filter_bytes = to_bytes(&filter_value);

self.storage.put_object(&path, filter_bytes.clone()).await?;

}

for filter in filter_bytes {

// deserialize into Value

let mut filter_value =

serde_json::from_slice::<serde_json::Value>(&filter)?;

if let Some(meta) = filter_value.clone().as_object() {

let version = meta.get("version").and_then(|version| version.as_str());

if version == Some("v1") {

// delete older version of the filter

self.storage.delete_object(&filters_path).await?;

filter_value = migrate_v1_v2(filter_value);

let migrated_obj = filter_value.as_object();

let user_id = migrated_obj

.and_then(|o| o.get("user_id"))

.and_then(|v| v.as_str());

let filter_id = migrated_obj

.and_then(|o| o.get("filter_id"))

.and_then(|v| v.as_str());

let stream_name = migrated_obj

.and_then(|o| o.get("stream_name"))

.and_then(|v| v.as_str());

// if these values are present, create a new file

if let (Some(user_id), Some(stream_name), Some(filter_id)) =

(user_id, stream_name, filter_id)

{

let path = filter_path(

user_id,

stream_name,

&format!("{filter_id}.json"),

);

let filter_bytes = to_bytes(&filter_value);

self.storage.put_object(&path, filter_bytes.clone()).await?;

}

🤖 Prompt for AI Agents

In `@src/metastore/metastores/object_store_metastore.rs` around lines 621 - 661, The code currently calls unwrap() on filter_value.as_object() after calling migrate_v1_v2, which can panic if migration yields a non-object; replace these unwraps with safe checks: after filter_value = migrate_v1_v2(filter_value) do an if let Some(obj) = filter_value.as_object() and then extract user_id, filter_id, stream_name via obj.get(...).and_then(...); only proceed to build path with filter_path, call to_bytes and storage.put_object when all three are Some, otherwise log/skip the malformed migrated value (or return a controlled error) instead of unwrapping to avoid panics in the loop that processes filter_bytes and affects methods like migrate_v1_v2, storage.delete_object, to_bytes, and storage.put_object.

coderabbitai · 2026-01-15T11:10:20Z

src/parseable/streams.rs

+        tracing::warn!(
+            "get_or_create\nstream- {stream_name}\ntenant- {tenant_id:?}\nmetadata- {metadata:?}\noptions- {options:?}"
+        );
+        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
+
+        if let Some(tenant_streams) = guard.get(tenant)
+            && let Some(stream) = tenant_streams.get(&stream_name)
+        {
            return stream.clone();
        }

-        let stream = Stream::new(options, &stream_name, metadata, ingestor_id);
-        guard.insert(stream_name, stream.clone());
-
+        // if let Some(stream) = guard.get(&stream_name) {
+        //     return stream.clone();
+        // }
+        // guard.insert(stream_name, stream.clone());
+
+        let stream = Stream::new(options, &stream_name, metadata, ingestor_id, tenant_id);
+        tracing::warn!("creating new stream- {stream_name}");
+        guard
+            .entry(tenant.to_owned())
+            .or_default()
+            .insert(stream_name, stream.clone());
+        tracing::warn!("inserted stream in mem");
        stream


⚠️ Potential issue | 🟡 Minor

Remove debug logging statements.

Multiple tracing::warn! calls (lines 1055-1057, 1072, 1077) are used for debugging during development. These will flood production logs and should be removed or reduced to trace level.

♻️ Suggested fix

- tracing::warn!( - "get_or_create\nstream- {stream_name}\ntenant- {tenant_id:?}\nmetadata- {metadata:?}\noptions- {options:?}" - ); let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v); if let Some(tenant_streams) = guard.get(tenant) @@ -1069,9 +1066,6 @@ let stream = Stream::new(options, &stream_name, metadata, ingestor_id, tenant_id); - tracing::warn!("creating new stream- {stream_name}"); guard .entry(tenant.to_owned()) .or_default() .insert(stream_name, stream.clone()); - tracing::warn!("inserted stream in mem"); stream

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tracing::warn!(

"get_or_create\nstream- {stream_name}\ntenant- {tenant_id:?}\nmetadata- {metadata:?}\noptions- {options:?}"

);

let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);

if let Some(tenant_streams) = guard.get(tenant)

&& let Some(stream) = tenant_streams.get(&stream_name)

{

return stream.clone();

}

let stream = Stream::new(options, &stream_name, metadata, ingestor_id);

guard.insert(stream_name, stream.clone());

// if let Some(stream) = guard.get(&stream_name) {

// return stream.clone();

// }

// guard.insert(stream_name, stream.clone());

let stream = Stream::new(options, &stream_name, metadata, ingestor_id, tenant_id);

tracing::warn!("creating new stream- {stream_name}");

guard

.entry(tenant.to_owned())

.or_default()

.insert(stream_name, stream.clone());

tracing::warn!("inserted stream in mem");

stream

let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);

if let Some(tenant_streams) = guard.get(tenant)

&& let Some(stream) = tenant_streams.get(&stream_name)

{

return stream.clone();

}

// if let Some(stream) = guard.get(&stream_name) {

// return stream.clone();

// }

// guard.insert(stream_name, stream.clone());

let stream = Stream::new(options, &stream_name, metadata, ingestor_id, tenant_id);

guard

.entry(tenant.to_owned())

.or_default()

.insert(stream_name, stream.clone());

stream

🤖 Prompt for AI Agents

In `@src/parseable/streams.rs` around lines 1055 - 1078, The get_or_create path contains several development debug logs using tracing::warn! (the multi-line call logging stream_name/tenant_id/metadata/options, the "creating new stream- {stream_name}" call, and the "inserted stream in mem" call); replace these with either removed statements or downgrade them to tracing::trace! to avoid polluting production logs while keeping low-verbosity traces, keeping the rest of the logic (tenant lookup, Stream::new, guard.entry(...).or_default().insert(...), and the returned stream) unchanged.

coderabbitai · 2026-01-15T11:10:20Z

src/parseable/streams.rs

+    pub fn contains(&self, stream_name: &str, tenant_id: &Option<String>) -> bool {
+        let tenant_id = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
+        if let Some(tenant) = self.read().expect(LOCK_EXPECT).get(tenant_id) {
+            tenant.contains_key(stream_name)
+        } else {
+            tracing::warn!(
+                "Tenant with id {tenant_id} does not exist! Shouldn't happen (stream- {stream_name})"
+            );
+            false
+        }


⚠️ Potential issue | 🟡 Minor

Noisy warning for missing tenant.

The warn! at lines 1096-1098 triggers on every contains check when a tenant doesn't exist. This could flood logs during normal operation (e.g., checking streams before tenant is registered). Consider using debug or trace level.

♻️ Suggested fix

} else { - tracing::warn!( + tracing::debug!( "Tenant with id {tenant_id} does not exist! Shouldn't happen (stream- {stream_name})" ); false }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn contains(&self, stream_name: &str, tenant_id: &Option<String>) -> bool {

let tenant_id = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);

if let Some(tenant) = self.read().expect(LOCK_EXPECT).get(tenant_id) {

tenant.contains_key(stream_name)

} else {

tracing::warn!(

"Tenant with id {tenant_id} does not exist! Shouldn't happen (stream- {stream_name})"

);

false

}

pub fn contains(&self, stream_name: &str, tenant_id: &Option<String>) -> bool {

let tenant_id = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);

if let Some(tenant) = self.read().expect(LOCK_EXPECT).get(tenant_id) {

tenant.contains_key(stream_name)

} else {

tracing::debug!(

"Tenant with id {tenant_id} does not exist! Shouldn't happen (stream- {stream_name})"

);

false

}

}

🤖 Prompt for AI Agents

In `@src/parseable/streams.rs` around lines 1091 - 1100, The contains method logs a warning whenever a tenant is missing, which noisily floods logs; change the tracing::warn! call inside contains to a lower level (tracing::debug! or tracing::trace!) so missing tenants during normal checks aren’t noisy, keeping the same message/context (tenant_id and stream_name); update the log invocation in the contains function (where LOCK_EXPECT and DEFAULT_TENANT are used) accordingly.

coderabbitai · 2026-01-15T11:10:20Z

src/query/mod.rs

+impl InMemorySessionContext {
+    pub fn get_ctx(&self) -> SessionContext {
+        let ctx = self
+            .session_context
+            .read()
+            .expect("SessionContext should be readable");
+        ctx.clone()
+    }
+
+    pub fn add_schema(&self, tenant_id: &str) {
+        self.session_context
+            .write()
+            .expect("SessionContext should be writeable")
+            .catalog("datafusion")
+            .expect("Default catalog should be available")
+            .register_schema(
+                tenant_id,
+                Arc::new(GlobalSchemaProvider {
+                    storage: PARSEABLE.storage().get_object_store(),
+                    tenant_id: Some(tenant_id.to_owned()),
+                }),
+            )
+            .expect("Should be able to register new schema");
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for add_schema invocations and tenant-related functions rg -n "add_schema|\.add_schema\(" --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 509

🏁 Script executed:

# Search for tenant addition/creation flows rg -n "add_tenant|insert_tenant|create_tenant" --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 1327

🏁 Script executed:

# Find where QUERY_SESSION is used rg -n "QUERY_SESSION\." --type rust -B 1 -A 1

Repository: parseablehq/parseable

Length of output: 2248

🏁 Script executed:

# Get the full add_tenant function sed -n '1056,1080p' src/parseable/mod.rs

Repository: parseablehq/parseable

Length of output: 824

🏁 Script executed:

# Search for ALL invocations of add_schema (not just definition) rg -n "\.add_schema\(" --type rust

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Check if add_schema is ever called anywhere rg -c "add_schema" --type rust | grep -v "^0$" | head -20

Repository: parseablehq/parseable

Length of output: 83

🏁 Script executed:

# Find GlobalSchemaProvider implementation rg -n "struct GlobalSchemaProvider|impl GlobalSchemaProvider" --type rust -A 20

Repository: parseablehq/parseable

Length of output: 1537

🏁 Script executed:

# Check if schema resolution is dynamic or requires pre-registration rg -n "register_schema|tenant_id" --type rust | grep -E "(GlobalSchemaProvider|schema_provider)" -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 2609

🏁 Script executed:

# Search for how schemas are resolved for tenants in queries rg -n "get_schema|resolve.*schema" --type rust -B 2 -A 2 | head -50

Repository: parseablehq/parseable

Length of output: 3040

🏁 Script executed:

# Get the initialization context around line 191 where register_schema is called sed -n '150,210p' src/query/mod.rs

Repository: parseablehq/parseable

Length of output: 2219

🏁 Script executed:

# Check if there's a difference between how QUERY_SESSION is initialized vs QUERY_SESSION_STATE rg -n "QUERY_SESSION\s*=" --type rust -B 5 -A 5

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Find where QUERY_SESSION is created/initialized rg -n "lazy_static|QUERY_SESSION\s*:" --type rust -B 2 -A 10 | head -50

Repository: parseablehq/parseable

Length of output: 2824

🏁 Script executed:

# Search for static declarations of QUERY_SESSION rg -n "^lazy_static|^static.*QUERY_SESSION" --type rust -B 1 -A 10

Repository: parseablehq/parseable

Length of output: 889

🏁 Script executed:

# Check if add_schema is called during app startup or initialization rg -n "startup|init.*session|bootstrap" --type rust -B 2 -A 5 | grep -A 5 -B 5 "add_schema|QUERY_SESSION"

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Find all calls to add_tenant rg -n "\.add_tenant\(" --type rust -B 3 -A 3

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Check if there's any dynamic schema registration after tenant creation rg -n "add_schema|register_schema" --type rust -B 2 -A 2 | grep -A 5 -B 5 "add_tenant"

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Search more broadly for add_tenant calls with different patterns rg "add_tenant" --type rust -n

Repository: parseablehq/parseable

Length of output: 113

🏁 Script executed:

# Look for HTTP endpoints that might handle tenant creation rg -n "tenant.*create|create.*tenant|POST.*tenant" --type rust -B 2 -A 5

Repository: parseablehq/parseable

Length of output: 18934

🏁 Script executed:

# Check all files that reference add_tenant function rg -l "add_tenant" --type rust

Repository: parseablehq/parseable

Length of output: 85

🏁 Script executed:

# Search for HTTP handler endpoints related to tenant management rg -n "tenant" --type rust | grep -E "(post|put|get).*handler|route|endpoint" -i -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Look for tenant routes in handlers fd --type f -path "*/handlers/*" | xargs rg -l "tenant" --type rust

Repository: parseablehq/parseable

Length of output: 2131

🏁 Script executed:

# Search for where PARSEABLE.add_tenant might be called rg -n "PARSEABLE\.add_tenant" --type rust

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Search directly for add_tenant method calls rg "add_tenant" --type rust

Repository: parseablehq/parseable

Length of output: 108

🏁 Script executed:

# Check the tenants module to see if add_tenant is even exposed cat src/tenants/mod.rs

Repository: parseablehq/parseable

Length of output: 5083

🏁 Script executed:

# Search in handlers directory for tenant creation endpoints find src/handlers -type f -name "*.rs" | xargs rg -l "tenant" | head -20

Repository: parseablehq/parseable

Length of output: 717

🏁 Script executed:

# Look for tenant-related HTTP handlers with POST/PUT methods rg -n "pub.*fn.*tenant|pub.*async.*fn.*tenant" --type rust | grep -v "get_tenant" | head -20

Repository: parseablehq/parseable

Length of output: 1985

🏁 Script executed:

# Check if add_tenant is called in modal handlers rg -n "add_tenant\|POST\|tenant.*create" src/handlers/http/modal/ --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Check the modal handlers for tenant endpoints rg -n "tenant" src/handlers/http/modal/mod.rs --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 1571

🏁 Script executed:

# Look for Parseable.add_tenant usage rg -n "\.add_tenant|fn add_tenant" src/parseable/mod.rs --type rust -B 5 -A 15

Repository: parseablehq/parseable

Length of output: 853

🏁 Script executed:

# Search for HTTP endpoint routes that might handle tenant creation rg -n "web::post|web::put|web::get.*tenant" src/handlers/http/mod.rs --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Check all HTTP routes/endpoints in handlers rg -n "route|post|put|get" src/handlers/http/mod.rs --type rust | head -40

Repository: parseablehq/parseable

Length of output: 319

🏁 Script executed:

# Search for tenant-related HTTP endpoints more broadly rg -n "tenant" src/handlers/http/mod.rs --type rust

Repository: parseablehq/parseable

Length of output: 280

🏁 Script executed:

# Check if add_tenant is even exposed as a public API endpoint rg -rn "\/tenant|tenant.*endpoint" src/handlers --type rust

Repository: parseablehq/parseable

Length of output: 156

🏁 Script executed:

# Search for tenant-related HTTP endpoints rg -n "tenant" src/handlers/http/mod.rs --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 431

🏁 Script executed:

# Look for where the HTTP routes are configured rg -n "scope|service|configure" src/handlers/http/mod.rs --type rust | head -30

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Check if there's a dedicated tenant handler file find src/handlers -name "*tenant*" -type f

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Get full view of add_tenant function implementation sed -n '1056,1074p' src/parseable/mod.rs

Repository: parseablehq/parseable

Length of output: 708

🏁 Script executed:

# Check if there's anywhere that calls add_schema after tenant creation rg -n "add_schema|QUERY_SESSION\.add" --type rust

Repository: parseablehq/parseable

Length of output: 248

🏁 Script executed:

# Verify the initialization flow - check what PARSEABLE.list_tenants returns rg -n "list_tenants" --type rust -B 2 -A 5

Repository: parseablehq/parseable

Length of output: 13841

🏁 Script executed:

# Check how GlobalSchemaProvider is used in query execution rg -n "GlobalSchemaProvider\|get_schema\|table_names" src/query/stream_schema_provider.rs --type rust | head -30

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Look at the DataFusion catalog/schema resolution rg -n "catalog\|schema.*resolve" src/query/mod.rs --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 47

Add schema registration when tenants are created at runtime.

The add_tenant() function registers tenant metadata but does not register the tenant's schema with QUERY_SESSION. When a new tenant is added at runtime, queries for that tenant will fail because the catalog will have no schema for it. Call QUERY_SESSION.add_schema(tenant_id) after TENANT_METADATA.insert_tenant() in the add_tenant() function.

🤖 Prompt for AI Agents

In `@src/query/mod.rs` around lines 97 - 120, The add_tenant() flow registers tenant metadata with TENANT_METADATA.insert_tenant(...) but never registers the tenant schema with the query session; after calling TENANT_METADATA.insert_tenant(...) in add_tenant(), call QUERY_SESSION.add_schema(tenant_id) (using the same tenant_id) so the DataFusion catalog contains the new schema and runtime queries for that tenant succeed.

coderabbitai · 2026-01-15T11:10:20Z

src/rbac/map.rs

+        } else {
            return;
        };
        sessions.retain(|(_, expiry)| expiry < &now);


⚠️ Potential issue | 🔴 Critical

Critical bug: Session expiration logic is inverted.

The remove_expired_session method retains sessions where expiry < &now, but this keeps expired sessions and removes valid ones. The condition should be expiry > &now to retain sessions that have not yet expired.

🐛 Proposed fix

- sessions.retain(|(_, expiry)| expiry < &now); + sessions.retain(|(_, expiry)| expiry > &now);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sessions.retain(|(_, expiry)| expiry < &now);

sessions.retain(|(_, expiry)| expiry > &now);

🤖 Prompt for AI Agents

In `@src/rbac/map.rs` at line 303, The session expiration check in remove_expired_session is inverted: sessions.retain(|(_, expiry)| expiry < &now) keeps expired sessions and drops valid ones; update the predicate to retain only non-expired entries by using expiry > &now (or expiry >= &now if you want to treat equal as valid) so sessions.retain(|(_, expiry)| expiry > &now) preserves sessions that have not yet expired.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/handlers/http/correlation.rs (1)
89-96: Same issue: tenant_id discarded in modify operation.

Similar to the post function, tenant_id is extracted but discarded, and CORRELATIONS.update() is called without tenant context. This should be addressed alongside the post function fix for consistent multi-tenant behavior.
🔧 Proposed fix
-    correlation.user_id = get_user_and_tenant_from_request(&req)
-        .map(|(s, _)| get_hash(&s.to_string()))
+    let (user_id, tenant_id) = get_user_and_tenant_from_request(&req)
+        .map(|(s, t)| (get_hash(&s.to_string()), t))
         .map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;
+    correlation.user_id = user_id;

     let session_key = extract_session_key_from_req(&req)
         .map_err(|err| CorrelationError::AnyhowError(anyhow::Error::msg(err.to_string())))?;

-    let correlation = CORRELATIONS.update(correlation, &session_key).await?;
+    let correlation = CORRELATIONS.update(correlation, &session_key, &tenant_id).await?;
src/handlers/http/users/dashboards.rs (1)

78-90: Same tenant isolation concern in get_dashboard.

Similar to list_dashboards, this function extracts tenant_id from the header (line 83) rather than the authenticated session. This could allow cross-tenant dashboard access.
src/parseable/streams.rs (1)
1197-1267: Tests are broken: Stream::new calls missing tenant_id parameter.

Multiple test functions call Stream::new with 4 arguments, but the updated signature requires 5 parameters (including tenant_id). Additionally, calls to local_stream_data_path need to be updated to pass tenant_id as the second parameter. This will cause compilation failures.
🔧 Suggested fix (example for one test)
     fn test_staging_new_with_valid_stream() {
         let stream_name = "test_stream";

         let options = Arc::new(Options::default());
         let staging = Stream::new(
             options.clone(),
             stream_name,
             LogStreamMetadata::default(),
             None,
+            &None,
         );

         assert_eq!(
             staging.data_path,
-            options.local_stream_data_path(stream_name)
+            options.local_stream_data_path(stream_name, &None)
         );
     }
Apply similar fixes to all test functions: test_staging_with_special_characters, test_staging_data_path_initialization, test_staging_with_alphanumeric_stream_name, test_arrow_files_empty_directory, generate_correct_path_with_current_time_and_no_custom_partitioning, generate_correct_path_with_current_time_and_custom_partitioning, test_convert_to_parquet_with_empty_staging, write_log, different_minutes_multiple_arrow_files_to_parquet, same_minute_multiple_arrow_files_to_parquet, miss_current_arrow_file_when_converting_to_parquet, get_or_create_returns_existing_stream, create_and_return_new_stream_when_name_does_not_exist, and get_or_create_stream_concurrently.

🤖 Fix all issues with AI agents

In `@src/handlers/http/correlation.rs`:
- Around line 45-52: The handler currently uses get_tenant_id_from_request
(header) to derive tenant_id which is inconsistent with delete; replace that
header-based extraction with the session-based get_user_and_tenant_from_request
flow: call get_user_and_tenant_from_request(&req, &session_key) (or the
project's equivalent) and use the returned tenant (e.g., from the (user, tenant)
tuple) as tenant_id, propagate errors the same way as delete does, then pass
that tenant_id into CORRELATIONS.get_correlation(&correlation_id, &tenant_id).
Ensure you remove or stop using get_tenant_id_from_request in this function so
tenant is always taken from the authenticated session.

In `@src/handlers/http/users/dashboards.rs`:
- Around line 248-253: list_tags currently uses get_tenant_id_from_request
(header-based) which breaks tenant isolation; change it to extract the tenant id
the same way list_dashboards does (i.e., from the authenticated session/context
rather than a raw header). Locate the list_tags function and replace the call to
get_tenant_id_from_request(&req) with the same tenant-extraction helper used by
list_dashboards (or call into the auth/session object retrieved from the
request), ensure the tenant value passed to DASHBOARDS.list_tags(...) comes from
the authenticated session, and keep the existing return/error handling (same
types: list_tags, DASHBOARDS, HttpRequest, DashboardError).

In `@src/parseable/streams.rs`:
- Around line 1163-1169: The debug tracing statements in flush_and_convert are
left over and should be removed: delete the two tracing::warn! calls that log
flush_and_convert_tenants and parseable_streams_tenants (the lines that
reference tenants from PARSEABLE.list_tenants()/DEFAULT_TENANT and
self.read().unwrap().keys()). Leave the tenants selection logic (using
PARSEABLE.list_tenants and DEFAULT_TENANT) intact and do not replace them with
other logging.
- Line 649: The tracing::warn!(part_path=?part_path) call is a debug artifact
that will flood production logs; either remove it or downgrade it to trace level
by replacing tracing::warn!(part_path=?part_path) with
tracing::trace!(part_path=?part_path) (or delete the statement entirely) in the
same function/scope where the macro appears so logging noise is eliminated.

♻️ Duplicate comments (9)

src/utils/mod.rs (1)
79-85: Potential panic on malformed header value.

Using .unwrap() on to_str() can panic if the tenant header contains non-UTF8 bytes. This was previously flagged.
Suggested fix
 pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> {
     if let Some(tenant_value) = req.headers().get("tenant") {
-        Some(tenant_value.to_str().unwrap().to_owned())
+        tenant_value.to_str().ok().map(|s| s.to_owned())
     } else {
         None
     }
 }
src/parseable/mod.rs (4)

1066-1085: TOCTOU race condition in tenant addition.

This issue was previously flagged. The existence check (line 1075) uses a read lock, but the insertion (line 1080) acquires a separate write lock. Another thread could add the same tenant between these operations.

1125-1153: Incomplete tenant deletion - missing tenants list cleanup.

This issue was previously flagged. The method removes the tenant from TENANT_METADATA but does not remove it from self.tenants. This leaves the system in an inconsistent state where list_tenants() still returns the deleted tenant.

1155-1191: Incomplete logic in load_tenants and silent lock failure.

This issue was previously flagged. The empty else if !is_multi_tenant { } block (lines 1177-1178) does nothing, and the silent lock failure at lines 1185-1190 masks potential issues.

1193-1201: Silent lock failure masks potential issues.

This issue was previously flagged. Consider using expect("tenants lock poisoned") for consistency with lock handling elsewhere.
src/handlers/http/middleware.rs (2)
167-180: Handle potential panic from HeaderValue::from_str().unwrap().

If tenant_id contains characters not valid in HTTP headers (e.g., control characters), HeaderValue::from_str() will return an error and unwrap() will panic. This is a duplicate of a prior review comment that should be addressed.
🔧 Suggested fix
         let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) {
             Ok((uid, tid)) => {
-                if tid.is_some() {
-                    req.headers_mut().insert(
-                        HeaderName::from_static("tenant"),
-                        HeaderValue::from_str(&tid.as_ref().unwrap()).unwrap(),
-                    );
+                if let Some(ref tenant) = tid {
+                    if let Ok(header_val) = HeaderValue::from_str(tenant) {
+                        req.headers_mut().insert(
+                            HeaderName::from_static("tenant"),
+                            header_val,
+                        );
+                    }
                 }
-
                 Ok((uid, tid))
             }
             Err(e) => Err(e),
         };
316-327: Security: Consider stricter handling when tenant doesn't exist.

The check_suspension function returns Authorized when:

No tenant header is present (line 325-326)

Tenant doesn't exist in TENANT_METADATA (empty else block at line 322-324)

This could allow requests to bypass tenant-level controls. The empty else branch with the "tenant does not exist" comment suggests this needs further handling. This is a duplicate of a prior review comment that should be addressed.
🔧 Suggested approach
 pub fn check_suspension(req: &HttpRequest, action: Action) -> rbac::Response {
     if let Some(tenant) = req.headers().get("tenant")
         && let Ok(tenant) = tenant.to_str()
     {
-        if let Ok(Some(suspension)) = TENANT_METADATA.is_action_suspended(tenant, &action) {
-            return rbac::Response::Suspended(suspension);
-        } else {
-            // tenant does not exist
+        match TENANT_METADATA.is_action_suspended(tenant, &action) {
+            Ok(Some(suspension)) => return rbac::Response::Suspended(suspension),
+            Ok(None) => {} // Tenant exists, action not suspended - continue
+            Err(_) => {
+                tracing::warn!(tenant = tenant, "Tenant not found in metadata during suspension check");
+                // Consider returning UnAuthorized for unknown tenants in production
+            }
         }
     }
     rbac::Response::Authorized
 }
src/parseable/streams.rs (2)
1056-1079: Remove debug logging statements.

Multiple tracing::warn! calls (lines 1056-1058, 1073, 1078) are used for debugging during development. These will flood production logs and should be removed or reduced to trace level. This is a duplicate of a prior review comment.
🔧 Suggested fix
     pub fn get_or_create(
         ...
     ) -> StreamRef {
         let mut guard = self.write().expect(LOCK_EXPECT);
-        tracing::warn!(
-            "get_or_create\nstream- {stream_name}\ntenant- {tenant_id:?}\nmetadata- {metadata:?}\noptions- {options:?}"
-        );
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);

         if let Some(tenant_streams) = guard.get(tenant)
             && let Some(stream) = tenant_streams.get(&stream_name)
         {
             return stream.clone();
         }

-        // if let Some(stream) = guard.get(&stream_name) {
-        //     return stream.clone();
-        // }
-        // guard.insert(stream_name, stream.clone());
-
         let stream = Stream::new(options, &stream_name, metadata, ingestor_id, tenant_id);
-        tracing::warn!("creating new stream- {stream_name}");
         guard
             .entry(tenant.to_owned())
             .or_default()
             .insert(stream_name, stream.clone());
-        tracing::warn!("inserted stream in mem");
         stream
     }
1092-1101: Noisy warning for missing tenant.

The warn! at lines 1097-1099 triggers on every contains check when a tenant doesn't exist. This could flood logs during normal operation (e.g., checking streams before tenant is registered). Consider using debug or trace level. This is a duplicate of a prior review comment.
🔧 Suggested fix
         } else {
-            tracing::warn!(
+            tracing::debug!(
                 "Tenant with id {tenant_id} does not exist! Shouldn't happen (stream- {stream_name})"
             );
             false
         }

🧹 Nitpick comments (16)

src/utils/mod.rs (2)
61-77: Inconsistent tenant_id handling between BasicAuth and SessionId paths.

The BasicAuth path (line 68) returns user.tenant.clone() which is already Option<String>, while the SessionId path (line 76) wraps the tenant_id in Some(). This creates inconsistent behavior:

BasicAuth users may have tenant_id = None if user.tenant is None

SessionId users always have tenant_id = Some(...) even if the underlying value represents "no tenant"

Consider normalizing the return value to ensure consistent semantics across both authentication methods.

87-93: Unnecessary clone on tenant_id.

The tenant_id returned from get_userid_from_session is already a String. The .clone() on line 89 is redundant since you're returning by value.
Suggested fix
 pub fn get_tenant_id_from_key(key: &SessionKey) -> Option<String> {
     if let Some((_, tenant_id)) = Users.get_userid_from_session(key) {
-        Some(tenant_id.clone())
+        Some(tenant_id)
     } else {
         None
     }
 }
src/storage/object_storage.rs (6)
116-117: Remove commented-out debug logging.

These commented debug statements should be removed before merging.
-    // tracing::warn!("upload single stream_relative_path- {stream_relative_path:?}");
-    // tracing::warn!("upload single path- {path:?}");
634-636: Consider reducing log level for production paths.

The tracing::warn! statement here logs on every stream lookup miss, which could generate excessive logs in production. Consider using tracing::debug! or tracing::trace! instead.
-        tracing::warn!(
-            "unable to find stream- {stream_name} with tenant- {tenant_id:?} in PARSEABLE.get_stream"
-        );
+        tracing::debug!(
+            "Stream {stream_name} with tenant {tenant_id:?} not found in memory, loading from storage"
+        );
939-939: Remove debug logging artifact.
-        tracing::warn!(process_parquet_files_path=?path);
971-979: Remove commented debug code.
-    // tracing::warn!("spawn parquet file name- {filename}");
...
-    // tracing::warn!("spawn parquet stream_relative_path- {stream_relative_path}");
1061-1061: Remove or reduce debug logging in production path.
-        tracing::warn!(upload_context_schema_files=?path);
+        tracing::trace!(upload_context_schema_files=?path);
1156-1169: Consider consistent tenant handling in path functions.

The schema_path and stream_json_path functions unconditionally include an empty string when tenant_id is None (via .map_or("", |v| v)), creating paths like ["", stream_name, ...]. While the relative-path crate normalizes empty path segments and this doesn't cause runtime issues, this pattern is inconsistent with similar functions like alert_json_path and alert_state_json_path, which use conditional inclusion (if let Some(tenant_id)). For consistency and clarity, consider adopting the same pattern: only include the tenant component when it's present.
src/parseable/mod.rs (3)
252-267: Consider reducing log level for check_or_load_stream.

The tracing::warn! calls at lines 257 and 261 will fire frequently in normal operation. Consider using tracing::debug! instead.
-        tracing::warn!("check or load streams- {stream_name}, {tenant_id:?}");
+        tracing::debug!("check or load streams- {stream_name}, {tenant_id:?}");
...
-        tracing::warn!("check or load streams not present, creating {stream_name}, {tenant_id:?}");
+        tracing::debug!("check or load streams not present, creating {stream_name}, {tenant_id:?}");
821-831: Reduce debug logging in stream creation path.

Multiple tracing::warn! statements in the stream creation success path will generate excessive logs in production.
-                tracing::warn!(created_stream_at=?created_at);
-                tracing::warn!(stream_name=?stream_name);
-                tracing::warn!(schema=?schema);
-                tracing::warn!(tenant_id=?tenant_id);
+                tracing::debug!(created_stream_at=?created_at, stream_name=?stream_name, tenant_id=?tenant_id);
460-462: Reduce log level for schema commit.
-        tracing::warn!(commit_schema=?schema);
+        tracing::trace!(commit_schema=?schema);
src/handlers/http/middleware.rs (1)
329-337: Simplify suspension check with if let instead of match.

The match statement only handles the Suspended variant explicitly; consider using if let for cleaner code.
♻️ Suggested refactor
 pub fn auth_no_context(req: &mut ServiceRequest, action: Action) -> Result<rbac::Response, Error> {
     // check if tenant is suspended
-    match check_suspension(req.request(), action) {
-        rbac::Response::Suspended(msg) => return Ok(rbac::Response::Suspended(msg)),
-        _ => {}
+    if let rbac::Response::Suspended(msg) = check_suspension(req.request(), action) {
+        return Ok(rbac::Response::Suspended(msg));
     }
     let creds = extract_session_key(req);
     creds.map(|key| Users.authorize(key, action, None, None))
 }
The same pattern applies to auth_resource_context (lines 343-347) and auth_user_context (lines 370-374).
src/parseable/streams.rs (2)
413-421: Minor: Unnecessary variable binding.

The dirs variable is immediately returned. Consider returning the expression directly.
♻️ Suggested fix
-        let dirs = dir
+        dir
             .flatten()
             .map(|file| file.path())
             .filter(|file| {
                 file.extension().is_some_and(|ext| ext.eq("parquet"))
                     && Self::is_valid_parquet_file(file, &self.stream_name)
             })
-            .collect();
-        dirs
+            .collect()
     }
1032-1036: Breaking change: Streams type now requires nested tenant lookup.

The type change from HashMap<String, StreamRef> to HashMap<String, HashMap<String, StreamRef>> is a significant structural change. The commented-out old definition should be removed.
♻️ Suggested fix
-// #[derive(Deref, DerefMut, Default)]
-// pub struct Streams(RwLock<HashMap<String, StreamRef>>);
-
 #[derive(Deref, DerefMut, Default)]
 pub struct Streams(RwLock<HashMap<String, HashMap<String, StreamRef>>>);
src/rbac/user.rs (2)
153-164: Salt generation updated to use 32-byte random value.

The change from SaltString::generate(&mut OsRng) to manual 32-byte generation with encode_b64 is functionally equivalent but more explicit. The commented-out line should be removed.
♻️ Suggested fix
 fn gen_hash(password: &str) -> String {
     let mut bytes = [0u8; 32];
     let r = &mut OsRng;
     r.fill_bytes(&mut bytes);
     let salt = SaltString::encode_b64(&bytes).unwrap();
-    // let salt = SaltString::generate(&mut OsRng);
     let argon2 = Argon2::default();
     argon2
         .hash_password(password.as_bytes(), &salt)
         .expect("can hash random alphanumeric")
         .to_string()
 }
472-478: Remove commented-out code.

The commented-out update_in_metadata method should be removed or implemented. Leaving dead code makes maintenance harder.
♻️ Suggested fix
-    // pub async fn update_in_metadata(&self, tenant_id: &Option<String>) -> Result<(), RBACError> {
-    //     let mut metadata = get_metadata(tenant_id).await?;
-    //     metadata.user_groups.retain(|x| x.name != self.name);
-    //     metadata.user_groups.push(self.clone());
-    //     put_metadata(&metadata).await?;
-    //     Ok(())
-    // }
 }

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9802cbf and be1a48b.

📒 Files selected for processing (9)

src/handlers/http/correlation.rs
src/handlers/http/middleware.rs
src/handlers/http/users/dashboards.rs
src/handlers/http/users/filters.rs
src/parseable/mod.rs
src/parseable/streams.rs
src/rbac/user.rs
src/storage/object_storage.rs
src/utils/mod.rs

🧰 Additional context used

🧠 Learnings (26)

📚 Learning: 2025-05-01T10:27:56.858Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.

Applied to files:

src/handlers/http/users/filters.rs
src/handlers/http/users/dashboards.rs
src/handlers/http/middleware.rs

📚 Learning: 2025-09-05T09:27:12.659Z

Learnt from: parmesant
Repo: parseablehq/parseable PR: 1424
File: src/users/filters.rs:116-121
Timestamp: 2025-09-05T09:27:12.659Z
Learning: The Filters::load() function in src/users/filters.rs is only called once at server initialization, so there's no risk of duplicate entries from repeated invocations.

Applied to files:

src/handlers/http/users/filters.rs

📚 Learning: 2025-02-14T09:49:25.818Z

Learnt from: de-sh
Repo: parseablehq/parseable PR: 1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.

Applied to files:

src/utils/mod.rs
src/parseable/mod.rs
src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-05-01T12:22:42.363Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1305
File: src/users/dashboards.rs:154-165
Timestamp: 2025-05-01T12:22:42.363Z
Learning: Title validation for dashboards is performed in the `create_dashboard` HTTP handler function rather than in the `DASHBOARDS.create` method, avoiding redundant validation.

Applied to files:

src/handlers/http/users/dashboards.rs

📚 Learning: 2025-05-01T10:33:51.767Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

Applied to files:

src/handlers/http/users/dashboards.rs

📚 Learning: 2025-08-25T01:31:41.786Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.

Applied to files:

src/parseable/mod.rs
src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-25T01:32:25.980Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.

Applied to files:

src/parseable/mod.rs
src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-10-28T02:10:41.140Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1453
File: src/parseable/mod.rs:397-400
Timestamp: 2025-10-28T02:10:41.140Z
Learning: In Parseable enterprise deployments with multiple query nodes, hot tier configuration must be persisted in object storage so that newly started query nodes can fetch and synchronize the hot tier settings at startup (file: src/parseable/mod.rs, function: create_stream_and_schema_from_storage).

Applied to files:

src/parseable/mod.rs
src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-09-18T09:52:07.554Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.

Applied to files:

src/parseable/mod.rs
src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-09-06T04:26:17.191Z

Learnt from: parmesant
Repo: parseablehq/parseable PR: 1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).

Applied to files:

src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-10-21T02:22:24.403Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/parseable/mod.rs:419-432
Timestamp: 2025-10-21T02:22:24.403Z
Learning: In Parseable's internal stream creation (`create_internal_stream_if_not_exists` in `src/parseable/mod.rs`), errors should not propagate to fail server initialization. The function creates both pmeta and pbilling internal streams, and failures are logged but the function always returns `Ok(())` to ensure server startup resilience. Individual stream creation failures should not prevent syncing of successfully created streams.

Applied to files:

src/parseable/mod.rs
src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-18T14:56:18.463Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.

Applied to files:

src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-03-26T06:13:48.898Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

Applied to files:

src/parseable/mod.rs

📚 Learning: 2025-08-18T12:37:47.732Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.

Applied to files:

src/parseable/mod.rs
src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-18T19:10:11.941Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:163-164
Timestamp: 2025-08-18T19:10:11.941Z
Learning: Field statistics calculation in src/storage/field_stats.rs uses None for the time_partition parameter when calling flatten_and_push_logs(), as field stats generation does not require time partition functionality.

Applied to files:

src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-21T11:47:01.279Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.

Applied to files:

src/parseable/mod.rs

📚 Learning: 2025-08-21T14:41:55.462Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.

Applied to files:

src/parseable/mod.rs

📚 Learning: 2025-03-26T06:44:53.362Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Applied to files:

src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-09-18T09:59:20.177Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.

Applied to files:

src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-09-09T14:08:45.809Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1427
File: resources/ingest_demo_data.sh:440-440
Timestamp: 2025-09-09T14:08:45.809Z
Learning: In the resources/ingest_demo_data.sh demo script, hardcoded stream names like "demodata" in alert queries should be ignored and not flagged for replacement with $P_STREAM variables.

Applied to files:

src/parseable/streams.rs

📚 Learning: 2025-10-20T17:48:53.444Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/handlers/http/cluster/mod.rs:1370-1400
Timestamp: 2025-10-20T17:48:53.444Z
Learning: In src/handlers/http/cluster/mod.rs, the billing metrics processing logic should NOT accumulate counter values from multiple Prometheus samples with the same labels. The intended behavior is to convert each received counter from nodes into individual events for ingestion, using `.insert()` to store the counter value directly.

Applied to files:

src/parseable/streams.rs

📚 Learning: 2025-07-28T17:10:39.448Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.

Applied to files:

src/parseable/streams.rs

📚 Learning: 2025-09-14T15:17:59.234Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1432
File: src/storage/object_storage.rs:124-132
Timestamp: 2025-09-14T15:17:59.234Z
Learning: In Parseable's upload validation system (src/storage/object_storage.rs), the validate_uploaded_parquet_file function should not include bounded retries for metadata consistency issues. Instead, failed validations rely on the 30-second sync cycle for natural retries, with staging files preserved when manifest_file is set to None.

Applied to files:

src/storage/object_storage.rs

📚 Learning: 2025-08-20T17:01:25.791Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.

Applied to files:

src/storage/object_storage.rs

📚 Learning: 2025-08-18T18:01:22.834Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:271-292
Timestamp: 2025-08-18T18:01:22.834Z
Learning: In Parseable's ingestion validation, validate_stream_for_ingestion is designed to prevent regular log ingestion endpoints (ingest() and post_event()) from ingesting into streams that exclusively contain OTEL traces or metrics. The function allows mixed streams (regular logs + OTEL) but blocks ingestion into OTEL-only streams, maintaining architectural separation between regular log and OTEL ingestion pathways.

Applied to files:

src/storage/object_storage.rs

📚 Learning: 2025-06-16T09:50:38.636Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.

Applied to files:

src/storage/object_storage.rs

🧬 Code graph analysis (7)

src/handlers/http/correlation.rs (1)

src/utils/mod.rs (4)

get_hash (95-100)

get_tenant_id_from_request (79-85)

get_user_and_tenant_from_request (61-77)

user_auth_for_datasets (114-180)

src/handlers/http/users/filters.rs (1)

src/utils/mod.rs (3)

get_hash (95-100)

get_user_and_tenant_from_request (61-77)

is_admin (182-199)

src/utils/mod.rs (1)

src/utils/actix.rs (2)

req (31-31)

extract_session_key_from_req (51-71)

src/handlers/http/users/dashboards.rs (1)

src/utils/mod.rs (4)

get_hash (95-100)

get_tenant_id_from_request (79-85)

get_user_and_tenant_from_request (61-77)

is_admin (182-199)

src/handlers/http/middleware.rs (2)

src/rbac/mod.rs (1)

roles_to_permission (313-328)

src/utils/mod.rs (1)

get_user_and_tenant_from_request (61-77)

src/rbac/user.rs (2)

src/rbac/mod.rs (1)

add_roles (142-150)

src/rbac/map.rs (2)

roles (80-88)

mut_sessions (106-112)

src/parseable/mod.rs (5)

src/handlers/http/modal/utils/rbac_utils.rs (1)

get_metadata (24-34)

src/rbac/map.rs (3)

mut_roles (90-96)

mut_users (72-78)

users (62-70)

src/storage/store_metadata.rs (1)

put_remote_metadata (290-296)

src/metastore/metastore_traits.rs (1)

delete_tenant (306-306)

src/tenants/mod.rs (1)

delete_tenant (69-71)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: coverage
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments

🔇 Additional comments (33)

src/handlers/http/correlation.rs (3)

26-28: LGTM!

The new imports for tenant-related utilities are appropriate for the multi-tenancy changes.

101-114: LGTM!

The delete function correctly extracts and propagates both user_id and tenant_id, implementing proper tenant-scoped deletion. This pattern should be applied consistently to the other endpoints (get, post, modify).

73-78: Tenant isolation is maintained via session_key; the current implementation is correct.

The create() method does not require an explicit tenant_id parameter because it internally derives the tenant context from the session_key via get_tenant_id_from_key(session_key) (line 134 in src/correlation.rs). This tenant_id is then correctly passed to the metastore and used for in-memory storage, ensuring tenant isolation.

The extracted tenant_id in the post function is unused, but this does not represent a bug—it's vestigial code. The design intentionally passes session_key to create(), which handles tenant derivation internally (unlike delete(), which takes an explicit tenant_id parameter for a different purpose).

src/utils/mod.rs (2)

102-180: LGTM - tenant context properly threaded through authorization.

The user_auth_for_query and user_auth_for_datasets functions correctly propagate tenant context through check_or_load_stream and get_stream calls. The permission matching logic properly handles tenant-scoped stream access.

182-199: LGTM - admin check updated for tenant-aware permissions.

The is_admin function now correctly matches Some(ParseableResourceType::All) to align with the updated RBAC model.

src/handlers/http/users/filters.rs (4)

45-62: LGTM - tenant context properly integrated into filter retrieval.

The get handler correctly extracts both user_id and tenant_id, then passes tenant_id to FILTERS.get_filter for tenant-scoped filter access.

64-78: LGTM - filter creation with tenant context.

The post handler correctly propagates tenant_id to both metastore persistence (put_filter) and in-memory state update (FILTERS.update).

80-107: LGTM - filter update with tenant context.

The update handler correctly uses tenant_id for both authorization check (get_filter) and persistence operations.

109-131: LGTM - filter deletion with tenant context.

The delete handler correctly propagates tenant_id through authorization, metastore deletion, and in-memory state cleanup.

src/handlers/http/users/dashboards.rs (5)

92-106: LGTM - create_dashboard uses auth-based tenant extraction.

The create_dashboard handler correctly uses get_user_and_tenant_from_request to derive tenant context from the authenticated user.

108-195: LGTM - update_dashboard with proper tenant context.

The handler correctly uses get_user_and_tenant_from_request and passes tenant_id through both authorization (get_dashboard_by_user) and update operations.

197-212: LGTM - delete_dashboard with proper tenant context.

The handler correctly uses auth-based tenant extraction for deletion operations.

214-246: LGTM - add_tile properly checks ownership with tenant context.

Based on learnings, get_dashboard_by_user is correctly used for authorization before modification, and tenant_id is properly propagated.

36-76: No security issue. The list_dashboards function correctly uses the tenant ID from the request. All dashboard endpoints are protected by the .authorize() middleware, which invokes the AuthMiddleware that validates the tenant from the authenticated user session and overwrites the request tenant header with the authenticated user's actual tenant before the handler executes. This prevents header-based tenant spoofing attacks, ensuring consistent tenant isolation across all operations.

Likely an incorrect or invalid review comment.

src/storage/object_storage.rs (3)

1074-1093: LGTM - tenant-prefixed path construction.

The stream_relative_path function correctly handles tenant prefixing, using DEFAULT_TENANT comparison to determine whether to include tenant in the path.

1095-1123: LGTM - multi-tenant stream sync orchestration.

The sync_all_streams function correctly iterates over tenants (or uses None for single-tenant mode) and spawns upload tasks per tenant/stream combination.

1211-1221: LGTM - tenant-aware alert path construction.

The alert_json_path function correctly handles optional tenant prefix for alert storage paths.

src/parseable/mod.rs (3)

44-45: LGTM - DEFAULT_TENANT constant.

Good practice to define a constant for the default tenant identifier.

213-227: LGTM - tenant-aware stream lookup.

The get_stream function correctly falls back to DEFAULT_TENANT when no tenant is specified, maintaining backward compatibility.

467-541: LGTM - per-tenant internal stream creation.

The create_internal_stream_if_not_exists function correctly iterates over all tenants (or uses None for single-tenant mode) to create internal streams per tenant.

src/handlers/http/middleware.rs (4)

200-216: Tenant-aware user lookup during session refresh looks correct.

The code correctly uses map_or(DEFAULT_TENANT, |v| v) to handle the Option<String> tenant_id and retrieves the user from the tenant-scoped map. The nested if let pattern appropriately handles both the tenant and user lookups.

251-264: Consistent tenant handling in user mutation path.

The mutable user lookup mirrors the read path with the same tenant resolution pattern. The early return with an appropriate error message when user is not found is correct.

266-289: Tenant propagation to track_new and roles_to_permission is correct.

The tenant_id is properly passed through to both roles_to_permission (which expects &str per the relevant snippet) and track_new. The fallback to DEFAULT_TENANT is consistent with the rest of the codebase.

305-308: New Suspended response mapped to BadRequest.

The suspension response is properly handled and converted to an HTTP 400 error. This aligns with the pattern of informing clients about the specific suspension reason.

src/parseable/streams.rs (1)

123-126: Tenant-aware data path derivation.

The Stream::new function correctly accepts tenant_id and passes it to local_stream_data_path for tenant-scoped storage paths. This aligns with the multi-tenancy design.

src/rbac/user.rs (8)

48-48: New tenant field added to User struct.

The addition of the optional tenant field properly supports multi-tenancy. The field is correctly propagated through constructors.

53-67: new_basic correctly accepts and stores tenant.

The constructor properly initializes the new tenant field from the parameter.

69-86: new_oauth correctly accepts and stores tenant.

The OAuth user constructor properly handles the tenant parameter.

131-131: Password length increased from 16 to 32 characters.

This is a security improvement, providing more entropy for generated passwords.

246-247: New tenant_id field added to GroupUser.

The addition supports tenant-scoped user group operations. The field is correctly propagated in from_user.

374-384: add_roles now requires tenant_id: &str parameter.

The signature change aligns with the relevant code snippet from src/rbac/mod.rs which shows add_roles accepting tenant_id: &Option<String>. However, here it takes &str directly. Ensure callers resolve Option<String> to &str before calling.

393-397: Tenant resolution in add_users uses GroupUser.tenant_id.

The code correctly uses the tenant_id from each GroupUser with DEFAULT_TENANT as fallback for session removal.

171-185: No action required. All callers of the renamed function and role definitions have been updated consistently. The function get_admin_user has been completely removed with no orphaned references, and the single caller in src/rbac/map.rs:138 correctly uses get_super_admin_user. The role definition in src/rbac/map.rs:135 is consistent with the new "super-admin" name.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T13:04:35Z

src/handlers/http/correlation.rs

+    let tenant_id = get_tenant_id_from_request(&req);
    let correlation_id = correlation_id.into_inner();
    let session_key = extract_session_key_from_req(&req)
        .map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;

-    let correlation = CORRELATIONS.get_correlation(&correlation_id).await?;
+    let correlation = CORRELATIONS
+        .get_correlation(&correlation_id, &tenant_id)
+        .await?;


⚠️ Potential issue | 🟠 Major

Inconsistent tenant_id extraction creates potential security risk.

The get function extracts tenant_id from an HTTP header via get_tenant_id_from_request, while delete extracts it from the authenticated session via get_user_and_tenant_from_request. This inconsistency could allow a user to access correlations from other tenants by manipulating the tenant header.

For consistency and security, consider using the session-based approach here as well:

🔒 Proposed fix

pub async fn get( req: HttpRequest, correlation_id: Path<String>, ) -> Result<impl Responder, CorrelationError> { - let tenant_id = get_tenant_id_from_request(&req); let correlation_id = correlation_id.into_inner(); let session_key = extract_session_key_from_req(&req) .map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?; + let tenant_id = get_user_and_tenant_from_request(&req) + .map(|(_, t)| t) + .map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?; let correlation = CORRELATIONS .get_correlation(&correlation_id, &tenant_id) .await?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let tenant_id = get_tenant_id_from_request(&req);

let correlation_id = correlation_id.into_inner();

let session_key = extract_session_key_from_req(&req)

.map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;

let correlation = CORRELATIONS.get_correlation(&correlation_id).await?;

let correlation = CORRELATIONS

.get_correlation(&correlation_id, &tenant_id)

.await?;

let correlation_id = correlation_id.into_inner();

let session_key = extract_session_key_from_req(&req)

.map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;

let tenant_id = get_user_and_tenant_from_request(&req)

.map(|(_, t)| t)

.map_err(|err| CorrelationError::AnyhowError(Error::msg(err.to_string())))?;

let correlation = CORRELATIONS

.get_correlation(&correlation_id, &tenant_id)

.await?;

🤖 Prompt for AI Agents

In `@src/handlers/http/correlation.rs` around lines 45 - 52, The handler currently uses get_tenant_id_from_request (header) to derive tenant_id which is inconsistent with delete; replace that header-based extraction with the session-based get_user_and_tenant_from_request flow: call get_user_and_tenant_from_request(&req, &session_key) (or the project's equivalent) and use the returned tenant (e.g., from the (user, tenant) tuple) as tenant_id, propagate errors the same way as delete does, then pass that tenant_id into CORRELATIONS.get_correlation(&correlation_id, &tenant_id). Ensure you remove or stop using get_tenant_id_from_request in this function so tenant is always taken from the authenticated session.

coderabbitai · 2026-01-15T13:04:35Z

src/handlers/http/users/dashboards.rs

+pub async fn list_tags(req: HttpRequest) -> Result<impl Responder, DashboardError> {
+    let tags = DASHBOARDS
+        .list_tags(&get_tenant_id_from_request(&req))
+        .await;
    Ok((web::Json(tags), StatusCode::OK))
 }


⚠️ Potential issue | 🟠 Major

Tenant isolation concern in list_tags.

Similar to list_dashboards, this extracts tenant from header rather than authenticated session.

🤖 Prompt for AI Agents

In `@src/handlers/http/users/dashboards.rs` around lines 248 - 253, list_tags currently uses get_tenant_id_from_request (header-based) which breaks tenant isolation; change it to extract the tenant id the same way list_dashboards does (i.e., from the authenticated session/context rather than a raw header). Locate the list_tags function and replace the call to get_tenant_id_from_request(&req) with the same tenant-extraction helper used by list_dashboards (or call into the auth/session object retrieved from the request), ensure the tenant value passed to DASHBOARDS.list_tags(...) comes from the authenticated session, and keep the existing return/error handling (same types: list_tags, DASHBOARDS, HttpRequest, DashboardError).

coderabbitai · 2026-01-15T13:04:35Z

src/parseable/streams.rs

            let schema = Arc::new(merged_schema);

            let part_path = parquet_path.with_extension("part");
+            tracing::warn!(part_path=?part_path);


⚠️ Potential issue | 🟡 Minor

Remove debug logging statement.

The tracing::warn!(part_path=?part_path) call appears to be a debug artifact that will flood production logs. Remove or downgrade to trace level.

🔧 Suggested fix

- tracing::warn!(part_path=?part_path);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tracing::warn!(part_path=?part_path);

🤖 Prompt for AI Agents

In `@src/parseable/streams.rs` at line 649, The tracing::warn!(part_path=?part_path) call is a debug artifact that will flood production logs; either remove it or downgrade it to trace level by replacing tracing::warn!(part_path=?part_path) with tracing::trace!(part_path=?part_path) (or delete the statement entirely) in the same function/scope where the macro appears so logging noise is eliminated.

coderabbitai · 2026-01-15T13:04:35Z

src/parseable/streams.rs

+        let tenants = if let Some(tenants) = PARSEABLE.list_tenants() {
+            tenants
+        } else {
+            vec![DEFAULT_TENANT.to_owned()]
+        };
+        tracing::warn!(flush_and_convert_tenants=?tenants);
+        tracing::warn!(parseable_streams_tenants=?self.read().unwrap().keys());


⚠️ Potential issue | 🟡 Minor

Remove debug logging in flush_and_convert.

The tracing::warn! calls at lines 1168-1169 are debug artifacts that should be removed before merging.

🔧 Suggested fix

let tenants = if let Some(tenants) = PARSEABLE.list_tenants() { tenants } else { vec![DEFAULT_TENANT.to_owned()] }; - tracing::warn!(flush_and_convert_tenants=?tenants); - tracing::warn!(parseable_streams_tenants=?self.read().unwrap().keys()); for tenant_id in tenants {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let tenants = if let Some(tenants) = PARSEABLE.list_tenants() {

tenants

} else {

vec![DEFAULT_TENANT.to_owned()]

};

tracing::warn!(flush_and_convert_tenants=?tenants);

tracing::warn!(parseable_streams_tenants=?self.read().unwrap().keys());

let tenants = if let Some(tenants) = PARSEABLE.list_tenants() {

tenants

} else {

vec![DEFAULT_TENANT.to_owned()]

};

🤖 Prompt for AI Agents

In `@src/parseable/streams.rs` around lines 1163 - 1169, The debug tracing statements in flush_and_convert are left over and should be removed: delete the two tracing::warn! calls that log flush_and_convert_tenants and parseable_streams_tenants (the lines that reference tenants from PARSEABLE.list_tenants()/DEFAULT_TENANT and self.read().unwrap().keys()). Leave the tenants selection logic (using PARSEABLE.list_tenants and DEFAULT_TENANT) intact and do not replace them with other logging.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)

src/handlers/http/correlation.rs (1)

89-96: Missing tenant_id propagation in correlation update.

Same issue as post — the tenant_id is extracted but discarded. If CORRELATIONS.update supports tenant scoping, it should be passed here for consistency with the delete handler.
src/handlers/http/users/dashboards.rs (2)
36-69: Inconsistent tenant_id extraction in read operations.

The list_dashboards function extracts tenant_id from an HTTP header via get_tenant_id_from_request (line 37), while write operations like create_dashboard (line 99) use get_user_and_tenant_from_request which derives tenant from the authenticated session.

This inconsistency could allow users to view dashboards from other tenants by manipulating the tenant header. For proper tenant isolation, use session-based extraction:
🔒 Suggested fix
 pub async fn list_dashboards(req: HttpRequest) -> Result<impl Responder, DashboardError> {
-    let tenant_id = get_tenant_id_from_request(&req);
+    let (_, tenant_id) = get_user_and_tenant_from_request(&req)?;
     let query_map = web::Query::<HashMap<String, String>>::from_query(req.query_string())
78-90: Inconsistent tenant_id extraction in get_dashboard.

Same issue as list_dashboards — uses header-based get_tenant_id_from_request instead of session-based extraction, which could allow cross-tenant data access.
🔒 Suggested fix
 pub async fn get_dashboard(
     req: HttpRequest,
     dashboard_id: Path<String>,
 ) -> Result<impl Responder, DashboardError> {
     let dashboard_id = validate_dashboard_id(dashboard_id.into_inner())?;
-    let tenant_id = get_tenant_id_from_request(&req);
+    let (_, tenant_id) = get_user_and_tenant_from_request(&req)?;
     let dashboard = DASHBOARDS
src/rbac/user.rs (1)
241-305: Fix GroupUser equality/hash to include tenant_id (prevents cross-tenant collisions).
Now that GroupUser has tenant_id (Line 246), keeping Eq/Hash based only on userid risks treating two different-tenant users as the same set element.
Proposed diff
 impl PartialEq for GroupUser {
     fn eq(&self, other: &Self) -> bool {
-        self.userid == other.userid
+        self.userid == other.userid && self.tenant_id == other.tenant_id
     }
 }
 impl Eq for GroupUser {}
 impl std::hash::Hash for GroupUser {
     fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
         self.userid.hash(state)
+        self.tenant_id.hash(state)
     }
 }
src/parseable/streams.rs (1)

117-137: Update 14 test call sites for the new Stream::new(..., tenant_id) signature.

Stream::new now requires a 5th parameter tenant_id: &Option<String> (line 123), but 14 test calls in the file are still using the old 4-parameter signature. These need to be updated to pass the missing parameter:

Lines 1202, 1220, 1238, 1256, 1277, 1300, 1334, 1363, 1420, 1469, 1519 (multi-line calls with 4 parameters)

Lines 1446, 1496, 1550 (single-line calls with 4 parameters)

Production code at line 1072 already uses the correct signature. Test calls should add &None as the 5th argument.

src/handlers/http/query.rs (1)

112-162: Validate tenant headers before using them as DataFusion schema identifiers.

The tenant header is extracted via get_tenant_id_from_request() (which calls .to_str().unwrap() on the raw header value) and then passed directly to default_schema on lines 121–125 without any format validation or normalization. Since tenant IDs are used as SQL schema identifiers and as partitioning keys throughout the system, an attacker-controlled tenant header could contain characters that violate SQL identifier rules (e.g., spaces, special characters) or cause tenant-isolation issues.

This pattern is also repeated in src/storage/field_stats.rs:108. Add explicit validation (e.g., alphanumeric + underscore only, length limits) and normalize tenant IDs before using them as schema names or auth scope. Ideally, also verify that the tenant exists in your system.

src/storage/object_storage.rs (1)

1073-1093: Unify tenant prefix rules across path builders (avoid empty/DEFAULT_TENANT path segments).

schema_path and stream_json_path unconditionally include a tenant element using map_or("", ...), resulting in empty string path components when tenant_id is None. This diverges from stream_relative_path (skips tenant for None/DEFAULT_TENANT), alert_json_path, and mttr_json_path (conditional inclusion). The metastore code explicitly works around this by converting empty tenant strings back to DEFAULT_TENANT when reading—evidence the inconsistency causes actual path mismatches. Standardize all path builders to either skip tenant components when None or uniformly include the tenant value without empty segments.

🤖 Fix all issues with AI agents

In `@src/handlers/http/query.rs`:
- Around line 126-131: The local binding schema_names is computed from
session_state.catalog_list().catalog("datafusion").unwrap().schema_names() but
never used; remove the unused variable and its computation or, if you intended
to use it later, reference it appropriately—specifically delete the let
schema_names = ... statement (or rename to _schema_names if you want to suppress
the unused warning temporarily) so the build won’t fail under deny(warnings).

In `@src/rbac/user.rs`:
- Around line 374-383: The add_roles method extends self.roles then refreshes
sessions using the caller-provided tenant_id for every user, which can
invalidate the wrong tenant; update the session invalidation loop in
UserGroup::add_roles to call mut_sessions().remove_user(group_user.userid(),
group_user.tenant_id()) (i.e., use each group_user.tenant_id instead of the
function parameter) so session invalidation uses each user’s actual tenant.

In `@src/storage/object_storage.rs`:
- Around line 939-949: Replace noisy per-file warning logs with lower-severity
debug/trace logs: change the tracing::warn! that logs process_parquet_files_path
in process_parquet_files to tracing::debug! (or tracing::trace! if extremely
verbose) and likewise change the tracing::warn!/warn! usage around schema
processing in the same module (the block that iterates schema files ~lines
1055–1069) to tracing::debug!/trace!. Update any similar per-file warn calls
passed into spawn_parquet_upload_task or nearby helpers so only actionable
operator issues remain at warn level; keep error-level logs for actual failures.

♻️ Duplicate comments (12)

src/handlers/http/correlation.rs (1)

45-52: Inconsistent tenant_id extraction creates potential security risk.

The get function extracts tenant_id from an HTTP header via get_tenant_id_from_request, while delete (line 106) extracts it from the authenticated session via get_user_and_tenant_from_request. This inconsistency could allow a user to access correlations from other tenants by manipulating the tenant header.

For consistency and security, use the session-based approach here as well.
src/handlers/http/middleware.rs (2)
167-180: Handle potential panic from HeaderValue::from_str().unwrap().

If tenant_id contains characters not valid in HTTP headers (e.g., control characters), HeaderValue::from_str() will return an error and unwrap() will panic.
♻️ Suggested fix
         let user_and_tenant_id = match get_user_and_tenant_from_request(req.request()) {
             Ok((uid, tid)) => {
-                if tid.is_some() {
-                    req.headers_mut().insert(
-                        HeaderName::from_static("tenant"),
-                        HeaderValue::from_str(&tid.as_ref().unwrap()).unwrap(),
-                    );
-                }
+                if let Some(ref tenant) = tid {
+                    if let Ok(header_val) = HeaderValue::from_str(tenant) {
+                        req.headers_mut().insert(
+                            HeaderName::from_static("tenant"),
+                            header_val,
+                        );
+                    }
+                }
 
                 Ok((uid, tid))
             }
             Err(e) => Err(e),
         };
316-327: Security: Consider stricter handling when tenant doesn't exist.

The check_suspension function returns Authorized when:

No tenant header is present (line 326)

Tenant doesn't exist in TENANT_METADATA (lines 322-324, empty else branch)

This could allow requests to bypass tenant-level controls. Consider whether non-existent tenants should return Unauthorized rather than Authorized.
src/utils/mod.rs (1)
84-90: Potential panic on malformed header value.

Using .unwrap() on to_str() can panic if the tenant header contains non-UTF8 bytes.
♻️ Suggested fix
 pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> {
     if let Some(tenant_value) = req.headers().get("tenant") {
-        Some(tenant_value.to_str().unwrap().to_owned())
+        tenant_value.to_str().ok().map(|s| s.to_owned())
     } else {
         None
     }
 }
src/handlers/http/users/dashboards.rs (1)
248-253: Tenant isolation concern in list_tags.

Similar to list_dashboards, this extracts tenant from header rather than authenticated session, which could allow cross-tenant data access.
🔒 Suggested fix
 pub async fn list_tags(req: HttpRequest) -> Result<impl Responder, DashboardError> {
+    let (_, tenant_id) = get_user_and_tenant_from_request(&req)?;
     let tags = DASHBOARDS
-        .list_tags(&get_tenant_id_from_request(&req))
+        .list_tags(&tenant_id)
         .await;
     Ok((web::Json(tags), StatusCode::OK))
 }
src/parseable/streams.rs (2)

1055-1079: Remove/downgrade the new tracing::warn! debug logs (noise + possible secrets).
The warn logs added in get_or_create/contains/flush_and_convert are development-level verbosity and include options / metadata formatting.

Also applies to: 1092-1101, 1163-1169

648-650: Drop tracing::warn!(part_path=?part_path) debug artifact.
This is likely to flood logs during normal parquet conversion.

src/parseable/mod.rs (3)

1066-1085: Make tenant add atomic (avoid TOCTOU between read and write locks).
Current add_tenant does contains() under a read lock then pushes under a write lock (Line 1075-1081).

1125-1153: delete_tenant must also remove the tenant from self.tenants (in-memory list).
Right now it cleans users/roles and TENANT_METADATA, but list_tenants() can still return the deleted tenant.

1155-1191: load_tenants / list_tenants shouldn’t silently swallow poisoned locks or have empty branches.
load_tenants has an empty else if !is_multi_tenant {} (Line 1177-1178) and returns Ok(None) on write-lock failure (Line 1185-1190); list_tenants similarly returns None on lock failure.

Also applies to: 1193-1201
src/alerts/mod.rs (2)
1331-1339: Silent write drop still possible in update_state().

The write-back block still uses get_mut(tenant) which silently skips the insert if the tenant bucket doesn't exist. This was flagged in a previous review and should use the same entry().or_default() pattern as update().
🐛 Proposed fix
         {
             let mut write_access = self.alerts.write().await;
-
             let tenant = alert.get_tenant_id().as_ref().map_or(DEFAULT_TENANT, |v| v);
-            if let Some(alerts) = write_access.get_mut(tenant) {
-                alerts.insert(*alert.get_id(), alert.clone_box());
-            }
-            // write_access.insert(*alert.get_id(), alert.clone_box());
+            write_access
+                .entry(tenant.to_owned())
+                .or_default()
+                .insert(*alert.get_id(), alert.clone_box());
         }
1377-1381: Silent write drop still possible in update_notification_state().

Same issue as update_state() - uses get_mut(tenant) which silently skips the insert if the tenant bucket doesn't exist.
🐛 Proposed fix
         alert
             .update_notification_state(new_notification_state)
             .await?;
-        if let Some(alerts) = write_access.get_mut(tenant) {
-            alerts.insert(*alert.get_id(), alert.clone_box());
-        }
-        // write_access.insert(*alert.get_id(), alert.clone_box());
+        write_access
+            .entry(tenant.to_owned())
+            .or_default()
+            .insert(*alert.get_id(), alert.clone_box());

         Ok(())

🧹 Nitpick comments (9)

src/handlers/http/middleware.rs (1)
329-337: Consider simplifying the suspension check pattern.

The match statement can be simplified using if let for better readability:
♻️ Suggested refactor
 pub fn auth_no_context(req: &mut ServiceRequest, action: Action) -> Result<rbac::Response, Error> {
     // check if tenant is suspended
-    match check_suspension(req.request(), action) {
-        rbac::Response::Suspended(msg) => return Ok(rbac::Response::Suspended(msg)),
-        _ => {}
+    if let rbac::Response::Suspended(msg) = check_suspension(req.request(), action) {
+        return Ok(rbac::Response::Suspended(msg));
     }
     let creds = extract_session_key(req);
     creds.map(|key| Users.authorize(key, action, None, None))
 }
The same pattern applies to auth_resource_context (lines 343-347) and auth_user_context (lines 370-374).
src/handlers/http/query.rs (1)

423-473: Consider propagating distributed stream-load failures instead of always returning Ok(()).

create_streams_for_distributed logs failures (Line 458-463) but the caller proceeds; that can turn “stream couldn’t be loaded” into opaque execution errors later. If callers require these streams to exist, consider collecting task results and returning an error if any load fails.

src/parseable/mod.rs (1)

252-267: Downgrade/remove tracing::warn! debug logs that can flood production and expose internals.
Examples include check_or_load_stream (Line 257/261), create_stream_and_schema_from_storage (Line 385/392/393/460), and create_stream (Line 821/827-831).

Also applies to: 376-465, 645-871
src/alerts/mod.rs (6)
1045-1050: Consider simplifying tenant Option construction.

The current logic converts empty string to &None and non-empty to &Some(tenant_id.clone()). This works but could be cleaner.
♻️ Suggested simplification
-        for (tenant_id, raw_bytes) in raw_objects {
-            let tenant = if tenant_id.is_empty() {
-                &None
-            } else {
-                &Some(tenant_id.clone())
-            };
+        for (tenant_id, raw_bytes) in raw_objects {
+            let tenant: Option<String> = (!tenant_id.is_empty()).then(|| tenant_id.clone());
Then use &tenant where needed. This avoids the awkward reference-to-temporary pattern.
1165-1169: Remove commented-out code.

This commented-out code block appears to be leftover from refactoring. It should be removed to maintain code clarity.
♻️ Proposed cleanup
             } else {
                 vec![]
             }
-            // alerts_guard
-            //     .values()
-            //     .map(|alert| alert.to_alert_config())
-            //     .collect()
         };
1395-1399: Remove commented-out code.
♻️ Proposed cleanup
         } else {
             warn!("Alert ID- {alert_id} not found in memory!");
         }
-        // if self.alerts.write().await.remove(&alert_id).is_some() {
-        //     trace!("removed alert from memory");
-        // } else {
-        //     warn!("Alert ID- {alert_id} not found in memory!");
-        // }
         Ok(())
1443-1458: Remove commented-out code and approve tenant-scoped logic.

The tenant-scoped tag listing is correct. Remove the commented line for cleanliness.
♻️ Proposed cleanup
     async fn list_tags(&self, tenant_id: &Option<String>) -> Vec<String> {
         let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
-        // let alerts = self.alerts.read().await;
         let mut tags = if let Some(alerts) = self.alerts.read().await.get(tenant) {
1470-1472: Remove commented-out code.
♻️ Proposed cleanup
         } else {
             HashMap::new()
         }
-        // let alerts = self.alerts.read().await;
-        // alerts.iter().map(|(k, v)| (*k, v.clone_box())).collect()
     }
1282-1298: Remove large commented-out code block.

This entire block of commented-out code should be removed. It shows the old implementation that has been replaced by the tenant-aware version above.
♻️ Proposed cleanup
             } else {
                 return Err(AlertError::CustomError(format!(
                     "No alert found for the given ID- {alert_id}"
                 )));
             };
-            // let alert = if let Some(alert) = read_access.get(&alert_id) {
-            //     match &alert.get_alert_type() {
-            //         AlertType::Threshold => Box::new(ThresholdAlert::from(alert.to_alert_config()))
-            //             as Box<dyn AlertTrait>,
-            //         AlertType::Anomaly(_) => {
-            //             return Err(AlertError::NotPresentInOSS("anomaly"));
-            //         }
-            //         AlertType::Forecast(_) => {
-            //             return Err(AlertError::NotPresentInOSS("forecast"));
-            //         }
-            //     }
-            // } else {
-            //     return Err(AlertError::CustomError(format!(
-            //         "No alert found for the given ID- {alert_id}"
-            //     )));
-            // };

             let current_state = *alert.get_state();

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be1a48b and ca421da.

📒 Files selected for processing (11)

src/alerts/mod.rs
src/handlers/http/correlation.rs
src/handlers/http/middleware.rs
src/handlers/http/query.rs
src/handlers/http/users/dashboards.rs
src/handlers/http/users/filters.rs
src/parseable/mod.rs
src/parseable/streams.rs
src/rbac/user.rs
src/storage/object_storage.rs
src/utils/mod.rs

🧰 Additional context used

🧠 Learnings (30)

📚 Learning: 2025-05-01T10:27:56.858Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1305
File: src/handlers/http/users/dashboards.rs:0-0
Timestamp: 2025-05-01T10:27:56.858Z
Learning: The `add_tile()` function in `src/handlers/http/users/dashboards.rs` should use `get_dashboard_by_user(dashboard_id, &user_id)` instead of `get_dashboard(dashboard_id)` to ensure proper authorization checks when modifying a dashboard.

Applied to files:

src/handlers/http/correlation.rs
src/handlers/http/middleware.rs
src/handlers/http/users/filters.rs
src/handlers/http/users/dashboards.rs

📚 Learning: 2025-09-05T09:27:12.659Z

Learnt from: parmesant
Repo: parseablehq/parseable PR: 1424
File: src/users/filters.rs:116-121
Timestamp: 2025-09-05T09:27:12.659Z
Learning: The Filters::load() function in src/users/filters.rs is only called once at server initialization, so there's no risk of duplicate entries from repeated invocations.

Applied to files:

src/handlers/http/users/filters.rs

📚 Learning: 2025-02-14T09:49:25.818Z

Learnt from: de-sh
Repo: parseablehq/parseable PR: 1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern `!PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)` ensures proper error handling in both modes.

Applied to files:

src/utils/mod.rs
src/handlers/http/query.rs
src/parseable/streams.rs
src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-06-18T06:39:04.775Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1340
File: src/query/mod.rs:64-66
Timestamp: 2025-06-18T06:39:04.775Z
Learning: In src/query/mod.rs, QUERY_SESSION_STATE and QUERY_SESSION serve different architectural purposes: QUERY_SESSION_STATE is used for stats calculation and allows dynamic registration of individual parquet files from the staging path (files created every minute), while QUERY_SESSION is used for object store queries with the global schema provider. Session contexts with schema providers don't support registering individual tables/parquets, so both session objects are necessary for their respective use cases.

Applied to files:

src/handlers/http/query.rs
src/alerts/mod.rs

📚 Learning: 2025-06-18T12:44:31.983Z

Learnt from: parmesant
Repo: parseablehq/parseable PR: 1347
File: src/handlers/http/query.rs:0-0
Timestamp: 2025-06-18T12:44:31.983Z
Learning: The counts API in src/handlers/http/query.rs does not currently support group_by functionality in COUNT queries, so the hard-coded fields array ["start_time", "end_time", "count"] is appropriate for the current scope.

Applied to files:

src/handlers/http/query.rs

📚 Learning: 2025-10-28T02:10:41.140Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1453
File: src/parseable/mod.rs:397-400
Timestamp: 2025-10-28T02:10:41.140Z
Learning: In Parseable enterprise deployments with multiple query nodes, hot tier configuration must be persisted in object storage so that newly started query nodes can fetch and synchronize the hot tier settings at startup (file: src/parseable/mod.rs, function: create_stream_and_schema_from_storage).

Applied to files:

src/handlers/http/query.rs
src/parseable/streams.rs
src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-10-21T02:22:24.403Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/parseable/mod.rs:419-432
Timestamp: 2025-10-21T02:22:24.403Z
Learning: In Parseable's internal stream creation (`create_internal_stream_if_not_exists` in `src/parseable/mod.rs`), errors should not propagate to fail server initialization. The function creates both pmeta and pbilling internal streams, and failures are logged but the function always returns `Ok(())` to ensure server startup resilience. Individual stream creation failures should not prevent syncing of successfully created streams.

Applied to files:

src/handlers/http/query.rs
src/parseable/streams.rs
src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-05-01T12:22:42.363Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1305
File: src/users/dashboards.rs:154-165
Timestamp: 2025-05-01T12:22:42.363Z
Learning: Title validation for dashboards is performed in the `create_dashboard` HTTP handler function rather than in the `DASHBOARDS.create` method, avoiding redundant validation.

Applied to files:

src/handlers/http/users/dashboards.rs

📚 Learning: 2025-05-01T10:33:51.767Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.

Applied to files:

src/handlers/http/users/dashboards.rs

📚 Learning: 2025-03-26T06:44:53.362Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Applied to files:

src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-25T01:31:41.786Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metadata.rs:63-68
Timestamp: 2025-08-25T01:31:41.786Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metadata.rs and src/storage/object_storage.rs are designed to track total events across all streams, not per-stream. They use labels [origin, parsed_date] to aggregate by format and date, while per-stream metrics use [stream_name, origin, parsed_date] labels.

Applied to files:

src/parseable/streams.rs
src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-09-18T09:59:20.177Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.

Applied to files:

src/parseable/streams.rs
src/storage/object_storage.rs

📚 Learning: 2025-07-28T17:10:39.448Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.448Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.

Applied to files:

src/parseable/streams.rs

📚 Learning: 2025-08-25T01:32:25.980Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:163-173
Timestamp: 2025-08-25T01:32:25.980Z
Learning: The TOTAL_EVENTS_INGESTED_DATE, TOTAL_EVENTS_INGESTED_SIZE_DATE, and TOTAL_EVENTS_STORAGE_SIZE_DATE metrics in src/metrics/mod.rs are intentionally designed to track global totals across all streams for a given date, using labels ["format", "date"] rather than per-stream labels. This is the correct design for global aggregation purposes.

Applied to files:

src/parseable/streams.rs
src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-09-09T14:08:45.809Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1427
File: resources/ingest_demo_data.sh:440-440
Timestamp: 2025-09-09T14:08:45.809Z
Learning: In the resources/ingest_demo_data.sh demo script, hardcoded stream names like "demodata" in alert queries should be ignored and not flagged for replacement with $P_STREAM variables.

Applied to files:

src/parseable/streams.rs

📚 Learning: 2025-10-20T17:48:53.444Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1448
File: src/handlers/http/cluster/mod.rs:1370-1400
Timestamp: 2025-10-20T17:48:53.444Z
Learning: In src/handlers/http/cluster/mod.rs, the billing metrics processing logic should NOT accumulate counter values from multiple Prometheus samples with the same labels. The intended behavior is to convert each received counter from nodes into individual events for ingestion, using `.insert()` to store the counter value directly.

Applied to files:

src/parseable/streams.rs

📚 Learning: 2025-09-18T09:52:07.554Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/storage/object_storage.rs:173-177
Timestamp: 2025-09-18T09:52:07.554Z
Learning: In Parseable's upload system (src/storage/object_storage.rs), the update_storage_metrics function can safely use path.metadata().map_err() to fail on local file metadata read failures because parquet validation (validate_uploaded_parquet_file) ensures file integrity before this step, and the system guarantees local staging files remain accessible throughout the upload flow.

Applied to files:

src/parseable/streams.rs
src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-18T12:37:47.732Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/parseable/mod.rs:528-533
Timestamp: 2025-08-18T12:37:47.732Z
Learning: In Parseable, the validate_time_partition function in src/utils/json/flatten.rs already provides a default time partition limit of 30 days using `map_or(30, |days| days.get() as i64)` when time_partition_limit is None, so no additional defaulting is needed in the stream creation logic in src/parseable/mod.rs.

Applied to files:

src/parseable/streams.rs
src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-09-06T04:26:17.191Z

Learnt from: parmesant
Repo: parseablehq/parseable PR: 1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).

Applied to files:

src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-18T14:56:18.463Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/storage/object_storage.rs:997-1040
Timestamp: 2025-08-18T14:56:18.463Z
Learning: In Parseable's staging upload system (src/storage/object_storage.rs), failed parquet file uploads should remain in the staging directory for retry in the next sync cycle, while successful uploads remove their staged files immediately. Early return on first error in collect_upload_results is correct behavior as concurrent tasks handle their own cleanup and failed files need to stay for retry.

Applied to files:

src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-03-26T06:13:48.898Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

Applied to files:

src/parseable/mod.rs

📚 Learning: 2025-08-18T19:10:11.941Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/ingest.rs:163-164
Timestamp: 2025-08-18T19:10:11.941Z
Learning: Field statistics calculation in src/storage/field_stats.rs uses None for the time_partition parameter when calling flatten_and_push_logs(), as field stats generation does not require time partition functionality.

Applied to files:

src/parseable/mod.rs
src/storage/object_storage.rs

📚 Learning: 2025-08-21T11:47:01.279Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1410
File: src/storage/object_storage.rs:0-0
Timestamp: 2025-08-21T11:47:01.279Z
Learning: In Parseable's object storage implementation (src/storage/object_storage.rs), the hour and minute directory prefixes (hour=XX, minute=YY) are generated from arrow file timestamps following proper datetime conventions, so they are guaranteed to be within valid ranges (0-23 for hours, 0-59 for minutes) and don't require additional range validation.

Applied to files:

src/parseable/mod.rs

📚 Learning: 2025-08-21T14:41:55.462Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1410
File: src/storage/object_storage.rs:876-916
Timestamp: 2025-08-21T14:41:55.462Z
Learning: In Parseable's object storage system (src/storage/object_storage.rs), date directories (date=YYYY-MM-DD) are only created when there's actual data to store, which means they will always contain corresponding hour and minute subdirectories. There can be no case where a date directory exists without hour or minute subdirectories.

Applied to files:

src/parseable/mod.rs

📚 Learning: 2025-09-14T15:17:59.234Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1432
File: src/storage/object_storage.rs:124-132
Timestamp: 2025-09-14T15:17:59.234Z
Learning: In Parseable's upload validation system (src/storage/object_storage.rs), the validate_uploaded_parquet_file function should not include bounded retries for metadata consistency issues. Instead, failed validations rely on the 30-second sync cycle for natural retries, with staging files preserved when manifest_file is set to None.

Applied to files:

src/storage/object_storage.rs

📚 Learning: 2025-08-20T17:01:25.791Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1409
File: src/storage/field_stats.rs:429-456
Timestamp: 2025-08-20T17:01:25.791Z
Learning: In Parseable's field stats calculation (src/storage/field_stats.rs), the extract_datetime_from_parquet_path_regex function correctly works with filename-only parsing because Parseable's server-side filename generation guarantees the dot-separated format date=YYYY-MM-DD.hour=HH.minute=MM pattern in parquet filenames.

Applied to files:

src/storage/object_storage.rs

📚 Learning: 2025-06-16T09:50:38.636Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.

Applied to files:

src/storage/object_storage.rs

📚 Learning: 2025-08-18T18:01:22.834Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1405
File: src/handlers/http/modal/utils/ingest_utils.rs:271-292
Timestamp: 2025-08-18T18:01:22.834Z
Learning: In Parseable's ingestion validation, validate_stream_for_ingestion is designed to prevent regular log ingestion endpoints (ingest() and post_event()) from ingesting into streams that exclusively contain OTEL traces or metrics. The function allows mixed streams (regular logs + OTEL) but blocks ingestion into OTEL-only streams, maintaining architectural separation between regular log and OTEL ingestion pathways.

Applied to files:

src/storage/object_storage.rs

📚 Learning: 2025-07-24T11:09:21.781Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1388
File: src/alerts/mod.rs:88-104
Timestamp: 2025-07-24T11:09:21.781Z
Learning: In the Parseable alert system (src/alerts/mod.rs), alert versions are server-generated and controlled via CURRENT_ALERTS_VERSION constant, not user input. The AlertVerison enum's From<&str> implementation correctly defaults unknown versions to V2 since the server only generates known versions (v1, v2). Unknown versions would only occur in exceptional cases like file corruption, making the current fallback approach appropriate.

Applied to files:

src/alerts/mod.rs

📚 Learning: 2025-08-14T10:14:50.453Z

Learnt from: parmesant
Repo: parseablehq/parseable PR: 1398
File: src/alerts/mod.rs:712-718
Timestamp: 2025-08-14T10:14:50.453Z
Learning: In the alerts module, get_number_of_agg_exprs() function validates that exactly 1 aggregate expression is present in SQL queries before other aggregate-related processing, which prevents empty aggr_expr scenarios in downstream functions like _get_aggregate_projection().

Applied to files:

src/alerts/mod.rs

🧬 Code graph analysis (9)

src/handlers/http/correlation.rs (1)

src/utils/mod.rs (4)

get_hash (102-107)

get_tenant_id_from_request (84-90)

get_user_and_tenant_from_request (61-82)

user_auth_for_datasets (121-187)

src/handlers/http/middleware.rs (2)

src/rbac/mod.rs (1)

roles_to_permission (313-328)

src/utils/mod.rs (1)

get_user_and_tenant_from_request (61-82)

src/handlers/http/users/filters.rs (1)

src/utils/mod.rs (3)

get_hash (102-107)

get_user_and_tenant_from_request (61-82)

is_admin (189-206)

src/utils/mod.rs (6)

src/utils/actix.rs (2)

req (31-31)

extract_session_key_from_req (51-71)

src/handlers/http/rbac.rs (1)

Users (74-74)

src/rbac/user.rs (1)

username (262-264)

src/handlers/http/cluster/mod.rs (3)

stream (1036-1039)

stream (1199-1202)

stream (1574-1577)

src/cli.rs (1)

format (669-670)

src/parseable/mod.rs (1)

storage (338-340)

src/handlers/http/query.rs (3)

src/utils/mod.rs (1)

get_tenant_id_from_request (84-90)

src/handlers/http/mod.rs (1)

fetch_schema (89-94)

src/event/mod.rs (1)

commit_schema (136-157)

src/handlers/http/users/dashboards.rs (2)

src/utils/mod.rs (4)

get_hash (102-107)

get_tenant_id_from_request (84-90)

get_user_and_tenant_from_request (61-82)

is_admin (189-206)

src/users/dashboards.rs (4)

dashboards (435-439)

dashboards (462-466)

get_dashboard (364-383)

list_tags (458-478)

src/parseable/streams.rs (2)

src/parseable/mod.rs (1)

new (196-211)

src/rbac/mod.rs (1)

contains (163-169)

src/storage/object_storage.rs (3)

src/storage/field_stats.rs (1)

calculate_field_stats (88-178)

src/parseable/streams.rs (3)

new (118-136)

len (1105-1111)

v (606-608)

src/catalog/mod.rs (1)

update_snapshot (108-130)

src/rbac/user.rs (2)

src/rbac/mod.rs (1)

add_roles (142-150)

src/rbac/map.rs (2)

roles (80-88)

mut_sessions (106-112)

🔇 Additional comments (22)

src/handlers/http/correlation.rs (2)

106-112: LGTM!

The delete handler correctly extracts both user_id and tenant_id from the authenticated session and passes them to CORRELATIONS.delete. This is the correct pattern that should be followed by the other handlers.

73-78: No action needed. The tenant_id is properly handled through the session_key mechanism. The CORRELATIONS.create() method extracts the tenant_id directly from the session_key (via get_tenant_id_from_key()) and uses it to store the correlation in both the metastore and in-memory cache. Discarding the request-level tenant_id is intentional and correct—it ensures the correlation is created under the authenticated user's tenant derived from the session, not from untrusted request data.

Likely an incorrect or invalid review comment.

src/handlers/http/users/filters.rs (4)

49-54: LGTM!

The get handler correctly extracts tenant_id from the authenticated session and passes it to get_filter for proper tenant isolation.

68-75: LGTM!

The post handler correctly propagates tenant_id to both the metastore (put_filter) and the in-memory cache (FILTERS.update).

85-104: LGTM!

The update handler correctly uses tenant_id for both authorization (checking filter ownership) and persistence operations.

113-128: LGTM!

The delete handler correctly implements tenant-aware deletion with proper authorization checks.

src/handlers/http/middleware.rs (2)

200-290: LGTM!

The token refresh logic correctly threads tenant_id through user lookups, permission resolution, and session tracking with appropriate DEFAULT_TENANT fallback.

305-307: LGTM!

Properly maps the Suspended response to ErrorBadRequest with the suspension message.

src/utils/mod.rs (4)

61-82: LGTM!

The get_user_and_tenant_from_request function correctly handles both BasicAuth and SessionId authentication methods, and properly normalizes DEFAULT_TENANT to None for consistent downstream handling.

92-100: LGTM!

The get_tenant_id_from_key function correctly extracts tenant context from a SessionKey with consistent DEFAULT_TENANT normalization.

109-119: LGTM!

The user_auth_for_query function correctly derives tenant_id from the session key and propagates it to dataset authorization.

121-157: LGTM!

The user_auth_for_datasets function correctly accepts tenant_id and uses it for tenant-scoped stream existence checks and metadata lookups.

src/handlers/http/users/dashboards.rs (4)

99-104: LGTM!

The create_dashboard handler correctly uses session-based tenant extraction and propagates tenant_id to the create operation.

113-191: LGTM!

The update_dashboard handler correctly uses session-based tenant extraction and propagates tenant_id through authorization and update operations.

201-208: LGTM!

The delete_dashboard handler correctly uses session-based tenant extraction.

223-242: LGTM!

The add_tile handler correctly uses session-based tenant extraction and get_dashboard_by_user for proper authorization checks. Based on learnings, this aligns with the expected pattern.

src/alerts/mod.rs (6)

107-150: LGTM!

The migrate_from_v1 function correctly propagates tenant_id through the migration flow - passing it to build_query_from_v1, storing it in the migrated alert config, and using it when persisting via put_alert.

187-207: LGTM!

Tenant context is correctly threaded through build_query_from_v1 → add_where_conditions → fetch_schema, ensuring schema lookups during v1 migration are tenant-aware.

Also applies to: 274-301

620-624: LGTM!

Target lookup during notification dispatch correctly uses the alert's tenant_id.

1224-1240: LGTM!

The get_alert_by_id method correctly handles tenant-scoped lookups with appropriate error handling when the alert is not found.

1242-1251: Previous issue addressed.

The update() method now correctly uses entry(tenant.to_owned()).or_default() to ensure the tenant bucket exists before insertion, preventing silent write drops for first-time tenants.

1404-1420: LGTM!

The get_state() method correctly handles tenant-scoped lookups with appropriate error messaging.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T14:27:28Z

src/handlers/http/query.rs

+    let schema_names = session_state
+        .catalog_list()
+        .catalog("datafusion")
+        .unwrap()
+        .schema_names();
+


⚠️ Potential issue | 🟡 Minor

Remove unused schema_names (can fail builds with deny(warnings)).
Line 126-131 computes schema_names and never uses it.

Proposed diff

- let schema_names = session_state - .catalog_list() - .catalog("datafusion") - .unwrap() - .schema_names(); -

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let schema_names = session_state

.catalog_list()

.catalog("datafusion")

.unwrap()

.schema_names();

🤖 Prompt for AI Agents

In `@src/handlers/http/query.rs` around lines 126 - 131, The local binding schema_names is computed from session_state.catalog_list().catalog("datafusion").unwrap().schema_names() but never used; remove the unused variable and its computation or, if you intended to use it later, reference it appropriately—specifically delete the let schema_names = ... statement (or rename to _schema_names if you want to suppress the unused warning temporarily) so the build won’t fail under deny(warnings).

coderabbitai · 2026-01-15T14:27:29Z

src/rbac/user.rs

+    pub fn add_roles(&mut self, roles: HashSet<String>, tenant_id: &str) -> Result<(), RBACError> {
        if roles.is_empty() {
            return Ok(());
        }
        self.roles.extend(roles);
        // also refresh all user sessions
        for group_user in &self.users {
-            mut_sessions().remove_user(group_user.userid());
+            mut_sessions().remove_user(group_user.userid(), tenant_id);
        }
        Ok(())


⚠️ Potential issue | 🟠 Major

UserGroup::add_roles should invalidate sessions using each user’s tenant (not the caller-provided tenant_id).
Line 381-382 uses the function parameter for all users, but other mutation paths use group_user.tenant_id. Aligning this avoids invalidating the wrong tenant’s sessions if the group ever contains mixed-tenant users (even accidentally).

🤖 Prompt for AI Agents

In `@src/rbac/user.rs` around lines 374 - 383, The add_roles method extends self.roles then refreshes sessions using the caller-provided tenant_id for every user, which can invalidate the wrong tenant; update the session invalidation loop in UserGroup::add_roles to call mut_sessions().remove_user(group_user.userid(), group_user.tenant_id()) (i.e., use each group_user.tenant_id instead of the function parameter) so session invalidation uses each user’s actual tenant.

coderabbitai · 2026-01-15T14:27:29Z

src/storage/object_storage.rs

+        tracing::warn!(process_parquet_files_path=?path);
        spawn_parquet_upload_task(
            &mut join_set,
            semaphore.clone(),
            object_store.clone(),
            upload_context,
            stream_name,
            path,
+            tenant_id.clone(),
        )
        .await;


⚠️ Potential issue | 🟡 Minor

Downgrade per-file tracing::warn! logs during sync.
process_parquet_files logs every path at warn (Line 939) and schema processing logs each schema file path (Line 1061); these should be debug/trace unless there’s an actionable operator issue.

Also applies to: 1055-1069

🤖 Prompt for AI Agents

In `@src/storage/object_storage.rs` around lines 939 - 949, Replace noisy per-file warning logs with lower-severity debug/trace logs: change the tracing::warn! that logs process_parquet_files_path in process_parquet_files to tracing::debug! (or tracing::trace! if extremely verbose) and likewise change the tracing::warn!/warn! usage around schema processing in the same module (the block that iterates schema files ~lines 1055–1069) to tracing::debug!/trace!. Update any similar per-file warn calls passed into spawn_parquet_upload_task or nearby helpers so only actionable operator issues remain at warn level; keep error-level logs for actual failures.

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (18)

src/handlers/http/users/dashboards.rs (1)
36-69: Derive tenant from authenticated session, not raw header.

Using the tenant header here lets a client spoof tenancy, which can leak dashboards across tenants. The codebase already establishes the secure pattern in create_dashboard (line 99); apply the same approach here.
🔧 Suggested fix
-    let tenant_id = get_tenant_id_from_request(&req);
+    let (_user_id, tenant_id) = get_user_and_tenant_from_request(&req)?;
Note: get_dashboard (line 83) and other functions in this file also use the insecure get_tenant_id_from_request and should be updated similarly.
src/handlers/http/demo_data.rs (1)
19-24: Avoid panic on malformed tenant header.

get_tenant_id_from_request unwraps header parsing; a non‑UTF8 tenant header will panic and can take down the worker. Please make header parsing fallible and treat invalid values as None or a 400. One option is to make this call site safe (or update the helper similarly).
🛠️ Suggested fix (safe extraction)
 use crate::{
     handlers::http::{cluster::get_demo_data_from_ingestor, ingest::PostError},
     option::Mode,
     parseable::PARSEABLE,
-    utils::get_tenant_id_from_request,
 };
@@
-    let tenant_id = get_tenant_id_from_request(&req);
+    let tenant_id = req
+        .headers()
+        .get("tenant")
+        .and_then(|value| value.to_str().ok())
+        .map(|value| value.to_owned());
Also applies to: 52-52
src/handlers/http/modal/query/querier_logstream.rs (1)

93-110: Add tenant header to stream delete request sent to ingestors.

send_stream_delete_request only sets CONTENT_TYPE and AUTHORIZATION headers. The ingestor delete handler relies on the tenant header (extracted via get_tenant_id_from_request) to identify which tenant's stream to delete. Without it, non-default tenant deletes will operate on the default tenant or fail silently. Pass tenant_id to send_stream_delete_request and have it set the tenant header, similar to how sync_streams_with_ingestors propagates request headers.
src/rbac/role.rs (1)
191-233: Potential breaking change in DefaultPrivilege JSON shape.

Switching Writer/Reader/Ingestor to struct variants changes serialized field names (e.g., "0" → "resource"). This can break existing clients of /user/roles. Consider adding serde aliases for backward compatibility or clearly versioning/migrating the API.
🛠️ Backward-compatibility shim (serde alias)
-        Writer {
-            resource: ParseableResourceType,
-        },
+        Writer {
+            #[serde(alias = "0")]
+            resource: ParseableResourceType,
+        },
-        Ingestor {
-            resource: Option<ParseableResourceType>,
-        },
+        Ingestor {
+            #[serde(alias = "0")]
+            resource: Option<ParseableResourceType>,
+        },
-        Reader {
-            resource: ParseableResourceType,
-        },
+        Reader {
+            #[serde(alias = "0")]
+            resource: ParseableResourceType,
+        },
src/hottier.rs (2)
208-218: delete_hot_tier ignores tenant_id in the path.

The delete path is still hot_tier_path/stream, which will miss tenant-scoped directories (or delete the wrong tenant’s data).
🐛 Proposed fix
-        let path = self.hot_tier_path.join(stream);
+        let path = if let Some(tenant_id) = tenant_id.as_ref() {
+            self.hot_tier_path.join(tenant_id).join(stream)
+        } else {
+            self.hot_tier_path.join(stream)
+        };
410-466: Local hot-tier paths are still tenant-agnostic.

parquet_path, manifest paths, and get_stream_path_for_date remain tenant-agnostic, so hot-tier files for different tenants will collide if stream names overlap. Please thread tenant_id into the local-path helpers (e.g., get_stream_path_for_date, fetch_hot_tier_dates, manifest path construction) and include the tenant segment in the local hot-tier directory.
🐛 Directional fix (call site)
-        let path = self.get_stream_path_for_date(stream, &date);
+        let path = self.get_stream_path_for_date(stream, &date, tenant_id);
src/storage/store_metadata.rs (2)
124-144: Staging metadata read is still global despite tenant-scoped writes.
resolve_parseable_metadata now receives tenant_id (Line 126) but get_staging_metadata() still reads the global .parseable.json. For non-default tenants this can mis-detect environment changes and overwrite tenant metadata. Consider making get_staging_metadata tenant-aware and wiring the parameter through.
🛠️ Proposed fix: read staging metadata per tenant
 pub async fn resolve_parseable_metadata(
     parseable_metadata: &Option<Bytes>,
     tenant_id: &Option<String>,
 ) -> Result<StorageMetadata, ObjectStorageError> {
-    let staging_metadata = get_staging_metadata()?;
+    let staging_metadata = get_staging_metadata(tenant_id)?;
     let remote_metadata = parseable_metadata
         .as_ref()
         .map(|meta| serde_json::from_slice(meta).expect("parseable config is valid json"));
@@
-pub fn get_staging_metadata() -> io::Result<Option<StorageMetadata>> {
-    let path = RelativePathBuf::from(PARSEABLE_METADATA_FILE_NAME)
-        .to_path(PARSEABLE.options.staging_dir());
+pub fn get_staging_metadata(tenant_id: &Option<String>) -> io::Result<Option<StorageMetadata>> {
+    let path = if let Some(tenant_id) = tenant_id.as_ref() {
+        PARSEABLE
+            .options
+            .staging_dir()
+            .join(tenant_id)
+            .join(PARSEABLE_METADATA_FILE_NAME)
+    } else {
+        PARSEABLE
+            .options
+            .staging_dir()
+            .join(PARSEABLE_METADATA_FILE_NAME)
+    };
307-327: Ensure tenant staging directory exists before writing.
When tenant_id is Some (Line 311), the path includes a tenant subdirectory but no create_dir_all, so writes will fail if the directory isn’t pre-created.
🛠️ Proposed fix: create tenant staging directory
-    let path = if let Some(tenant_id) = tenant_id.as_ref() {
-        PARSEABLE
-            .options
-            .staging_dir()
-            .join(tenant_id)
-            .join(PARSEABLE_METADATA_FILE_NAME)
-    } else {
-        PARSEABLE
-            .options
-            .staging_dir()
-            .join(PARSEABLE_METADATA_FILE_NAME)
-    };
+    let path = if let Some(tenant_id) = tenant_id.as_ref() {
+        let tenant_dir = PARSEABLE.options.staging_dir().join(tenant_id);
+        create_dir_all(&tenant_dir)?;
+        tenant_dir.join(PARSEABLE_METADATA_FILE_NAME)
+    } else {
+        PARSEABLE
+            .options
+            .staging_dir()
+            .join(PARSEABLE_METADATA_FILE_NAME)
+    };
src/migration/mod.rs (1)
50-155: Per-tenant staging metadata isn’t loaded yet.
put_staging_metadata now writes under staging_dir/<tenant>/…, but run_metadata_migration still calls get_staging_metadata without tenant context, so tenant-specific staging metadata won’t be migrated. Consider making get_staging_metadata tenant-aware and passing tenant_id through.
🔧 Suggested call-site adjustment
-    let staging_metadata = get_staging_metadata(config)?;
+    let staging_metadata = get_staging_metadata(config, tenant_id)?;
src/parseable/streams.rs (1)

117-127: Update tests/callers for new Stream::new signature.
Stream::new now requires tenant_id; tests in this file still call with four args and will fail to compile. Also note that staging metrics now require a tenant_id label in assertions.
src/storage/gcs.rs (1)
179-438: Record object‑store metrics only after successful operations.
In _get_object, _put_object, _delete_prefix, _list_dates, _upload_file, and _upload_multipart, counters are incremented before confirming success, which can over‑count failures.
♻️ Example fix (apply pattern to similar helpers)
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
-        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
-        match resp {
-            Ok(resp) => {
+        let resp = self.client.get(&to_object_store_path(path)).await;
+        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
+        match resp {
+            Ok(resp) => {
+                increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
                 let body: Bytes = resp.bytes().await?;
                 increment_files_scanned_in_object_store_calls_by_date(
                     "GET",
                     1,
                     &Utc::now().date_naive().to_string(),
                     tenant,
                 );
Based on learnings, increment_object_store_calls_by_date should only run after successful operations to keep metrics accurate.
src/storage/s3.rs (3)
342-401: Move object‑store call metrics to the success path.
increment_object_store_calls_by_date is executed before error handling, so failed GET/PUT/HEAD/DELETE attempts get counted and skew per‑tenant usage.
🔧 Example fix (apply same pattern to other calls in this file)
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        increment_object_store_calls_by_date(
-            "GET",
-            &Utc::now().date_naive().to_string(),
-            tenant_str,
-        );
-
-        match resp {
-            Ok(resp) => {
-                let body = resp.bytes().await?;
+        let resp = self.client.get(&to_object_store_path(path)).await?;
+        increment_object_store_calls_by_date(
+            "GET",
+            &Utc::now().date_naive().to_string(),
+            tenant_str,
+        );
+        let body = resp.bytes().await?;
                 increment_files_scanned_in_object_store_calls_by_date(
                     "GET",
                     1,
                     &Utc::now().date_naive().to_string(),
                     tenant_str,
                 );
                 increment_bytes_scanned_in_object_store_calls_by_date(
                     "GET",
                     body.len() as u64,
                     &Utc::now().date_naive().to_string(),
                     tenant_str,
                 );
-                Ok(body)
-            }
-            Err(err) => Err(err.into()),
-        }
+        Ok(body)
Based on learnings, metrics should be recorded only after successful operations.

Also applies to: 504-625, 673-694, 826-910

462-502: Tenant prefix is missing when listing date/hour/minute partitions.
_list_dates, list_hours, and list_minutes use stream_name only, which can list default‑tenant partitions when tenant_id is non‑default.
🔧 Suggested fix for `_list_dates` (apply analogous prefixing in list_hours/list_minutes)
-        let resp: Result<object_store::ListResult, object_store::Error> = self
-            .client
-            .list_with_delimiter(Some(&(stream.into())))
-            .await;
+        let prefix = tenant_id
+            .as_deref()
+            .filter(|t| *t != DEFAULT_TENANT)
+            .map(|t| format!("{t}/{stream}"))
+            .unwrap_or_else(|| stream.to_string());
+        let resp: Result<object_store::ListResult, object_store::Error> = self
+            .client
+            .list_with_delimiter(Some(&prefix.clone().into()))
+            .await;
@@
-            .filter_map(|path| path.as_ref().strip_prefix(&format!("{stream}/")))
+            .filter_map(|path| path.as_ref().strip_prefix(&format!("{prefix}/")))
Also applies to: 978-1059

944-953: Use a consistent tenant label for list_old_streams HEAD metrics.
The HEAD metric inside the per‑dir task uses "", which creates a separate label bucket.
🔧 Suggested fix
-                increment_object_store_calls_by_date(
-                    "HEAD",
-                    &Utc::now().date_naive().to_string(),
-                    "",
-                );
+                increment_object_store_calls_by_date(
+                    "HEAD",
+                    &Utc::now().date_naive().to_string(),
+                    tenant_str,
+                );
src/storage/object_storage.rs (1)
1131-1150: Normalize empty/default tenant before prefixing stream paths.
stream_relative_path prefixes when tenant_id is Some and not DEFAULT_TENANT. If an empty tenant slips through, this produces "/{stream}/..." and breaks object paths.
🔧 Suggested fix
-    if let Some(tenant) = tenant_id
-        && !tenant.eq(DEFAULT_TENANT)
-    {
-        format!("{tenant}/{stream_name}/{file_suffix}")
-    } else {
-        format!("{stream_name}/{file_suffix}")
-    }
+    if let Some(tenant) = tenant_id
+        .as_deref()
+        .filter(|t| !t.is_empty() && *t != DEFAULT_TENANT)
+    {
+        format!("{tenant}/{stream_name}/{file_suffix}")
+    } else {
+        format!("{stream_name}/{file_suffix}")
+    }
src/storage/azure_blob.rs (2)
215-266: Only count successful object‑store calls in metrics.
Metrics are incremented before error handling, so failed GET/PUT/HEAD/DELETE are counted as successes.
🔧 Example fix (apply similarly to other calls)
-        let resp = self.client.get(&to_object_store_path(path)).await;
-        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
-
-        match resp {
-            Ok(resp) => {
-                let body: Bytes = resp.bytes().await?;
+        let resp = self.client.get(&to_object_store_path(path)).await?;
+        increment_object_store_calls_by_date("GET", &Utc::now().date_naive().to_string(), tenant);
+        let body: Bytes = resp.bytes().await?;
                 increment_files_scanned_in_object_store_calls_by_date(
                     "GET",
                     1,
                     &Utc::now().date_naive().to_string(),
                     tenant,
                 );
                 increment_bytes_scanned_in_object_store_calls_by_date(
                     "GET",
                     body.len() as u64,
                     &Utc::now().date_naive().to_string(),
                     tenant,
                 );
-                Ok(body)
-            }
-            Err(err) => Err(err.into()),
-        }
+        Ok(body)
Based on learnings, metrics should be recorded only after successful operations.

Also applies to: 359-472, 499-717

321-357: Tenant prefix is missing when listing date/hour/minute partitions.
Listing uses stream_name only, which can read default‑tenant partitions for non‑default tenants.
🔧 Suggested fix for `_list_dates` (apply analogous prefixing in list_hours/list_minutes)
-        let resp: Result<object_store::ListResult, object_store::Error> = self
-            .client
-            .list_with_delimiter(Some(&(stream.into())))
-            .await;
+        let prefix = tenant_id
+            .as_deref()
+            .filter(|t| *t != DEFAULT_TENANT)
+            .map(|t| format!("{t}/{stream}"))
+            .unwrap_or_else(|| stream.to_string());
+        let resp: Result<object_store::ListResult, object_store::Error> = self
+            .client
+            .list_with_delimiter(Some(&prefix.clone().into()))
+            .await;
@@
-            .filter_map(|path| path.as_ref().strip_prefix(&format!("{stream}/")))
+            .filter_map(|path| path.as_ref().strip_prefix(&format!("{prefix}/")))
Also applies to: 775-858
src/metastore/metastores/object_store_metastore.rs (1)
316-414: Alert state paths are tenant‑inconsistent (read/write mismatch).
get_alert_states builds a tenant‑prefixed base path, but get_alert_state_entry / put_alert_state still use the non‑tenant alert_state_json_path. This makes tenant alert states unreadable and risks cross‑tenant collisions.
🔧 Suggested fix (compute tenant‑aware alert_state path consistently)
-        let tenant = tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v);
-        let base_path = RelativePathBuf::from_iter([&tenant, ALERTS_ROOT_DIRECTORY]);
+        let base_path = if let Some(tenant) = tenant_id.as_ref().filter(|t| !t.is_empty() && *t != DEFAULT_TENANT) {
+            RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY])
+        } else {
+            RelativePathBuf::from_iter([ALERTS_ROOT_DIRECTORY])
+        };
@@
-        let path = alert_state_json_path(*alert_id);
+        let path = if let Some(tenant) = tenant_id.as_ref().filter(|t| !t.is_empty() && *t != DEFAULT_TENANT) {
+            RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY, &format!("alert_state_{alert_id}.json")])
+        } else {
+            alert_state_json_path(*alert_id)
+        };
@@
-        let path = alert_state_json_path(id);
+        let path = if let Some(tenant) = tenant_id.as_ref().filter(|t| !t.is_empty() && *t != DEFAULT_TENANT) {
+            RelativePathBuf::from_iter([tenant, ALERTS_ROOT_DIRECTORY, &format!("alert_state_{id}.json")])
+        } else {
+            alert_state_json_path(id)
+        };

🤖 Fix all issues with AI agents

In `@src/alerts/target.rs`:
- Around line 56-73: In TargetConfigs::load (the async load method) change how
t.tenant is set so that the sentinel DEFAULT_TENANT string is converted to None
instead of Some(DEFAULT_TENANT); specifically, inside the map over targets (the
closure that mutates t) set t.tenant = None when tenant_id == DEFAULT_TENANT,
otherwise set t.tenant = Some(tenant_id.clone()), then continue to collect and
insert as before.

In `@src/catalog/mod.rs`:
- Around line 567-576: The partition_path code currently maps tenant_id None to
an empty string which yields a path component ["", ...]; update partition_path
to match codebase patterns by using DEFAULT_TENANT when tenant_id is None (or
alternatively only include the tenant component when tenant_id.is_some());
modify the mapping of tenant_id (used when building the RelativePathBuf via
RelativePathBuf::from_iter) to reference DEFAULT_TENANT instead of "" so the
produced path components are consistent with the rest of the codebase.

In `@src/handlers/airplane.rs`:
- Around line 113-115: The Flight handler is using &None for tenant context
(e.g., PARSEABLE.get_stream(&table_name, &None)), which causes all Flight
queries to use the global/default tenant; derive the tenant_id from the session
key or ticket metadata in the Flight request and thread it through every
tenant-aware call: replace &None with &tenant_id when calling
PARSEABLE.get_stream, pass tenant_id into ingestor lookup, authorization checks,
execution calls, and staging cleanup routines so each of those code paths use
the extracted tenant_id rather than the default; update the same pattern in the
other Flight-related areas (the ingestor/authorization/execution/staging cleanup
call sites) to accept and use the tenant_id parameter.

In `@src/handlers/http/health_check.rs`:
- Around line 120-128: get_tenant_id_from_request currently calls
HeaderValue::to_str().unwrap(), which panics on malformed UTF-8 and crashes the
readiness handler; change that unwrap to to_str().ok() so header parsing is
fallible (returning Option) and update callers like the readiness(HttpRequest)
function to handle the None case safely (e.g., treat missing/invalid tenant as
None or return a controlled error response) — locate and modify
get_tenant_id_from_request and the readiness usage to accept an Option<String>
(or handle Option returned) instead of assuming a valid String.

In `@src/handlers/http/query.rs`:
- Around line 82-101: In get_records_and_fields: before calling into_query or
execute, set the DataFusion default schema on the current session context so
tenant-scoped SQL resolves to the tenant schema instead of "public"; obtain the
schema name from tenant_id (fall back to "public" if None), and call the session
context/state method that sets the default schema on
QUERY_SESSION.get_ctx().state() (i.e., update the session_state's
default_schema) so subsequent into_query(...) and execute(...) run under the
tenant schema.

In `@src/handlers/http/users/dashboards.rs`:
- Around line 78-86: The handler get_dashboard currently calls
get_tenant_id_from_request(&req) which reads an untrusted header and allows
cross-tenant access; change it to derive and validate the tenant from the
authenticated session (e.g., call the session-based accessor used elsewhere such
as get_tenant_id_from_session or the project’s session validation helper),
ensure the session tenant is present and return an auth/validation error if not,
then pass that session-validated tenant to
DASHBOARDS.get_dashboard(dashboard_id, &tenant_id). Also remove or ignore any
tenant value derived from headers in this function and reuse
validate_dashboard_id(dashboard_id) as already used.

In `@src/metastore/metastore_traits.rs`:
- Around line 182-192: The get_chats implementation needs to mirror
get_dashboards by iterating tenants via PARSEABLE.list_tenants() and collecting
chats for each tenant; update the get_chats function to loop over tenants first,
then for each tenant call PARSEABLE.list_dirs_relative(USERS_ROOT_DIR,
Some(tenant_id)) (or equivalent) to list users, then call
PARSEABLE.get_objects(..., Some(tenant_id)) when fetching chat objects so
tenant_id is passed through; ensure the DashMap aggregation still groups Bytes
by user key across all tenants.
- Around line 320-330: The get_node_metadata signature is inconsistent because
NodeMetadata is global but it currently takes tenant_id; change the trait method
signature async fn get_node_metadata(&self, node_type: NodeType) ->
Result<Vec<Bytes>, MetastoreError> (remove tenant_id) and update all
implementations and call sites (including any calls that currently pass
tenant_id and the storage-layer invocation) to stop passing tenant_id and
instead use the same global behavior as put_node_metadata/delete_node_metadata
(i.e., treat tenant as None). Ensure trait impls, tests, and any storage adapter
methods invoked by get_node_metadata are updated to the new signature.

In `@src/otel/traces.rs`:
- Around line 159-181: The test call site fails to compile because
flatten_otel_traces now requires a tenant_id; update the test
test_flatten_otel_traces_complete_structure to pass a tenant_id string (e.g.,
"test-tenant" or reuse any existing tenant_str in the test) when calling
flatten_otel_traces(&traces_data, tenant_id), matching how the production call
passes tenant_id (see flatten_otel_traces and process_resource_spans usage).

In `@src/query/listing_table_builder.rs`:
- Around line 101-102: Update the misleading comment above the call to
storage.list_dirs_relative(&prefix, &None).await in listing_table_builder.rs to
state that tenant_id is passed as None to intentionally use the default tenant
(DEFAULT_TENANT) for historical listing, and that multi-tenancy filtering is
handled at a higher level; reference storage.list_dirs_relative and the prefix
variable so reviewers can locate the call.

In `@src/storage/localfs.rs`:
- Around line 442-460: delete_stream and the other stream-scoped methods
(list_dirs, list_dates, list_hours, list_minutes) build paths using
self.root.join(stream_name) which ignores tenant_id and will mix tenants when
filesystem layout is tenant-prefixed; update each of these functions to derive
tenant_str = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) and scope paths by
joining tenant_str before the stream (e.g.,
self.root.join(tenant_str).join(stream_name) or equivalent using existing path
building helpers), then use that scoped path for fs operations and metrics;
apply the same change to the other affected methods referenced in the comment
(the blocks around lines ~555-593, ~630-663, ~665-702) so all stream-scoped
filesystem operations consistently include the tenant prefix.

♻️ Duplicate comments (36)

src/prism/logstream/mod.rs (1)
66-73: Restore real stats instead of a default placeholder.

The stats result is computed but then discarded and replaced with QueriedStats::default(), which silently returns fake data and hides failures. This breaks dataset stats.
🐛 Proposed fix
-    tracing::warn!("starting dataset info");
     let info = info?;
-    tracing::warn!("got info");
     let schema = schema?;
-    tracing::warn!("got schema");
-    // let stats = stats?;
-    let stats = QueriedStats::default();
-    tracing::warn!("got FAKE stats");
+    let stats = stats?;
src/handlers/http/users/dashboards.rs (1)
248-251: Tenant scoping should not rely on raw header.

This is the same tenant-spoofing risk as list_dashboards; use session-derived tenant or validate the header against the session before listing tags.
🔧 Suggested fix
-    let tags = DASHBOARDS
-        .list_tags(&get_tenant_id_from_request(&req))
-        .await;
+    let (_user_id, tenant_id) = get_user_and_tenant_from_request(&req)?;
+    let tags = DASHBOARDS.list_tags(&tenant_id).await;
src/handlers/http/llm.rs (1)

92-104: Same tenant header parsing risk here.
This path also relies on get_tenant_id_from_request, which currently unwraps to_str() and can panic on malformed headers.
src/tenants/mod.rs (1)
36-104: Suspension state isn’t persisted (duplicate storage).
suspended_services is stored both on TenantOverview and inside meta.suspended_services, but suspend/resume only mutates the HashSet. Since get_tenants() returns meta, suspension changes won’t be reflected/persisted. Consider syncing meta on updates or removing the duplicate field.
🛠️ Option: keep meta in sync
 pub fn suspend_service(&self, tenant_id: &str, service: Service) {
     if let Some(mut tenant) = self.tenants.get_mut(tenant_id) {
         tenant.suspended_services.insert(service.clone());
+        let services = tenant.meta.suspended_services.get_or_insert_with(HashSet::new);
+        services.insert(service);
     }
 }

 pub fn resume_service(&self, tenant_id: &str, service: Service) {
     if let Some(mut tenant) = self.tenants.get_mut(tenant_id) {
         tenant.suspended_services.remove(&service);
+        if let Some(services) = tenant.meta.suspended_services.as_mut() {
+            services.remove(&service);
+        }
     }
 }
src/handlers/http/middleware.rs (2)
167-175: Avoid panic when inserting tenant header.
HeaderValue::from_str(...).unwrap() will panic on invalid header bytes. Handle the error and return a controlled failure instead.
♻️ Suggested fix
- if tid.is_some() {
-     req.headers_mut().insert(
-         HeaderName::from_static("tenant"),
-         HeaderValue::from_str(&tid.as_ref().unwrap()).unwrap(),
-     );
- }
+ if let Some(tid) = tid {
+     let header_val =
+         HeaderValue::from_str(&tid).map_err(|_| ErrorBadRequest("invalid tenant header"))?;
+     req.headers_mut()
+         .insert(HeaderName::from_static("tenant"), header_val);
+ }
167-175: Suspension bypass for default/missing/unknown tenants.
Because the header is only inserted when tid.is_some(), the default tenant never gets a tenant header. check_suspension then treats missing/unknown tenants as Authorized, so suspensions won’t apply to default (or unknown) tenants. Consider always setting the header (including DEFAULT_TENANT) or treating missing/unknown tenants as unauthorized.
💡 Possible fix
- if let Some(tid) = tid {
-     req.headers_mut().insert(
-         HeaderName::from_static("tenant"),
-         HeaderValue::from_str(&tid).unwrap(),
-     );
- }
+ let tenant_value = tid.unwrap_or_else(|| DEFAULT_TENANT.to_string());
+ let header_val =
+     HeaderValue::from_str(&tenant_value).map_err(|_| ErrorBadRequest("invalid tenant header"))?;
+ req.headers_mut()
+     .insert(HeaderName::from_static("tenant"), header_val);

 pub fn check_suspension(req: &HttpRequest, action: Action) -> rbac::Response {
-    if let Some(tenant) = req.headers().get("tenant")
-        && let Ok(tenant) = tenant.to_str()
-    {
-        if let Ok(Some(suspension)) = TENANT_METADATA.is_action_suspended(tenant, &action) {
-            return rbac::Response::Suspended(suspension);
-        } else {
-            // tenant does not exist
-        }
-    }
-    rbac::Response::Authorized
+    let tenant = req
+        .headers()
+        .get("tenant")
+        .and_then(|t| t.to_str().ok())
+        .unwrap_or(DEFAULT_TENANT);
+    match TENANT_METADATA.is_action_suspended(tenant, &action) {
+        Ok(Some(msg)) => rbac::Response::Suspended(msg),
+        Ok(None) => rbac::Response::Authorized,
+        Err(_) => rbac::Response::UnAuthorized,
+    }
 }
Also applies to: 316-327
src/handlers/http/modal/query/querier_rbac.rs (1)
79-80: Associate new users with the request tenant.

User::new_basic(..., None) drops the tenant context, so newly created users won’t be scoped to the tenant. This breaks isolation in multi-tenant setups.
🛠️ Suggested fix
-    let (user, password) = user::User::new_basic(username.clone(), None);
+    let (user, password) = user::User::new_basic(username.clone(), tenant_id.clone());
src/handlers/http/oidc.rs (3)
112-116: Use request tenant for BasicAuth lookups.

get_tenant_id_from_key yields None for BasicAuth, so lookups default to DEFAULT_TENANT even when a tenant header is present.
🛠️ Suggested fix
-    let tenant_id = get_tenant_id_from_key(&session_key);
+    let tenant_id = get_tenant_id_from_key(&session_key);
+    let tenant_id_from_req = get_tenant_id_from_request(&req);
...
-        SessionKey::BasicAuth { username, password } => match Users.get_user(&username, &tenant_id)
+        SessionKey::BasicAuth { username, password } => match Users.get_user(&username, &tenant_id_from_req)
132-155: Treat non-2xx intra-cluster login sync as failure.

send().await succeeds on non-2xx responses, so failed syncs are silently ignored.
🛠️ Suggested fix
-                        INTRA_CLUSTER_CLIENT
+                        let resp = INTRA_CLUSTER_CLIENT
                             .post(url)
                             .header(header::AUTHORIZATION, node.token)
                             .header(header::CONTENT_TYPE, "application/json")
                             .json(&json!(
                                 {
                                     "sessionCookie": _session,
                                     "user": _user,
                                     "expiry": EXPIRY_DURATION
                                 }
                             ))
                             .send()
                             .await?;
+                        resp.error_for_status()?;
                         Ok::<(), anyhow::Error>(())
323-327: Create new OIDC users within tenant scope.

New users are still created with None, which breaks tenant isolation.
🛠️ Suggested fix
-        // LET TENANT BE NONE FOR NOW!!!
-        (None, roles) => put_user(&user_id, roles, user_info, bearer, None).await?,
+        (None, roles) => put_user(&user_id, roles, user_info, bearer, tenant_id.clone()).await?,
src/rbac/user.rs (1)
377-385: Invalidate sessions using each user’s tenant.

add_roles removes sessions using the caller-provided tenant_id, which can invalidate the wrong tenant for mixed-tenant groups.
🛠️ Suggested fix
-        for group_user in &self.users {
-            mut_sessions().remove_user(group_user.userid(), tenant_id);
-        }
+        for group_user in &self.users {
+            mut_sessions().remove_user(
+                group_user.userid(),
+                group_user.tenant_id.as_ref().map_or(DEFAULT_TENANT, |v| v),
+            );
+        }
src/hottier.rs (3)
92-115: Fix stream/tenant exclusion logic to avoid cross-tenant leakage.

The current condition excludes a stream only when both stream and tenant differ, which can still include the current stream under other tenants. This reintroduces the bug previously flagged.
🐛 Proposed fix
-                if self.check_stream_hot_tier_exists(&stream, &tenant_id)
-                    && stream != current_stream
-                    && tenant_id != *current_tenant_id
+                if self.check_stream_hot_tier_exists(&stream, &tenant_id)
+                    && !(stream == current_stream && tenant_id == *current_tenant_id)
                 {
595-602: Avoid panics in an “exists” check.

hot_tier_file_path(...).unwrap() can panic on path conversion errors; this should be a best-effort check.
🐛 Proposed fix
-        let path = self.hot_tier_file_path(stream, tenant_id).unwrap();
-        PathBuf::from(path.to_string()).exists()
+        match self.hot_tier_file_path(stream, tenant_id) {
+            Ok(path) => PathBuf::from(path.to_string()).exists(),
+            Err(err) => {
+                warn!(
+                    "Failed to resolve hot tier file path for stream={stream} tenant={tenant_id:?}: {err}"
+                );
+                false
+            }
+        }
779-808: Use internal stream sizing for pstats hot tier.

pstats is an internal stream and should use INTERNAL_STREAM_HOT_TIER_SIZE_BYTES, consistent with pmeta.
🐛 Proposed fix
-                    size: MIN_STREAM_HOT_TIER_SIZE_BYTES,
+                    size: INTERNAL_STREAM_HOT_TIER_SIZE_BYTES,
                     used_size: 0,
-                    available_size: MIN_STREAM_HOT_TIER_SIZE_BYTES,
+                    available_size: INTERNAL_STREAM_HOT_TIER_SIZE_BYTES,
src/handlers/http/modal/ingest/mod.rs (1)
27-38: SyncRole fields are private but used externally.

This still looks like a compilation issue if other modules access sync_req.tenant_id / sync_req.privileges directly.
🐛 Option: make fields public
 pub struct SyncRole {
-    privileges: Vec<DefaultPrivilege>,
-    tenant_id: String,
+    pub privileges: Vec<DefaultPrivilege>,
+    pub tenant_id: String,
 }
src/alerts/alert_types.rs (1)

91-123: Auth credential extraction still resolves to None.
The admin lookup path (Line 107-117) currently returns None for both Native and OAuth users, so tenant-scoped alert queries may still run unauthenticated.
src/handlers/http/modal/ingest/ingestor_role.rs (1)
49-55: Inverted tenant validation logic remains unfixed.

The condition req_tenant.ne(DEFAULT_TENANT) && (req_tenant.eq(&sync_req.tenant_id)) rejects when the request tenant matches the payload tenant (same tenant), but the error message indicates it should reject cross-tenant operations. The second condition should use .ne() instead of .eq().
🐛 Proposed fix
-    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.eq(&sync_req.tenant_id)) {
+    if req_tenant.ne(DEFAULT_TENANT) && (req_tenant.ne(&sync_req.tenant_id)) {
         return Err(RoleError::Anyhow(anyhow::Error::msg(
             "non super-admin user trying to create role for another tenant",
         )));
     }
src/migration/mod.rs (2)
161-199: Avoid aborting all tenants on single list_streams failure.
The ? returns early and skips remaining tenants. This matches a previously flagged issue.
🔧 Suggested fix (continue on per-tenant errors)
-        let stream_names = PARSEABLE.metastore.list_streams(&tenant_id).await?;
+        let stream_names = match PARSEABLE.metastore.list_streams(&tenant_id).await {
+            Ok(names) => names,
+            Err(e) => {
+                warn!("Failed to list streams for tenant {:?}: {:?}", tenant_id, e);
+                continue;
+            }
+        };
494-507: Use PARSEABLE_METADATA_FILE_NAME instead of hardcoded .parseable.json.
This matches an earlier review item and avoids path mismatches.
🔧 Suggested fix
-            .join(".parseable.json")
+            .join(PARSEABLE_METADATA_FILE_NAME)
     } else {
-        config.options.staging_dir().join(".parseable.json")
+        config.options.staging_dir().join(PARSEABLE_METADATA_FILE_NAME)
     };
src/parseable/streams.rs (4)
657-658: Remove or downgrade debug warn! for part_path.
This is a noisy debug artifact (previously flagged).
🔧 Suggested fix
-            tracing::warn!(part_path=?part_path);
+            tracing::trace!(part_path=?part_path);
1068-1090: Downgrade/remove verbose warn logs in get_or_create.
These were previously flagged and will flood logs at scale.
🔧 Suggested fix
-        tracing::warn!(
+        tracing::trace!(
             "get_or_create\nstream- {stream_name}\ntenant- {tenant_id:?}\nmetadata- {metadata:?}\noptions- {options:?}"
         );
@@
-        tracing::warn!("creating new stream- {stream_name}");
+        tracing::trace!("creating new stream- {stream_name}");
@@
-        tracing::warn!("inserted stream in mem");
+        tracing::trace!("inserted stream in mem");
1109-1111: Noisy warn on missing tenant in contains.
This was previously flagged; consider debug/trace instead.
🔧 Suggested fix
-            tracing::warn!(
+            tracing::debug!(
                 "Tenant with id {tenant_id} does not exist! Shouldn't happen (stream- {stream_name})"
             );
1180-1181: Remove debug warns in flush_and_convert.
These were previously flagged.
🔧 Suggested fix
-        tracing::warn!(flush_and_convert_tenants=?tenants);
-        tracing::warn!(parseable_streams_tenants=?self.read().unwrap().keys());
src/handlers/http/cluster/mod.rs (1)
1875-1959: Avoid unwrap() on header conversion to prevent panics.
If the token contains invalid header bytes, this will panic. Prefer a fallible conversion and map to QueryError.
🐛 Suggested fix
-        map.insert(
-            reqwest::header::AUTHORIZATION,
-            reqwest::header::HeaderValue::from_str(&querier.token).unwrap(),
-        );
+        match reqwest::header::HeaderValue::from_str(&querier.token) {
+            Ok(val) => { map.insert(reqwest::header::AUTHORIZATION, val); }
+            Err(e) => {
+                mark_querier_available(&domain_name).await;
+                return Err(QueryError::Anyhow(anyhow::anyhow!("Invalid token header: {}", e)));
+            }
+        }
Does reqwest::header::HeaderValue::from_str return Err for non-ASCII/control bytes, and what is the recommended pattern for safe header construction?
src/query/mod.rs (1)
173-205: Don’t silently drop schema‑registration errors.
register_schema failures should be logged to surface startup issues.
✅ Suggested fix
-                    let _ = catalog.register_schema(t, schema_provider);
+                    if let Err(e) = catalog.register_schema(t, schema_provider) {
+                        tracing::error!("Failed to register schema for tenant {}: {:?}", t, e);
+                    }
In DataFusion, under what conditions can CatalogProvider::register_schema return an error, and is logging failures at startup recommended?
src/parseable/mod.rs (4)
1071-1090: Tenant creation still has a TOCTOU race and misses schema registration.
The read‑then‑write lock split allows duplicate inserts, and queries won’t work for newly added tenants without registering their schema.
🔧 Suggested fix (atomic insert + schema registration)
 pub fn add_tenant(
     &self,
     tenant_id: String,
     tenant_meta: StorageMetadata,
 ) -> Result<(), anyhow::Error> {
     if !self.options.is_multi_tenant() {
         return Err(anyhow::Error::msg("P_MULTI_TENANCY is set to false"));
     }

-    if self.tenants.read().unwrap().contains(&tenant_id) {
-        return Err(anyhow::Error::msg(format!(
-            "Tenant with id- {tenant_id} already exists"
-        )));
-    } else {
-        self.tenants.write().unwrap().push(tenant_id.clone());
-        TENANT_METADATA.insert_tenant(tenant_id, tenant_meta);
-    }
+    let mut tenants = self.tenants.write().unwrap();
+    if tenants.contains(&tenant_id) {
+        return Err(anyhow::Error::msg(format!(
+            "Tenant with id- {tenant_id} already exists"
+        )));
+    }
+    tenants.push(tenant_id.clone());
+    drop(tenants);
+
+    TENANT_METADATA.insert_tenant(tenant_id.clone(), tenant_meta);
+    crate::query::QUERY_SESSION.add_schema(&tenant_id);
 
     Ok(())
 }
1130-1158: Tenant deletion leaves stale entries in the in‑memory tenant list.
self.tenants is never updated, so list_tenants() will still return deleted tenants.
🧹 Suggested fix
 pub fn delete_tenant(&self, tenant_id: &str) -> Result<(), anyhow::Error> {
     // delete users and sessions
     let users = mut_users().remove(tenant_id);
@@
     // delete roles
     mut_roles().remove(tenant_id);

+    // Remove tenant from in-memory list
+    if let Ok(mut tenants) = self.tenants.write() {
+        tenants.retain(|t| t != tenant_id);
+    }
+
     // delete from in-mem
     TENANT_METADATA.delete_tenant(&tenant_id);
     Ok(())
 }
1160-1196: load_tenants still masks invalid state and lock failures.
Single‑tenant mode silently ignores tenant directories, and poisoned lock failures return Ok(None).
✅ Suggested fix
-            } else if !is_multi_tenant {
-            } else {
+            } else if is_multi_tenant {
                 return Err(anyhow::Error::msg(format!(
                     "Found invalid tenant directory with multi-tenant mode- {tenant_id}.\nExiting."
                 )));
             }
+            // In single-tenant mode, ignore directories without tenant metadata
         }
 
-        if let Ok(mut t) = self.tenants.write() {
-            t.extend(dirs);
-            Ok(Some(()))
-        } else {
-            Ok(None)
-        }
+        let mut t = self.tenants.write().expect("tenants lock poisoned");
+        t.extend(dirs);
+        Ok(Some(()))
1198-1207: list_tenants still hides lock poisoning.
Returning None on lock failure masks real errors.
✅ Suggested fix
 pub fn list_tenants(&self) -> Option<Vec<String>> {
-    if let Ok(t) = self.tenants.as_ref().read()
-        && !t.is_empty()
-    {
-        let t = t.clone();
-        Some(t)
-    } else {
-        None
-    }
+    let t = self.tenants.read().expect("tenants lock poisoned");
+    if t.is_empty() { None } else { Some(t.clone()) }
 }
src/storage/object_storage.rs (1)

997-1007: Per‑file warn logs during sync are still noisy.

Also applies to: 1114-1123

src/alerts/mod.rs (1)

1331-1337: Ensure tenant bucket exists before write‑back.

Also applies to: 1374-1379

src/metastore/metastores/object_store_metastore.rs (4)

516-541: Dashboards are still overwritten per user within a tenant.

666-693: Potential panic from unwrap() after filter migration.

997-1025: Manifest path construction inconsistency persists.

1274-1344: Tenant prefix is still included in stream names from list_streams.

src/metastore/metastore_traits.rs (1)

115-119: The tenant_id parameter in get_alert_state_entry is unused in path construction (implementation issue).

This issue has been previously identified. The implementation in object_store_metastore.rs calls alert_state_json_path(*alert_id) without passing tenant_id, breaking tenant isolation for alert state retrieval.

🧹 Nitpick comments (10)

src/storage/retention.rs (3)
48-52: Handle empty tenant list explicitly to avoid skipping retention.

If PARSEABLE.list_tenants() returns Some(vec![]), the scheduler will skip all streams. Please confirm that no tenantless/default streams exist in that state; otherwise, consider falling back to None or logging a clear message.
Possible adjustment
-        let tenants = if let Some(tenants) = PARSEABLE.list_tenants() {
-            tenants.into_iter().map(|v| Some(v)).collect()
-        } else {
-            vec![None]
-        };
+        let tenants = match PARSEABLE.list_tenants() {
+            Some(tenants) if !tenants.is_empty() => tenants.into_iter().map(Some).collect(),
+            _ => vec![None],
+        };
71-73: Include tenant_id in retention-load warnings.

In multi-tenant mode this warning is hard to attribute without tenant context. Consider adding tenant_id to the log fields.
Example tweak
-                    Err(err) => {
-                        warn!("failed to load retention config for {stream_name} due to {err:?}")
-                    }
+                    Err(err) => {
+                        warn!(
+                            "failed to load retention config for stream={stream_name} tenant={tenant_id:?} due to {err:?}"
+                        )
+                    }
188-222: Add tenant context to retention delete logs and failures.

With tenant scoping, it’s hard to trace delete actions without tenant info. Consider adding tenant_id to the info/error logs and logging list_dates failures.
Example tweak
-    pub(super) async fn delete(stream_name: String, days: u32, tenant_id: &Option<String>) {
-        info!("running retention task - delete for stream={stream_name}");
+    pub(super) async fn delete(stream_name: String, days: u32, tenant_id: &Option<String>) {
+        info!(
+            "running retention task - delete for stream={stream_name} tenant={tenant_id:?}"
+        );
         let store = PARSEABLE.storage.get_object_store();

-        let Ok(mut dates) = store.list_dates(&stream_name, tenant_id).await else {
+        let Ok(mut dates) = store.list_dates(&stream_name, tenant_id).await else {
+            tracing::error!(
+                "failed to list dates for retention delete stream={stream_name} tenant={tenant_id:?}"
+            );
             return;
         };
src/analytics.rs (1)
209-232: Simplify .map(|v| Some(v)) to .map(Some).

The tenant iteration logic is correct—when tenants exist, iterate per-tenant; otherwise fall back to vec![None] for non-multi-tenant deployments. However, line 210 can be simplified.
✨ Suggested simplification
     let tenants = if let Some(tenants) = PARSEABLE.list_tenants() {
-        tenants.into_iter().map(|v| Some(v)).collect()
+        tenants.into_iter().map(Some).collect()
     } else {
         vec![None]
     };
src/handlers/http/modal/mod.rs (1)
632-684: Tenant-aware hot-tier bootstrap looks good; remove commented legacy block.
The tenant iteration and DEFAULT_TENANT mapping (Line 632-684) look solid. Consider deleting the commented-out legacy block to keep the function clean.
♻️ Suggested cleanup
-        // let streams_guard = PARSEABLE.streams.read().unwrap();
-        // streams_guard
-        //     .iter()
-        //     .filter_map(|(stream_name, stream)| {
-        //         // Skip if hot tier metadata file already exists for this stream
-        //         if hot_tier_manager.check_stream_hot_tier_exists(stream_name) {
-        //             return None;
-        //         }
-
-        //         // Get the hot tier configuration from the in-memory stream metadata
-        //         stream
-        //             .get_hot_tier()
-        //             .map(|config| (stream_name.clone(), config))
-        //     })
-        //     .collect()
src/catalog/mod.rs (1)
467-467: Remove debug logging artifact.

This warn! log for manifest path appears to be a debugging artifact that should be removed or downgraded to trace! level before merging.
♻️ Suggested fix
-    tracing::warn!("manifest path_url= {path_url}");
+    tracing::trace!("manifest path_url= {path_url}");
src/storage/localfs.rs (1)
494-498: Prefer DEFAULT_TENANT constant over "default" literal.
Keeps labels consistent if the default tenant name ever changes.
♻️ Suggested tweak
-                    "default",
+                    DEFAULT_TENANT,
@@
-                    "default",
+                    DEFAULT_TENANT,
Also applies to: 530-534
src/handlers/http/logstream.rs (1)

416-477: Consider removing the stale “TODO tenants” note.
Tenant_id is already threaded through hot‑tier operations, so the TODO is misleading.
src/metastore/metastore_traits.rs (2)
316-317: Commented-out code should be removed or tracked.

The commented-out method declaration put_tenant_metadata should either be removed if no longer needed, or tracked via a TODO/issue if planned for future implementation.
🧹 Suggested cleanup
     async fn delete_tenant(&self, tenant_id: &str) -> Result<(), MetastoreError>;
-    // async fn put_tenant_metadata(&self, obj: &dyn MetastoreObject) -> Result<(), MetastoreError>;
46-335: Consider using Option<&str> instead of &Option<String> for ergonomics.

Throughout this trait, the pattern tenant_id: &Option<String> is used consistently. However, Option<&str> is generally more ergonomic in Rust as it:

Avoids borrowing an owned container

Allows callers to pass Some("literal") directly without creating a String

Is a common Rust idiom for optional string parameters

This is a minor suggestion given the scope of changes, but worth considering for API polish if breaking changes are acceptable at this stage.

coderabbitai · 2026-01-16T08:03:57Z

src/alerts/target.rs

 impl TargetConfigs {
    /// Loads alerts from disk, blocks
    pub async fn load(&self) -> anyhow::Result<()> {
        let targets = PARSEABLE.metastore.get_targets().await?;
        let mut map = self.target_configs.write().await;
-        for target in targets {
-            map.insert(target.id, target);
+        for (tenant_id, targets) in targets {
+            let inner = targets
+                .into_iter()
+                .map(|mut t| {
+                    t.tenant = Some(tenant_id.clone());
+                    (t.id, t)
+                })
+                .collect();
+            map.insert(tenant_id, inner);
        }

        Ok(())
    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the metastore implementation of get_targets find . -type f -name "*.rs" | xargs grep -l "get_targets" | head -20

Repository: parseablehq/parseable

Length of output: 311

🏁 Script executed:

# Search for the metastore module and get_targets implementation rg "fn get_targets" -A 20 -B 2

Repository: parseablehq/parseable

Length of output: 7924

🏁 Script executed:

# Find DEFAULT_TENANT constant definition rg "DEFAULT_TENANT" -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 50377

🏁 Script executed:

# Check the target.rs file structure to understand the context better cat -n src/alerts/target.rs | head -80

Repository: parseablehq/parseable

Length of output: 3093

🏁 Script executed:

# Get the full get_targets implementation from object_store_metastore.rs rg "async fn get_targets" -A 30 src/metastore/metastores/object_store_metastore.rs

Repository: parseablehq/parseable

Length of output: 1272

🏁 Script executed:

# Check how metastore handles default tenant in keys rg "list_tenants\|DEFAULT_TENANT" src/metastore/metastores/object_store_metastore.rs -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Look for how Target struct and tenant field are used elsewhere rg "impl.*Target\|struct Target" -A 15 src/alerts/alert_types.rs | head -50

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Check the Target struct definition rg "struct Target" -A 20 src/alerts/

Repository: parseablehq/parseable

Length of output: 3362

🏁 Script executed:

# Look for other usages of Target.tenant to see how it's handled rg "\.tenant\s*=" src/alerts/target.rs -B 3 -A 3

Repository: parseablehq/parseable

Length of output: 289

🏁 Script executed:

# Check how other code paths set target.tenant rg "target\.tenant\s*=" -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Check if there's any code that converts DEFAULT_TENANT string back to None rg "DEFAULT_TENANT.*None\|None.*DEFAULT_TENANT" -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Look at how alerts load() function handles tenant_id (if it exists) fd "alert" --type f -name "*.rs" | xargs rg "impl.*AlertConfigs\|pub async fn load" | head -20

Repository: parseablehq/parseable

Length of output: 295

🏁 Script executed:

# Search for other load() implementations that might handle DEFAULT_TENANT rg "pub async fn load.*->.*Result" -A 25 | head -100

Repository: parseablehq/parseable

Length of output: 5646

🏁 Script executed:

# Check how target.tenant field is accessed/used after being loaded rg "\.tenant\s*==" -B 2 -A 2 src/alerts/

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

# Look at how get_tenant_id_from_key converts tenant fields cat -n src/utils/mod.rs | grep -A 10 "get_tenant_id_from_key"

Repository: parseablehq/parseable

Length of output: 894

🏁 Script executed:

# Check if there are any tests for target loading find . -name "*test*" -o -name "*spec*" | xargs grep -l "target.*load\|TargetConfigs" 2>/dev/null

Repository: parseablehq/parseable

Length of output: 49

Convert DEFAULT_TENANT string to None in Target.tenant field for consistency.

When get_targets() returns targets with tenant_id="DEFAULT_TENANT" (the string literal), the current code stores Some("DEFAULT_TENANT") in t.tenant. This violates the codebase convention where None represents the default tenant and Some(String) represents a specific tenant (as seen in get_tenant_id_from_key() and throughout the codebase).

Update the mapping to convert the DEFAULT_TENANT string to None:

Suggested fix

let inner = targets .into_iter() .map(|mut t| { t.tenant = if tenant_id == DEFAULT_TENANT { None } else { Some(tenant_id.clone()) }; (t.id, t) }) .collect();

🤖 Prompt for AI Agents

In `@src/alerts/target.rs` around lines 56 - 73, In TargetConfigs::load (the async load method) change how t.tenant is set so that the sentinel DEFAULT_TENANT string is converted to None instead of Some(DEFAULT_TENANT); specifically, inside the map over targets (the closure that mutates t) set t.tenant = None when tenant_id == DEFAULT_TENANT, otherwise set t.tenant = Some(tenant_id.clone()), then continue to collect and insert as before.

coderabbitai · 2026-01-16T08:03:57Z

src/catalog/mod.rs

+    tenant_id: &Option<String>,
 ) -> RelativePathBuf {
+    let root = tenant_id.as_ref().map_or("", |v| v);
    let lower = lower_bound.date_naive().format("%Y-%m-%d").to_string();
    let upper = upper_bound.date_naive().format("%Y-%m-%d").to_string();
    if lower == upper {
-        RelativePathBuf::from_iter([stream, &format!("date={lower}")])
+        RelativePathBuf::from_iter([root, stream, &format!("date={lower}")])
    } else {
-        RelativePathBuf::from_iter([stream, &format!("date={lower}:{upper}")])
+        RelativePathBuf::from_iter([root, stream, &format!("date={lower}:{upper}")])
    }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1. First, let's find the partition_path function and understand its context rg -n "fn partition_path" --type rust -A 15 src/catalog/

Repository: parseablehq/parseable

Length of output: 975

🏁 Script executed:

#!/bin/bash # 2. Search for tests related to partition_path rg -n "partition_path" --type rust -B 2 -A 5

Repository: parseablehq/parseable

Length of output: 5707

🏁 Script executed:

#!/bin/bash # 3. Check for RelativePathBuf imports and usage patterns rg -n "RelativePathBuf" --type rust -B 1 -A 1 | head -50

Repository: parseablehq/parseable

Length of output: 2590

🌐 Web query:

relative-path crate from_iter empty string behavior

💡 Result:

I don’t have enough information to answer precisely. Which crate version and exact API call do you mean (RelativePath / RelativePathBuf from_iter)? Can you paste the code/inputs you used and the observed vs expected result?

🏁 Script executed:

#!/bin/bash # Search for tests of partition_path function rg -n "partition_path" --type rust -B 3 -A 10 | grep -E "(test|#\[test\]|mod test)" -A 10

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

#!/bin/bash # Look for any test files that might test this function fd "test" --type f -e rs | xargs rg -l "partition_path" 2>/dev/null

Repository: parseablehq/parseable

Length of output: 134

🏁 Script executed:

#!/bin/bash # Check Cargo.toml to see which version of relative-path is used rg -n "relative-path" Cargo.toml

Repository: parseablehq/parseable

Length of output: 126

🏁 Script executed:

#!/bin/bash # Search for any documentation or comments about the function behavior rg -n "partition_path" --type rust -B 5 | head -50

Repository: parseablehq/parseable

Length of output: 4243

🏁 Script executed:

#!/bin/bash # Check the implementation of manifest_path and how it handles paths rg -n "manifest_path" --type rust -B 2 -A 5 | head -80

Repository: parseablehq/parseable

Length of output: 5441

🏁 Script executed:

#!/bin/bash # Find the manifest_path function implementation rg -n "pub fn manifest_path" --type rust -A 15

Repository: parseablehq/parseable

Length of output: 1315

🏁 Script executed:

#!/bin/bash # Let's create a test to understand how RelativePathBuf handles empty strings python3 << 'EOF' # Check the relative-path 1.7 crate behavior with empty strings # We'll look at the documentation or source to understand the behavior import subprocess import json # Search for any actual path construction examples in the codebase result = subprocess.run( ["rg", "from_iter", "--type", "rust", "-B2", "-A2"], capture_output=True, text=True ) # Count patterns with empty strings or potential empty string handling lines = result.stdout.split('\n') for i, line in enumerate(lines): if 'from_iter' in line and i < len(lines) - 1: print(f"{i}: {line}") if i+1 < len(lines): print(f" {lines[i+1]}") EOF

Repository: parseablehq/parseable

Length of output: 151

🏁 Script executed:

#!/bin/bash # Search for any integration or unit tests that might show the actual behavior fd -e rs . | xargs grep -l "test_partition_path\|partition_path.*test" 2>/dev/null

Repository: parseablehq/parseable

Length of output: 49

🏁 Script executed:

#!/bin/bash # Let's check if there are any constants or defaults for tenant in the codebase rg -n "DEFAULT_TENANT" --type rust -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 50377

🏁 Script executed:

#!/bin/bash # Let's look at the S3 storage implementation to see how it handles tenant_id rg -n "tenant_str.*unwrap_or" --type rust -B 3 -A 3

Repository: parseablehq/parseable

Length of output: 16292

🏁 Script executed:

#!/bin/bash # Search for any comments or documentation about the empty string handling rg -n "partition_path\|RelativePathBuf" --type rust src/catalog/mod.rs -B 3 -A 3 | head -60

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

#!/bin/bash # Check if there are any existing usages that show how paths are handled with empty first component rg -n "from_iter.*\[\"\"\]" --type rust

Repository: parseablehq/parseable

Length of output: 164

🏁 Script executed:

#!/bin/bash # Look for any documentation or related issues about RelativePathBuf behavior rg -n "empty\|Empty" src/catalog/mod.rs

Repository: parseablehq/parseable

Length of output: 47

Fix inconsistent tenant_id handling in partition_path to align with codebase patterns.

When tenant_id is None, the function creates a path with an empty string component ["", stream, ...], which differs from how the rest of the codebase handles this scenario. Everywhere else, DEFAULT_TENANT is used as a fallback. For consistency and to avoid potential path matching issues, refactor to conditionally build the path only with the tenant component when it exists:

♻️ Suggested fix

pub fn partition_path( stream: &str, lower_bound: DateTime<Utc>, upper_bound: DateTime<Utc>, tenant_id: &Option<String>, ) -> RelativePathBuf { - let root = tenant_id.as_ref().map_or("", |v| v); let lower = lower_bound.date_naive().format("%Y-%m-%d").to_string(); let upper = upper_bound.date_naive().format("%Y-%m-%d").to_string(); - if lower == upper { - RelativePathBuf::from_iter([root, stream, &format!("date={lower}")]) + let date_segment = if lower == upper { + format!("date={lower}") } else { - RelativePathBuf::from_iter([root, stream, &format!("date={lower}:{upper}")]) + format!("date={lower}:{upper}") + }; + if let Some(tenant) = tenant_id.as_ref() { + RelativePathBuf::from_iter([tenant.as_str(), stream, &date_segment]) + } else { + RelativePathBuf::from_iter([stream, &date_segment]) } }

🤖 Prompt for AI Agents

In `@src/catalog/mod.rs` around lines 567 - 576, The partition_path code currently maps tenant_id None to an empty string which yields a path component ["", ...]; update partition_path to match codebase patterns by using DEFAULT_TENANT when tenant_id is None (or alternatively only include the tenant component when tenant_id.is_some()); modify the mapping of tenant_id (used when building the RelativePathBuf via RelativePathBuf::from_iter) to reference DEFAULT_TENANT instead of "" so the produced path components are consistent with the rest of the codebase.

coderabbitai · 2026-01-16T08:03:57Z

src/handlers/airplane.rs

        let schema = PARSEABLE
-            .get_stream(&table_name)
+            .get_stream(&table_name, &None)
            .map_err(|err| Status::failed_precondition(err.to_string()))?


⚠️ Potential issue | 🟠 Major

Flight queries still default to the global tenant.
Using &None for stream access, ingestor lookup, authorization, execution, and staging cleanup means Flight requests for non-default tenants will hit the default tenant, risking data leakage or incorrect auth. Please derive tenant_id from the session key (or ticket metadata) and pass it through these calls.

💡 Suggested approach

- let session_state = QUERY_SESSION.get_ctx().state(); + let session_state = QUERY_SESSION.get_ctx().state(); let key = extract_session_key(req.metadata()) .map_err(|e| Status::unauthenticated(e.to_string()))?; + let (_, tenant_raw) = + Users.get_userid_from_session(&key).ok_or_else(|| Status::unauthenticated("invalid session"))?; + let tenant_id = if tenant_raw == DEFAULT_TENANT { None } else { Some(tenant_raw) }; - let schema = PARSEABLE.get_stream(&table_name, &None) ... + let schema = PARSEABLE.get_stream(&table_name, &tenant_id) ... - let ingester_metadatas = get_node_info(NodeType::Ingestor, &None).await?; + let ingester_metadatas = get_node_info(NodeType::Ingestor, &tenant_id).await?; - user_auth_for_datasets(&permissions, &streams, &None).await?; + user_auth_for_datasets(&permissions, &streams, &tenant_id).await?; - let (records, _) = execute(query, false, &None).await?; + let (records, _) = execute(query, false, &tenant_id).await?; - PARSEABLE.get_or_create_stream(&stream_name, &None).clear(); + PARSEABLE.get_or_create_stream(&stream_name, &tenant_id).clear();

Also applies to: 169-210, 238-238

🤖 Prompt for AI Agents

In `@src/handlers/airplane.rs` around lines 113 - 115, The Flight handler is using &None for tenant context (e.g., PARSEABLE.get_stream(&table_name, &None)), which causes all Flight queries to use the global/default tenant; derive the tenant_id from the session key or ticket metadata in the Flight request and thread it through every tenant-aware call: replace &None with &tenant_id when calling PARSEABLE.get_stream, pass tenant_id into ingestor lookup, authorization checks, execution calls, and staging cleanup routines so each of those code paths use the extracted tenant_id rather than the default; update the same pattern in the other Flight-related areas (the ingestor/authorization/execution/staging cleanup call sites) to accept and use the tenant_id parameter.

coderabbitai · 2026-01-16T08:03:57Z

src/handlers/http/health_check.rs

+pub async fn readiness(req: HttpRequest) -> HttpResponse {
+    let tenant_id = get_tenant_id_from_request(&req);
    // Check the object store connection
-    if PARSEABLE.storage.get_object_store().check().await.is_ok() {
+    if PARSEABLE
+        .storage
+        .get_object_store()
+        .check(&tenant_id)
+        .await
+        .is_ok()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the get_tenant_id_from_request function find . -name "*.rs" -type f | head -20

Repository: parseablehq/parseable

Length of output: 610

🏁 Script executed:

# Search for get_tenant_id_from_request definition rg "fn get_tenant_id_from_request" -A 10

Repository: parseablehq/parseable

Length of output: 654

🏁 Script executed:

# Also check the readiness function in health_check.rs rg "pub async fn readiness" -A 20

Repository: parseablehq/parseable

Length of output: 977

🏁 Script executed:

# Search for all usages of get_tenant_id_from_request rg "get_tenant_id_from_request" -B 2 -A 2

Repository: parseablehq/parseable

Length of output: 38932

Fix panic on invalid tenant header.
get_tenant_id_from_request unwraps HeaderValue::to_str(), so a malformed tenant header causes a panic in the readiness endpoint. Replace .unwrap() with .ok() to safely handle invalid UTF-8:

Safe fallible header parsing

pub fn get_tenant_id_from_request(req: &HttpRequest) -> Option<String> { - if let Some(tenant_value) = req.headers().get("tenant") { - Some(tenant_value.to_str().unwrap().to_owned()) - } else { - None - } + match req.headers().get("tenant") { + Some(tenant_value) => tenant_value.to_str().ok().map(|s| s.to_owned()), + None => None, + } }

🤖 Prompt for AI Agents

In `@src/handlers/http/health_check.rs` around lines 120 - 128, get_tenant_id_from_request currently calls HeaderValue::to_str().unwrap(), which panics on malformed UTF-8 and crashes the readiness handler; change that unwrap to to_str().ok() so header parsing is fallible (returning Option) and update callers like the readiness(HttpRequest) function to handle the None case safely (e.g., treat missing/invalid tenant as None or return a controlled error response) — locate and modify get_tenant_id_from_request and the readiness usage to accept an Option<String> (or handle Option returned) instead of assuming a valid String.

coderabbitai · 2026-01-16T08:03:57Z

src/handlers/http/query.rs

 pub async fn get_records_and_fields(
    query_request: &Query,
    creds: &SessionKey,
+    tenant_id: &Option<String>,
 ) -> Result<(Option<Vec<RecordBatch>>, Option<Vec<String>>), QueryError> {
-    let session_state = QUERY_SESSION.state();
+    let session_state = QUERY_SESSION.get_ctx().state();
    let time_range =
        TimeRange::parse_human_time(&query_request.start_time, &query_request.end_time)?;
    let tables = resolve_stream_names(&query_request.query)?;
    //check or load streams in memory
-    create_streams_for_distributed(tables.clone()).await?;
+    create_streams_for_distributed(tables.clone(), tenant_id).await?;

    let query: LogicalQuery = into_query(query_request, &session_state, time_range).await?;

    let permissions = Users.get_permissions(creds);

-    user_auth_for_datasets(&permissions, &tables).await?;
+    user_auth_for_datasets(&permissions, &tables, tenant_id).await?;

-    let (records, fields) = execute(query, false).await?;
+    let (records, fields) = execute(query, false, tenant_id).await?;



⚠️ Potential issue | 🟠 Major

Set DataFusion default schema for tenant in get_records_and_fields.

This helper executes SQL without setting default_schema, so tenant-specific queries can resolve against "public" rather than the tenant schema.

🛠️ Suggested fix

- let session_state = QUERY_SESSION.get_ctx().state(); + let mut session_state = QUERY_SESSION.get_ctx().state(); + session_state + .config_mut() + .options_mut() + .catalog + .default_schema = tenant_id.as_ref().map_or("public".into(), |v| v.to_owned());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub async fn get_records_and_fields(

query_request: &Query,

creds: &SessionKey,

tenant_id: &Option<String>,

) -> Result<(Option<Vec<RecordBatch>>, Option<Vec<String>>), QueryError> {

let session_state = QUERY_SESSION.state();

let session_state = QUERY_SESSION.get_ctx().state();

let time_range =

TimeRange::parse_human_time(&query_request.start_time, &query_request.end_time)?;

let tables = resolve_stream_names(&query_request.query)?;

//check or load streams in memory

create_streams_for_distributed(tables.clone()).await?;

create_streams_for_distributed(tables.clone(), tenant_id).await?;

let query: LogicalQuery = into_query(query_request, &session_state, time_range).await?;

let permissions = Users.get_permissions(creds);

user_auth_for_datasets(&permissions, &tables).await?;

user_auth_for_datasets(&permissions, &tables, tenant_id).await?;

let (records, fields) = execute(query, false).await?;

let (records, fields) = execute(query, false, tenant_id).await?;

pub async fn get_records_and_fields(

query_request: &Query,

creds: &SessionKey,

tenant_id: &Option<String>,

) -> Result<(Option<Vec<RecordBatch>>, Option<Vec<String>>), QueryError> {

let mut session_state = QUERY_SESSION.get_ctx().state();

session_state

.config_mut()

.options_mut()

.catalog

.default_schema = tenant_id.as_ref().map_or("public".into(), |v| v.to_owned());

let time_range =

TimeRange::parse_human_time(&query_request.start_time, &query_request.end_time)?;

let tables = resolve_stream_names(&query_request.query)?;

//check or load streams in memory

create_streams_for_distributed(tables.clone(), tenant_id).await?;

let query: LogicalQuery = into_query(query_request, &session_state, time_range).await?;

let permissions = Users.get_permissions(creds);

user_auth_for_datasets(&permissions, &tables, tenant_id).await?;

let (records, fields) = execute(query, false, tenant_id).await?;

🤖 Prompt for AI Agents

In `@src/handlers/http/query.rs` around lines 82 - 101, In get_records_and_fields: before calling into_query or execute, set the DataFusion default schema on the current session context so tenant-scoped SQL resolves to the tenant schema instead of "public"; obtain the schema name from tenant_id (fall back to "public" if None), and call the session context/state method that sets the default schema on QUERY_SESSION.get_ctx().state() (i.e., update the session_state's default_schema) so subsequent into_query(...) and execute(...) run under the tenant schema.

coderabbitai · 2026-01-16T08:03:57Z

src/metastore/metastore_traits.rs

+    async fn get_node_metadata(
+        &self,
+        node_type: NodeType,
+        tenant_id: &Option<String>,
+    ) -> Result<Vec<Bytes>, MetastoreError>;
    async fn delete_node_metadata(
        &self,
        domain_name: &str,
        node_type: NodeType,
    ) -> Result<bool, MetastoreError>;
    async fn put_node_metadata(&self, obj: &dyn MetastoreObject) -> Result<(), MetastoreError>;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for the implementations of these three methods rg -n "async fn (get_node_metadata|put_node_metadata|delete_node_metadata)" --type rust -A 15

Repository: parseablehq/parseable

Length of output: 5544

🏁 Script executed:

#!/bin/bash # Search for comments or context around node metadata methods rg -n "node.?metadata|tenant.*scop" --type rust -B 2 -A 2 src/metastore/ | head -100 # Also check how these methods are called rg -n "get_node_metadata|put_node_metadata|delete_node_metadata" --type rust src/metastore/metastores/object_store_metastore.rs | grep -v "async fn"

Repository: parseablehq/parseable

Length of output: 3629

🏁 Script executed:

#!/bin/bash # Search for calls to these methods (excluding the implementations) rg -n "\.get_node_metadata\(|\.put_node_metadata\(|\.delete_node_metadata\(" --type rust -B 2 -A 2 # Also check if NodeMetadata contains tenant_id info rg -n "struct NodeMetadata|impl.*NodeMetadata" --type rust -A 10

Repository: parseablehq/parseable

Length of output: 5122

🏁 Script executed:

#!/bin/bash # Find NodeMetadata::file_path implementation rg -n "fn file_path" --type rust src/handlers/http/modal/mod.rs -A 5 # Also check the complete get_node_metadata implementation to see if tenant_id is actually used rg -n "async fn get_node_metadata" --type rust src/metastore/metastores/object_store_metastore.rs -A 25

Repository: parseablehq/parseable

Length of output: 1620

Confirm: tenant_id parameter in get_node_metadata is inconsistent with the trait design.

Node metadata is stored globally (not tenant-scoped, as evidenced by NodeMetadata::file_path() which contains no tenant information). However, get_node_metadata accepts a tenant_id parameter and passes it to the storage layer, while put_node_metadata and delete_node_metadata both hardcode &None, indicating they operate on a global scope.

Either get_node_metadata should remove the unused tenant_id parameter for consistency, or the design needs clarification if node metadata is truly intended to be tenant-isolated at the API level.

🤖 Prompt for AI Agents

In `@src/metastore/metastore_traits.rs` around lines 320 - 330, The get_node_metadata signature is inconsistent because NodeMetadata is global but it currently takes tenant_id; change the trait method signature async fn get_node_metadata(&self, node_type: NodeType) -> Result<Vec<Bytes>, MetastoreError> (remove tenant_id) and update all implementations and call sites (including any calls that currently pass tenant_id and the storage-layer invocation) to stop passing tenant_id and instead use the same global behavior as put_node_metadata/delete_node_metadata (i.e., treat tenant as None). Ensure trait impls, tests, and any storage adapter methods invoked by get_node_metadata are updated to the new signature.

coderabbitai · 2026-01-16T08:03:57Z

src/otel/traces.rs

+pub fn flatten_otel_traces_protobuf(
+    message: &ExportTraceServiceRequest,
+    tenant_id: &str,
+) -> Vec<Value> {
    process_resource_spans(
        &message.resource_spans,
        |rs| rs.resource.as_ref(),
        |rs| &rs.scope_spans,
        |rs| &rs.schema_url,
+        tenant_id,
    )
 }

 /// this function performs the custom flattening of the otel traces event
 /// and returns a `Vec` of `Value::Object` of the flattened json
-pub fn flatten_otel_traces(message: &TracesData) -> Vec<Value> {
+pub fn flatten_otel_traces(message: &TracesData, tenant_id: &str) -> Vec<Value> {
    process_resource_spans(
        &message.resource_spans,
        |rs| rs.resource.as_ref(),
        |rs| &rs.scope_spans,
        |rs| &rs.schema_url,
+        tenant_id,
    )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, find the functions and their signatures echo "=== Function signatures ===" rg -nA 5 'pub fn flatten_otel_traces' src/otel/traces.rs echo -e "\n=== All call sites of flatten_otel_traces ===" rg -nP --type=rs '\bflatten_otel_traces(_protobuf)?\s*\(' echo -e "\n=== Context around line 922 ===" sed -n '915,930p' src/otel/traces.rs

Repository: parseablehq/parseable

Length of output: 1228

🏁 Script executed:

#!/bin/bash # Find all call sites using grep without type restrictions echo "=== All call sites of flatten_otel_traces (grep search) ===" grep -rn "flatten_otel_traces" src/ --include="*.rs" echo -e "\n=== Specifically check test file ===" grep -n "flatten_otel_traces" src/otel/traces.rs | grep -v "^[0-9]*:pub fn"

Repository: parseablehq/parseable

Length of output: 868

🏁 Script executed:

#!/bin/bash echo "=== Context around line 100 in ingest_utils.rs ===" sed -n '95,105p' src/handlers/http/modal/utils/ingest_utils.rs echo -e "\n=== All call sites of flatten_otel_traces_protobuf ===" grep -rn "flatten_otel_traces_protobuf" src/ --include="*.rs" | grep -v "^[^:]*:[0-9]*:pub fn"

Repository: parseablehq/parseable

Length of output: 581

Update test call site to pass tenant_id (test will not compile).
flatten_otel_traces now requires tenant_id, but the test test_flatten_otel_traces_complete_structure at line 922 still calls flatten_otel_traces(&traces_data) without it. The production call site at src/handlers/http/modal/utils/ingest_utils.rs:100 already passes tenant_str correctly.

🔧 Suggested fix (test call site)

- let result = flatten_otel_traces(&traces_data); + let result = flatten_otel_traces(&traces_data, "test-tenant");

🤖 Prompt for AI Agents

In `@src/otel/traces.rs` around lines 159 - 181, The test call site fails to compile because flatten_otel_traces now requires a tenant_id; update the test test_flatten_otel_traces_complete_structure to pass a tenant_id string (e.g., "test-tenant" or reuse any existing tenant_str in the test) when calling flatten_otel_traces(&traces_data, tenant_id), matching how the production call passes tenant_id (see flatten_otel_traces and process_resource_spans usage).

coderabbitai · 2026-01-16T08:03:57Z

src/query/listing_table_builder.rs

+            // None because no new data will be created using this method
+            match storage.list_dirs_relative(&prefix, &None).await {


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find list_dirs_relative implementations and trait definitions echo "=== Searching for list_dirs_relative implementations ===" rg -n "fn list_dirs_relative" --type rust -B 2 -A 10

Repository: parseablehq/parseable

Length of output: 4304

🏁 Script executed:

#!/bin/bash # Also search for the trait definition echo "=== Searching for trait definition ===" rg -n "list_dirs_relative" --type rust | head -30

Repository: parseablehq/parseable

Length of output: 1527

🏁 Script executed:

#!/bin/bash # Check the context around lines 101-102 in listing_table_builder.rs echo "=== Context around lines 101-102 ===" sed -n '95,110p' src/query/listing_table_builder.rs

Repository: parseablehq/parseable

Length of output: 716

🏁 Script executed:

#!/bin/bash # Search for other uses of list_dirs_relative to understand tenant_id usage patterns echo "=== Other uses of list_dirs_relative ===" rg -n "list_dirs_relative" --type rust -B 1 -A 1

Repository: parseablehq/parseable

Length of output: 3493

Clarify the comment to reflect correct multi-tenancy behavior.

The comment "None because no new data will be created using this method" is misleading. This is a read/listing operation, not a data creation operation. More importantly, passing None for tenant_id defaults to DEFAULT_TENANT (as seen in the implementations across s3.rs, gcs.rs, and azure_blob.rs).

The comment should explain why the default tenant is appropriate for this historical data listing context, for example: "tenant_id is None to list from the default tenant (multi-tenant queries handled at a higher level)" or similar.

Suggested comment improvement

- // None because no new data will be created using this method + // tenant_id is None to use the default tenant for this historical data listing match storage.list_dirs_relative(&prefix, &None).await {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// None because no new data will be created using this method

match storage.list_dirs_relative(&prefix, &None).await {

// tenant_id is None to use the default tenant for this historical data listing

match storage.list_dirs_relative(&prefix, &None).await {

🤖 Prompt for AI Agents

In `@src/query/listing_table_builder.rs` around lines 101 - 102, Update the misleading comment above the call to storage.list_dirs_relative(&prefix, &None).await in listing_table_builder.rs to state that tenant_id is passed as None to intentionally use the default tenant (DEFAULT_TENANT) for historical listing, and that multi-tenancy filtering is handled at a higher level; reference storage.list_dirs_relative and the prefix variable so reviewers can locate the call.

coderabbitai · 2026-01-16T08:03:57Z

src/rbac/mod.rs

+    pub fn get_userid_from_session(&self, session: &SessionKey) -> Option<(String, String)> {
+        sessions().get_user_and_tenant_id(session)
+    }
+
+    pub fn get_user_from_basic(&self, username: &str, password: &str) -> Option<User> {
+        for (_, usermap) in users().iter() {
+            for (_, user) in usermap.iter() {
+                if let UserType::Native(basic) = &user.ty
+                    && basic.username.eq(username)
+                    && basic.verify_password(password)
+                {
+                    return Some(user.clone());
+                }
+            }
+        }
+        None


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.rs" | head -20

Repository: parseablehq/parseable

Length of output: 610

🏁 Script executed:

cd src && find . -type f -name "*.rs" | head -20

Repository: parseablehq/parseable

Length of output: 562

🏁 Script executed:

cd .. && git ls-files | grep -E "\.(rs|toml)$" | head -30

Repository: parseablehq/parseable

Length of output: 133

🏁 Script executed:

wc -l src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 84

🏁 Script executed:

sed -n '250,300p' src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 1727

🏁 Script executed:

grep -n "fn users" src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

grep -n "get_user_from_basic" src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 251

🏁 Script executed:

sed -n '230,245p' src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 747

🏁 Script executed:

sed -n '1,50p' src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 1560

🏁 Script executed:

grep -n "struct User" src/rbac/mod.rs | head -5

Repository: parseablehq/parseable

Length of output: 113

🏁 Script executed:

grep -rn "UserType::Native" src/rbac/ --include="*.rs"

Repository: parseablehq/parseable

Length of output: 772

🏁 Script executed:

sed -n '200,260p' src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 2288

🏁 Script executed:

grep -n "fn users()" src/rbac/map.rs

Repository: parseablehq/parseable

Length of output: 119

🏁 Script executed:

grep -n "pub fn users" src/rbac/map.rs

Repository: parseablehq/parseable

Length of output: 119

🏁 Script executed:

sed -n '50,100p' src/rbac/map.rs

Repository: parseablehq/parseable

Length of output: 1136

🏁 Script executed:

grep -n "impl Users" src/rbac/map.rs | head -5

Repository: parseablehq/parseable

Length of output: 81

🏁 Script executed:

sed -n '54,150p' src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 3789

🏁 Script executed:

grep -n "pub struct Users" src/rbac/map.rs -A 20

Repository: parseablehq/parseable

Length of output: 890

🏁 Script executed:

grep -n "USERS" src/rbac/map.rs | head -10

Repository: parseablehq/parseable

Length of output: 228

🏁 Script executed:

rg "impl.*Users" src/rbac/ --max-count 5

Repository: parseablehq/parseable

Length of output: 171

🏁 Script executed:

grep -n "SessionKey" src/rbac/map.rs | head -20

Repository: parseablehq/parseable

Length of output: 780

🏁 Script executed:

grep -n "enum SessionKey" src/rbac/ -r

Repository: parseablehq/parseable

Length of output: 106

🏁 Script executed:

grep -n "BasicAuth" src/rbac/ -r | head -20

Repository: parseablehq/parseable

Length of output: 342

🏁 Script executed:

sed -n '188,200p' src/rbac/map.rs

Repository: parseablehq/parseable

Length of output: 601

🏁 Script executed:

grep -n "fn userid" src/rbac/user.rs

Repository: parseablehq/parseable

Length of output: 141

🏁 Script executed:

sed -n '50,120p' src/rbac/user.rs

Repository: parseablehq/parseable

Length of output: 2078

🏁 Script executed:

grep -rn "username" src/rbac/user.rs | grep -i "unique\|constraint\|duplicate"

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

grep -n "pub fn new_basic" src/rbac/user.rs -A 15

Repository: parseablehq/parseable

Length of output: 591

🏁 Script executed:

grep -rn "put_user\|insert.*user" src/rbac/ --include="*.rs" | head -20

Repository: parseablehq/parseable

Length of output: 1038

🏁 Script executed:

sed -n '140,160p' src/rbac/map.rs

Repository: parseablehq/parseable

Length of output: 706

🏁 Script executed:

grep -rn "get_tenant_id_from_key" src/ --include="*.rs" | head -10

Repository: parseablehq/parseable

Length of output: 915

🏁 Script executed:

grep -rn "fn get_tenant_id_from_key" src/ --include="*.rs"

Repository: parseablehq/parseable

Length of output: 152

🏁 Script executed:

sed -n '92,110p' src/utils/mod.rs

Repository: parseablehq/parseable

Length of output: 544

🏁 Script executed:

sed -n '230,270p' src/rbac/mod.rs

Repository: parseablehq/parseable

Length of output: 1628

🏁 Script executed:

grep -rn "create_user\|add_user\|register" src/handlers/http/ --include="*.rs" | grep -i "user" | head -20

Repository: parseablehq/parseable

Length of output: 241

🏁 Script executed:

grep -rn "SessionKey::BasicAuth" src/ --include="*.rs" -B 3 -A 3 | head -50

Repository: parseablehq/parseable

Length of output: 3131

🏁 Script executed:

grep -rn "extract_session_key_from_req" src/handlers/http/ --include="*.rs" -l

Repository: parseablehq/parseable

Length of output: 313

🏁 Script executed:

grep -n "extract_session_key_from_req" src/handlers/http/auth.rs -A 30

Repository: parseablehq/parseable

Length of output: 123

🏁 Script executed:

grep -rn "fn extract_session_key_from_req" src/ --include="*.rs"

Repository: parseablehq/parseable

Length of output: 172

🏁 Script executed:

sed -n '100,150p' src/handlers/http/oidc.rs

Repository: parseablehq/parseable

Length of output: 2330

🏁 Script executed:

grep -rn "get_user_from_basic" src/ --include="*.rs" -B 5 -A 10

Repository: parseablehq/parseable

Length of output: 2947

🏁 Script executed:

sed -n '51,80p' src/utils/actix.rs

Repository: parseablehq/parseable

Length of output: 1015

🏁 Script executed:

grep -rn "DEFAULT_TENANT" src/rbac/mod.rs | head -10

Repository: parseablehq/parseable

Length of output: 804

🏁 Script executed:

grep -rn "username.*unique\|unique.*username" src/ --include="*.rs"

Repository: parseablehq/parseable

Length of output: 47

🏁 Script executed:

sed -n '55,80p' src/utils/mod.rs

Repository: parseablehq/parseable

Length of output: 915

🏁 Script executed:

grep -rn "DEFAULT_TENANT\|default.*tenant" src/parseable.rs | head -10

Repository: parseablehq/parseable

Length of output: 114

🏁 Script executed:

wc -l src/utils/mod.rs

Repository: parseablehq/parseable

Length of output: 85

🏁 Script executed:

grep -rn "extract_session_key_from_req" src/ -A 5 -B 5 | head -100

Repository: parseablehq/parseable

Length of output: 6697

🏁 Script executed:

grep -rn "create.*user\|add.*user" src/handlers/http/ --include="*.rs" -l

Repository: parseablehq/parseable

Length of output: 490

🏁 Script executed:

find src/handlers/http -name "*user*.rs" -type f

Repository: parseablehq/parseable

Length of output: 47

Address tenant ambiguity in basic-auth credential matching.

get_user_from_basic iterates through all tenant maps and returns the first username/password match without tenant context. If the same credentials exist in multiple tenants, a user will be logged into whichever tenant's entry is encountered first during the iteration, resulting in potential incorrect tenant access. Additionally, SessionKey::BasicAuth contains only username and password with no tenant context, making it impossible to specify tenant during basic-auth.

Fix by either: (1) enforcing username uniqueness across all tenants, (2) including tenant context in BasicAuth requests (via subdomain, header, or explicit field), or (3) restricting basic-auth lookups to the tenant identified by the request context.

coderabbitai · 2026-01-16T08:03:58Z

src/storage/localfs.rs

+    async fn delete_stream(
+        &self,
+        stream_name: &str,
+        tenant_id: &Option<String>,
+    ) -> Result<(), ObjectStorageError> {
        let path = self.root.join(stream_name);
+        let tenant_str = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT);

        let result = fs::remove_dir_all(path).await;
        if result.is_ok() {
-            increment_object_store_calls_by_date("DELETE", &Utc::now().date_naive().to_string());
+            increment_object_store_calls_by_date(
+                "DELETE",
+                &Utc::now().date_naive().to_string(),
+                tenant_str,
+            );
        }

        Ok(result?)
    }


⚠️ Potential issue | 🟠 Major

Tenant_id isn’t applied to LocalFS paths for stream-scoped ops.
Even after adding tenant_id, delete_stream, list_dirs, list_dates, list_hours, and list_minutes still operate on root/<stream> and will mix tenants if the filesystem layout is tenant-prefixed. Please ensure paths are scoped by tenant (or confirm upstream already prefixes stream_name).

🔧 Example adjustment (delete_stream + list_dates)

- let path = self.root.join(stream_name); + let tenant_str = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT); + let path = self.root.join(tenant_str).join(stream_name); @@ - let path = self.root.join(stream_name); + let tenant_str = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT); + let path = self.root.join(tenant_str).join(stream_name);

Also applies to: 555-593, 630-663, 665-702

🤖 Prompt for AI Agents

In `@src/storage/localfs.rs` around lines 442 - 460, delete_stream and the other stream-scoped methods (list_dirs, list_dates, list_hours, list_minutes) build paths using self.root.join(stream_name) which ignores tenant_id and will mix tenants when filesystem layout is tenant-prefixed; update each of these functions to derive tenant_str = tenant_id.as_deref().unwrap_or(DEFAULT_TENANT) and scope paths by joining tenant_str before the stream (e.g., self.root.join(tenant_str).join(stream_name) or equivalent using existing path building helpers), then use that scoped path for fs operations and metrics; apply the same change to the other affected methods referenced in the comment (the blocks around lines ~555-593, ~630-663, ~665-702) so all stream-scoped filesystem operations consistently include the tenant prefix.

coderabbitai bot requested changes Jan 13, 2026

View reviewed changes

coderabbitai bot requested changes Jan 15, 2026

View reviewed changes

feat: Multi-tenancy for Parseable server

5a2e62e

parmesant force-pushed the multi-tenancy branch from 6fa57d7 to f60d975 Compare January 15, 2026 09:47

coderabbitai bot requested changes Jan 15, 2026

View reviewed changes

updates for multi-tenancy

9802cbf

parmesant force-pushed the multi-tenancy branch from f60d975 to 9802cbf Compare January 15, 2026 10:56

coderabbitai bot requested changes Jan 15, 2026

View reviewed changes

parmesant force-pushed the multi-tenancy branch from be1a48b to ca421da Compare January 15, 2026 14:19

coderabbitai bot requested changes Jan 15, 2026

View reviewed changes

fixes for quest OSS

b3ab9f8

parmesant force-pushed the multi-tenancy branch from ca421da to b3ab9f8 Compare January 16, 2026 07:50

coderabbitai bot requested changes Jan 16, 2026

View reviewed changes

	tracing::warn!(object_store_url=?object_store_url);
	tracing::trace!(object_store_url=?object_store_url);

	sessions.retain(\|(_, expiry)\| expiry < &now);
	sessions.retain(\|(_, expiry)\| expiry > &now);

		// None because no new data will be created using this method
		match storage.list_dirs_relative(&prefix, &None).await {

Uh oh!

feat: Multi-tenancy for Parseable server #1518

Are you sure you want to change the base?

feat: Multi-tenancy for Parseable server #1518

Conversation

parmesant commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

parmesant commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading