Skip to content

feat(BA-2938): Migrate Session data to RBAC database#9636

Draft
fregataa wants to merge 6 commits intomainfrom
feature/BA-2938-migrate-session-rbac
Draft

feat(BA-2938): Migrate Session data to RBAC database#9636
fregataa wants to merge 6 commits intomainfrom
feature/BA-2938-migrate-session-rbac

Conversation

@fregataa
Copy link
Member

@fregataa fregataa commented Mar 4, 2026

Summary

  • Add SESSION entity-type permissions to all role+scope combinations (skip domain member roles)
  • Create AUTO edges from User scope → Session (via user_uuid) and Project scope → Session (via group_id)
  • Follow new RBAC pattern with direct role+scope in permissions table (no permission_groups)
  • Use keyset pagination for scalability with large session tables

This migration brings Session entities into the RBAC system, enabling fine-grained access control. Unlike VFolder, Session has no invitation/sharing mechanism, making the migration simpler with only entity-type permissions and AUTO edges.

Test plan

  • Migration applies successfully (alembic upgrade head)
  • Migration rolls back correctly (alembic downgrade -1)
  • All quality checks pass (fmt, lint, check)
  • CI tests pass
  • Manual verification in dev/staging environment

Resolves BA-2938

Add alembic migration to migrate Session entities to the new RBAC system:

- Add SESSION entity-type permissions to all role+scope combinations
  - Skip domain member roles (scope too broad)
  - Member roles get READ operation only
  - Owner/admin roles get all operations
- Create AUTO edges from User scope → Session (via user_uuid)
- Create AUTO edges from Project scope → Session (via group_id)
- Use keyset pagination for scalability with large session tables
- Support both upgrade and downgrade operations

This migration follows the new RBAC pattern where permissions table
includes role_id, scope_type, scope_id directly (no permission_groups).
The association_scopes_entities table uses relation_type='auto' to mark
automatically managed scope-entity relationships.

Unlike VFolder, Session has no invitation/sharing mechanism, making the
migration simpler with only entity-type permissions and AUTO edges.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 4, 2026 18:17
@github-actions github-actions bot added the size:L 100~500 LoC label Mar 4, 2026
@github-actions github-actions bot added comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated labels Mar 4, 2026
@fregataa fregataa modified the milestones: Backlog, 26.3 Mar 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates Session entities into the RBAC system by adding entity-type permissions and creating AUTO scope→entity associations via an Alembic migration.

Changes:

  • Added an Alembic migration to backfill SESSION permissions for existing role+scope combinations.
  • Added AUTO association edges from user/project scopes to Session records using batched pagination.
  • Added a changelog entry describing the migration.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File Description
src/ai/backend/manager/models/alembic/versions/30c8308738ee_migrate_session_data_to_rbac.py Implements the RBAC data migration for Session permissions and scope associations.
changes/9636.feature.md Documents the feature/migration at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +68 to +71
values_list.append(
f"('{row.role_id}', '{row.scope_type}', '{row.scope_id}', "
f"'{EntityType.SESSION.value}', '{operation.value}')"
)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds SQL by interpolating database values into the statement text. Even if these fields are expected to be UUID/enums, this pattern is brittle (quoting/escaping issues) and makes it easier for unexpected data to break the migration. Prefer parameterized inserts (e.g., executemany with bind params) or a set-based INSERT ... SELECT that avoids string-building.

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +104
last_id = "00000000-0000-0000-0000-000000000000"
while True:
query = sa.text("""
SELECT id::text AS id, user_uuid::text AS user_uuid
FROM sessions
WHERE id::text > :last_id
ORDER BY id
LIMIT :limit
""")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasons why I don't use OFFSET here:

Using OFFSET to migrate huge data is inefficient.

fregataa added 4 commits March 5, 2026 03:37
…tent ordering

Replace text-based UUID comparison with UUID type comparison to avoid
potential ordering discrepancies between lexicographic (text) and binary
(UUID) sort orders.

Changes:
- Import UUID from uuid module
- Initialize last_id as UUID object instead of string
- Remove ::text casting in WHERE and SELECT clauses
- Use native UUID comparison (id > :last_id) instead of text comparison

This ensures that filtering and ordering use the same sort order,
preventing potential data skips or duplicates during batch processing.

Addresses Copilot review comment about UUID/text ordering mismatch.
…LECT

Replace application-side OFFSET pagination with a single set-based
INSERT ... SELECT query for better performance and consistency.

Changes:
- Remove while loop with OFFSET pagination
- Use CTE (WITH clause) to derive role+scope combinations
- Use UNION ALL to combine member and owner operations
- Use unnest() to expand operation arrays inline
- Single database round-trip instead of multiple batches

Benefits:
- O(1) time complexity vs O(N) with OFFSET
- No risk of row skips/duplicates from concurrent changes
- Simpler code without manual batching logic
- Better query plan from database optimizer

Addresses Copilot review comment about OFFSET inefficiency.
…d queries

Replace string interpolation in INSERT and DELETE statements with
parameterized queries to improve security and maintainability.

Changes for INSERT:
- Build values_list with dicts instead of string formatting
- Use parameterized query with :named parameters
- Execute individual inserts in loop (safe for ON CONFLICT)

Changes for DELETE:
- Replace 'SELECT + string join' pattern with subquery
- Use DELETE ... WHERE id IN (SELECT ... LIMIT N)
- Check rowcount instead of empty result set
- Keep all parameters bound safely

Benefits:
- Eliminates SQL injection risks from malformed UUIDs
- Prevents quoting/escaping issues
- Avoids oversized query strings with large batches
- More maintainable and readable code

Addresses Copilot review comments about SQL injection and DELETE pattern.
Rename constant to use more accurate terminology. 'SUFFIX' better
describes the pattern matching with str.endswith() than 'POSTFIX'.

Addresses Copilot review comment about naming convention.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +113 to +187
# Process User scope edges
last_id = UUID("00000000-0000-0000-0000-000000000000")
while True:
query = sa.text("""
SELECT id, user_uuid
FROM sessions
WHERE id > :last_id
ORDER BY id
LIMIT :limit
""")
rows = db_conn.execute(query, {"last_id": last_id, "limit": BATCH_SIZE}).all()
if not rows:
break

last_id = rows[-1].id

# Bulk insert using parameterized query
values_list = [
{
"scope_type": "user",
"scope_id": str(row.user_uuid),
"entity_type": entity_type,
"entity_id": str(row.id),
"relation_type": relation_type,
}
for row in rows
]

if values_list:
insert_query = sa.text("""
INSERT INTO association_scopes_entities (scope_type, scope_id, entity_type, entity_id, relation_type)
VALUES (:scope_type, :scope_id, :entity_type, :entity_id, :relation_type)
ON CONFLICT (scope_type, scope_id, entity_id) DO NOTHING
""")
for values in values_list:
db_conn.execute(insert_query, values)

# Process Project scope edges
last_id = UUID("00000000-0000-0000-0000-000000000000")
while True:
query = sa.text("""
SELECT id, group_id
FROM sessions
WHERE id > :last_id
ORDER BY id
LIMIT :limit
""")
rows = db_conn.execute(query, {"last_id": last_id, "limit": BATCH_SIZE}).all()
if not rows:
break

last_id = rows[-1].id

# Bulk insert using parameterized query
values_list = [
{
"scope_type": "project",
"scope_id": str(row.group_id),
"entity_type": entity_type,
"entity_id": str(row.id),
"relation_type": relation_type,
}
for row in rows
]

if values_list:
insert_query = sa.text("""
INSERT INTO association_scopes_entities (scope_type, scope_id, entity_type, entity_id, relation_type)
VALUES (:scope_type, :scope_id, :entity_type, :entity_id, :relation_type)
ON CONFLICT (scope_type, scope_id, entity_id) DO NOTHING
""")
for values in values_list:
db_conn.execute(insert_query, values)


Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project-scope association block largely duplicates the user-scope block above. Refactoring the shared keyset-pagination + insert logic into a helper would reduce duplication and the risk of fixing a bug in only one path.

Suggested change
# Process User scope edges
last_id = UUID("00000000-0000-0000-0000-000000000000")
while True:
query = sa.text("""
SELECT id, user_uuid
FROM sessions
WHERE id > :last_id
ORDER BY id
LIMIT :limit
""")
rows = db_conn.execute(query, {"last_id": last_id, "limit": BATCH_SIZE}).all()
if not rows:
break
last_id = rows[-1].id
# Bulk insert using parameterized query
values_list = [
{
"scope_type": "user",
"scope_id": str(row.user_uuid),
"entity_type": entity_type,
"entity_id": str(row.id),
"relation_type": relation_type,
}
for row in rows
]
if values_list:
insert_query = sa.text("""
INSERT INTO association_scopes_entities (scope_type, scope_id, entity_type, entity_id, relation_type)
VALUES (:scope_type, :scope_id, :entity_type, :entity_id, :relation_type)
ON CONFLICT (scope_type, scope_id, entity_id) DO NOTHING
""")
for values in values_list:
db_conn.execute(insert_query, values)
# Process Project scope edges
last_id = UUID("00000000-0000-0000-0000-000000000000")
while True:
query = sa.text("""
SELECT id, group_id
FROM sessions
WHERE id > :last_id
ORDER BY id
LIMIT :limit
""")
rows = db_conn.execute(query, {"last_id": last_id, "limit": BATCH_SIZE}).all()
if not rows:
break
last_id = rows[-1].id
# Bulk insert using parameterized query
values_list = [
{
"scope_type": "project",
"scope_id": str(row.group_id),
"entity_type": entity_type,
"entity_id": str(row.id),
"relation_type": relation_type,
}
for row in rows
]
if values_list:
insert_query = sa.text("""
INSERT INTO association_scopes_entities (scope_type, scope_id, entity_type, entity_id, relation_type)
VALUES (:scope_type, :scope_id, :entity_type, :entity_id, :relation_type)
ON CONFLICT (scope_type, scope_id, entity_id) DO NOTHING
""")
for values in values_list:
db_conn.execute(insert_query, values)
def _paginate_and_associate(scope_type: str, scope_id_column: str) -> None:
last_id = UUID("00000000-0000-0000-0000-000000000000")
insert_query = sa.text("""
INSERT INTO association_scopes_entities (scope_type, scope_id, entity_type, entity_id, relation_type)
VALUES (:scope_type, :scope_id, :entity_type, :entity_id, :relation_type)
ON CONFLICT (scope_type, scope_id, entity_id) DO NOTHING
""")
while True:
query = sa.text(f"""
SELECT id, {scope_id_column} AS scope_id
FROM sessions
WHERE id > :last_id
ORDER BY id
LIMIT :limit
""")
rows = db_conn.execute(
query,
{"last_id": last_id, "limit": BATCH_SIZE},
).all()
if not rows:
break
last_id = rows[-1].id
# Bulk insert using parameterized query
values_list = [
{
"scope_type": scope_type,
"scope_id": str(row.scope_id),
"entity_type": entity_type,
"entity_id": str(row.id),
"relation_type": relation_type,
}
for row in rows
]
if values_list:
for values in values_list:
db_conn.execute(insert_query, values)
# Process User scope edges
_paginate_and_associate("user", "user_uuid")
# Process Project scope edges
_paginate_and_associate("project", "group_id")

Copilot uses AI. Check for mistakes.
Comment on lines +147 to +149
for values in values_list:
db_conn.execute(insert_query, values)

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inserts association rows one-by-one inside the batch (for values in values_list: db_conn.execute(...)). For large sessions tables this causes many DB round-trips and can make the migration impractically slow. Prefer a single executemany call (passing the full values_list to execute) or a set-based insert per batch.

Copilot uses AI. Check for mistakes.
Comment on lines +184 to +186
for values in values_list:
db_conn.execute(insert_query, values)

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This batch insert also executes one INSERT per row. Please switch to executemany (single execute call with a list of param dicts) or a set-based INSERT ... SELECT to avoid O(n) database calls.

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
delete_query = sa.text("""
DELETE FROM association_scopes_entities
WHERE id IN (
SELECT id FROM association_scopes_entities
WHERE entity_type = :entity_type
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration comment says it removes SESSION "AUTO" edges, but this DELETE filters only by entity_type. Either add a relation_type = 'auto' filter or update the wording; current behavior removes all session associations regardless of relation_type.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +39
# Precompute operation lists
member_ops = [op.value for op in OperationType.member_operations()]
owner_ops = [op.value for op in OperationType.owner_operations()]
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OperationType.member_operations() / owner_operations() return sets; building lists directly from them can produce nondeterministic ordering. It’s safer to sort the operation values to keep the migration deterministic and easier to reason about.

Suggested change
# Precompute operation lists
member_ops = [op.value for op in OperationType.member_operations()]
owner_ops = [op.value for op in OperationType.owner_operations()]
# Precompute operation lists (sorted for deterministic ordering)
member_ops = sorted(op.value for op in OperationType.member_operations())
owner_ops = sorted(op.value for op in OperationType.owner_operations())

Copilot uses AI. Check for mistakes.
@fregataa fregataa marked this pull request as draft March 5, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants