Skip to content

fix(databricks): sanitize username in memtable volume name#709

Merged
Bl3f merged 2 commits intogetnao:mainfrom
jeannicolasfine:fix/databricks-dotted-username-volume
May 7, 2026
Merged

fix(databricks): sanitize username in memtable volume name#709
Bl3f merged 2 commits intogetnao:mainfrom
jeannicolasfine:fix/databricks-dotted-username-volume

Conversation

@jeannicolasfine
Copy link
Copy Markdown
Contributor

Problem

Users with a dot in their OS username (e.g. john.doe) get a UC_VOLUME_NOT_FOUND error when connecting to Databricks:

[UC_VOLUME_NOT_FOUND] Volume `catalog`.`schema`.`john`.`doe-py=3124-pid=98765` does not exist.

Ibis builds the memtable_volume name as {getpass.getuser()}-py={version}-pid={pid}. When the username contains a dot, Databricks Unity Catalog interprets it as a path separator, producing an invalid 4-part identifier instead of a valid 3-part one.

Fix

Sanitize the username before constructing the volume name by replacing any character outside [a-zA-Z0-9_-] with an underscore, then pass the result explicitly as memtable_volume to ibis.databricks.connect().

Test

Run nao debug with a Databricks connection configured — previously failed with UC_VOLUME_NOT_FOUND, now connects successfully.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

This PR was auto-closed. Only contributors approved with lgtm can open PRs. Open an issue first.

Maintainers review auto-closed issues daily. Issues that do not meet the quality bar in CONTRIBUTING.md will not be reopened or receive a reply.

If a maintainer replies lgtmi, your future issues will stay open. If a maintainer replies lgtm, your future issues and PRs will stay open.

See CONTRIBUTING.md.

@github-actions github-actions Bot closed this May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

🧹 Preview Removed

The preview deployment for this PR has been cleaned up.

Copy link
Copy Markdown
Contributor

@Bl3f Bl3f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question


def connect(self) -> BaseBackend:
"""Create an Ibis Databricks connection."""
import getpass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 imports should be imported at the top level of the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved and cleaned to limit the strict necessary ones.

# Usernames with dots (e.g. "jean-nicolas.fine") break UC volume name parsing:
# Databricks treats the dot as a catalog/schema separator, creating a 4-part
# path instead of a valid 3-part one. Sanitize before ibis constructs the name.
safe_user = re.sub(r"[^a-zA-Z0-9_-]", "_", getpass.getuser())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genuine question why do you need to use getpass.getuser() which gets the name of the machine the server is running on. Which might be weird as we have already a user/password given in the nao_config.yaml.

I'm not sure this is a good idea to get on-the-fly a username like this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I just had a look at https://github.com/ibis-project/ibis/blob/e0402155f017fb94bf1129e26773d9ffff2136f5/ibis/backends/databricks/__init__.py#L642 I dont fully understand why Databricks/Ibis needs this but 🤷

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback @Bl3f
You're right, there is a better and shorter way to handle it.
Instead of using the getpass.getuser(), we can use the connection name set in nao_config.yaml. This will avoid relying on the username.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@jeannicolasfine jeannicolasfine force-pushed the fix/databricks-dotted-username-volume branch from 05e1f08 to e885975 Compare May 4, 2026 19:41
Usernames containing dots (e.g. "john.doe") caused a UC_VOLUME_NOT_FOUND
error because Databricks interprets the dot as a Unity Catalog path
separator, turning a valid 3-part volume identifier into an invalid
4-part one.

Fix by replacing any character outside [a-zA-Z0-9_-] with an underscore
when building the memtable_volume name passed to ibis.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jeannicolasfine jeannicolasfine force-pushed the fix/databricks-dotted-username-volume branch from e885975 to 7083a9e Compare May 4, 2026 19:55
Signed-off-by: Christophe Blefari <christophe.blefari@gmail.com>
@Bl3f Bl3f merged commit 13cc586 into getnao:main May 7, 2026
2 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🚀 Preview Deployment

URL https://pr-709-f6bb5e0.preview.getnao.io
Commit f6bb5e0

⚠️ No LLM API keys configured - you'll see the API key setup flow when trying to chat.


Preview will be automatically removed when this PR is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants