fix(databricks): sanitize username in memtable volume name#709
Conversation
|
This PR was auto-closed. Only contributors approved with Maintainers review auto-closed issues daily. Issues that do not meet the quality bar in CONTRIBUTING.md will not be reopened or receive a reply. If a maintainer replies See CONTRIBUTING.md. |
🧹 Preview RemovedThe preview deployment for this PR has been cleaned up. |
|
|
||
| def connect(self) -> BaseBackend: | ||
| """Create an Ibis Databricks connection.""" | ||
| import getpass |
There was a problem hiding this comment.
These 3 imports should be imported at the top level of the file.
There was a problem hiding this comment.
moved and cleaned to limit the strict necessary ones.
| # Usernames with dots (e.g. "jean-nicolas.fine") break UC volume name parsing: | ||
| # Databricks treats the dot as a catalog/schema separator, creating a 4-part | ||
| # path instead of a valid 3-part one. Sanitize before ibis constructs the name. | ||
| safe_user = re.sub(r"[^a-zA-Z0-9_-]", "_", getpass.getuser()) |
There was a problem hiding this comment.
Genuine question why do you need to use getpass.getuser() which gets the name of the machine the server is running on. Which might be weird as we have already a user/password given in the nao_config.yaml.
I'm not sure this is a good idea to get on-the-fly a username like this?
There was a problem hiding this comment.
Ok I just had a look at https://github.com/ibis-project/ibis/blob/e0402155f017fb94bf1129e26773d9ffff2136f5/ibis/backends/databricks/__init__.py#L642 I dont fully understand why Databricks/Ibis needs this but 🤷
There was a problem hiding this comment.
Thanks for your feedback @Bl3f
You're right, there is a better and shorter way to handle it.
Instead of using the getpass.getuser(), we can use the connection name set in nao_config.yaml. This will avoid relying on the username.
05e1f08 to
e885975
Compare
Usernames containing dots (e.g. "john.doe") caused a UC_VOLUME_NOT_FOUND error because Databricks interprets the dot as a Unity Catalog path separator, turning a valid 3-part volume identifier into an invalid 4-part one. Fix by replacing any character outside [a-zA-Z0-9_-] with an underscore when building the memtable_volume name passed to ibis. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e885975 to
7083a9e
Compare
Signed-off-by: Christophe Blefari <christophe.blefari@gmail.com>
🚀 Preview Deployment
Preview will be automatically removed when this PR is closed. |
Problem
Users with a dot in their OS username (e.g.
john.doe) get aUC_VOLUME_NOT_FOUNDerror when connecting to Databricks:Ibis builds the
memtable_volumename as{getpass.getuser()}-py={version}-pid={pid}. When the username contains a dot, Databricks Unity Catalog interprets it as a path separator, producing an invalid 4-part identifier instead of a valid 3-part one.Fix
Sanitize the username before constructing the volume name by replacing any character outside
[a-zA-Z0-9_-]with an underscore, then pass the result explicitly asmemtable_volumetoibis.databricks.connect().Test
Run
nao debugwith a Databricks connection configured — previously failed withUC_VOLUME_NOT_FOUND, now connects successfully.🤖 Generated with Claude Code