Skip to content

Fix __file__ NameError in power run scripts on Databricks#255

Merged
gerashegalov merged 4 commits intoNVIDIA:devfrom
nartal1:db_nameError_fix
Apr 10, 2026
Merged

Fix __file__ NameError in power run scripts on Databricks#255
gerashegalov merged 4 commits intoNVIDIA:devfrom
nartal1:db_nameError_fix

Conversation

@nartal1
Copy link
Copy Markdown
Contributor

@nartal1 nartal1 commented Apr 7, 2026

This PR fixes a regression on DB platform after this PR was merged - #243.

  • Fixes NameError: name 'file' is not defined that occurs when running nds_power.py or nds_h_power.py on Databricks, where scripts are executed via exec(compile(...)) rather than direct invocation.

  • Uses inspect.stack() to resolve the calling script's location from the bytecode (co_filename), which correctly returns the script path that Databricks passes to compile().

  • Extracted the path resolution logic into a reusable add_utils_to_sys_path() function in setup_utils.py to avoid duplicating the multi-line fix across files.

  • Applied the fix to all five affected files

Fixes below error:

"NameError: name '__file__' is not defined\n[Trace ID: 00-0ec742bbbcdc1e38f6f7f0a9a405a7b9-28f79e43e09612e0-00]"
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)\nFile \u001b[0;32m~/.ipykernel/2187/command--1-3962067571:8\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[38;5;28;01mdel\u001b[39;00m sys\n\u001b[1;32m      7\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mopen\u001b[39m(filename, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrb\u001b[39m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;28;01mas\u001b[39;00m f:\n\u001b[0;32m----> 8\u001b[0m   exec(\u001b[38;5;28mcompile\u001b[39m(f\u001b[38;5;241m.\u001b[39mread(), filename, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mexec\u001b[39m\u001b[38;5;124m'\u001b[39m))\n\nFile \u001b[0;32m/Workspace/Repos/.internal/ff6841501e_commits/058f945478263612838602cdd586e7ffb23766e7/nds/nds_power.py:49\u001b[0m\n\u001b[1;32m     45\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mnds_schema\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m get_schemas\n\u001b[1;32m     47\u001b[0m \u001b[38;5;66;03m# Python doesn't automatically include sibling directories in the import path.\u001b[39;00m\n\u001b[1;32m     48\u001b[0m \u001b[38;5;66;03m# We need to explicitly add the utils directory to sys.path to import shared utilities.\u001b[39;00m\n\u001b[0;32m---> 49\u001b[0m parent_dir \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mabspath(os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mdirname(\u001b[38;5;18m__file__\u001b[39m), \u001b[38;5;124m'\u001b[39m\u001b[38;5;124m..\u001b[39m\u001b[38;5;124m'\u001b[39m))\n\u001b[1;32m     50\u001b[0m utils_dir \u001b[38;5;241m=\u001b[39m os\u001b[38;5;241m.\u001b[39mpath\u001b[38;5;241m.\u001b[39mjoin(parent_dir, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mutils\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m     51\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m utils_dir \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m sys\u001b[38;5;241m.\u001b[39mpath:\n\n\u001b[0;31mNameError\u001b
[0m: name '__file__' is not defined"

Signed-off-by: Niranjan Artal <nartal@nvidia.com>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

This PR refactors the Databricks __file__ regression (introduced in #243) into a clean shared helper, add_utils_to_sys_path(), which resolves the caller's path via inspect.stack()[1].filename — correctly reading co_filename from the compiled bytecode rather than relying on __file__ or sys.argv[0]. All seven affected files across both NDS variants are updated, and the previously flagged unguarded files (nds_maintenance.py, nds_h_gen_data.py, nds_h_gen_query_stream.py) are now covered as well.

Confidence Score: 5/5

Safe to merge — the fix is correct and complete, with no remaining P0/P1 issues.

The inspect.stack()[1].filename approach is the right solution: it reads co_filename from the compiled bytecode, which is exactly the filename argument Databricks passes to compile(). All seven previously affected files are updated. The only remaining finding is a P2 about duplicating setup_utils.py across nds/ and nds-h/, which does not block merging.

No files require special attention.

Vulnerabilities

No security concerns identified. The change manipulates sys.path using a path derived from the calling script's bytecode filename, which is no different in trust level from the previous __file__-based approach.

Important Files Changed

Filename Overview
nds/setup_utils.py New helper module — uses inspect.stack()[1].filename to resolve the caller's path, which works correctly in both standard Python execution and Databricks exec(compile(...)) contexts.
nds-h/setup_utils.py New helper module — byte-for-byte identical to nds/setup_utils.py; same correct implementation, but duplicated across both NDS variants.
nds/nds_power.py Replaces bare file path manipulation with add_utils_to_sys_path(); import ordering and dedup guard are correct.
nds/nds_maintenance.py Replaces bare file path manipulation with add_utils_to_sys_path(); utils dir is already added when nds_power is imported earlier, so the explicit call is a harmless no-op thanks to the dedup guard.
nds-h/nds_h_power.py Replaces bare file path manipulation with add_utils_to_sys_path(); straightforward and correct.
nds-h/nds_h_gen_data.py Addresses the previously flagged unguarded file regression; also gains the dedup guard that the old unconditional sys.path.insert(0,...) lacked.
nds-h/nds_h_gen_query_stream.py Addresses the previously flagged unguarded file regression; also gains the dedup guard that the old unconditional sys.path.insert(0,...) lacked.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Databricks: exec(compile(f.read(), filename, 'exec'))"] -->|"__file__ undefined"| B["nds_power.py / nds_h_power.py\nnds_maintenance.py\nnds_h_gen_data.py\nnds_h_gen_query_stream.py"]
    C["Standard Python: python nds_power.py"] --> B
    B --> D["from setup_utils import add_utils_to_sys_path\nadd_utils_to_sys_path()"]
    D --> E["inspect.stack()[1].filename\n→ co_filename from bytecode\n= actual script path"]
    E --> F["os.path.dirname(caller_file) / '..' / 'utils'"]
    F --> G{"utils_dir in sys.path?"}
    G -->|No| H["sys.path.insert(0, utils_dir)"]
    G -->|Yes| I["no-op (dedup guard)"]
    H --> J["from spark_utils import ...\nfrom profiler import ..."]
    I --> J
Loading

Reviews (3): Last reviewed commit: "address review comments" | Re-trigger Greptile

Comment thread nds/nds_power.py Outdated
Comment thread nds-h/nds_h_power.py Outdated
Comment thread nds/nds_power.py Outdated
nartal1 added 2 commits April 7, 2026 11:32
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Comment thread nds-h/nds_h_gen_data.py Outdated
Comment on lines +39 to +48
# Python doesn't automatically include sibling directories in the import path.
# We need to explicitly add the utils directory to sys.path to import shared utilities.
# Note: __file__ is not defined when Databricks runs scripts via exec(compile(...)),
# so fall back to inspect to retrieve the filename from the compiled bytecode.
try:
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
except NameError:
import inspect
_this_file = inspect.getfile(inspect.currentframe())
parent_dir = os.path.abspath(os.path.join(os.path.dirname(_this_file), '..'))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a oneliner anymore. Please refactor into a reuable function

@nartal1 nartal1 requested a review from gerashegalov April 9, 2026 22:14
Copy link
Copy Markdown
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gerashegalov gerashegalov merged commit af0cd8a into NVIDIA:dev Apr 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants