Fix __file__ NameError in power run scripts on Databricks#255
Fix __file__ NameError in power run scripts on Databricks#255gerashegalov merged 4 commits intoNVIDIA:devfrom
Conversation
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
Greptile SummaryThis PR refactors the Databricks Confidence Score: 5/5Safe to merge — the fix is correct and complete, with no remaining P0/P1 issues. The inspect.stack()[1].filename approach is the right solution: it reads co_filename from the compiled bytecode, which is exactly the filename argument Databricks passes to compile(). All seven previously affected files are updated. The only remaining finding is a P2 about duplicating setup_utils.py across nds/ and nds-h/, which does not block merging. No files require special attention.
|
| Filename | Overview |
|---|---|
| nds/setup_utils.py | New helper module — uses inspect.stack()[1].filename to resolve the caller's path, which works correctly in both standard Python execution and Databricks exec(compile(...)) contexts. |
| nds-h/setup_utils.py | New helper module — byte-for-byte identical to nds/setup_utils.py; same correct implementation, but duplicated across both NDS variants. |
| nds/nds_power.py | Replaces bare file path manipulation with add_utils_to_sys_path(); import ordering and dedup guard are correct. |
| nds/nds_maintenance.py | Replaces bare file path manipulation with add_utils_to_sys_path(); utils dir is already added when nds_power is imported earlier, so the explicit call is a harmless no-op thanks to the dedup guard. |
| nds-h/nds_h_power.py | Replaces bare file path manipulation with add_utils_to_sys_path(); straightforward and correct. |
| nds-h/nds_h_gen_data.py | Addresses the previously flagged unguarded file regression; also gains the dedup guard that the old unconditional sys.path.insert(0,...) lacked. |
| nds-h/nds_h_gen_query_stream.py | Addresses the previously flagged unguarded file regression; also gains the dedup guard that the old unconditional sys.path.insert(0,...) lacked. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Databricks: exec(compile(f.read(), filename, 'exec'))"] -->|"__file__ undefined"| B["nds_power.py / nds_h_power.py\nnds_maintenance.py\nnds_h_gen_data.py\nnds_h_gen_query_stream.py"]
C["Standard Python: python nds_power.py"] --> B
B --> D["from setup_utils import add_utils_to_sys_path\nadd_utils_to_sys_path()"]
D --> E["inspect.stack()[1].filename\n→ co_filename from bytecode\n= actual script path"]
E --> F["os.path.dirname(caller_file) / '..' / 'utils'"]
F --> G{"utils_dir in sys.path?"}
G -->|No| H["sys.path.insert(0, utils_dir)"]
G -->|Yes| I["no-op (dedup guard)"]
H --> J["from spark_utils import ...\nfrom profiler import ..."]
I --> J
Reviews (3): Last reviewed commit: "address review comments" | Re-trigger Greptile
Signed-off-by: Niranjan Artal <nartal@nvidia.com>
| # Python doesn't automatically include sibling directories in the import path. | ||
| # We need to explicitly add the utils directory to sys.path to import shared utilities. | ||
| # Note: __file__ is not defined when Databricks runs scripts via exec(compile(...)), | ||
| # so fall back to inspect to retrieve the filename from the compiled bytecode. | ||
| try: | ||
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) | ||
| except NameError: | ||
| import inspect | ||
| _this_file = inspect.getfile(inspect.currentframe()) | ||
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(_this_file), '..')) |
There was a problem hiding this comment.
This is not a oneliner anymore. Please refactor into a reuable function
This PR fixes a regression on DB platform after this PR was merged - #243.
Fixes NameError: name 'file' is not defined that occurs when running nds_power.py or nds_h_power.py on Databricks, where scripts are executed via exec(compile(...)) rather than direct invocation.
Uses inspect.stack() to resolve the calling script's location from the bytecode (co_filename), which correctly returns the script path that Databricks passes to compile().
Extracted the path resolution logic into a reusable add_utils_to_sys_path() function in setup_utils.py to avoid duplicating the multi-line fix across files.
Applied the fix to all five affected files
Fixes below error: