Fall back to sys.argv[0] when __file__ is not defined#251
Fall back to sys.argv[0] when __file__ is not defined#251mbid wants to merge 1 commit intoNVIDIA:devfrom
Conversation
Greptile SummaryThis PR applies a uniform defensive fix across five benchmark entry-point scripts to prevent Key observations:
Confidence Score: 3/5
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Script starts] --> B{"__file__ in dir?"}
B -- Yes --> C[Use __file__]
B -- "No: IPython/Databricks" --> D["Use sys.argv[0]"]
C --> E["os.path.dirname(path)"]
D --> E
E --> F["parent_dir = abspath(dirname + '..')"]
F --> G["utils_dir = parent_dir/utils"]
G --> H{utils_dir in sys.path?}
H -- No --> I["sys.path.insert(0, utils_dir)"]
H -- Yes --> J[Skip insert]
I --> K[Import utils modules]
J --> K
Last reviewed commit: a7ec577 |
|
|
||
| #For adding utils to path | ||
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) | ||
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__ if '__file__' in dir() else sys.argv[0]), '..')) |
There was a problem hiding this comment.
sys.argv[0] may not point to the script in all IPython/Databricks contexts
In certain Databricks execution modes (e.g. when code is exec'd by IPython rather than run via ipython script.py on the CLI), sys.argv[0] will be the IPython launcher path (e.g. /usr/local/lib/python3.x/dist-packages/ipykernel_launcher.py) rather than the user script path. In that case os.path.dirname(sys.argv[0]) resolves to the wrong directory, and the subsequent sys.path.insert silently adds a non-existent utils directory, causing the immediately following from check import ... or from spark_utils import ... to raise an ImportError.
The same issue applies identically in all five changed files: nds-h/nds_h_gen_data.py:40, nds-h/nds_h_gen_query_stream.py:37, nds-h/nds_h_power.py:45, nds/nds_maintenance.py:48, nds/nds_power.py:49.
A more robust fallback would use inspect to resolve the actual source file, or at minimum validate that the resolved utils_dir exists before inserting it:
import inspect as _inspect
def _get_script_path():
try:
return __file__
except NameError:
pass
frame = _inspect.currentframe()
if frame and frame.f_back:
src = _inspect.getfile(frame.f_back)
if src != '<string>':
return src
return sys.argv[0]
parent_dir = os.path.abspath(os.path.join(os.path.dirname(_get_script_path()), '..'))At minimum, please verify (ideally with a CI test or a note in the PR) that sys.argv[0] reliably holds the script path in the specific Databricks Git-sourced task execution mode you are targeting.
|
|
||
| #For adding utils to path | ||
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) | ||
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__ if '__file__' in dir() else sys.argv[0]), '..')) |
There was a problem hiding this comment.
Prefer try/except NameError over dir() membership check
'__file__' in dir() is less idiomatic than a try/except NameError block and produces a harder-to-read one-liner. The standard Python pattern for handling a potentially-undefined built-in name is:
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__ if '__file__' in dir() else sys.argv[0]), '..')) | |
| _script_file = sys.argv[0] | |
| try: | |
| _script_file = __file__ | |
| except NameError: | |
| pass | |
| parent_dir = os.path.abspath(os.path.join(os.path.dirname(_script_file), '..')) |
This also applies to the identical line in nds-h/nds_h_gen_query_stream.py:37, nds-h/nds_h_power.py:45, nds/nds_maintenance.py:48, and nds/nds_power.py:49.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
@mbid please add sign-off when commit, to ease the CI check. |
Databricks runs Git-sourced Python files via IPython where __file__ is not defined, causing NameError. Use sys.argv[0] as fallback. Signed-off-by: Martin Bidlingmaier <bidlingmaier@nvidia.com>
116736d to
a7ec577
Compare
I force pushed after |
|
Is anything else required from my side to get this merged? |
We strive to avoid forced-pushs. The check is for just at least one commit on the PR branch to have been signed. Signing an empty commit
I find the comments raised by greptile legitimate and easy to address. |
Databricks runs Git-sourced Python files via IPython where file is not defined, causing NameError. Use sys.argv[0] as fallback.