Skip to content

Fall back to sys.argv[0] when __file__ is not defined#251

Closed
mbid wants to merge 1 commit intoNVIDIA:devfrom
mbid:fix-databricks-file-compat
Closed

Fall back to sys.argv[0] when __file__ is not defined#251
mbid wants to merge 1 commit intoNVIDIA:devfrom
mbid:fix-databricks-file-compat

Conversation

@mbid
Copy link
Copy Markdown

@mbid mbid commented Mar 2, 2026

Databricks runs Git-sourced Python files via IPython where file is not defined, causing NameError. Use sys.argv[0] as fallback.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 2, 2026

Greptile Summary

This PR applies a uniform defensive fix across five benchmark entry-point scripts to prevent NameError when scripts are executed via IPython on Databricks Git-sourced runs, where __file__ is not injected into the module namespace. The fix adds a conditional fallback to sys.argv[0] when __file__ is unavailable.

Key observations:

  • The guard mechanism ('__file__' in dir()) is functionally correct: dir() without arguments reflects the module's global namespace and accurately detects whether __file__ has been defined by the Python runtime.
  • No new dependencies or logic are introduced; this is purely defensive path-resolution boilerplate.
  • The fix prevents a hard NameError crash on Databricks IPython kernel execution.
  • The choice of sys.argv[0] as a fallback introduces some uncertainty: in certain IPython execution modes (e.g., when code is exec'd rather than run via CLI), sys.argv[0] may point to the kernel launcher rather than the user script, which could result in an incorrect utils_dir path. However, this concern is a known architectural trade-off in the fallback strategy and does not invalidate the fix's primary purpose of preventing the NameError.

Confidence Score: 3/5

  • The fix prevents a hard crash in Databricks IPython contexts, but the fallback value sys.argv[0] may not always resolve to the expected script path in all kernel execution modes.
  • The five-file change is small, self-contained, and correct at the functional level: the guard logic cannot regress normal (__file__-present) execution. The remaining uncertainty stems from the reliability of sys.argv[0] in various Databricks IPython kernel configurations, which could potentially cause silent failures (wrong utils_dir path) rather than loud ones. This architectural limitation on the fallback strategy is known and documented but unresolved, justifying a 3/5 rather than a higher score.
  • No files require special attention. All five modified files apply the same pattern with consistent safety properties.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Script starts] --> B{"__file__ in dir?"}
    B -- Yes --> C[Use __file__]
    B -- "No: IPython/Databricks" --> D["Use sys.argv[0]"]
    C --> E["os.path.dirname(path)"]
    D --> E
    E --> F["parent_dir = abspath(dirname + '..')"]
    F --> G["utils_dir = parent_dir/utils"]
    G --> H{utils_dir in sys.path?}
    H -- No --> I["sys.path.insert(0, utils_dir)"]
    H -- Yes --> J[Skip insert]
    I --> K[Import utils modules]
    J --> K
Loading

Last reviewed commit: a7ec577

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread nds-h/nds_h_gen_data.py

#For adding utils to path
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__ if '__file__' in dir() else sys.argv[0]), '..'))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys.argv[0] may not point to the script in all IPython/Databricks contexts

In certain Databricks execution modes (e.g. when code is exec'd by IPython rather than run via ipython script.py on the CLI), sys.argv[0] will be the IPython launcher path (e.g. /usr/local/lib/python3.x/dist-packages/ipykernel_launcher.py) rather than the user script path. In that case os.path.dirname(sys.argv[0]) resolves to the wrong directory, and the subsequent sys.path.insert silently adds a non-existent utils directory, causing the immediately following from check import ... or from spark_utils import ... to raise an ImportError.

The same issue applies identically in all five changed files: nds-h/nds_h_gen_data.py:40, nds-h/nds_h_gen_query_stream.py:37, nds-h/nds_h_power.py:45, nds/nds_maintenance.py:48, nds/nds_power.py:49.

A more robust fallback would use inspect to resolve the actual source file, or at minimum validate that the resolved utils_dir exists before inserting it:

import inspect as _inspect

def _get_script_path():
    try:
        return __file__
    except NameError:
        pass
    frame = _inspect.currentframe()
    if frame and frame.f_back:
        src = _inspect.getfile(frame.f_back)
        if src != '<string>':
            return src
    return sys.argv[0]

parent_dir = os.path.abspath(os.path.join(os.path.dirname(_get_script_path()), '..'))

At minimum, please verify (ideally with a CI test or a note in the PR) that sys.argv[0] reliably holds the script path in the specific Databricks Git-sourced task execution mode you are targeting.

Comment thread nds-h/nds_h_gen_data.py

#For adding utils to path
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__ if '__file__' in dir() else sys.argv[0]), '..'))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer try/except NameError over dir() membership check

'__file__' in dir() is less idiomatic than a try/except NameError block and produces a harder-to-read one-liner. The standard Python pattern for handling a potentially-undefined built-in name is:

Suggested change
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__file__ if '__file__' in dir() else sys.argv[0]), '..'))
_script_file = sys.argv[0]
try:
_script_file = __file__
except NameError:
pass
parent_dir = os.path.abspath(os.path.join(os.path.dirname(_script_file), '..'))

This also applies to the identical line in nds-h/nds_h_gen_query_stream.py:37, nds-h/nds_h_power.py:45, nds/nds_maintenance.py:48, and nds/nds_power.py:49.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@wjxiz1992
Copy link
Copy Markdown
Collaborator

@mbid please add sign-off when commit, to ease the CI check.

Databricks runs Git-sourced Python files via IPython where __file__
is not defined, causing NameError. Use sys.argv[0] as fallback.

Signed-off-by: Martin Bidlingmaier <bidlingmaier@nvidia.com>
@mbid mbid force-pushed the fix-databricks-file-compat branch from 116736d to a7ec577 Compare March 3, 2026 13:29
@mbid
Copy link
Copy Markdown
Author

mbid commented Mar 3, 2026

@mbid please add sign-off when commit, to ease the CI check.

I force pushed after git commit --amend --signoff.

@mbid
Copy link
Copy Markdown
Author

mbid commented Mar 15, 2026

Is anything else required from my side to get this merged?

@gerashegalov
Copy link
Copy Markdown
Collaborator

gerashegalov commented Mar 17, 2026

I force pushed after git commit --amend --signoff.

We strive to avoid forced-pushs. The check is for just at least one commit on the PR branch to have been signed. Signing an empty commit git commit -s --allow-empty -m signing would have sufficed.

Is anything else required from my side to get this merged?

I find the comments raised by greptile legitimate and easy to address.

@gerashegalov
Copy link
Copy Markdown
Collaborator

Since there is no activity on this PR we fixed it via #255. Thank you @mbid for contributing a fix idea in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants