Add analyze_tables by GaryShen2008 · Pull Request #257 · NVIDIA/spark-rapids-benchmarks

GaryShen2008 · 2026-04-30T09:30:52Z

Run COMPUTE STATISTICS before query running
It only runs on raw dataset, not including Iceberg and Delta Lake.

Run COMPUTE STATISTICS before query running It only runs on raw dataset, not including Iceberg and Delta Lake. Signed-off-by: Gary Shen <gashen@nvidia.com>

greptile-apps · 2026-04-30T09:32:50Z

Greptile Summary

This PR adds an --analyze_tables flag to both nds_power.py and nds_h_power.py. When set, instead of creating TempViews the scripts register permanent catalog tables via CREATE TABLE IF NOT EXISTS … USING … LOCATION and then run ANALYZE TABLE … COMPUTE STATISTICS on each one before executing benchmark queries; the feature is intentionally skipped for Iceberg and Delta Lake formats.

Confidence Score: 5/5

Safe to merge — all findings are P2 style issues with no functional impact.

Only P2 comments (inconsistent identifier quoting and a stale docstring). The core logic — catalog table creation, ANALYZE execution, and CLI wiring — is correct. No P0 or P1 issues found.

No files require special attention.

Important Files Changed

Filename	Overview
nds/nds_power.py	Adds `analyze_tables` flag that creates real catalog tables and runs ANALYZE TABLE COMPUTE STATISTICS before benchmark queries; minor inconsistency in backtick quoting for ANALYZE SQL and a stale docstring.
nds-h/nds_h_power.py	Mirrors nds_power.py changes: adds `analyze_tables` support with the same helper functions and the same minor issues (raw backtick formatting in ANALYZE SQL, stale docstring).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run_query_stream called] --> B{input_format\niceberg or delta?}
    B -- Yes, iceberg --> C[register_iceberg_tables]
    B -- Yes, delta --> D[register_delta_tables]
    B -- No --> E{analyze_tables flag?}
    E -- False --> F[reader.load.createOrReplaceTempView\nfor each table]
    E -- True --> G[_create_data_source_table\nCREATE TABLE IF NOT EXISTS USING LOCATION]
    G --> H[ANALYZE TABLE COMPUTE STATISTICS]
    F --> I[Run benchmark queries]
    H --> I
    C --> I
    D --> I

_{Reviews (2): Last reviewed commit: "Fix typo" | Re-trigger Greptile}

greptile-apps · 2026-04-30T09:32:54Z

                        choices=['parquet', 'orc', 'avro', 'csv', 'json', 'iceberg', 'delta'],
                        default='parquet')
+    parser.add_argument('--analyze_tables',
+                        action='store_true',


Redundant default=False with store_true

argparse's store_true action already sets default=False implicitly; the explicit default=False is redundant. Same pattern appears in nds/nds_power.py. No functional impact, but it adds noise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copilot

Pull request overview

Adds an optional “analyze tables” step to the NDS and NDS-H power-run drivers so users can precompute Spark table statistics before executing benchmark queries (requested in issue #256).

Changes:

Add --analyze_tables CLI flag to enable ANALYZE TABLE … COMPUTE STATISTICS after creating each TempView.
Plumb the analyze_tables option through run_query_stream() → setup_tables() in both NDS and NDS-H power-run scripts.
Record per-table ANALYZE timing in execution_time_list alongside TempView creation timings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`nds/nds_power.py`	Adds `--analyze_tables` flag and runs `ANALYZE TABLE … COMPUTE STATISTICS` after each TempView creation when enabled.
`nds-h/nds_h_power.py`	Mirrors the same `--analyze_tables` behavior for the NDS-H power run script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Gary Shen <gashen@nvidia.com>

Remove default False since it's false by default Correct some typo Signed-off-by: Gary Shen <gashen@nvidia.com>

Signed-off-by: Gary Shen <gashen@nvidia.com>

wjxiz1992

LGTM

Add analyze_tables

30ea32f

Run COMPUTE STATISTICS before query running It only runs on raw dataset, not including Iceberg and Delta Lake. Signed-off-by: Gary Shen <gashen@nvidia.com>

Copilot AI review requested due to automatic review settings April 30, 2026 09:30

Copilot started reviewing on behalf of GaryShen2008 April 30, 2026 09:31 View session

GaryShen2008 marked this pull request as draft April 30, 2026 09:31

greptile-apps Bot reviewed Apr 30, 2026

View reviewed changes

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread nds/nds_power.py Outdated

Comment thread nds-h/nds_h_power.py Outdated

GaryShen2008 added 3 commits April 30, 2026 17:42

Create table when analyzing tables

c25bf95

Signed-off-by: Gary Shen <gashen@nvidia.com>

Fix comments

5aaa0bc

Remove default False since it's false by default Correct some typo Signed-off-by: Gary Shen <gashen@nvidia.com>

Fix typo

5c6522c

Signed-off-by: Gary Shen <gashen@nvidia.com>

GaryShen2008 marked this pull request as ready for review April 30, 2026 09:56

GaryShen2008 requested a review from wjxiz1992 April 30, 2026 09:56

wjxiz1992 approved these changes Apr 30, 2026

View reviewed changes

wjxiz1992 merged commit e257ead into NVIDIA:dev Apr 30, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add analyze_tables#257

Add analyze_tables#257
wjxiz1992 merged 4 commits intoNVIDIA:devfrom
GaryShen2008:add-analyze-tables

GaryShen2008 commented Apr 30, 2026

Uh oh!

greptile-apps Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

wjxiz1992 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

GaryShen2008 commented Apr 30, 2026

Uh oh!

greptile-apps Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

wjxiz1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

greptile-apps Bot commented Apr 30, 2026 •

edited

Loading