Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

@apoorvedave1
Copy link
Contributor

What is the context for this pull request?

As discussed previously, this PR enables stats on spark's query plans

What changes were proposed in this pull request?

Enable stats in native spark's gold standard test cases. The stats are picked directly from apache spark 3.0+ codebase from this file https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/TPCDSTableStats.scala

This ensures that for joining large tables, Broadcast join is replaced by SortMergeJoin.

Does this PR introduce any user-facing change?

No

How was this patch tested?

unit test

== Physical Plan ==
CollectLimit 100
+- *(5) Project [cs_bill_customer_sk#1, ss_customer_sk#2]
+- *(5) SortMergeJoin [cs_sold_date_sk#3], [ss_sold_date_sk#4], Inner
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: SortMergeJoin is used here because stats are enabled.

""".stripMargin)
}

private val originalCBCEnabled = conf.cboEnabled
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all changes below this line onward are directly picked from spark codebase. Refer https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala#L609

Copy link
Collaborator

@sezruby sezruby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @apoorvedave1!

))
)
// scalastyle:on line.size.limit
} No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line

Copy link
Contributor

@andrei-ionescu andrei-ionescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants