Skip to content

delete#2

Open
scottsand-db wants to merge 1 commit into
diff_tests_00from
diff_tests_01
Open

delete#2
scottsand-db wants to merge 1 commit into
diff_tests_00from
diff_tests_01

Conversation

@scottsand-db

Copy link
Copy Markdown
Owner

No description provided.

scottsand-db added a commit that referenced this pull request Nov 1, 2024
…rClient` API (delta-io#3797)

This is a stacked PR. Please view this PR's diff here:
-
delta_kernel_cc_1...delta_kernel_cc_2

#### Which Delta project/connector is this regarding?
- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [X] Kernel
- [ ] Other (fill in here)

## Description

Adds new `TableDescriptor` and `CommitCoordinatorClient` API. Adds a new
`getCommitCoordinatorClient` API to the `Engine` (with a default
implementation that throws an exception).

## How was this patch tested?

N/A trivial.

## Does this PR introduce _any_ user-facing changes?

Yes. See the above.
scottsand-db added a commit that referenced this pull request Dec 3, 2024
scottsand-db added a commit that referenced this pull request Dec 4, 2024
…inatorClient API" (delta-io#3917)

This reverts commit 6ae4b62

We seem to be rethinking our Coordinated Commits CUJ / APIs, and we
don't want these APIs leaked in Delta 3.3.
scottsand-db pushed a commit that referenced this pull request Mar 31, 2025
…ColumnMapping (delta-io#4319)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
Split the main PR delta-io#4265 for faster
review

Add a util func `convertToPhysicalColumnNames` in ColumnMapping to get
the corresponding physical column name for a logical column
## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Add unit test cases in ColumnMappingSuite.scala

## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
scottsand-db pushed a commit that referenced this pull request May 19, 2025
…sage with replace table (delta-io#4520)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [X] Kernel
- [ ] Other (fill in here)

## Description

Update SchemaUtils and ColumnMapping with unit tests in order to support
REPLACE TABLE with column mapping + fieldId re-use in PR #2.
Specifically this involves the following changes (not necessarily
related, but combined in this PR)

1) When a connector provides its own column mapping info in the schema
pre-populated we require that it's complete (i.e. fieldId AND
physicalName must be present)
2) We add an argument to our schema validation checks
`allowNewNonNullableFields`. This is useful in cases where we can be
sure the table state has been completely cleared, and thus new non-null
fields are valid (like REPLACE).
3) We don't allow adding a new column with a fieldId less than the
maxColId. For now, do this proactively for safety. In the future in the
case of something like RESTORE in the future we will likely need a
config to bypass this check.

## How was this patch tested?

Updates unit tests.

Also, all the changes in this PR are used by
delta-io#4520 which adds a lot more E2E
tests with multiple schema scenarios.

## Does this PR introduce _any_ user-facing changes?

No.
scottsand-db pushed a commit that referenced this pull request Jun 13, 2025
…elta-io#4732)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
This PR is refactoring only, move all the IcebergCompatChecks from
`IcebergCompatV2MetadataValidatorAndUpdater.java` to its base class
`IcebergCompatMetadataValidatorAndUpdater.java` so that later newly
added `IcebergCompatV3MetadataValidatorAndUpdater.java` can use them.

## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Existing unit tests since it is only doing refactoring.
## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
scottsand-db pushed a commit that referenced this pull request Jun 16, 2025
…lta-io#4734)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [x] Kernel
- [ ] Other (fill in here)

## Description

<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
This PR is refactoring only, move all the validation checks from
`IcebergWriterCompatV1MetadataValidatorAndUpdater.java` to a new base
class `IcebergWriterCompatMetadataValidatorAndUpdater.java` so that
later newly added
`IcebergWriterCompatV3MetadataValidatorAndUpdater.java` can use them.



## How was this patch tested?

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
Existing unit tests since it is only doing refactoring.



## Does this PR introduce _any_ user-facing changes?

<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
scottsand-db pushed a commit that referenced this pull request Jan 16, 2026
…an show the parameters. (delta-io#5817)

#### Which Delta project/connector is this regarding?
- [X] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
Convert @ParameterizedTest to dynamic tests so that SBT can show the
parameters.
Also upgrade the JUnit related packages so that the display name can
actually show up in SBT output.
This is to address comment:
delta-io#5772 (comment)

Before this PR it shows:
```
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testBasicInsertOperations(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#1 started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testBasicInsertOperations(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#2 started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testMergeWithDeleteAction(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#1 started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#testMergeWithDeleteAction(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType):#2 started
...
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType, boolean, boolean, boolean, boolean):#1 started
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable(io.sparkuctest.UCDeltaTableIntegrationBaseTest$TableType, boolean, boolean, boolean, boolean):#2 started
```

After this PR it shows:
```
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #1:testUpdateOperations(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #1:testUpdateOperations(MANAGED) started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #2:testDeleteOperations(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaTableDMLTest#allTableTypesTestsFactory() #2:testDeleteOperations(MANAGED) started
...
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable():tableType=EXTERNAL, withPartition=true, withCluster=false, withAsSelect=true, replaceTable=true started
[info] Test io.sparkuctest.UCDeltaTableCreationTest#testCreateTable():tableType=EXTERNAL, withPartition=true, withCluster=false, withAsSelect=true, replaceTable=false started
```

## How was this patch tested?

## Does this PR introduce _any_ user-facing changes?
No.

---------

Signed-off-by: Yi Li <yi.li@databricks.com>
scottsand-db pushed a commit that referenced this pull request Jan 24, 2026
#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
Added a new UC E2E streaming test suite to cover Delta structured
streaming read/write behavior across managed/external tables, including
UC commit version validation for managed tables. The changes live in
spark/unitycatalog/src/test/java/io/sparkuctest/
UCDeltaStreamingTest.java to exercise MemoryStream writes, streaming
reads in V2 strict mode, and checkpointed queries.
<!--
⚠️⚠️ **Note to reviewers** ⚠️⚠️ this branch is based on
delta-io#5651; therefore there are a
number of changes that are not directly related to the E2E test
integration for streaming reads (those changes are to enable these
streaming reads). Please just review
[spark/unitycatalog/src/test/java/io/sparkuctest/UCDeltaStreamingTest.java](https://github.com/delta-io/delta/pull/5833/changes#diff-04879565d272ba02d9c1e47707ec9bfdb1044460e957ec769d1914178554383b)
-->
## How was this patch tested?
Locally:
```
export UC_REMOTE=false
$ build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaStreamingTest"

...
[info] Test run started (JUnit Jupiter)
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(MANAGED) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(MANAGED) started
[info] Test run finished: 0 failed, 0 ignored, 4 total, 24.65s
[info] Passed: Total 4, Failed 0, Errors 0, Passed 4
[success] Total time: 44 s, completed Jan 16, 2026, 12:17:45 AM
```

and tested against remote UC server
```
export UC_REMOTE=true
export UC_URI=$UC_URI
export UC_TOKEN=$UC_TOKEN
export UC_CATALOG_NAME=main
export UC_SCHEMA_NAME=demo_zh
export UC_BASE_TABLE_LOCATION=$S3_BASE_LOCATION

$ build/sbt "sparkUnityCatalog/testOnly io.sparkuctest.UCDeltaStreamingTest"

...
[info] Test run started (JUnit Jupiter)
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #1:testStreamingWriteToManagedTable(MANAGED) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaStreamingTest#allTableTypesTestsFactory() #2:testStreamingReadFromTable(MANAGED) started
[info] Test run finished: 0 failed, 0 ignored, 4 total, 250.336s
[info] Passed: Total 4, Failed 0, Errors 0, Passed 4
[success] Total time: 274 s (04:34), completed Jan 16, 2026, 12:10:14 AM
```

## Does this PR introduce _any_ user-facing changes?
No.

---------

Co-authored-by: Tathagata Das <tathagata.das1565@gmail.com>
scottsand-db pushed a commit that referenced this pull request Apr 9, 2026
…6145)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description
Fix integration tests not running due to reporting. Remove common
setting to generate xml since

https://www.scala-sbt.org/1.x/docs/Testing.html
> By default, sbt will generate JUnit XML test reports for all tests in
the build, located in the target/test-reports directory for a project.
This can be disabled by disabling the JUnitXmlReportPlugin
<!--
- Describe what this PR changes.
- Describe why we need the change.
 
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

## How was this patch tested?
Locally
```sh
$ build/sbt -mem 4096 "sparkUnityCatalog/testOnly io.sparkuctest.UCDelta*"
...
[info] Test io.sparkuctest.UCDeltaTableReadTest#allTableTypesTestsFactory() #1:testDeltaTableForPath(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaTableReadTest#allTableTypesTestsFactory() #1:testDeltaTableForPath(MANAGED) started
[info] Test io.sparkuctest.UCDeltaTableReadTest#allTableTypesTestsFactory() #2:testChangeDataFeed(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaTableReadTest#allTableTypesTestsFactory() #2:testChangeDataFeed(MANAGED) started
[info] Test io.sparkuctest.UCDeltaTableReadTest#allTableTypesTestsFactory() #3:testTimeTravelRead(EXTERNAL) started
[info] Test io.sparkuctest.UCDeltaTableReadTest#allTableTypesTestsFactory() #3:testTimeTravelRead(MANAGED) started
[info] Test run finished: 0 failed, 0 ignored, 6 total, 9.253s
[info] Passed: Total 63, Failed 0, Errors 0, Passed 63
[success] Total time: 157 s (02:37), completed Feb 26, 2026, 6:54:12 PM
```
<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

## Does this PR introduce _any_ user-facing changes?
No
<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant