Skip to content

Apache main rebase#28

Merged
leborchuk merged 341 commits intomainfrom
apache-main
Mar 10, 2026
Merged

Apache main rebase#28
leborchuk merged 341 commits intomainfrom
apache-main

Conversation

@leborchuk
Copy link

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


higuoxing and others added 30 commits February 27, 2026 17:30
This PR fixes incorrect JOIN condition in test_blackmap.sql. The
relfilenodes are not always different across segments. Hence, we should
add an additional JOIN condition to the test case or it will produce
unstable results.
Add calculate_table_size() to calculate any table's size, including uncommitted table.
1. Monitor active table in master.
2. Replace relation_open() with SearchSysCache(), pg_appendonly and pg_index to fetch Form_pg_class of table and index, to avoid DEADLOCK. 
3. Use DiskQuotaRelationCacheEntry to calculate table size, instead of pg_table_size(), to avoid DEADLOCK.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
Co-authored-by: Xuebin Su (苏学斌) <sxuebin@vmware.com>
Co-authored-by: Xing Guo <higuoxing+github@gmail.com>
This patch adds support for adding uncommitted relations to blackmap on
segment servers. Most of the codes share similar logic with adding
committed relations to blackmap. Test will be added in the next
following commits.
The extension file with (segno=0, column=1) is not traversed by
ao_foreach_extent_file(), we need to handle the size of it additionally.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
Co-authored-by: Xing Guo <higuoxing+github@gmail.com>
This patch adds support for dispatching blackmap to segments. This patch also
introduces two UDFs:

diskquota.enable_hardlimit()
diskquota.disable_hardlimit()

User is able to enable and disable the hardlimit feature by using those UDFs.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
This PR is trying to fix CI building failure by ignoring distribution
notice in CTAS statements.

Co-authored-by: Hao Zhang <hzhang2@vmware.com>
…full (apache#112)

* Fix bug
The relation_cache_entry of temporary table, created during vacuum full, is not be removed after vacuum full. This table will be treated as an uncommitted table, although it has been dropped after vacuum full. And its table size still remains in diskquota.table_size, which causes quota size to be larger than real status.
Use RelidByRelfilenode() to check whether the table is committed, and remove its relation_cache_entry.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
Co-authored-by: Xing Guo <higuoxing+github@gmail.com>
Co-authored-by: Xuebin Su (苏学斌) <sxuebin@vmware.com>
The SQL statement is not equal to the expected output in test_vacuum.sql.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
…e index on ao_table` is committed (apache#113)

We can not calculate the size of pg_aoblkdir_xxxx before `create index on ao_table` is committed.
1. Lack of the ability to parse the name of pg_aoblkdir_xxxx.
2. pg_aoblkdir_xxxx is created by `create index on ao_table`, which can not be searched by diskquota_get_appendonly_aux_oid_list() before index's creation.
Solution:
1. parse the name begin with `pg_aoblkdir`.
2. When blkdirrelid is missing, we try to fetch it by traversing relation_cache.

Co-authored-by: hzhang2 <hzhang2@vmware.com>
Co-authored-by: Xing Guo <higuoxing+github@gmail.com>
Co-authored-by: Xuebin Su (苏学斌) <sxuebin@vmware.com>
Co-authored-by: Xuebin Su (苏学斌) <12034000+xuebinsu@users.noreply.github.com>
Co-authored-by: Xing Guo <higuoxing@gmail.com>
Co-authored-by: Hao Zhang <hzhang2@vmware.com>
Consider a user session that does a DELETE followed by a VACUUM FULL to
reclaim the disk space. If, at the same time, the bgworker loads config
by doing a SELECT, and the SELECT begins before the DELETE ends, while ends
after the VACUUM FULL begins:

bgw: ---------[ SELECT ]----------->
usr: ---[ DELETE ]-[ VACUUM FULL ]-->

then the tuples deleted will be marked as RECENTLY_DEAD instead of DEAD.
As a result, the deleted tuples cannot be removed by VACUUM FULL.

The fix lets the user session wait for the bgworker to finish the current
SELECT before starting VACUUM FULL .
…ache#116)

When doing VACUUM FULL, The table size may not be updated if
the table's oid is pulled before its relfilenode is swapped.

This fix keeps the table's oid in the shared memory if the table
is being altered, i.e., is locked in ACCESS EXCLUSIVE mode.

Co-authored-by: Xuebin Su <sxuebin@vmware.com>
Currently, diskquota.pause() only takes effect on quota checking.
Bgworkers still go over the loop to refreshing quota even if diskquota
is paused. This wastes computation resources and can cause flaky issues.

This fix makes bgworkers skip refreshing quota when the user pauses
diskquota entirely to avoid those issues. Table sizes can be updated
correctly after resume.
ci: create rhel8 release build.

Signed-off-by: Sasasu <i@sasa.su>
Co-authored-by: Xuebin Su <sxuebin@vmware.com>
Currently, deadlock can occur when

1. A user session is doing DROP EXTENSION, and
2. A bgworker is loading quota configs using SPI.

This patch fixes the issue by pausing diskquota before DROP
EXTENSION so that the bgworker will not load config anymore.

Note that this cannot be done using object_access_hook() because
the extension object is dropped AFTER dropping all tables that belong
to the extension.
Test case test_primary_failure will stop/start segment to produce a
mirror switch. But the segment start could fail while replaying xlog.
The failure was caused by the deleted tablespace directories in previous
test cases.

This commit removes the "rm" statement in those tablespace test cases and
add "-p" to the "mkdir" command line. The corresponding sub-directories
will be deleted by "DROP TABLESPACE" if the case passes.

Relevant logs:
2022-02-08 10:09:30.458183 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","entering standby mode",,,,,,,0,,"xlog.c",6537,
2022-02-08 10:09:30.458670 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","redo starts at E/24638A28",,,,,,,0,,"xlog.c",7153,
2022-02-08 10:09:30.468323 CST,"cc","postgres",p1182588,th1235613568,"[local]",,2022-02-08 10:09:30 CST,0,,,seg1,,,,,"FATAL","57P03","the database system is starting up"
,"last replayed record at E/2481EA70",,,,,,0,,"postmaster.c",2552,
2022-02-08 10:09:30.484792 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"FATAL","58P01","directory ""/tmp/test_spc"" does not exist",,"Create this directory for the table
space before restarting the server.",,,"xlog redo create tablespace: 2590660 ""/tmp/test_spc""",,0,,"tablespace.c",749,
Otherwise, compiler reports a warning:
"comparison of constant ‘20’ with boolean expression is always false"
Each time the state of Diskquota is changed, we need to wait for the
change to take effect using diskquota.wait_for_worker_new_epoch().
However, when the bgworker is not alive, such wait can last forever.

This patch fixes the issue by adding a timeout GUC so that wait() will
throw an NOTICE if it times out, making it more user-friendly.

To fix a race condition when CREATE EXTENSION, the user needs to
SELECT wait_for_worker_new_epoch() manually before writing data.
This is to wait until the current database is added to the monitored
db cache so that active tables in the current database can be
recorded.

This patch also fix test script for activating standby and rename
some of the cases to make them more clear.
zhrt123 and others added 28 commits February 27, 2026 17:48
Co-authored-by: Chen Mulong <chenmulong@gmail.com>
Co-authored-by: Xing Guo <admin@higuoxing.com>
Merge all types of quota info maps into one in each bgworker.
The package name is incorrect when releasing package in Rocky9/RHEL9
platforms (<package>-<version>-unknown_x86_64.tar.gz). This patch fix
it to <package>-<version>-rhel9_x86_64.tar.gz.

Cherry-pick https://github.com/pivotal/timestamp9/pull/41
- DDL message is only used on master, so it is unnecessary to allocate the
memory on segments.
- The launcher shmem should not be initialized on segments.
refresh_rejectmap looks for a tuple using SearchSysCacheCopy1 which retrieves
a copy of the tuple allocating memory for it. However, refresh_rejectmap didn't
free these tuple copies after use. If lots of oids were passed, diskquota could
work incorrectly because of huge memory leak. This patch frees these tuples and
prevents memory leaks.
fix: update actions

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Diskquota calculates sizes and stores information in the diskquota.table_size
table periodically with a pause in diskquota.naptime, 2 seconds by default.
If we restart the cluster during this pause, then diskquota will lose all
changes that have occurred since the last save to the diskquota.table_size
table. We could create temporary tables, wait when it will be flushed to
diskquota.table_size table, restart the cluster, and diskquota would remember
the information about the temporary tables. Or we could delete the tables,
restart the cluster, and again diskquota will remember information about the
deleted tables. This happens because at the start of the cluster, diskquota
remembers all the information written to the diskquota.table_size table,
but does not check that some tables may have already been deleted.

As a solution, we invalidate diskquota.table_size during diskquota
worker start in addition to pg_class validation.

The remaining problem: the incorrect table size cannot be refreshed until 
the corresponding table becomes an active table.

Solution: call diskquota.init_table_size_table().
Cleanup the invalid files after importing diskquota extension from upstream.

The specific changes include:
1. Remove .github directory: The upstream GitHub Actions workflows are
   no longer applicable in the Cloudberry repository.
2. Remove .gitmessage, .editorconfig, .clang-format: Code style and
   commit templates should follow Cloudberry's main repository standards.
3. Remove SECURITY.md: Legacy security policy.
4. Remove concourse directory: Legacy CI scripts.

This cleanup makes the extension structure cleaner and more consistent
with other contrib modules.
Integrate diskquota extension into Apache Cloudberry build system and
adapt the codebase for Cloudberry 2.0+ (PostgreSQL 14 based).

Main changes:

Build system integration:
* Add new Makefile for building with Cloudberry source tree
    ```
    make
    make install
    make installcheck
    make clean
    ```
* Update gpcontrib/Makefile to include diskquota in build and installcheck
* Simplify CMakeLists.txt by removing GP6 version conditionals
* Add PG_SRC_DIR availability check for isolation2 tests

Code modernization (remove GP6 compatibility code):
* Remove GP_VERSION_NUM < 70000 conditionals throughout codebase
* Replace deprecated APIs:
    heap_open -> table_open,
    heap_beginscan_catalog -> table_beginscan_catalog,
    heap_endscan -> table_endscan, etc.
* Replace init_ps_display() with set_ps_display() for process status
* Replace StrNCpy() with snprintf() for safer string handling
* Remove WaitForBackgroundWorkerShutdown() polyfill (now in core)
* Remove MemoryAccounting_Reset() calls (removed in GP7+)
* Update tuple descriptor attribute access from pointer to direct access

Documentation:
* Rewrite README.md for Apache Cloudberry with updated build instructions

Other improvements:
* Update extension comment to be more descriptive
* Ensure postgres.h is included first in all source files

CI:
* add ic-diskquota to `build-cloudberry.yml` workflow
* For `build-deb-cloudberry.yml`, the installation and configure prefix
  are not consisent, which results in the test error. Will add
  ic-diskquota test back once updating the deb workflow.

See: https://lists.apache.org/thread/1zd80r1hvpwwh5fjd5yqgcc7sr4f27qr
Add diskquota extension license information to comply with Apache
release requirements.

Changes:
- Add diskquota entry to top-level LICENSE file under Greenplum section
- Create licenses/LICENSE-diskquota.txt with PostgreSQL License text
- Add gpcontrib/diskquota/** to pom.xml excludes for apache-rat checks

The diskquota extension is licensed under the PostgreSQL License,
originally developed by Pivotal Software and VMware.
This is a code defect of a original GPDB on resource group enabled.
There is a bug to calculate length of pg_wchar in `gpvars_check_gp_resource_group_cgroup_parent` function.
For example, the value "greenplum database" was originally supposed to be judged as an illegal name, and report error:
"gp_resource_group_cgroup_parent can only contains alphabet, number and non-leading . _ -".
But it was wrongly judged as legal.
Use absolute artifact paths in the GPG verification step of
devops/release/cloudberry-release.sh.

Previously, the script verified SHA-512 using an absolute path but
called `gpg --verify` with relative file names. When running with
`--repo` from a different working directory, this could fail with
"No such file or directory" even though the `.asc` file existed in
the artifacts directory.

This change aligns the GPG verify command with the SHA-512 check by
verifying:
  $ARTIFACTS_DIR/${TAR_NAME}.asc
against:
  $ARTIFACTS_DIR/$TAR_NAME

No behavior change for successful local runs besides making path
resolution robust.
Add GUC_GPDB_NEED_SYNC flag to pax.enable_sparse_filter and
pax.enable_row_filter so their values are dispatched from QD
to QE segments. Without this flag, SET on the coordinator has
no effect because scans run on QE segments.
…scoding

COPY FROM with SEGMENT REJECT LIMIT had two bugs when encountering
invalid multi-byte encoding sequences:

1. Encoding errors were double-counted: HandleCopyError() incremented
   rejectcount, then RemoveInvalidDataInBuf() incremented it again for
   the same error. This caused the reject limit to be reached twice as
   fast as expected.

2. SREH (Single Row Error Handling) was completely disabled when
   transcoding was required (file encoding != database encoding). Any
   encoding error during transcoding would raise an ERROR instead of
   skipping the bad row.

Fix by removing the duplicate rejectcount++ from RemoveInvalidDataInBuf(),
removing the !need_transcoding guard that blocked SREH for transcoding,
and adding proper buffer cleanup for the transcoding case (advance
raw_buf past the bad line using FindEolInUnverifyRawBuf).

Add regression tests covering both non-transcoding (invalid UTF-8) and
transcoding (invalid EUC_CN to UTF-8) cases with various reject limits.

Fixes apache#1425
src/test/regress/sql/misc.sql is generated by
src/test/regress/input/misc.souce, it should not be add in sql directory.
macOS BSD sed requires an explicit empty string argument after
-i (sed -i '' 'script' file), unlike GNU sed which takes -i
without a suffix argument. Without this fix, BSD sed misinterprets
the sed script as a backup suffix and treats the filename as the
script, causing "unterminated substitute pattern" error.
Previously, GetViewBaseRelids() rejected any query with more than one
base table, so materialized views defined with JOINs were never
registered in gp_matview_aux/gp_matview_tables. This meant no status
tracking and no staleness propagation for join matviews.

Add a recursive helper extract_base_relids_from_jointree() that walks
RangeTblRef, JoinExpr, and FromExpr nodes to collect all base relation
OIDs. This is the only C function changed -- the existing downstream
infrastructure (InsertMatviewTablesEntries, SetRelativeMatviewAuxStatus,
MaintainMaterializedViewStatus, reference counting) already supports
N base tables per matview.

This is a first step toward AQUMV support for join queries. Users can
also inspect a join matview's freshness status manually via
gp_matview_aux.

Key behaviors:
- Self-joins (t1 JOIN t1) are deduplicated to one catalog entry
- All join types supported: INNER, LEFT, RIGHT, FULL, implicit cross
- Subquery/function RTEs in FROM are still rejected
- Partitioned tables in joins propagate DML status correctly
- Status escalation across multiple base tables works (i→e on delete)
- Transaction rollback correctly reverts status changes

Includes regression tests for: two/three-table joins, implicit joins,
self-joins, all outer join types, mixed join types, join with GROUP BY,
shared base tables across multiple MVs, multi-DML transactions,
transaction rollback, cross joins, partitioned tables in joins,
VACUUM FULL, TRUNCATE, WITH NO DATA, and DROP CASCADE.
ADD_DEFINITIONS(-DRUN_GTEST) and ADD_DEFINITIONS(-DRUN_GBENCH)
are directory-scoped CMake commands that apply to ALL targets,
including the production pax shared library. This caused test-
only macros to be defined in production builds.

In pax_porc_adpater.cc, the leaked RUN_GTEST activates:

    expect_hdr = rel_tuple_desc_->attrs[index].attlen == -1 &&
                 rel_tuple_desc_->attrs[index].attbyval == false;

    #ifdef RUN_GTEST
    expect_hdr = false;
    #endif

This forces expect_hdr to false in production, skipping the
stripping of PostgreSQL varlena headers from dictionary
entries. As a result, dictionary-encoded string columns
return garbled data (varlena header bytes are included as
part of the string content).

Replace ADD_DEFINITIONS with target_compile_definitions
scoped to test_main and bench_main targets only, so
RUN_GTEST and RUN_GBENCH are no longer defined when
building pax.so.
Oid is unsigned int. Therefore, when the Oid reaches 2^31, printing it with %d will display a negative value.
This is a defect of the original GPDB. GPDB has fixed similar defects on commit 7279a1e('Fix getResUsage integer overflow'), but there are still omissions.
…gn partitions (apache#1524)

The storage type detection logic failed to properly identify mixed storage when
foreign and non-foreign partitions coexisted, leading to incorrect metadata that
could cause issues with scan type selection and query planning.
When ALTER TABLE ... SET WITH (reorganize=true) runs concurrently with
COPY TO, COPY may return 0 rows instead of all rows.  The root cause is
a snapshot/lock ordering problem: PortalRunUtility() pushes the active
snapshot before calling DoCopy(), so the snapshot predates any
concurrent reorganize that had not yet committed.  After COPY TO blocks
on AccessExclusiveLock and the reorganize commits, the stale snapshot
cannot see the new physical files (xmin = reorganize_xid is invisible)
while the old physical files have already been removed, yielding 0 rows.

Three code paths are fixed:

1. Relation-based COPY TO (copy.c, DoCopy):
   After table_openrv() acquires AccessShareLock — which blocks until
   any concurrent reorganize commits — pop and re-push the active
   snapshot so it reflects all committed data at lock-grant time.

2. Query-based COPY TO, RLS COPY TO, and CTAS (copyto.c, BeginCopy):
   After pg_analyze_and_rewrite() -> AcquireRewriteLocks() acquires
   all direct relation locks, refresh the snapshot.  This covers
   COPY (SELECT ...) TO, COPY on RLS-protected tables (internally
   rewritten to a query), and CREATE TABLE AS SELECT.

3. Partitioned table COPY TO (copy.c, DoCopy):
   Before entering BeginCopy, call find_all_inheritors() to eagerly
   acquire AccessShareLock on all child partitions.  Child partition
   locks are normally acquired later in ExecutorStart -> ExecInitAppend,
   after PushCopiedSnapshot has already embedded a stale snapshot.
   Locking all children upfront ensures the snapshot refresh in fixes
   1 and 2 covers all concurrent child-partition reorganize commits.

In REPEATABLE READ or SERIALIZABLE isolation, GetTransactionSnapshot()
returns the same transaction-level snapshot, so the Pop/Push is a
harmless no-op.

Tests added:
- src/test/isolation2/sql/copy_to_concurrent_reorganize.sql
  Tests 2.1-2.5 for relation-based, query-based, partitioned, RLS,
  and CTAS paths across heap, AO row, and AO column storage.
- contrib/pax_storage/src/test/isolation2/sql/pax/
  copy_to_concurrent_reorganize.sql
  Same coverage for PAX columnar storage.

See: Issue#1545 <apache#1545>
For RC tags like X.Y.Z-incubating-rcN, generate the source tarball
filename and top-level directory using BASE_VERSION (without -rcN).

This keeps the voted bits ready for promotion without rebuilding and
avoids -rcN showing up in the extracted source directory.
@leborchuk leborchuk merged commit 283cc00 into main Mar 10, 2026
40 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.