Skip to content

feat(catalog): enrich catalog interface with more methods#102

Open
ChaomingZhangCN wants to merge 16 commits intoalibaba:mainfrom
ChaomingZhangCN:catalog-api
Open

feat(catalog): enrich catalog interface with more methods#102
ChaomingZhangCN wants to merge 16 commits intoalibaba:mainfrom
ChaomingZhangCN:catalog-api

Conversation

@ChaomingZhangCN
Copy link
Contributor

@ChaomingZhangCN ChaomingZhangCN commented Feb 2, 2026

Purpose

Close #84

API and Format

virtual Status DropDatabase(const std::string& name, bool ignore_if_not_exists, bool cascade);
virtual Status AlterDatabase(const std::string& name, const std::map<std::string, std::string>& changes, bool ignore_if_not_exists);
virtual Result<std::shared_ptr<Database>> GetDatabase(const std::string& name) const;
virtual Result<std::shared_ptr<Table>> GetTable(const Identifier& identifier) const;
virtual Result<std::shared_ptr<Table>> GetTableById(const std::string& table_id) const;
virtual Status DropTable(const Identifier& identifier, bool ignore_if_not_exists);
virtual Status RenameTable(const Identifier& from_table, const Identifier& to_table, bool ignore_if_not_exists);
virtual Status MarkDonePartitions(const Identifier& identifier, const std::vector<std::map<std::string, std::string>>& partitions);
virtual Result<std::vector<std::map<std::string, std::string>>> ListPartitions(const Identifier& identifier) const;
virtual Status CreatePartitions(const Identifier& identifier, const std::vector<std::map<std::string, std::string>>& partitions);
virtual Status DropPartitions(const Identifier& identifier, const std::vector<std::map<std::string, std::string>>& partitions);
virtual Status AlterPartitions(const Identifier& identifier, const std::vector<PartitionStatistics>& partitions);
virtual Result<std::shared_ptr<View>> GetView(const Identifier& identifier) const;
virtual Status DropView(const Identifier& identifier, bool ignore_if_not_exists);
virtual Status CreateView(const Identifier& identifier, const View& view, bool ignore_if_exists);
virtual Result<std::vector<std::string>> ListViews(const std::string& database_name) const;
virtual Status RenameView(const Identifier& from_view, const Identifier& to_view, bool ignore_if_not_exists);
virtual Status RepairCatalog();
virtual Status RepairDatabase(const std::string& database_name);
virtual Status RepairTable(const Identifier& identifier);
virtual Status RegisterTable(const Identifier& identifier, const std::string& path);
virtual bool SupportsListObjectsPaged() const;
virtual bool SupportsListByPattern() const;
virtual bool SupportsListTableByType() const;
virtual bool SupportsVersionManagement() const;
virtual Result<bool> CommitSnapshot(const Identifier& identifier, const std::string& table_uuid, const std::shared_ptr<Snapshot>& snapshot, const std::vector<PartitionStatistics>& statistics);
virtual Result<std::shared_ptr<Snapshot>> LoadSnapshot(const Identifier& identifier, const std::string& version) const;
virtual Status RollbackTo(const Identifier& identifier, const std::chrono::system_clock::time_point& instant);
virtual Status RollbackTo(const Identifier& identifier, const std::chrono::system_clock::time_point& instant, const std::optional<int64_t>& from_snapshot);
virtual Status CreateBranch(const Identifier& identifier, const std::string& branch, const std::optional<std::string>& from_tag = std::nullopt);
virtual Status DropBranch(const Identifier& identifier, const std::string& branch);
virtual Status FastForward(const Identifier& identifier, const std::string& branch);
virtual Result<std::vector<std::string>> ListBranches(const Identifier& identifier) const;
virtual Result<TagInfo> GetTag(const Identifier& identifier, const std::string& tag_name) const;
virtual Status CreateTag(const Identifier& identifier, const std::string& tag_name, const std::optional<int64_t>& snapshot_id, const std::optional<std::string>& time_retained, bool ignore_if_exists);
virtual Status DeleteTag(const Identifier& identifier, const std::string& tag_name);

Documentation

@ChaomingZhangCN ChaomingZhangCN changed the title [feature] Enrich catalog interface with more methods feat: Enrich catalog interface with more methods Feb 2, 2026
@ChaomingZhangCN ChaomingZhangCN changed the title feat: Enrich catalog interface with more methods feat(catalog): Enrich catalog interface with more methods Feb 2, 2026
@ChaomingZhangCN ChaomingZhangCN changed the title feat(catalog): Enrich catalog interface with more methods feat(catalog): enrich catalog interface with more methods Feb 2, 2026
@ChaomingZhangCN
Copy link
Contributor Author

@lxy-9602 @lszskye PTAL~

/// status.
virtual Result<std::vector<std::string>> ListTables(const std::string& db_name) const = 0;

/// Drops a database.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR includes too many unimplemented APIs at once, which may make the API documentation confusing. Would it be better to submit only a few APIs with their implementations in each PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion. I will implement some of them in the filesystem catalog.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @lucasfang meant that the catalog's public interface should only expose methods that are already implemented — for example, DropDatabase, DropTable, RenameTable, and GetTable in this PR.

The unimplemented interfaces can be left in TODOs and added later when they're fully implemented. This helps avoid confusion in auto-generated documentation, which might otherwise include incomplete APIs.

Just wondering — for methods like DropDatabase in catalog, since they're meant to be implemented by subclasses, would it make sense to make them pure virtual (= 0) instead of providing a default "NotImplemented" implementation?

@alibaba alibaba deleted a comment from lszskye Feb 6, 2026
const std::string& table_name)
: schema_(schema), database_(database), table_name_(table_name) {}

virtual ~Table() = default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Table have subclasses? It seems like many functions are declared as virtual.

@lucasfang lucasfang requested a review from Copilot March 2, 2026 01:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the C++ Catalog public interface to better align with the Java catalog API (issue #84), and begins implementing a subset of the newly added methods for the filesystem-backed catalog.

Changes:

  • Extends include/paimon/catalog/catalog.h with many additional catalog APIs (databases/tables/views/partitions/version-management).
  • Implements and tests GetTable, DropDatabase, DropTable, and RenameTable in FileSystemCatalog.
  • Introduces new Database (public header) and Table / View abstractions (currently under src/).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
include/paimon/catalog/catalog.h Adds many new catalog methods and supporting type aliases/forward declarations.
include/paimon/catalog/database.h Adds a new Database interface type for catalog APIs.
src/paimon/core/catalog/file_system_catalog.h Extends filesystem catalog class with new overrides (DropDatabase, DropTable, RenameTable, GetTable).
src/paimon/core/catalog/file_system_catalog.cpp Implements GetTable, DropDatabase, DropTable, RenameTable.
src/paimon/core/catalog/file_system_catalog_test.cpp Adds unit tests covering the newly implemented filesystem catalog methods.
src/paimon/core/table/table.h Adds a basic Table abstraction used by GetTable.
src/paimon/core/view/view.h Adds a View interface abstraction (not yet implemented by filesystem catalog).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to +45
class Database;
class Table;
class View;
class Schema;
class Snapshot;
class PartitionStatistics;
class Tag;
class Identifier;
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This public header forward-declares Table, View, Snapshot, Tag, etc., and returns/accepts them via std::shared_ptr<> in the public API. Since only include/ is installed (not src/), API consumers won’t have access to these type definitions, and destroying a std::shared_ptr<IncompleteType> (created in the library with the default deleter) typically requires the complete type in the consumer TU. Please either move these interface types into public headers under include/, or adjust the API to avoid owning shared_ptr to incomplete types (e.g., return metadata/IDs, or use an exported custom deleter pattern).

Copilot uses AI. Check for mistakes.
#pragma once

#include <map>
#include <memory>
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header uses std::optional in Comment() but does not include <optional>, which can cause compilation failures depending on include order. Add #include <optional> to this header.

Suggested change
#include <memory>
#include <memory>
#include <optional>

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +40
class PAIMON_EXPORT Database {
public:
/// ================== Table Metadata =====================

/// A name to identify this database.
virtual std::string Name() const = 0;

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database declares virtual methods but does not declare a virtual destructor. Deleting a derived instance through a Database* (or via smart pointers) would be undefined behavior. Add virtual ~Database() = default; (or a protected non-virtual destructor if you want to prevent deletion via base pointer).

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +46
/// Get the table-level options associated with this schema.
/// @return Options
virtual const std::map<std::string, std::string>& Options() const = 0;

/// Get an optional comment describing the table.
/// @return The table comment if set, or std::nullopt otherwise.
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstrings here refer to “table-level options” and “table comment”, but this interface is for a database. Please update the wording (database-level options/comment) to match the actual type and avoid confusing generated API docs.

Suggested change
/// Get the table-level options associated with this schema.
/// @return Options
virtual const std::map<std::string, std::string>& Options() const = 0;
/// Get an optional comment describing the table.
/// @return The table comment if set, or std::nullopt otherwise.
/// Get the database-level options associated with this database.
/// @return Options
virtual const std::map<std::string, std::string>& Options() const = 0;
/// Get an optional comment describing the database.
/// @return The database comment if set, or std::nullopt otherwise.

Copilot uses AI. Check for mistakes.

#include <chrono>
#include <map>
#include <memory>
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catalog now uses std::optional in multiple method signatures (e.g., RollbackTo/CreateBranch/CreateTag) but the header does not include <optional>, which will cause compilation failures for translation units that include this header without already including <optional>. Add #include <optional> here.

Suggested change
#include <memory>
#include <memory>
#include <optional>

Copilot uses AI. Check for mistakes.
/// @return A result containing true if commit succeeded, or an error status.
virtual Result<bool> CommitSnapshot(const Identifier& identifier, const std::string& table_uuid,
const std::shared_ptr<Snapshot>& snapshot,
const std::vector<PartitionStatistics>& statistics) {
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartitionStatistics is only forward-declared here and appears inside std::vector<PartitionStatistics> in the public API. Standard containers generally require the element type to be complete, so this will either fail to compile or force users to include internal src/ headers. Consider moving PartitionStatistics into an installed public header (under include/), or change the API to use a public/opaque representation (e.g., a serializable struct, shared_ptr to a public interface, etc.).

Suggested change
const std::vector<PartitionStatistics>& statistics) {
const std::vector<std::shared_ptr<PartitionStatistics>>& statistics) {

Copilot uses AI. Check for mistakes.
Comment on lines +453 to +460
/// Get tag for table.
///
/// @param identifier Path of the table, cannot be system name.
/// @param tag_name Tag name
/// @return A result containing the tag information, or an error status.
virtual Result<std::shared_ptr<Tag>> GetTag(const Identifier& identifier,
const std::string& tag_name) const {
return Status::NotImplemented("GetTag not implemented");
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description lists GetTag returning TagInfo, but the public API here declares GetTag returning std::shared_ptr<Tag>. Please align the PR description and the public interface (either update the description or adjust the API) to avoid confusion and documentation drift.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +52
virtual std::string Query() = 0;

/// Loads the schema of view.
virtual std::shared_ptr<Schema> GetSchema() = 0;
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query() and GetSchema() are getter-style APIs but are not marked const, which prevents calling them on a const View& and is inconsistent with Name()/FullName() being const. Consider making these methods const (e.g., virtual std::string Query() const = 0; and virtual std::shared_ptr<Schema> GetSchema() const = 0;).

Suggested change
virtual std::string Query() = 0;
/// Loads the schema of view.
virtual std::shared_ptr<Schema> GetSchema() = 0;
virtual std::string Query() const = 0;
/// Loads the schema of view.
virtual std::shared_ptr<Schema> GetSchema() const = 0;

Copilot uses AI. Check for mistakes.
}

/// Loads the latest schema of table.
virtual std::shared_ptr<Schema> LatestSchema() {
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LatestSchema() returns the cached schema pointer and does not appear to mutate state, but it is not marked const. Making it const will improve const-correctness and allow calling it through const std::shared_ptr<Table>& / const Table&.

Suggested change
virtual std::shared_ptr<Schema> LatestSchema() {
virtual std::shared_ptr<Schema> LatestSchema() const {

Copilot uses AI. Check for mistakes.
/// Interface of a database in a catalog.
class PAIMON_EXPORT Database {
public:
/// ================== Table Metadata =====================
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtual ~Database() = default;

if (IsSystemTable(identifier)) {
return Status::Invalid(fmt::format("Cannot drop system table {}.", identifier.ToString()));
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add CheckNotBranch function here to avoid drop branch

return Status::Invalid(fmt::format("Cannot drop system table {}.", identifier.ToString()));
}

PAIMON_ASSIGN_OR_RAISE(bool exist, TableExists(identifier));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support drop branch/external path here or leave a TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Improve catalog interface

5 participants