Skip to content

Feature/schema statistics and samples#4

Merged
Nechja merged 2 commits intomainfrom
feature/schema-statistics-and-samples
Oct 4, 2025
Merged

Feature/schema statistics and samples#4
Nechja merged 2 commits intomainfrom
feature/schema-statistics-and-samples

Conversation

@Nechja
Copy link
Copy Markdown
Owner

@Nechja Nechja commented Oct 4, 2025

This pull request introduces new schema statistics and data sampling features to the export and comparison commands, enhancing the ability to analyze and understand database schemas. The main changes include support for collecting table and column statistics (such as row counts and sample values), updates to the schema data model, and implementations for MySQL, PostgreSQL, and Oracle backends.

New statistics and sampling features:

  • Added command-line flags to enable collection of schema statistics, row counts, and column sample values, including a configurable sample size, in both the export and compare commands (cmd/schemalyzer/commands/export.go, cmd/schemalyzer/commands/compare.go). [1] [2]
  • Implemented the collectStatistics function to aggregate statistics (table count, view count, total columns, index count, row counts, and column samples) when exporting or comparing schemas (cmd/schemalyzer/commands/common.go). [1] [2] [3]

Schema model enhancements:

  • Extended the Schema, Table, and Column structs to include fields for overall statistics (Stats), per-table row counts (RowCount), and per-column sample values (Samples). Added a new SchemaStats struct to represent aggregated statistics (pkg/models/schema.go). [1] [2] [3] [4]

Backend support for statistics:

  • Defined a new StatisticsReader interface for retrieving row counts and column samples, and implemented it for MySQL, PostgreSQL, and Oracle readers, including safe identifier quoting and value conversion for each backend (internal/database/interfaces.go, internal/database/mysql/reader.go, internal/database/postgres/reader.go, internal/database/oracle/reader.go). [1] [2] [3] [4]

Nechja added 2 commits October 4, 2025 11:07
- Added --with-stats flag to include schema statistics (table count, column count, etc.)
- Added --with-row-count flag to include row counts for each table
- Added --with-samples flag to include sample values for each column
- Added --sample-size flag to control number of samples (default: 3)
- Implemented StatisticsReader interface for PostgreSQL
- Updated models to include optional statistics fields
- Statistics collection continues even if some queries fail

This helps users understand data volume and content patterns without manual queries.
@Nechja Nechja merged commit c1242ae into main Oct 4, 2025
7 checks passed
@Nechja Nechja deleted the feature/schema-statistics-and-samples branch December 10, 2025 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant