Skip to content

Fix -filter flag and extend stdout streaming to returns tables#70

Open
cjohnson-confluent wants to merge 2 commits intogregrahn:masterfrom
cjohnson-confluent:cjohnson/filter
Open

Fix -filter flag and extend stdout streaming to returns tables#70
cjohnson-confluent wants to merge 2 commits intogregrahn:masterfrom
cjohnson-confluent:cjohnson/filter

Conversation

@cjohnson-confluent
Copy link
Copy Markdown

What was broken

The -filter flag (which streams generated data to stdout instead of writing
files) was silently broken by the v2.3.0 upstream import (12caac0), which
reverted three fixes Greg Rahn had originally landed in 7992dbb:

  1. params.h: Option was named _FILTER but is_set() searches for
    "FILTER" using prefix matching — it never matched, so filter mode never
    activated for any table.
  2. print.c (print_start): fpOutfile = pTdef->outfile ran
    unconditionally, overwriting the stdout assignment with NULL on every
    row.
  3. w_store_sales.c: Returns generation ran even in filter mode, writing
    interleaved sales and returns rows to stdout in a single pass.

What's new

  • Restores all three fixes above
  • Extends -filter support to the three returns tables (store_returns,
    catalog_returns, web_returns). Because returns are generated as a side
    effect of their parent sales table, a g_filter_tabid global tracks the
    target table; driver.c redirects child table requests to the parent
    generator, and print_start routes only the target to stdout, suppressing
    the parent's output to /dev/null.
  • Auto-detects OS in makefile (DarwinMACOS, else LINUX) so the
    same build works on macOS and Linux without a manual OS= override.
  • Adds compiler flags for compatibility with modern GCC/clang on K&R-style C
    (-Wno-implicit-int -Wno-deprecated-non-prototype -fcommon).
  • Adds a clear error when -filter is used without specifying -table <name>.

Verification

Output verified byte-for-byte identical to non-filter mode for all tables.

Usage

./dsdgen -scale N -table store_sales    -filter -quiet 2>/dev/null | gzip > store_sales.gz
./dsdgen -scale N -table store_returns  -filter -quiet 2>/dev/null | gzip > store_returns.gz
# same pattern for catalog_sales/returns, web_sales/returns

cjohnson-confluent and others added 2 commits March 4, 2026 12:47
The -filter flag was silently broken in the v2.3.0 upstream import due to
three separate bugs, all of which were originally fixed by Greg Rahn in
2013 (commit 7992dbb) and later reverted:

1. params.h: option was named _FILTER but is_set() looks up "FILTER";
   prefix matching never matched, so filter mode never activated
2. print.c (print_start): fpOutfile = pTdef->outfile ran unconditionally,
   overwriting the stdout assignment with NULL on every row
3. w_store_sales.c: returns generation ran even in filter mode, writing
   both sales and returns rows to stdout in a single pass (interleaved)

This commit restores and extends those fixes:

- Fix all three bugs above
- Add -filter support for the three returns tables (store_returns,
  catalog_returns, web_returns). Because returns are generated as a
  side effect of their parent sales table, a g_filter_tabid global
  tracks the target table and driver.c redirects child table requests
  to the parent generator; print_start routes the target to stdout and
  suppresses the parent's output to /dev/null
- Auto-detect OS in makefile (Darwin -> MACOS, else LINUX) so the same
  build works on both macOS and Linux without manual OS= override
- Add -Wno-implicit-int -Wno-deprecated-non-prototype to MACOS_CFLAGS
  for compatibility with modern clang's stricter K&R C handling
- Add validation error when -filter is used without -table <name>

Output verified byte-for-byte identical to non-filter mode.

Usage:
  ./dsdgen -scale N -table store_sales    -filter -quiet 2>/dev/null | gzip > store_sales.gz
  ./dsdgen -scale N -table store_returns  -filter -quiet 2>/dev/null | gzip > store_returns.gz
  (same pattern for catalog_sales/returns, web_sales/returns)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
print.c is linked into both dsdgen and dsqgen; driver.c is dsdgen-only.
The extern declaration of g_filter_tabid in print.c caused an undefined
symbol error when linking dsqgen, because the definition lived in driver.c.

Fix: move the definition (and its initializer) to print.c, and change
driver.c to declare it extern. print.c is the right owner — it is the
translation unit that actually reads the variable in print_start().

Smoke tested: normal file output, -filter stdout for sales and returns
tables, error on -filter without -table, and dsqgen -filter all pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant