[ BOUNTY] Add deterministic seed support to data_generator (#4) by xcapselx · Pull Request #8 · thanhle74/kickama

xcapselx · 2026-06-18T23:16:10Z

Summary

Add deterministic seed support to tools/data_generator.py so generated data can be reproduced exactly from a seed. The existing --seed flag was present but helper functions used the global random module instead of the seeded instance, breaking determinism.

Changes

tools/data_generator.py:
- Fixed random_phone(), random_email(), and random_datetime() to accept an optional rng parameter
- Updated all DataGenerator method call sites to pass self.random to helper functions
- Changed --seed default from 42 to None (random seed when not specified)
- Added --print-seed flag to print the seed used so a random run can be reproduced
- Auto-enable --print-seed when no --seed is supplied
- Write _metadata.json with seed and parameters in output directory
- Print reproduction command when --print-seed is active
tests/test_data_generator_seed.py (new file):
- test_same_seed_produces_identical_output — same seed = byte-for-byte identical
- test_different_seeds_produce_different_output — different seeds = different output
- test_deterministic_across_three_seeds — verifies 3 seeds (42, 123, 999) produce identical hashes across runs
- test_print_seed_flag_exists — verifies --print-seed flag is recognized
- test_seed_none_generates_random_seed — verifies default seed is None
data/README.md:
- Added "Test Data Generation" section with usage examples for --seed and --print-seed

Testing

python -m unittest tests.test_data_generator_seed -v

Result:

test_deterministic_across_three_seeds ... ok
test_different_seeds_produce_different_output ... ok
test_print_seed_flag_exists ... ok
test_same_seed_produces_identical_output ... ok
test_seed_none_generates_random_seed ... ok

Ran 5 tests in 0.021s
OK

Build diagnostic: diagnostic/build-549809c9.json (encryptly .logd unavailable on Windows; JSON metadata included).

Checklist

Relevant modules affected by these changes build locally
Tests pass locally
Diagnostic build log is committed in this PR
Documentation has been updated (data/README.md)
Changes are scoped to the PR purpose and avoid unrelated cleanup
Security, privacy, and error-handling implications have been considered

Closes #4

- Fix helper functions (random_phone, random_email, random_datetime) to accept rng parameter - Update all DataGenerator call sites to pass self.random for full determinism - Change --seed default from 42 to None (random seed when not specified) - Add --print-seed flag to print seed for reproducibility - Auto-enable --print-seed when no seed is supplied - Write _metadata.json with seed and parameters in output directory - Add tests proving deterministic output for seeds 42, 123, 999 - Update data/README.md with seed usage examples

coderabbitai · 2026-06-18T23:16:18Z

Warning

Review limit reached

@xcapselx, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 57 minutes and 5 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94dd05cf-f5d7-4fce-bb5f-ce0581268d99

📥 Commits

Reviewing files that changed from the base of the PR and between 94e0fb0 and 7da0e03.

📒 Files selected for processing (4)

data/README.md
diagnostic/build-549809c9.json
tests/test_data_generator_seed.py
tools/data_generator.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

LE-VAI added 2 commits June 18, 2026 19:15

Add build diagnostics for 549809c

7da0e03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ BOUNTY] Add deterministic seed support to data_generator (#4)#8

[ BOUNTY] Add deterministic seed support to data_generator (#4)#8
xcapselx wants to merge 2 commits into
thanhle74:mainfrom
xcapselx:feat/seed-support-thanhle74

xcapselx commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xcapselx commented Jun 18, 2026

Summary

Changes

Testing

Checklist

Uh oh!

coderabbitai Bot commented Jun 18, 2026

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants