[ BOUNTY] Add deterministic seed support to data_generator (#4)#8
[ BOUNTY] Add deterministic seed support to data_generator (#4)#8xcapselx wants to merge 2 commits into
Conversation
- Fix helper functions (random_phone, random_email, random_datetime) to accept rng parameter - Update all DataGenerator call sites to pass self.random for full determinism - Change --seed default from 42 to None (random seed when not specified) - Add --print-seed flag to print seed for reproducibility - Auto-enable --print-seed when no seed is supplied - Write _metadata.json with seed and parameters in output directory - Add tests proving deterministic output for seeds 42, 123, 999 - Update data/README.md with seed usage examples
|
Warning Review limit reached
More reviews will be available in 57 minutes and 5 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Add deterministic seed support to
tools/data_generator.pyso generated data can be reproduced exactly from a seed. The existing--seedflag was present but helper functions used the globalrandommodule instead of the seeded instance, breaking determinism.Changes
tools/data_generator.py:random_phone(),random_email(), andrandom_datetime()to accept an optionalrngparameterDataGeneratormethod call sites to passself.randomto helper functions--seeddefault from42toNone(random seed when not specified)--print-seedflag to print the seed used so a random run can be reproduced--print-seedwhen no--seedis supplied_metadata.jsonwith seed and parameters in output directory--print-seedis activetests/test_data_generator_seed.py(new file):test_same_seed_produces_identical_output— same seed = byte-for-byte identicaltest_different_seeds_produce_different_output— different seeds = different outputtest_deterministic_across_three_seeds— verifies 3 seeds (42, 123, 999) produce identical hashes across runstest_print_seed_flag_exists— verifies--print-seedflag is recognizedtest_seed_none_generates_random_seed— verifies default seed is Nonedata/README.md:--seedand--print-seedTesting
Result:
Build diagnostic:
diagnostic/build-549809c9.json(encryptly .logd unavailable on Windows; JSON metadata included).Checklist
Closes #4