Persist normalization stats to data/normalization_stats.json by riley-1995 · Pull Request #17 · SensorFusion2026/ELP-Rumble-Detector

riley-1995 · 2026-03-05T13:26:30Z

Summary

Persist computed preprocessing normalization statistics to a single JSON artifact so teammates and downstream scripts can consume stable values without recomputation.

This PR closes #15.

What changed

Added JSON upsert helper in src/elp_rumble/data_creation/utils.py.
Updated src/elp_rumble/data_creation/create_tfrecords.py to write:
- audio_mean
- audio_std
Updated src/elp_rumble/data_creation/convert_audio_to_spec_tfrecords.py to write:
- spec_mean
- spec_std
Updated README.md with:
- output file location
- generation steps
- behavior notes
Added .github/ to .gitignore

Output location

data/normalization_stats.json

Expected JSON shape

{
  "audio_mean": <float>,
  "audio_std": <float>,
  "spec_mean": <float>,
  "spec_std": <float>
}

Overwrite behavior

If data/normalization_stats.json already exists, rerunning scripts overwrites the keys they compute with latest values:

create_tfrecords overwrites audio_mean / audio_std
convert_audio_to_spec_tfrecords overwrites spec_mean / spec_std

Validation run

Executed full README preprocessing flow with repo .venv:

python -m elp_rumble.data_creation.pos_audio_clips --mode train
python -m elp_rumble.data_creation.pos_audio_clips --mode test
python -m elp_rumble.data_creation.neg_audio_clips
python -m elp_rumble.data_creation.create_tfrecords
python -m elp_rumble.data_creation.convert_audio_to_spec_tfrecords

Observed output example:

{
  "audio_mean": -0.013995947316288948,
  "audio_std": 0.017343521118164062,
  "spec_mean": 2.7122833728790283,
  "spec_std": 1.0036183595657349
}

All four required keys are present, numeric, and non-null.

Copilot

Pull request overview

This PR adds persistence of preprocessing normalization statistics into a single JSON artifact (data/normalization_stats.json) so downstream scripts/teammates can reuse stable values without recomputing them, addressing issue #15.

Changes:

Added a JSON “upsert” helper to merge newly-computed stats into an existing stats file.
Updated audio and spectrogram TFRecord creation scripts to write their respective mean/std keys.
Documented the output location in the README and adjusted .gitignore.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/elp_rumble/data_creation/utils.py`	Adds `upsert_normalization_stats()` helper for read/merge/write of stats JSON
`src/elp_rumble/data_creation/create_tfrecords.py`	Writes `audio_mean`/`audio_std` into the shared stats JSON
`src/elp_rumble/data_creation/convert_audio_to_spec_tfrecords.py`	Writes `spec_mean`/`spec_std` into the shared stats JSON
`README.md`	Notes where normalization stats are written during preprocessing
`.gitignore`	Adds `.github/` ignore rule

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T13:32:25Z

+
+def upsert_normalization_stats(stats_path, updates):
+    """Merge normalization stats into JSON at stats_path, creating file if needed."""
+    os.makedirs(os.path.dirname(os.fspath(stats_path)), exist_ok=True)


os.makedirs(os.path.dirname(os.fspath(stats_path)), ...) will raise if stats_path has no parent directory (e.g., just "normalization_stats.json"), because os.path.dirname(...) becomes an empty string. Consider resolving the parent directory via Path(stats_path).parent (or guarding for empty dirname) before calling makedirs so the helper works for both nested and non-nested paths.

Suggested change

os.makedirs(os.path.dirname(os.fspath(stats_path)), exist_ok=True)

dir_name = os.path.dirname(os.fspath(stats_path))

if dir_name:

os.makedirs(dir_name, exist_ok=True)

Copilot · 2026-03-05T13:32:26Z

+
+    tmp_path = f"{stats_path}.tmp"
+    with open(tmp_path, "w", encoding="utf-8") as f:
+        json.dump(existing, f, indent=2)


Issue #15 requires deterministic JSON output, but json.dump(existing, ..., indent=2) does not enforce a stable key order; the serialized ordering will depend on insertion/update history. Add sort_keys=True (and keep indentation) so repeated runs produce a deterministic file layout.

Suggested change

json.dump(existing, f, indent=2)

json.dump(existing, f, indent=2, sort_keys=True)

Copilot · 2026-03-05T13:32:26Z

+During preprocessing, normalization statistics are saved to `data/normalization_stats.json`.
+`create_tfrecords` writes `audio_mean` and `audio_std`, and
+`convert_audio_to_spec_tfrecords` writes `spec_mean` and `spec_std`.
+Running both scripts produces a single JSON file with all four keys.


This section mentions the output path and which scripts write which keys, but it’s still missing two items called out in issue #15/PR description: (1) an example JSON snippet showing the expected shape, and (2) a brief note on overwrite/upsert behavior (reruns overwrite only the keys they compute). Adding both here would make the README sufficient for downstream consumers without needing to read the PR description.

Persist normalization stats JSON and ignore .github

118f5f2

Copilot AI review requested due to automatic review settings March 5, 2026 13:26

Copilot started reviewing on behalf of riley-1995 March 5, 2026 13:27 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist normalization stats to data/normalization_stats.json#17

Persist normalization stats to data/normalization_stats.json#17
riley-1995 wants to merge 1 commit intomainfrom
normalization-stats-json

riley-1995 commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    os.makedirs(os.path.dirname(os.fspath(stats_path)), exist_ok=True)
+    dir_name = os.path.dirname(os.fspath(stats_path))
+    if dir_name:
+        os.makedirs(dir_name, exist_ok=True)

	json.dump(existing, f, indent=2)
	json.dump(existing, f, indent=2, sort_keys=True)

Conversation

riley-1995 commented Mar 5, 2026

Summary

What changed

Output location

Expected JSON shape

Overwrite behavior

Validation run

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants