docs: usability pass — gotchas page, glossary, labelled waveform, less jargon#7
Merged
Conversation
…s jargon Reorganises the docsite for a tired-volunteer reading lens. Major changes: - New "Common mistakes" page consolidates scattered pitfalls (has_violence derivation, hardcoded speaker dirs, peak ~= 0.79, UPPERCASE/lowercase, NEG-isn't-violent, split: train, quality_flags semantics) so each one costs one read instead of recurring across pages. - New Glossary page maps AGG/VIC/SW/BEN, F0, SSML, IR, ISM, dBFS, RMS, prosody cap, dirty file, weak vs strong labels, voice IDs. - Home page rewritten: leads with a real labelled waveform of an SV clip (PNG generated from corpus data + .jsonl event boundaries), then a 4-line load snippet, then side-by-side team cards instead of tabbed product picker. Toy-corpus warning and "what's not here" callout moved below the fold. - Schema reference rewritten as a single annotated JSON example with click-to-expand explanations, ordered by frequency-of-use rather than by Pydantic object hierarchy. Tables retained only for EventLabel, manifest columns, and the .txt transcript format. - Audio format leads with consumer facts (peak ~= 0.79, padding included in timestamps, two-peak-fields convention); pipeline internals (M3a per-turn RMS, Stage 1/2/3 normalization, target rationale) collapsed into optional admonitions. - Taxonomy adds an explicit intensity-vs-typology coupling table and flags that scripts are LLM-generated, not human-written. - She-Proves / Elephant pages reframed as the differential vs the shared reference, with cleaner speaker tables (split speaker_id / voice columns) and full clip listings collapsed. - Critical ??? collapsed admonitions opened to !!! visible ones (peak-is-0.79, ACOU vs DIST, background event types, Tier A meaning, NEG-trap, casing convention). ??? reserved for skippable detail (per-turn RMS rationale, peak-target rationale). - Operator jargon stripped — milestone codes (M3a/M8a/M10a), "wet test", "spec validation", "pipeline bootstrapping". SSML/F0/IR/ISM/ Whisper defined on first use via Glossary cross-links. - Custom CSS adds status pills and team cards. Logo, palette, search, nav structure unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
pr-agent-context report: This run includes a failing check on PR #7 in repository https://github.com/DataHackIL/avdp-synth-corpus
Diagnose and fix the failing checks below, then push all of these changes in a single commit.
# Failing Checks
## FAIL-1
Type: Commit status
Context: pre-commit.ci - pr
Status: failure
URL: https://results.pre-commit.ci/run/github/1210843386/1778619602.J6z3ALh9QDCnBBTxqrEqDg
Summary:
checks completed with failuresRun metadata: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Critical re-read of the docsite from the lens of a tired DS volunteer (2 hours/week, scanning for value). Reorganises content so the first scroll surfaces what a consumer actually needs — and the operator-grade detail moves out of the way without being lost.
What changed
New pages
docs/gotchas.md— Common mistakes. Consolidates ten previously-scattered pitfalls into one ~2-minute read:has_violencederivation rule, hardcoded speaker-dir trap, peak-is-0.79 surprise, UPPERCASE/lowercase casing, NEG-isn't-violent,split: trainwarning,quality_flagssemantics, Google-clip flag, timestamp/pad relationship,.jsonvs.jsonl. Plus a clone-verification snippet.docs/glossary.md— One canonical place for AGG/VIC/SW/BEN, F0, SSML, IR, ISM, dBFS, RMS, prosody cap, dirty file, weak vs strong labels, voice IDs.New visual
docs/assets/sp_sv_a_0001_00_waveform.png— Real labelled waveform of an SV clip rendered from the corpus data + its.jsonlevents. Now the lead element on the home page. Shows the typical escalation arc (VERB→DIST→PHYS) with intensity badges.Home page
provisional · 2026-05-12) + a 4-line load snippet + the labelled waveform.extra.cssgrid).Schema Reference
(1)!explanations.weak_label→speakers→ paths →quality_flags→acoustic_scene(Tier B) →preprocessing_applied→generation_metadata(collapsed) → diagnostic/reserved fields..txttranscript format section (previously undocumented).Audio Format
loudness_target_peak_dbfsvsnormalized_dbfs).???admonitions for the curious.Taxonomy
???(collapsed) to!!!(visible): NEG-trap, ACOU vs DIST.max_intensityranges.Team guides
speaker_idandvoicecolumns (no more arrow-joined cells).???blocks; summary counts surfaced.Deliveries
Operator jargon stripped
M3a,M8a,M10a) removed from consumer-facing pages.Plumbing
mkdocs.yml: site_name → "AVDP Synthetic Corpus", new nav entries (Start here,Common mistakes,Glossary),abbrextension,extra_cssforassets/extra.css.docs/assets/extra.css: status pills + team-card grid (responsive, dark-mode aware).Test plan
mkdocs buildpasses with zero broken anchors / missing-link warnings<div class=\"team-cards\">,<span class=\"status-pill\">) renders correctly throughmd_in_html/assets/sp_sv_a_0001_00_waveform.png#1-dont-derive...,#4-uppercase-in-json..., etc.).github/workflows/docs.yml(triggers on push tomaintouchingdocs/ormkdocs.yml)🤖 Generated with Claude Code