Skip to content

UDP protocol overhaul and vision pipeline improvements#456

Open
phurley67 wants to merge 2 commits into
mainfrom
453-networking-vision
Open

UDP protocol overhaul and vision pipeline improvements#456
phurley67 wants to merge 2 commits into
mainfrom
453-networking-vision

Conversation

@phurley67

Copy link
Copy Markdown
Contributor

Closes #453

Summary

  • Rewrite Mac↔RIO packet protocol with magic number, version, sequence counter
  • Set explicit BIG_ENDIAN byte order on both sides
  • Split Mac into producer/sender threads with heartbeat packets
  • Add error recovery: Mac retry loop and RIO socket recreation
  • Add 500ms receive socket timeout to prevent silent hangs
  • Capture RIO time in receive thread for accurate time offset
  • Use EMA for vision timestamp offset instead of one-shot calibration
  • Add minimum floor to vision trust standard deviation
  • Use tighter vision trust for rotation than translation
  • Fix ambiguity threshold off-by-one between Mac and RIO
  • Fix ambiguity check to use filtered result instead of raw result
  • Add vision staleness detection to distinguish app crash from no tags
  • Resolve RoboRIO InetAddress once instead of per-packet
  • Document why 3-arg addVisionMeasurement override is absent
  • Add NETWORK_FINDINGS.md documenting UDP networking issues

Split from #337.

Test plan

  • Verify Mac Mini sends heartbeat packets when no tags are visible
  • Confirm magic number / version mismatch is rejected with clear log message
  • Test error recovery: unplug/replug network cable, verify auto-reconnect
  • Check vision trust values on Shuffleboard during match play
  • Verify EMA time offset converges and doesn't drift

🤖 Generated with Claude Code

- Rewrite Mac-RIO packet protocol with magic number, version, sequence counter
- Set explicit BIG_ENDIAN byte order on both sides
- Split Mac into producer/sender threads with heartbeat packets
- Add error recovery: Mac retry loop and RIO socket recreation
- Add 500ms receive socket timeout to prevent silent hangs
- Capture RIO time in receive thread for accurate time offset
- Use EMA for vision timestamp offset instead of one-shot calibration
- Add minimum floor to vision trust standard deviation
- Use tighter vision trust for rotation than translation
- Fix ambiguity threshold off-by-one between Mac and RIO
- Fix ambiguity check to use filtered result instead of raw result
- Add vision staleness detection to distinguish app crash from no tags
- Resolve RoboRIO InetAddress once instead of per-packet
- Document why 3-arg addVisionMeasurement override is absent
- Add NETWORK_FINDINGS.md documenting UDP networking issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 19, 2026 22:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR overhauls the Mac Mini ↔ RoboRIO UDP vision protocol and improves the RoboRIO-side vision ingestion pipeline to be more robust (validation, timeouts, reconnect) and more accurate (better time alignment and measurement trust tuning).

Changes:

  • Introduces a versioned UDP packet format with magic number, explicit BIG_ENDIAN encoding, sequence counter, and heartbeat packets.
  • Adds receive-side robustness: socket timeout, packet validation/rejection, staleness detection, and automatic socket recreation.
  • Improves vision fusion quality: EMA-based Mac↔RIO time offset estimation and clamped/tuned vision measurement standard deviations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/main/java/frc/robot/subsystems/Swerve.java Documents time-domain expectations for vision timestamp handling and why the 3-arg overload is intentionally not overridden.
src/main/java/frc/robot/subsystems/PhotonVision.java Implements the new UDP protocol parsing/validation, receive timeout + recovery loop, staleness tracking, EMA time offset, and updated trust modeling.
src/main/java/frc/robot/mac/Main.java Wraps the Mac-side vision app in a retry loop for self-recovery after failures.
src/main/java/frc/robot/mac/MacMini.java Reworks Mac-side sending into producer/scheduled-sender threads, adds heartbeats, and builds the new binary packet format.
NETWORK_FINDINGS.md Adds a networking findings document describing the discovered UDP issues and the intended fixes.

Comment on lines +53 to +54
// If someone interrupts our sleep, just retry immediately
Thread.currentThread().interrupt();

Copilot AI Mar 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this thread is interrupted during the retry sleep, the code sets the interrupt flag but continues the infinite retry loop. With the interrupt flag set, subsequent Thread.sleep(...) calls will immediately throw again, which can turn this into a tight log-spamming loop. Consider breaking/returning from main when interrupted (or clearing the interrupt and handling shutdown explicitly).

Suggested change
// If someone interrupts our sleep, just retry immediately
Thread.currentThread().interrupt();
// If someone interrupts our sleep, treat it as a shutdown request.
System.out.println("[PHOTON VISION] Interrupted during retry delay, shutting down.");
Thread.currentThread().interrupt();
break;

Copilot uses AI. Check for mistakes.
Comment thread NETWORK_FINDINGS.md
Comment on lines +113 to +124
## 9. Mac sends 20× faster than the RIO consumes (efficiency)

**File:** `MacMini.java` — `run()` loop

The Mac loops with a 1 ms sleep (~1000 Hz), but the RIO's `periodic()` runs at
50 Hz. The `AtomicReference.set()` overwrites ~19 out of every 20 packets
before they are read. All that pose-estimation work is wasted, and it burns CPU
that the PhotonVision pipeline could use.

**Fix:** Increased the sleep to 15 ms (~66 Hz), which still provides multiple
fresh readings per RIO cycle while dramatically reducing wasted work.

Copilot AI Mar 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says the fix for “Mac sends 20× faster than the RIO consumes” was changing a sleep to 15ms, but the current implementation uses a producer thread + a 10ms scheduled sender (and a 5ms producer sleep). Please update the write-up to match the actual design so it stays a reliable reference.

Copilot uses AI. Check for mistakes.
Comment thread NETWORK_FINDINGS.md
Comment on lines +81 to +83
**Fix:** Set a 500 ms socket timeout. On timeout, the thread logs a warning and
loops back to receive again. This also enables staleness detection (see #7).

Copilot AI Mar 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says the receive timeout “logs a warning” on each timeout, but the current receive-thread handler for SocketTimeoutException is intentionally silent. Either update this document to match the implementation, or add a throttled warning/metric so the claim is accurate.

Suggested change
**Fix:** Set a 500 ms socket timeout. On timeout, the thread logs a warning and
loops back to receive again. This also enables staleness detection (see #7).
**Fix:** Set a 500 ms socket timeout. On timeout, the `receive()` call throws a
`SocketTimeoutException`, and the thread simply loops back to receive again
without logging each timeout. This also enables staleness detection (see #7).

Copilot uses AI. Check for mistakes.
// packets — with UDP, occasional reordering is normal and the data
// is still valid. The sequence number is useful for debugging: if
// you see big jumps, packets are being dropped on the network.
int seq = buffer.getInt();

Copilot AI Mar 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq is read from the packet but never used, and the surrounding comment says it’s logged. Either remove the sequence number read (and comment), or actually record/log it (e.g., DataLog/Shuffleboard) so it’s useful for diagnosing packet drops/reordering.

Suggested change
int seq = buffer.getInt();
int seq = buffer.getInt();
DataLogManager.log("[PHOTON VISION] Received vision packet seq=" + seq);

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +220
} catch (SocketTimeoutException e) {
// This is EXPECTED and not an error. It means no packet
// arrived within SOCKET_TIMEOUT_MS. We just loop back and
// try again. The staleness detection in periodic() will
// handle alerting if we go too long without data.

Copilot AI Mar 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SocketTimeoutException path is intentionally silent, but several comments above say the timeout “lets us log a warning”. Either add a throttled warning/log/metric on timeouts (so loss of packets is visible), or update the comments/docs so they match the actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +103
// Three states the driver cares about:
// 1. commsAlive=false → Mac is dead / network down
// 2. commsAlive=true, visionStale=true → Mac is alive, but no tags visible
// 3. commsAlive=true, visionStale=false→ Everything working, getting poses
//
// The LED system uses these to show different colors while disabled.

Copilot AI Mar 19, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describes three driver-facing states using both commsAlive and a separate visionStale flag, but the code only implements commsAlive (and publishes vision data stale as !commsAlive). Please update the comment to reflect the actual state model, or add an explicit “no-tags/pose-recently-missing” state if that distinction is intended.

Suggested change
// Three states the driver cares about:
// 1. commsAlive=false → Mac is dead / network down
// 2. commsAlive=true, visionStale=true → Mac is alive, but no tags visible
// 3. commsAlive=true, visionStale=false→ Everything working, getting poses
//
// The LED system uses these to show different colors while disabled.
// Driver-visible states are collapsed onto this single flag:
// - commsAlive = false → Mac/network/vision app is not providing data
// - commsAlive = true → Mac and vision app are up and have sent
// a recent pose/heartbeat packet (not stale).
//
// "Vision data stale" is published elsewhere as !commsAlive, based on how
// recently we received a packet (see lastReceiveTimeRio and timeout logic).
// The LED system uses this to show different colors while disabled.

Copilot uses AI. Check for mistakes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UDP protocol overhaul and vision pipeline improvements

2 participants