diff --git a/NETWORK_FINDINGS.md b/NETWORK_FINDINGS.md new file mode 100644 index 00000000..0afe96ab --- /dev/null +++ b/NETWORK_FINDINGS.md @@ -0,0 +1,136 @@ +# Network Findings: Mac Mini ↔ RoboRIO UDP Communication + +This document describes reliability and correctness issues found in the UDP +networking layer between the Mac Mini vision processor (`frc.robot.mac.MacMini`) +and the RoboRIO receiver (`frc.robot.subsystems.PhotonVision`). + +Each issue has been fixed in its own commit so you can review them one at a time. + +--- + +## 1. Null pointer crash in Mac main loop (bug) + +**File:** `MacMini.java` — `run()` method + +`getEstimatedPose()` can return `null` (lines 116 and 128), but the main loop +only checks `info.pose != null` without first checking whether `info` itself is +null. This causes a `NullPointerException` that crashes the vision processor. + +--- + +## 2. Socket creation failure silently ignored (bug) + +**File:** `MacMini.java` — constructor + +If `new DatagramSocket()` throws a `SocketException`, the error is logged but +`socket` remains `null`. The `run()` method then calls `socket.send()`, which +throws an NPE on every iteration — spinning forever and flooding the log with +"Failed to send packet" messages. + +--- + +## 3. No packet magic number or validation (reliability) + +**Files:** `MacMini.java`, `PhotonVision.java` + +Any stray UDP traffic arriving on port 12345 is blindly parsed as a valid pose +and fed into the Kalman filter. There is no header, version byte, or integrity +check. On a busy FRC field network this could corrupt the pose estimator. + +**Fix:** Added a 4-byte magic number (`0x00000862` — team 862), a 1-byte +protocol version, and a 4-byte sequence counter. The receiver rejects packets +that don't start with the correct magic/version bytes. + +--- + +## 4. Time offset computed too late (accuracy) + +**File:** `PhotonVision.java` — `periodic()` method + +The "instant offset" between Mac time and RIO time was calculated inside +`periodic()`, which runs at 50 Hz. Up to 20 ms of scheduling delay between +packet arrival and `periodic()` execution gets baked into the offset, making +all vision timestamps systematically late. + +**Fix:** Capture `Utils.getCurrentTimeSeconds()` in the receive thread at the +moment the packet arrives, and store it alongside the parsed data. The EMA +calculation in `periodic()` now uses that receive-time instead of process-time. + +--- + +## 5. `InetAddress.getByName()` resolved every packet (performance) + +**File:** `MacMini.java` — `getBinaryPacket()` + +The destination address `"10.8.62.2"` was resolved from a string on every +packet send (~1000 times/sec). While Java may cache the result, this is +unnecessary overhead. + +**Fix:** Resolve the address once in the constructor and reuse it. + +--- + +## 6. No receive socket timeout (robustness) + +**File:** `PhotonVision.java` — receive thread + +`socket.receive()` blocks indefinitely. If the Mac Mini crashes or the network +cable is unplugged, the receive thread hangs forever with no exception, no log +message, and no way to detect the failure. + +**Fix:** Set a 500 ms socket timeout. On timeout, the thread logs a warning and +loops back to receive again. This also enables staleness detection (see #7). + +--- + +## 7. No staleness detection (reliability) + +**File:** `PhotonVision.java` — `periodic()` method + +When `pose.getAndSet(null)` returns null, there is no way to distinguish "no +tags visible this cycle" from "the Mac Mini has been dead for 10 seconds." The +ICMP ping thread only confirms the OS is up, not that the vision application is +running. + +**Fix:** Track the timestamp of the last successfully received packet. In +`periodic()`, if the gap exceeds a threshold, publish a "vision stale" warning +to Shuffleboard. + +--- + +## 8. ByteBuffer byte order not explicit (fragility) + +**Files:** `MacMini.java`, `PhotonVision.java` + +Both sides rely on Java's default `BIG_ENDIAN` byte order. This works today +because both are Java, but the contract is implicit and would silently break if +either side were ported to a language with a different default. + +**Fix:** Explicitly set `ByteOrder.BIG_ENDIAN` on both sides. + +--- + +## 9. Mac sends 20× faster than the RIO consumes (efficiency) + +**File:** `MacMini.java` — `run()` loop + +The Mac loops with a 1 ms sleep (~1000 Hz), but the RIO's `periodic()` runs at +50 Hz. The `AtomicReference.set()` overwrites ~19 out of every 20 packets +before they are read. All that pose-estimation work is wasted, and it burns CPU +that the PhotonVision pipeline could use. + +**Fix:** Increased the sleep to 15 ms (~66 Hz), which still provides multiple +fresh readings per RIO cycle while dramatically reducing wasted work. + +--- + +## 10. Ambiguity threshold off-by-one (minor) + +**Files:** `MacMini.java`, `PhotonVision.java` + +The Mac filters with `ambiguity > 1.0` (rejects values above 1.0, keeps 1.0) +but the RIO filters with `ambiguity < 1` (rejects 1.0). A pose with exactly +1.0 ambiguity passes the Mac but is rejected by the RIO. + +**Fix:** Aligned both sides to use `<=` / `>=` consistently so the boundary +value is handled the same way. diff --git a/src/main/java/frc/robot/mac/MacMini.java b/src/main/java/frc/robot/mac/MacMini.java index d8942399..3af8c9f5 100644 --- a/src/main/java/frc/robot/mac/MacMini.java +++ b/src/main/java/frc/robot/mac/MacMini.java @@ -4,13 +4,16 @@ import java.net.DatagramPacket; import java.net.DatagramSocket; import java.net.InetAddress; -import java.net.SocketException; -import java.net.UnknownHostException; import java.nio.ByteBuffer; +import java.nio.ByteOrder; import java.nio.file.Path; import java.util.ArrayList; import java.util.List; import java.util.Optional; +import java.util.concurrent.Executors; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicReference; import org.photonvision.EstimatedRobotPose; import org.photonvision.PhotonCamera; @@ -27,9 +30,40 @@ public class MacMini implements AutoCloseable { // Camera info - private record CameraInfo(PhotonCamera camera, PhotonPoseEstimator poseEstimator) {}; + private record CameraInfo(PhotonCamera camera, PhotonPoseEstimator poseEstimator) {}; private record VisionInfo(PhotonPipelineResult result, EstimatedRobotPose pose) {}; + // ===== PACKET PROTOCOL CONSTANTS ===== + // These must match EXACTLY between the sender (here) and the receiver + // (PhotonVision.java on the RoboRIO). If you change one, change both! + // + // MAGIC_NUMBER: A unique identifier at the start of every packet. + // Think of it like a secret handshake — if a random UDP packet arrives + // on our port (from another team's code, a network tool, etc.), it + // almost certainly won't start with 0x00000862. The receiver checks + // this first and throws away anything that doesn't match. + // We use 0x862 = team 862 in hex, padded to 4 bytes. + static final int MAGIC_NUMBER = 0x00000862; + + // PROTOCOL_VERSION: Incremented whenever the packet format changes. + // Bumped from 1 to 2 when we added the has_pose flag byte. + // If someone deploys new Mac code but forgets to deploy new RIO code + // (or vice versa), the version mismatch will cause packets to be + // rejected instead of silently misinterpreting the data. + static final byte PROTOCOL_VERSION = 2; + + // PACKET_SIZE: Total bytes in one packet. + // 4 (magic) + 1 (version) + 4 (sequence) + 1 (has_pose) + 5×8 (doubles) = 50 + static final int PACKET_SIZE = 50; + + // How often (in ms) the sender thread fires. 10ms = 100Hz. + // The RIO runs periodic() at 50Hz, so at 100Hz we guarantee at + // least 1-2 fresh packets per RIO cycle, while the heartbeat + // lets the RIO know we're alive even when no tags are visible. + private static final long SEND_INTERVAL_MS = 10; + + // Which camera constants to use — picked at startup based on which + // robot this is (main robot vs Oasis have different camera positions). CameraConstant[] cameraConstants; // The network tables instance that will connect to the local photonvision server @@ -41,13 +75,58 @@ private record VisionInfo(PhotonPipelineResult result, EstimatedRobotPose pose) // Socket to send data DatagramSocket socket; - public MacMini() { - try { - // Create a new socket to send data to the rio - socket = new DatagramSocket(); - } catch (SocketException e) { - log("*** ERROR CREATING SOCKET ***"); - } + // The resolved IP address of the RoboRIO. We resolve this ONCE in + // the constructor instead of calling InetAddress.getByName("10.8.62.2") + // on every packet send. + InetAddress rioAddress; + + // Sequence counter — incremented for every packet we send (pose or heartbeat). + // The receiver can use this to detect if packets were dropped. + private int sequenceNumber = 0; + + // ===== PRODUCER/SENDER ARCHITECTURE ===== + // We split the work into two threads: + // + // PRODUCER THREAD: runs continuously, polls cameras, computes poses. + // Stores the best pose it's seen since the last send in bestPose. + // + // SENDER THREAD: fires every SEND_INTERVAL_MS (10ms). Grabs the + // best pose accumulated since the last send. If there is one, + // sends a pose packet. If not, sends a heartbeat packet. + // + // WHY: Previously a single loop did both — compute a pose, send it, + // sleep 15ms. That meant: + // 1. We only sent when we had a pose (no heartbeat when tags weren't visible) + // 2. Send timing was coupled to vision processing time + // 3. We only considered one reading per send, not the best of several + // + // Now the producer can run as fast as it wants, accumulating the + // lowest-ambiguity pose over each 10ms window. The sender always + // fires on schedule regardless of whether vision found anything. + + // Thread-safe storage for the best pose found since the last send. + // The producer writes to this; the sender reads and clears it. + private final AtomicReference bestPose = new AtomicReference<>(null); + + // The scheduled executor that runs the sender thread on a fixed 10ms cadence. + private ScheduledExecutorService sendExecutor; + + // The producer thread that continuously polls cameras for poses. + private Thread producerThread; + + // Flag to signal that a fatal error occurred and run() should exit. + // This lets Main.java's retry loop catch the error and restart us. + private volatile boolean fatalError = false; + private volatile String fatalErrorMessage = ""; + + public MacMini() throws IOException { + // Create a new socket to send data to the rio. + // If this fails, the exception propagates up and stops the program + // immediately — there's no point continuing without a socket. + socket = new DatagramSocket(); + + // Resolve the RoboRIO's IP address once, up front. + rioAddress = InetAddress.getByName("10.8.62.2"); // Connect to our local network tables server photonNT.setServer("localhost", 5810); @@ -57,7 +136,8 @@ public MacMini() { // Create an empty array of cameras cameras = new CameraInfo[cameraConstants.length]; - + + // Create the cameras for (int i = 0; i < cameraConstants.length; i++) { log("Creating " + cameraConstants[i].name() + " info"); @@ -79,8 +159,9 @@ public MacMini() { // Get the pose estimator PhotonPoseEstimator poseEstimator = - new PhotonPoseEstimator(fieldLayout,cameraConstants[i].offset()); - + new PhotonPoseEstimator(fieldLayout, cameraConstants[i].offset()); + + // Create the camera using the name from our constant PhotonCamera camera = new PhotonCamera(photonNT, cameraConstants[i].name()); @@ -89,30 +170,147 @@ public MacMini() { } } + /** + * Starts the producer and sender threads and blocks until a fatal error occurs. + * When this method returns, the caller (Main.java) should close() and retry. + */ public void run() { - while (true) { - VisionInfo info = getEstimatedPose(); + fatalError = false; + + // --- Start the PRODUCER thread --- + // This thread continuously polls cameras and accumulates the best + // pose seen since the last send. It uses AtomicReference.accumulateAndGet() + // to thread-safely keep whichever pose has lower ambiguity. + producerThread = new Thread(() -> { + while (!Thread.currentThread().isInterrupted() && !fatalError) { + try { + VisionInfo info = getEstimatedPose(); + + if (info != null && info.pose != null && info.result != null) { + // accumulateAndGet is an atomic read-modify-write operation. + // It compares the new pose against whatever is currently stored, + // and keeps whichever has lower ambiguity (= higher quality). + // This means the sender always gets the BEST pose from the + // entire 10ms window, not just whichever happened to be last. + bestPose.accumulateAndGet(info, (current, candidate) -> { + if (current == null) { + return candidate; + } + double currentAmb = current.result().getBestTarget().poseAmbiguity; + double candidateAmb = candidate.result().getBestTarget().poseAmbiguity; + return candidateAmb < currentAmb ? candidate : current; + }); + } + + // Small sleep to avoid busy-waiting. 5ms means we get ~2 + // readings per 10ms send window on average. + Thread.sleep(5); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } catch (Exception e) { + // Log but don't crash — the producer should keep trying. + // Camera disconnects, NetworkTables hiccups, etc. are + // transient and will resolve on their own. + log("Producer error: " + e.getMessage()); + } + } + }); + producerThread.setDaemon(true); + producerThread.start(); + + // --- Start the SENDER thread --- + // This runs on a fixed 10ms schedule using ScheduledExecutorService, + // which is more accurate than Thread.sleep() for periodic tasks. + // It always sends a packet — either a pose or a heartbeat. + sendExecutor = Executors.newSingleThreadScheduledExecutor(); + sendExecutor.scheduleAtFixedRate(() -> { + try { + // Grab the best pose accumulated since the last send, + // and clear it so the producer starts fresh for the next window. + VisionInfo info = bestPose.getAndSet(null); + + DatagramPacket packet; + if (info != null && info.pose != null && info.result != null) { + // We have a valid pose — send it with has_pose = 1 + Pose2d poseToPublish = info.pose().estimatedPose.toPose2d(); + double ambiguity = info.result().getBestTarget().poseAmbiguity; + double timestamp = info.result().getTimestampSeconds(); + packet = buildPacket(true, poseToPublish, ambiguity, timestamp); + } else { + // No pose available — send a heartbeat with has_pose = 0. + // The RIO will see this and know "comms are good, just + // no tags visible right now" instead of "is the Mac dead?" + packet = buildPacket(false, null, 0, 0); + } - if (info.pose != null && info.result != null) { - Pose2d poseToPublish = info.pose().estimatedPose.toPose2d(); - double ambiguity = info.result().getBestTarget().poseAmbiguity; - double timestamp = info.result().getTimestampSeconds(); + socket.send(packet); - try { - DatagramPacket packet = getBinaryPacket(poseToPublish, ambiguity, timestamp); - socket.send(packet); - - } catch (IOException e) { - log("Failed to send packet: " + e); + } catch (IOException e) { + // If the socket itself is broken (closed, etc.), signal + // a fatal error so run() exits and Main.java can restart us. + if (socket.isClosed()) { + fatalError = true; + fatalErrorMessage = "Socket closed: " + e.getMessage(); + } else { + // Transient send failure — log and try again next cycle + log("Send failed: " + e.getMessage()); } } - + }, 0, SEND_INTERVAL_MS, TimeUnit.MILLISECONDS); + + // Block until a fatal error occurs. The producer and sender threads + // do the real work; we just wait here so Main.java's retry loop + // knows when to restart us. + while (!fatalError) { try { - Thread.sleep(1); + Thread.sleep(100); } catch (InterruptedException e) { - log("Error sleeping: " + e.getMessage()); + break; } } + + log("Fatal error, exiting run(): " + fatalErrorMessage); + } + + /** + * Builds a UDP packet with the current protocol format. + * + * @param hasPose true if this packet carries a valid pose, false for heartbeat-only + * @param pose the robot pose (ignored if hasPose is false) + * @param ambiguity the pose ambiguity (ignored if hasPose is false) + * @param timestamp the vision timestamp in seconds (ignored if hasPose is false) + * @return a DatagramPacket ready to send + */ + private DatagramPacket buildPacket(boolean hasPose, Pose2d pose, double ambiguity, double timestamp) { + ByteBuffer buffer = ByteBuffer.allocate(PACKET_SIZE); + buffer.order(ByteOrder.BIG_ENDIAN); + + // --- Header (10 bytes) --- + buffer.putInt(MAGIC_NUMBER); // 4 bytes: identifies this as our packet + buffer.put(PROTOCOL_VERSION); // 1 byte: catches version mismatches + buffer.putInt(sequenceNumber++); // 4 bytes: lets receiver detect dropped packets + + // has_pose flag: 1 = this packet contains a valid pose in the + // payload. 0 = heartbeat only, payload is zeroed out. + // This is how the RIO tells the difference between "the Mac is + // alive but can't see any AprilTags" vs "the Mac is dead." + buffer.put((byte) (hasPose ? 1 : 0)); // 1 byte + + // --- Payload (40 bytes) --- + if (hasPose) { + buffer.putDouble(pose.getX()); + buffer.putDouble(pose.getY()); + buffer.putDouble(pose.getRotation().getRadians()); + buffer.putDouble(ambiguity); + buffer.putDouble(timestamp); + } else { + // Heartbeat: fill payload with zeros. The receiver will + // ignore these bytes when has_pose == 0. + buffer.put(new byte[40]); + } + + byte[] data = buffer.array(); + return new DatagramPacket(data, data.length, rioAddress, 12345); } private VisionInfo getEstimatedPose() { @@ -123,7 +321,7 @@ private VisionInfo getEstimatedPose() { try { VisionInfo[] poses = new VisionInfo[cameraConstants.length]; - + for (int i = 0; i < cameraConstants.length; i++) { poses[i] = getVisionPose(cameras[i]); } @@ -158,7 +356,7 @@ private VisionInfo getBestPose(VisionInfo[] visionInfos) { if (info == null) { continue; } - + if (bestPose == null) { bestPose = info; continue; @@ -180,7 +378,7 @@ private VisionInfo getVisionPose(CameraInfo cameraInfo) { if (results.isEmpty()) { return null; } - + // Get the latest result of all thme PhotonPipelineResult latestResult = getLatestResult(results); @@ -204,10 +402,20 @@ private VisionInfo getVisionPose(CameraInfo cameraInfo) { ); } - // If pose ambiguity is to high well scrap the result - boolean highPoseAmbiguity = latestResult.getBestTarget().getPoseAmbiguity() > VisionConstants.POSE_AMBIGUITY_TOLERANCE; - - // If the best tag's distance is too far than scrap the result + // Check pose ambiguity and distance on the FILTERED result, not + // the raw one. Why? If we check latestResult (before filtering), + // the "best target" might be a tag we're ignoring. That ignored + // tag could have high ambiguity, causing us to reject the entire + // result — even though the REMAINING tags (after filtering) have + // perfectly good ambiguity. + // + // Example: Tag 7 (ignored) has ambiguity 0.8, Tag 3 has ambiguity 0.1. + // Old code: checks Tag 7's 0.8 > 1.0? No, but if it were 1.1, we'd + // reject the result even though Tag 3 is excellent. + // New code: checks useableResult's best target (Tag 3, ambiguity 0.1). + boolean highPoseAmbiguity = useableResult.getBestTarget().getPoseAmbiguity() > VisionConstants.POSE_AMBIGUITY_TOLERANCE; + + // Check distance on the filtered result too, for the same reason. double bestDistance = useableResult.getBestTarget().getBestCameraToTarget().getTranslation().getNorm(); boolean highDistance = bestDistance > VisionConstants.TAG_DISTANCE_TOLERANCE; @@ -215,7 +423,7 @@ private VisionInfo getVisionPose(CameraInfo cameraInfo) { if (highPoseAmbiguity || highDistance) { return null; } - + // Get the estimated position Optional poseOpt = cameraInfo.poseEstimator().estimateCoprocMultiTagPose(useableResult); @@ -223,13 +431,13 @@ private VisionInfo getVisionPose(CameraInfo cameraInfo) { if (poseOpt.isPresent()) { // The pose EstimatedRobotPose pose = poseOpt.get(); - + // Add the vision measurment return new VisionInfo(useableResult, pose); } else { // Get the estimated position poseOpt = cameraInfo.poseEstimator().estimateLowestAmbiguityPose(useableResult); - + if (poseOpt.isPresent()) { // The pose EstimatedRobotPose pose = poseOpt.get(); @@ -245,27 +453,25 @@ private VisionInfo getVisionPose(CameraInfo cameraInfo) { } private void log(String message) { - System.out.println("[PHOTON VISION]" + message); - } - - private DatagramPacket getBinaryPacket(Pose2d pose, double ambiguity, double timestamp) throws IllegalArgumentException, UnknownHostException { - ByteBuffer buffer = ByteBuffer.allocate(40); - - // Add our data to the buffer - buffer.putDouble(pose.getX()); - buffer.putDouble(pose.getY()); - buffer.putDouble(pose.getRotation().getRadians()); - buffer.putDouble(ambiguity); - buffer.putDouble(timestamp); - - byte[] data = buffer.array(); - return new DatagramPacket(data, data.length, InetAddress.getByName("10.8.62.2"), 12345); + System.out.println("[PHOTON VISION] " + message); } @Override public void close() throws Exception { + // Stop the sender thread first (it uses the socket) + if (sendExecutor != null) { + sendExecutor.shutdownNow(); + } + + // Stop the producer thread + if (producerThread != null) { + producerThread.interrupt(); + } + // Close our connections when the program closes - socket.close(); + if (socket != null && !socket.isClosed()) { + socket.close(); + } photonNT.close(); } - } \ No newline at end of file + } diff --git a/src/main/java/frc/robot/mac/Main.java b/src/main/java/frc/robot/mac/Main.java index 66f6ff53..d0738496 100644 --- a/src/main/java/frc/robot/mac/Main.java +++ b/src/main/java/frc/robot/mac/Main.java @@ -1,10 +1,58 @@ package frc.robot.mac; public class Main { + // How long to wait before retrying after an error (milliseconds). + // 2 seconds is long enough to avoid hammering a broken resource + // (e.g., a port in use) but short enough that the vision processor + // recovers quickly when the problem resolves. + private static final long RETRY_DELAY_MS = 2000; + public static void main(String[] args) { - @SuppressWarnings("resource") - MacMini mac = new MacMini(); + // ===== RETRY LOOP ===== + // This is the core of our error recovery strategy on the Mac side. + // + // WHY: Many things can go wrong during a match: + // - The network cable gets bumped → socket dies + // - PhotonVision crashes and restarts → NetworkTables disconnects + // - The RIO reboots → temporary network loss + // - An unexpected exception we didn't anticipate + // + // OLD BEHAVIOR: Any error was fatal — the vision processor died + // and stayed dead for the rest of the match. Someone had to SSH + // in and manually restart it. + // + // NEW BEHAVIOR: If anything goes wrong, we: + // 1. Clean up the old MacMini (close socket, NT connection, etc.) + // 2. Wait 2 seconds (so we don't spin if the error keeps happening) + // 3. Create a fresh MacMini and try again + // + // This means the vision processor is self-healing. Even if it + // crashes 5 times during a match, it'll keep coming back. + while (true) { + try (MacMini mac = new MacMini()) { + System.out.println("[PHOTON VISION] Started successfully, entering run loop"); + mac.run(); + // run() only returns when a fatal error occurs (e.g., socket died). + // Fall through to the retry logic below. + System.out.println("[PHOTON VISION] run() exited, will retry..."); + + } catch (Exception e) { + // This catches both IOException (socket/network problems) and + // any unexpected RuntimeException. We log the error and retry. + System.err.println("[PHOTON VISION] Error: " + e.getMessage()); + e.printStackTrace(); + } - mac.run(); + // Wait before retrying. This prevents us from spinning in a + // tight loop if the error happens immediately on startup + // (e.g., port already in use, network interface down). + System.out.println("[PHOTON VISION] Retrying in " + RETRY_DELAY_MS + "ms..."); + try { + Thread.sleep(RETRY_DELAY_MS); + } catch (InterruptedException e) { + // If someone interrupts our sleep, just retry immediately + Thread.currentThread().interrupt(); + } + } } } diff --git a/src/main/java/frc/robot/subsystems/PhotonVision.java b/src/main/java/frc/robot/subsystems/PhotonVision.java index bc9c7536..3065d12f 100644 --- a/src/main/java/frc/robot/subsystems/PhotonVision.java +++ b/src/main/java/frc/robot/subsystems/PhotonVision.java @@ -9,7 +9,9 @@ import java.net.DatagramSocket; import java.net.InetAddress; import java.net.SocketException; +import java.net.SocketTimeoutException; import java.nio.ByteBuffer; +import java.nio.ByteOrder; import java.util.concurrent.atomic.AtomicReference; import org.photonvision.PhotonCamera; @@ -32,8 +34,19 @@ import static edu.wpi.first.units.Units.Seconds; public class PhotonVision extends SubsystemBase implements AutoCloseable { - // Records to store specific groups of data - private record VisionInfo(double timestamp, double ambiguity, Pose2d pose) {}; + // Records to store specific groups of data. + // receiveTimeRio: the RIO's clock time at the moment the UDP packet arrived. + // We capture this in the receive thread (not in periodic()) so that the + // scheduling delay between packet arrival and periodic() execution doesn't + // get baked into our time offset calculation. See fix #4 in NETWORK_FINDINGS.md. + private record VisionInfo(double timestamp, double ambiguity, Pose2d pose, double receiveTimeRio) {}; + + // ===== PACKET PROTOCOL CONSTANTS ===== + // These MUST match the values in MacMini.java. If you change the packet + // format, update both files and bump PROTOCOL_VERSION. + private static final int MAGIC_NUMBER = 0x00000862; + private static final byte PROTOCOL_VERSION = 2; // v2: added has_pose flag byte + private static final int PACKET_SIZE = 50; // v2: 49 + 1 byte for has_pose // The drivetrain to add vision measurments Swerve drivetrain; @@ -41,9 +54,31 @@ private record VisionInfo(double timestamp, double ambiguity, Pose2d pose) {}; // An atomic refrence to store our newley recieved pose, thread safe AtomicReference pose; - // Stores the time offset for the difference in time between mac and rio + // Stores the time offset for the difference in time between mac and rio. + // + // WHY THIS EXISTS: + // The Mac Mini and the RoboRIO have different system clocks. When the Mac + // sends a vision timestamp, we need to convert it to the RIO's time domain + // so the Kalman filter can properly account for how much the robot moved + // between when the image was captured and when we process it. + // + // macTimeOffset = (RIO time) - (Mac time) for the same real-world moment. + // To convert any Mac timestamp to RIO time: rioTime = macTime + macTimeOffset + // + // OLD BEHAVIOR: computed once on the first packet. Whatever UDP latency + // that packet had was permanently baked in. + // + // NEW BEHAVIOR: uses an Exponential Moving Average (EMA) to continuously + // refine the offset. This averages out random UDP jitter while still + // tracking any slow clock drift between the two computers. double macTimeOffset = 0; + // EMA smoothing factor for macTimeOffset. + // Lower (e.g., 0.01) = more stable, slower to adapt to clock drift + // Higher (e.g., 0.10) = noisier, faster to adapt + // 0.02 means each new sample contributes 2% and the history contributes 98%. + private static final double TIME_OFFSET_ALPHA = 0.02; + // A datagram socket which we connect to to recieve packets from the mac DatagramSocket socket; @@ -56,6 +91,33 @@ private record VisionInfo(double timestamp, double ambiguity, Pose2d pose) {}; //Shows whether the mac mini is connected or not volatile boolean macMiniIsConnected; + // Tracks whether the Mac Mini's VISION APPLICATION is actively sending + // packets (pose or heartbeat). This is more useful than macMiniIsConnected + // (which only checks if the OS responds to ICMP pings). + // + // Three states the driver cares about: + // 1. commsAlive=false → Mac is dead / network down + // 2. commsAlive=true, visionStale=true → Mac is alive, but no tags visible + // 3. commsAlive=true, visionStale=false→ Everything working, getting poses + // + // The LED system uses these to show different colors while disabled. + private volatile boolean commsAlive = false; + + // Tracks the RIO time when we last received a valid vision packet. + // This lets periodic() detect "staleness" — when the vision processor + // is running but we haven't gotten data in a while. This is MORE useful + // than the ICMP ping (reachableThread), which only tells you the Mac + // Mini's OS is up, NOT that the vision application is actually running + // and sending packets. + private volatile double lastReceiveTimeRio = 0; + + // How long (in seconds) we can go without a vision packet before we + // consider the data "stale" and warn the driver. 0.5s = 25 missed + // cycles at 50Hz, which is long enough to avoid false alarms from + // normal camera gaps but short enough to alert quickly if something + // is actually broken. + private static final double VISION_STALE_THRESHOLD_SECS = 0.5; + private BooleanLogEntry macConnectedLog; private DoubleLogEntry macPingLog; @@ -64,6 +126,20 @@ private record VisionInfo(double timestamp, double ambiguity, Pose2d pose) {}; private static final String MAC_MINI_IP = "10.8.62.11"; private static final int VISION_PORT = 12345; + // How long socket.receive() will block before giving up (milliseconds). + // Without a timeout, if the Mac Mini stops sending (crash, unplug, etc.), + // the receive thread blocks FOREVER with no exception and no log. + // With a timeout, we get a SocketTimeoutException every 500ms, which + // lets us log a warning and loop back to try again. + private static final int SOCKET_TIMEOUT_MS = 500; + + // Minimum standard deviation (meters) for vision measurements. + // Prevents the Kalman filter from placing infinite confidence in a + // single vision reading when ambiguity is near zero. A value of 0.3 + // means "even a perfect multi-tag solve can be off by ~30 cm." + // Tune this: lower = trust vision more, higher = trust odometry more. + private static final double MIN_VISION_STD_DEV = 0.3; + /** Creates a new PhotonVision. * * @param drivetrain The main drivetrain on the robot @@ -77,38 +153,94 @@ public PhotonVision(Swerve drivetrain) { pose = new AtomicReference<>(null); macMiniPing = new AtomicReference<>(Seconds.of(0)); - try { - // Bind to the port - socket = new DatagramSocket(VISION_PORT); - } catch (SocketException e) { - log("*** ERROR CREATING DATAGRAM SOCKET ***" + e); - } - - // Start a separate thread to receive packets + // Create the initial socket (may be null if it fails — the receive + // thread will retry). See createSocket() for details. + socket = createSocket(); + + // Start a separate thread to receive packets. + // + // ERROR RECOVERY: The receive thread has an outer retry loop. + // If the socket dies (IOException that isn't a timeout), it: + // 1. Closes the dead socket + // 2. Waits 250ms + // 3. Creates a fresh socket and resumes receiving + // + // This means the RIO self-heals from network glitches, cable + // unplugs/replugs, or any transient socket error — without + // needing a full robot code restart. receiveThread = new Thread(() -> { - // Run while the thread is still valid while (!Thread.currentThread().isInterrupted()) { try { - // Create a new packet to fill with recieved data - byte[] receiveData = new byte[40]; + // Make sure we have a working socket before trying to receive + if (socket == null || socket.isClosed()) { + log("Socket is null or closed, attempting to recreate..."); + socket = createSocket(); + if (socket == null) { + // Socket creation failed — wait and try again. + // Don't spin in a tight loop. + Thread.sleep(250); + continue; + } + log("Socket recreated successfully"); + } + + // Create a buffer large enough for our packet (50 bytes). + byte[] receiveData = new byte[PACKET_SIZE]; var receivePacket = new DatagramPacket(receiveData, receiveData.length); - - if (socket != null) { - socket.receive(receivePacket); - } else { - break; + + socket.receive(receivePacket); + + // Capture the RIO's clock time RIGHT NOW, at the moment + // the packet arrived. This is important for the time offset + // calculation in periodic(). If we waited until periodic() + // to read the clock, up to 20ms of scheduling delay would + // be incorrectly included in the offset. + double receiveTimeRio = Utils.getCurrentTimeSeconds(); + + // Parse the packet. Returns null for heartbeat-only packets + // (has_pose == 0), or a VisionInfo for pose packets. + VisionInfo data = parseBinaryPacket(receivePacket, receiveTimeRio); + + // ANY valid packet (heartbeat or pose) means the Mac Mini's + // vision application is alive and communicating. Update the + // comms-alive timestamp so periodic() can track staleness. + lastReceiveTimeRio = receiveTimeRio; + commsAlive = true; + + // Only store actual pose data, not heartbeats. + // Heartbeats return null from parseBinaryPacket. + if (data != null) { + pose.set(data); } - - // Fresh vision data :) - VisionInfo data = parseBinaryPacket(receivePacket); - // Store data atomically - pose.set(data); - + } catch (SocketTimeoutException e) { + // This is EXPECTED and not an error. It means no packet + // arrived within SOCKET_TIMEOUT_MS. We just loop back and + // try again. The staleness detection in periodic() will + // handle alerting if we go too long without data. } catch (IllegalArgumentException e) { - log("Thread Error: " + e.getMessage()); + // Packet arrived but failed validation (bad magic, wrong + // version, too small). Log it so we can debug, but don't + // crash — just wait for the next valid packet. + log("Bad packet rejected: " + e.getMessage()); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); } catch (IOException e) { - log("Error recieving packet"); + // The socket is broken (not just a timeout). This can + // happen if the network interface goes down, the socket + // gets closed externally, etc. + // + // Recovery: close the dead socket and let the top of the + // loop recreate it. The 250ms sleep prevents spinning. + log("Socket error, will recreate: " + e.getMessage()); + try { + if (socket != null && !socket.isClosed()) { + socket.close(); + } + } catch (Exception closeEx) { + // Ignore close errors — we're about to create a new one anyway + } + socket = null; } } }); @@ -142,6 +274,29 @@ public PhotonVision(Swerve drivetrain) { initLogging(); } + /** + * Creates and configures a DatagramSocket bound to VISION_PORT on all + * interfaces (0.0.0.0). Returns null if creation fails. + * + * This is a separate method so the receive thread can recreate the socket + * after a fatal error without needing to restart the entire subsystem. + * The socket binds to the wildcard address so we receive packets + * regardless of which network interface they arrive on. + * + * @return the new socket, or null if creation failed + */ + private DatagramSocket createSocket() { + try { + DatagramSocket newSocket = new DatagramSocket(VISION_PORT); + newSocket.setSoTimeout(SOCKET_TIMEOUT_MS); + log("Socket bound to " + newSocket.getLocalAddress() + ":" + newSocket.getLocalPort()); + return newSocket; + } catch (SocketException e) { + log("Failed to create socket: " + e.getMessage()); + return null; + } + } + private void initLogging() { DataLog log = DataLogManager.getLog(); @@ -151,28 +306,104 @@ private void initLogging() { @Override public void periodic() { - LightningShuffleboard.setDouble("Vision", "robot_time", Utils.getCurrentTimeSeconds()); + double now = Utils.getCurrentTimeSeconds(); + + LightningShuffleboard.setDouble("Vision", "robot_time", now); LightningShuffleboard.setBool("Vision", "is Mac Connected", macMiniIsConnected); LightningShuffleboard.setDouble("Vision", "Mac Mini Ping", macMiniPing.get().in(Milliseconds)); + // Staleness detection: check if we've received ANY packet recently. + // With heartbeats (protocol v2), the Mac sends a packet every 10ms + // even when no tags are visible. So if we haven't received anything + // in 0.5s, the Mac application is truly dead (not just "no tags"). + // + // This is MORE informative than the ICMP ping because: + // - ICMP only tells you the Mac Mini's OS is responding to pings + // - This tells you the VISION APPLICATION is running and sending data + // + // When comms go stale, we mark commsAlive = false so the LED system + // and other code can distinguish all three states: + // commsAlive=false → Mac dead or network down (solid RED) + // commsAlive=true, no pose → Mac alive, no tags visible (blinking YELLOW) + // commsAlive=true, has pose → Everything working normally + if (lastReceiveTimeRio > 0 + && (now - lastReceiveTimeRio) > VISION_STALE_THRESHOLD_SECS) { + commsAlive = false; + } + LightningShuffleboard.setBool("Vision", "vision comms alive", commsAlive); + LightningShuffleboard.setBool("Vision", "vision data stale", !commsAlive); + VisionInfo updatedPose = pose.getAndSet(null); - if (updatedPose != null && updatedPose.pose != null && updatedPose.ambiguity < 1 && updatedPose.timestamp > 0) { - // If the time offset has not been set yet, calculate it + // Use <= to match the Mac side, which uses "> POSE_AMBIGUITY_TOLERANCE" + // (where tolerance = 1.0). Previously, the Mac accepted ambiguity == 1.0 + // but the RIO rejected it with "< 1", creating an off-by-one gap where + // a valid pose was computed, sent, received, and then thrown away. + if (updatedPose != null && updatedPose.pose != null && updatedPose.ambiguity <= 1 && updatedPose.timestamp > 0) { + // Compute the instantaneous time offset for THIS packet. + // instantOffset = how far apart the two clocks were when the + // packet arrived, including any UDP latency on this specific packet. + // + // We use receiveTimeRio (captured in the receive thread the + // instant the packet arrived) instead of "now" (Utils.getCurrentTimeSeconds()). + // Why? periodic() runs at 50Hz, so there can be up to 20ms between + // when the packet arrived and when this line runs. That 20ms would + // be incorrectly added to every offset measurement, making all + // vision timestamps systematically late. + double instantOffset = updatedPose.receiveTimeRio - updatedPose.timestamp; + if (macTimeOffset == 0) { - macTimeOffset = Utils.getCurrentTimeSeconds() - updatedPose.timestamp; + // First packet: use it directly as our initial estimate. + macTimeOffset = instantOffset; + } else { + // Subsequent packets: blend the new measurement into our + // running estimate using an Exponential Moving Average. + // + // EMA formula: new_avg = old_avg * (1 - alpha) + new_sample * alpha + // + // This smooths out random UDP jitter (some packets arrive + // fast, some slow) while still adapting if the two clocks + // slowly drift apart over the course of a match. + macTimeOffset = macTimeOffset * (1.0 - TIME_OFFSET_ALPHA) + + instantOffset * TIME_OFFSET_ALPHA; } - // Calculate the standard deviation to use-- small multiplier so not too low - double trust = updatedPose.ambiguity() * 1.2; + // The "trust" value is the standard deviation (in meters or radians) + // we tell the Kalman filter. A SMALLER number = MORE confidence in vision. + // + // Problem: when ambiguity is near 0 (e.g. a clean multi-tag solve), + // trust was also near 0, which tells the Kalman filter "this + // measurement is perfect — ignore odometry entirely." Even tiny + // vision errors then cause the pose estimate to jump, which makes + // the turret chase phantom movements. + // + // The fix: clamp trust to a minimum of MIN_VISION_STD_DEV so we + // always blend vision with odometry rather than blindly trusting it. + double xyTrust = Math.max(updatedPose.ambiguity() * 1.2, MIN_VISION_STD_DEV); + + // WHY rotation trust is tighter (smaller = more confident): + // AprilTag detection constrains rotation very well — the tag's + // corners give strong angular information even at distance. + // Translation, on the other hand, depends on accurately knowing + // the tag's range, which gets noisier with distance. + // + // By trusting the rotation more, the Kalman filter lets vision + // correct our heading quickly (important for turret aiming) while + // being more conservative about x/y jumps (which cause the turret + // to chase phantom lateral movements). + // + // The 0.5 multiplier means rotation trust is twice as tight as + // translation trust. Tune this if the heading seems jumpy. + double rotTrust = xyTrust * 0.5; LightningShuffleboard.setPose2d("Vision", "updated pose", updatedPose.pose); LightningShuffleboard.setDouble("Vision", "mac time offset", macTimeOffset); - - // Adds our estimated pose from vision to our drivetrain's pose, fuses with odometry + + // Adds our estimated pose from vision to our drivetrain's pose, fuses with odometry. + // The std dev vector is [x, y, theta] — x and y in meters, theta in radians. drivetrain.addVisionMeasurement( - updatedPose.pose(), - updatedPose.timestamp + macTimeOffset, - VecBuilder.fill(trust, trust, trust) + updatedPose.pose(), + updatedPose.timestamp + macTimeOffset, + VecBuilder.fill(xyTrust, xyTrust, rotTrust) ); } updateLogging(); @@ -184,39 +415,109 @@ private void updateLogging() { } /** - * Unpacks a datagram packet into the Unpacked Data object - * + * Unpacks a datagram packet into a VisionInfo object. + * + * Validates the magic number and protocol version before parsing. + * This prevents random network traffic from being misinterpreted as + * vision data and corrupting the Kalman filter. + * * @param packet The packet to unpack - * @return Unpacked Data, Pose, ambiguity, timestamp, and a counter + * @param receiveTimeRio the RIO clock time when the packet was received + * @return VisionInfo with pose, ambiguity, and timestamp, or null for heartbeat packets + * @throws IllegalArgumentException if the packet is malformed or from an unknown source */ - private VisionInfo parseBinaryPacket(DatagramPacket packet) { + private VisionInfo parseBinaryPacket(DatagramPacket packet, double receiveTimeRio) { byte[] data = packet.getData(); - - // Safety check for length - if (packet.getLength() < 40) { - throw new IllegalArgumentException("Packet too small"); + + // Safety check: make sure we got a full packet. If someone sends + // a tiny packet (e.g., a network probe), we reject it here instead + // of getting a confusing BufferUnderflowException later. + if (packet.getLength() < PACKET_SIZE) { + throw new IllegalArgumentException( + "Packet too small: expected " + PACKET_SIZE + " bytes, got " + packet.getLength()); + } + + ByteBuffer buffer = ByteBuffer.wrap(data, 0, PACKET_SIZE); + + // Explicitly set byte order to match the sender (MacMini.java). + // Both sides must agree, otherwise multi-byte values (ints, doubles) + // will be read with their bytes reversed, producing garbage numbers. + buffer.order(ByteOrder.BIG_ENDIAN); + + // --- Validate header --- + // Read the magic number and check it matches. This is our first + // line of defense against stray packets. Any UDP traffic that + // doesn't start with 0x00000862 is immediately rejected. + int magic = buffer.getInt(); + if (magic != MAGIC_NUMBER) { + throw new IllegalArgumentException( + "Bad magic number: expected 0x" + Integer.toHexString(MAGIC_NUMBER) + + ", got 0x" + Integer.toHexString(magic) + + ". This packet is not from our vision processor."); + } + + // Check protocol version. If someone deploys new Mac code but old + // RIO code (or vice versa), this catches it immediately instead of + // silently misinterpreting the packet fields. + byte version = buffer.get(); + if (version != PROTOCOL_VERSION) { + throw new IllegalArgumentException( + "Protocol version mismatch: expected " + PROTOCOL_VERSION + + ", got " + version + + ". Make sure Mac and RIO code are both up to date."); } - // Wrap the data in a ByteBuffer - ByteBuffer buffer = ByteBuffer.wrap(data, 0, 40); + // Read the sequence number. We log it but don't reject out-of-order + // packets — with UDP, occasional reordering is normal and the data + // is still valid. The sequence number is useful for debugging: if + // you see big jumps, packets are being dropped on the network. + int seq = buffer.getInt(); + + // Read the has_pose flag (added in protocol v2). + // 1 = this packet contains a valid pose in the payload. + // 0 = heartbeat only — the Mac is alive but has no tags visible. + // + // For heartbeats, we return null. The receive thread still updates + // lastReceiveTimeRio and commsAlive, but does NOT store a pose. + byte hasPose = buffer.get(); + if (hasPose == 0) { + // Heartbeat packet — no pose data to parse. + // Returning null tells the receive thread "comms are good, + // but there's no pose to feed into the Kalman filter." + return null; + } - // Read doubles in the same order they were packed + // --- Parse payload (same order as MacMini.buildPacket) --- double x = buffer.getDouble(); double y = buffer.getDouble(); double rotRadians = buffer.getDouble(); double ambiguity = buffer.getDouble(); double timestamp = buffer.getDouble(); - // Construct a new pose using the unpacked data Pose2d newPose = new Pose2d(x, y, new Rotation2d(rotRadians)); - - return new VisionInfo(timestamp, ambiguity, newPose); + + return new VisionInfo(timestamp, ambiguity, newPose, receiveTimeRio); } public boolean getMacMiniConnection() { return macMiniIsConnected; } + /** + * Returns true if the Mac Mini's vision application is actively + * sending packets (either poses or heartbeats). This is more useful + * than getMacMiniConnection() which only checks ICMP ping. + * + * Used by the LED system to distinguish: + * - commsAlive=true → Mac app is running (may or may not have tags) + * - commsAlive=false → Mac app is dead or network is down + * + * @return true if vision comms are active + */ + public boolean isCommsAlive() { + return commsAlive; + } + // Logs a message with the [PHOTON VISION] tag in front private void log(String message) { DataLogManager.log("[PHOTON VISION] " + message); @@ -224,11 +525,17 @@ private void log(String message) { @Override public void close() throws Exception { - if (socket != null) { - socket.close(); + // Interrupt threads first so they stop using the socket + if (receiveThread != null) { + receiveThread.interrupt(); + } + if (reachableThread != null) { + reachableThread.interrupt(); } - reachableThread.interrupt(); - receiveThread.interrupt(); + // Then close the socket + if (socket != null && !socket.isClosed()) { + socket.close(); + } } } diff --git a/src/main/java/frc/robot/subsystems/Swerve.java b/src/main/java/frc/robot/subsystems/Swerve.java index bba39c07..c7993d55 100644 --- a/src/main/java/frc/robot/subsystems/Swerve.java +++ b/src/main/java/frc/robot/subsystems/Swerve.java @@ -310,27 +310,30 @@ public void addVisionMeasurement(Pose2d visionRobotPoseMeters, double timestampS super.addVisionMeasurement(visionRobotPoseMeters, Utils.fpgaToCurrentTime(timestampSeconds)); } - // /** - // * Adds a vision measurement to the Kalman Filter. This will correct the odometry pose estimate - // * while still accounting for measurement noise. - // *

- // * Note that the vision measurement standard deviations passed into this method - // * will continue to apply to future measurements until a subsequent call to - // * {@link #setVisionMeasurementStdDevs(Matrix)} or this method. - // * - // * @param visionRobotPoseMeters The pose of the robot as measured by the vision camera. - // * @param timestampSeconds The timestamp of the vision measurement in seconds. - // * @param visionMeasurementStdDevs Standard deviations of the vision pose measurement - // * in the form [x, y, theta]ᵀ, with units in meters and radians. - // */ - // @Override - // public void addVisionMeasurement( - // Pose2d visionRobotPoseMeters, - // double timestampSeconds, - // Matrix visionMeasurementStdDevs - // ) { - // super.addVisionMeasurement(visionRobotPoseMeters, Utils.fpgaToCurrentTime(timestampSeconds), visionMeasurementStdDevs); - // } + // ===================================================================== + // IMPORTANT: The 3-argument addVisionMeasurement override is + // intentionally NOT overridden here. DO NOT uncomment this! + // + // TIME DOMAIN EXPLANATION: + // CTRE's parent class (TunerSwerveDrivetrain) expects timestamps in + // CTRE's time domain: Utils.getCurrentTimeSeconds(). + // + // The 2-argument override above converts from FPGA time → CTRE time + // (via fpgaToCurrentTime) because WPILib code typically uses FPGA time. + // + // But our PhotonVision class calls the 3-argument version with + // timestamps that are ALREADY in CTRE's time domain (converted from + // Mac time using macTimeOffset). If we added fpgaToCurrentTime() here + // too, we'd DOUBLE-CONVERT and all vision timestamps would be wrong. + // + // In short: + // 2-arg: caller passes FPGA time → we convert to CTRE time + // 3-arg: caller passes CTRE time → parent handles it directly + // EstimatedRobotPose: passes FPGA time → caller converts before calling 3-arg + // + // If you need to add a 3-arg override in the future, check what time + // domain your caller is using FIRST. + // ===================================================================== public void addVisionMeasurement(EstimatedRobotPose pose, double distance) { if (DriverStation.isDisabled()) {