feat(ocs): define parsed OCS message types by andrewdolce · Pull Request #16 · mbta/schemas

andrewdolce · 2025-06-09T14:22:53Z

Asana task: Add com.mbta.ocs parsed types

To support upcoming features, Orbit will need to listen to certain OCS events to supplement GTFS. Rather than listen to the existing ocs.raw_message events emitted by Trike, the team has proposed changing Trike to emit additional "parsed" events, which will have already done the work of parsing the OCS raw comma-separated format and converting to JSON.

Trike will still emit the existing raw messages unaltered, so this should be backward-compatible (assuming all existing consumers already filter out unknown message types). But the thinking is that Orbit and future consumers can subscribe to these parsed events and not have to duplicate the raw data parsing.

This PR includes JSON schema, examples, and minimal docs for the TMOV and TSCH messages, which are the two that we believe Orbit will need soon. Additional types may need to be spec'd out later.

I have a number of in-line "TODOs" as points to get specific feedback, but here are some general things I want to point out:

Like the raw events, these new events are intended to fit with the the CloudEvents spec. If I've violated that anywhere, please let me know.
I've represented each sub-type of TSCH, (e.g. TSCH_NEW, TSCH_OFF) as a separate top-level event type (e.g. ocs.tsch_new, ocs.tsch_off), instead of say, having a single ocs.tsch event with a subtype field. My instinct is that this is cleaner because there is a lot of variation in the data fields between the subtypes, but curious what others think.
Perhaps more controversially, I also broke TMOV into three types, for heavy-rail, light-rail, and deletions. I did this after looking at how the RTR codebase represents these messages, and felt it might make sense to do the same because of the variations in data between the three.
I went with camel case for property names, to mimic the glides event schemas and fall more generally in-line with most JSON standards. I had considered using underscores to better match how things are named in our elixir codebases, but that felt too implementation-dependent.
Some of my biggest open questions are about the counter field on OCS events.
- Does RTR or any consumer actually use this to detect dropped messages?
- Are the counter rules well-understood? From looking at splunk, it seems like each transitline gets its own independent counter across all event types, but I do see occasional missing ones, or counters that look drastically out-of-sequence.
- If the counter is important, then does that mean that Orbit will need to listen to all events (not just TMOV and TSCH) in order to keep the count?

rudiejd · 2025-06-09T15:51:24Z

Does RTR or any consumer actually use this to detect dropped messages?

Most of the business logic for how RTR uses the OCS message counter can be found in OCS.Stream.SequenceMonitor. I believe the only consequence of a sequence being out of order is that we emit a log, but tagging @lemald here since he probably has more context and my understanding is just from grokking RTR code.

Are the counter rules well-understood? From looking at splunk, it seems like each transitline gets its own independent counter across all event types, but I do see occasional missing ones, or counters that look drastically out-of-sequence.

The SequenceMonitor indicates that the counter should be per trike IP, so if the trike IP changes some out of sequence values could be permitted. You should get a ocs_sequence_monitor missing_sequence event when the counter skips numbers.

If the counter is important, then does that mean that Orbit will need to listen to all events (not just TMOV and TSCH) in order to keep the count?
This is probably above my pay grade, but the fact that a sequence mismatch is not actionable in RTR makes me think that you might be able to get away with ignoring them.

rudiejd · 2025-06-09T17:20:58Z

+            },
+            "heading": {
+              "description": "Directional information. Dependent on dataSource.",
+              "$comment": "TODO: Should we break this up into separate fields instead of a union type?",


I would definitely be in favor of splitting this up into separate fields, and potentially only including each field in a distinct message type. I believe this is a heading information when we're getting a GPS reading and AB information when we are getting an AVI read. The unit for heading information should be degrees offset from Magnetic North. I split these into separate fields in the light rail eventsschema already.

Ah ok, that makes sense. Definitely on board with splitting it into two fields.

In terms of distinct message types, are you considering something like ocs.tmov_light_rail_gps.v1, ocs.tmov_light_rail_avi.v1, and ocs.tmov_light_rail_nvl.v1 for the different data sources? Since the light rail events schema has a single event type with both fields, I think I would be inclined to do the same, and just mark them as nullable in cases where they aren't provided, rather than introduce more types. But let me know if you have a strong opinion otherwise.

But that leads to another question: RTR codebase uses an explicit :unreported atom in cases where this (and speed) are missing. Is there a reason that couldn't just be null here? Maybe there's some known distinction between "this is expectedly not provided" vs "this is missing and that's a hole in the data error"?

andrewdolce · 2025-06-10T12:13:41Z

+            "orientation": {
+              "description": "Whether the 'rear' or 'front' of each train is facing forward for each train in the consist. B means backward, A means forward. Only provided with AVI data source.",
+              "type": ["string", "null"]
+            }


Took another pass at this and tried to make it align with the RTR lightrail messages. If I'm understanding it, that means each individual character in the string represents the orientation of a train car, as opposed to, say, having this be an array with "AB" vs "BA" values, like RTR has internally. E.g. "AAB" instead of [ "AB, "AB, "BA" ].

andrewdolce · 2025-06-10T13:08:28Z

+        "leadCar": {
+          "description": "Lead car number, as represented internally in OCS, which may differ from the number displayed on a physical train car. Example: For a red line car physically numbered 15xx, internally numbered 25xx, this field would contain 25xx.",
+          "$comment": "TODO: Should this just be an integer.",
+          "type": "string"
+        },
+        "leadCarDisplay": {
+          "description": "Lead car number, as displayed on the physical vehicle. Example: For a red line car physically numbered 15xx, internally numbered 25xx, this field would contain 15xx.",
+          "$comment": "TODO: Should this just be an integer.",
+          "type": "string"
+        },


Just put up a commit to add this. Wondering whether people like this idea. Essentially, this is an attempt (at the cost of being more verbose) to get rid of every downstream listener needing to perform the special case mapping of red line 25xx -> 15xx trains, and to potentially account for future scenarios where train cars may have special numbering internal to OCS.

Torn on this. I get what you're after, but most TID systems except for Orbit probably want 15xx (I think RTR uses 15xx, right?). So I wonder if we should continue to base TID on 15xx and Orbit should shift back to 25xx in its presentation.

I know we discussed this a little already, but how would you feel about including this, but reversing the naming scheme? Something like:

leadCar -> "Physical car number"
leadCarInternal -> "Car number as represented internally in OCS"

I could also make the internal field optional and only present if there's a difference between the two? Would save us from being overly chatty in the majority of cases where there is no difference.

Pushed a version revised the way I proposed. Let me know if you think this is better, or worse, or if I should just drop this idea entirely.

andrewdolce · 2025-06-11T13:27:17Z

Just realized I missed a TSCH subtype - TSCH_TAG. I'll work on adding it.

mathcolo · 2025-06-11T17:55:49Z

@@ -0,0 +1,116 @@
+{


I know we talked about this verbally earlier, but are there enough differences between light and heavy rail to warrant a separate event type, rather than using optional fields?

mathcolo · 2025-06-11T18:04:10Z

+        "dispatchTags": {
+          "description": "A set of tags that dispatchers can assign to trains.",
+          "type": "array",
+          "items": {
+            "type": "string"
+          }
+        }


I noticed some TODOs elsewhere to consider enumerate strings. This is another case where a list of possible tags either as an enumerator or in the description could act as documentation of what's possible.

From conversation in slack, it sounds like there isn't an official doc of known values here, so I'm actually leaning in the direction of not making it an enum, so as to allow for arbitrary values. Let me know if you feel strongly otherwise? How would you feel about allowing arbitrary strings, but documenting the set of commonly expected values, in the description and/or docs?

mathcolo · 2025-06-11T18:05:20Z

+      "minLength": 20,
+      "$comment": "RFC3339 timestamp",
+      "format": "date-time",
+      "pattern": "^[0-9]{4}-[01][0-9]-[0-3][0-9]T[012][0-9]:[0-5][0-9]:[0-6][0-9](.[0-9]*)?(Z|[+-][012][0-9]:[0-5][0-9])$",


I've seen OCS timestamps in local time or RFC3339 without a timezone prefix (including Z), just checking this is right?

At the moment, I'm basing this on RTR because in most cases they convert local times into full RFC3339 at the point of parsing, so I was thinking Trike could do the same. I will double check though.

mathcolo · 2025-06-11T18:11:19Z

+        "leadCar": {
+          "description": "Lead car number, as represented internally in OCS, which may differ from the number displayed on a physical train car. Example: For a red line car physically numbered 15xx, internally numbered 25xx, this field would contain 25xx.",
+          "$comment": "TODO: Should this just be an integer.",
+          "type": "string"
+        },
+        "leadCarDisplay": {
+          "description": "Lead car number, as displayed on the physical vehicle. Example: For a red line car physically numbered 15xx, internally numbered 25xx, this field would contain 15xx.",
+          "$comment": "TODO: Should this just be an integer.",
+          "type": "string"
+        },


Torn on this. I get what you're after, but most TID systems except for Orbit probably want 15xx (I think RTR uses 15xx, right?). So I wonder if we should continue to base TID on 15xx and Orbit should shift back to 25xx in its presentation.

andrewdolce · 2025-08-07T12:56:48Z

From recent conversation, it sounds like we may want to re-evaluate the decision as to whether Trike should emit parsed messages. So this schema may not be needed.

I'm converting this PR back to a draft for now. (May end up just closing and re-opening later if needed.)

andrewdolce · 2026-03-26T18:04:24Z

Pretty sure this is no longer relevant. Closing for now.

andrewdolce added 3 commits June 9, 2025 09:15

feat(ocs): add schemas/examples for ocs.tmov events

4591252

feat(ocs): add schemas/examples for ocs.tsch events

2691369

feat(ocs): reorg ocs message docs and add parsed events

d35acfb

rudiejd reviewed Jun 9, 2025

View reviewed changes

Comment thread docs/events/ocs/com.mbta.ocs.tmov_light_rail.v1.mdx Outdated

rudiejd reviewed Jun 9, 2025

View reviewed changes

andrewdolce added 3 commits June 10, 2025 08:01

feat(ocs): touch up ocs event docs

861a529

feat(ocs): split out orientations from heading

861aab0

feat(ocs): make orientation field consistent with rtr.lightrail

bee06de

andrewdolce force-pushed the feat/ocs-parsed-message-types branch from 7c8490e to bee06de Compare June 10, 2025 12:06

andrewdolce commented Jun 10, 2025

View reviewed changes

feat(ocs): add explicit display fields for car numbers

4af1a61

andrewdolce force-pushed the feat/ocs-parsed-message-types branch from 40b4489 to 4af1a61 Compare June 10, 2025 13:13

andrewdolce force-pushed the feat/ocs-parsed-message-types branch from b6abb74 to bb1bf4e Compare June 11, 2025 17:51

andrewdolce added 2 commits June 11, 2025 13:56

feat(ocs): add schema/examples/docs for tsch_tag events

ba27fdb

feat(ocs): revise TMOV to use single-char dispatch tags

e0bc7a3

andrewdolce force-pushed the feat/ocs-parsed-message-types branch from bb1bf4e to e0bc7a3 Compare June 11, 2025 17:56

mathcolo reviewed Jun 11, 2025

View reviewed changes

andrewdolce added 2 commits June 12, 2025 14:14

feat(ocs): resolve TODO

72d4bb2

feat(ocs): replace consist "display" with "internal"

a952954

andrewdolce mentioned this pull request Jun 12, 2025

feat: emit "parsed" cloud events for OCS TSCH messages mbta/trike#44

Draft

3 tasks

andrewdolce marked this pull request as draft August 7, 2025 12:56

andrewdolce closed this Mar 26, 2026

Uh oh!

Conversation

andrewdolce commented Jun 9, 2025

Uh oh!

rudiejd commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewdolce Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewdolce commented Jun 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewdolce commented Aug 7, 2025

Uh oh!

andrewdolce commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rudiejd commented Jun 9, 2025 •

edited

Loading

andrewdolce Jun 9, 2025 •

edited

Loading