It's entirely plausible that they don't actually have good enough data to do this kind of analysis.
Sorry for posting a screenshot of Excel, but I don't know how else to show this. All I did was filter on car 3881 (literally a random car I thought of) for October 1 2025 and sort earliest to latest event.
According to
LREvents dataset this is how it started its day:
View attachment 69116
By the time it got from Riverside to Union Square it had been a part of 3 different trip_ids, had 2 different mates, and teleported backwards.
Trip 71193703 was normal from Union Square to Riverside, except that there was no DEP event from Union Square so we don't know how long it
actually takes to get to Riverside from Union Square. We can infer from the time it arrives at Lechmere, but that's additional steps and assumptions.
Trips ADDED-1583150555 and ADDED-1583150682
only have the records shown above, that's the entirety of that trip on 10/1/2025.
These kinds of issues are extremely common in the dataset and makes it incredibly difficult to do analysis, and there are other ones as well (eg Trip 71193770 from row 2 above departed Boston College at 20:02 but arrived at Washington Street at 06:54)
Of the 173 inbound C "trips" in the dataset on 10/1/2025 only 56 have a DEP event at all of Cleveland Circle, Coolidge Corner, Kenmore, and Boylston.
I don't want to say that collecting this data is easy because it's not, but trying to do predictive analytics off of flawed data is very difficult.
Of course, it's also entirely plausible that I just haven't figured out the trick to understanding how to use this data effectively as well.