General MBTA Topics (Multi Modal, Budget, MassDOT)

One of the beauties of tap-in, tap-out systems is the operators get real, accurate data about the actually trips passengers take. (Not the weak sauce infrequent survey pseudo info the T collects.) With that data you can optimize the system profile (routes, timing, expansion planning, etc.) to actually serve your customers.
Tap-in / tap-out data is nice to have, but by no means necessary. It's only worth implementing if you have a distance-based fare system (like the commuter rail) where the tap-out is needed to finalize the fare. There are other data sources that are sufficient for planning work:
  • Automatic Passenger Counters (APCs): These sensors detect movement across the door thresholds of buses and rail cars. They're standard on newer vehicles - all buses except the oldest 8 in service have them, and I believe the Type 9s and the new RL/OL fleet do as well. (Not sure about commuter rail cars.) It gives you the number of ons and offs at each stop (when combined with vehicle location data), so you have data for crowding and time-of-day ridership at a stop-by-stop level of detail. Some systems even do fancy things like real-time indication of which cars on a train are the least crowded.
  • Origin-Destination-Interchange (ODX): This method uses successive taps on the same farecard to interpolate what trips were made. For example, if the system sees my (anonymized) car tap at Copley and then 15 minutes later on a 57 at Kenmore, it can assign that first trip to the first westbound Green Line train that arrived after I tapped in. If I tap on a northbound 86 at Brighton Center 3 hours later, it can interpolate that I took that 57 to Brighton Center. (There's more fancy things going on in the algorithm, but that's the basics.) The MBTA and MIT were actually some of the earliest to develop the method. It's not 100% accurate (though impressively so when I tested it against my own trips) but the end result is basically the same as tap-in/tap-out data.
  • Trip modeling: This is used to estimate larger-scale demand by inputting existing service and ridership data into a model that simulates people deciding on when/if and how to make trips. It's used for estimating future demand, and for figuring out how changes to the network (like building a new line) will affect travel patterns.
Combine those with faregate data and smaller focused investigations when needed (like manual counts to verify the high-tech methods), and that's the basic data that transit planners use.
 
Tap-in / tap-out data is nice to have, but by no means necessary. It's only worth implementing if you have a distance-based fare system (like the commuter rail) where the tap-out is needed to finalize the fare. There are other data sources that are sufficient for planning work:
  • Automatic Passenger Counters (APCs): These sensors detect movement across the door thresholds of buses and rail cars. They're standard on newer vehicles - all buses except the oldest 8 in service have them, and I believe the Type 9s and the new RL/OL fleet do as well. (Not sure about commuter rail cars.) It gives you the number of ons and offs at each stop (when combined with vehicle location data), so you have data for crowding and time-of-day ridership at a stop-by-stop level of detail. Some systems even do fancy things like real-time indication of which cars on a train are the least crowded.
  • Origin-Destination-Interchange (ODX): This method uses successive taps on the same farecard to interpolate what trips were made. For example, if the system sees my (anonymized) car tap at Copley and then 15 minutes later on a 57 at Kenmore, it can assign that first trip to the first westbound Green Line train that arrived after I tapped in. If I tap on a northbound 86 at Brighton Center 3 hours later, it can interpolate that I took that 57 to Brighton Center. (There's more fancy things going on in the algorithm, but that's the basics.) The MBTA and MIT were actually some of the earliest to develop the method. It's not 100% accurate (though impressively so when I tested it against my own trips) but the end result is basically the same as tap-in/tap-out data.
  • Trip modeling: This is used to estimate larger-scale demand by inputting existing service and ridership data into a model that simulates people deciding on when/if and how to make trips. It's used for estimating future demand, and for figuring out how changes to the network (like building a new line) will affect travel patterns.
Combine those with faregate data and smaller focused investigations when needed (like manual counts to verify the high-tech methods), and that's the basic data that transit planners use.
Funny enough, I actually have some questions on this topic recently:

1. How much do transit agencies balance the use of APCs and ODX? The APC data for MBTA buses each fall are easily accessible, but a major drawback is that it doesn't model the origin-destination pairs well*. On the other hand, ODX data is highly inaccessible, so I'm not sure if it was because they haven't done it for a long time or just didn't publicize the data due to privacy and other concerns.
  • (*) For example, suppose 10 passengers board the 66 bus at Harvard, 10 board at Union Square Allston, 10 alight at Coolidge Corner, and 10 alight at Roxbury Crossing. The same data can be interpreted with two extremes: one would indicate that people use the 66 for either Red-to-Orange transfers and shorter trips to nearby neighborhoods, and another would suggest that there's demand from the Allston-Brighton-CC area specifically to Harvard and Roxbury that can't be served well by anything along the Green Line.
2. I saw that MBTA had a post in 2016 about the Origin-Destination-Transfer (ODX) model developed by a group of MIT researchers, but the demonstration website (below the video) no longer works. Does anyone happen to know if there's still accessible data on the model (for MBTA)?
 
Because of how detailed ODX data is, it not publicly accessible due to privacy concerns. Even though the actual fare card numbers are anonymized (except for when using known trips to verify), all trips with a given fare card are still grouped because of the way the algorithm works. When I was working with the data as a student, I could find myself just by knowing my regular commute, as I was the only person in the system who regularly commuted between the two specific endpoints. Even a rider with a more common pattern would be easy to find just by knowing one or two less common single trips they made.

More aggregated data can be safely shared. Group by hour, by line, by region, whatever, and there's no longer personally identifiable data. For example, BART (which is tap-in/tap-out) publishes station ridership data as an origin/destination matrix.
 
Tap-in / tap-out data is nice to have, but by no means necessary. It's only worth implementing if you have a distance-based fare system (like the commuter rail) where the tap-out is needed to finalize the fare. There are other data sources that are sufficient for planning work:
  • Automatic Passenger Counters (APCs): These sensors detect movement across the door thresholds of buses and rail cars. They're standard on newer vehicles - all buses except the oldest 8 in service have them, and I believe the Type 9s and the new RL/OL fleet do as well. (Not sure about commuter rail cars.) It gives you the number of ons and offs at each stop (when combined with vehicle location data), so you have data for crowding and time-of-day ridership at a stop-by-stop level of detail. Some systems even do fancy things like real-time indication of which cars on a train are the least crowded.
  • Origin-Destination-Interchange (ODX): This method uses successive taps on the same farecard to interpolate what trips were made. For example, if the system sees my (anonymized) car tap at Copley and then 15 minutes later on a 57 at Kenmore, it can assign that first trip to the first westbound Green Line train that arrived after I tapped in. If I tap on a northbound 86 at Brighton Center 3 hours later, it can interpolate that I took that 57 to Brighton Center. (There's more fancy things going on in the algorithm, but that's the basics.) The MBTA and MIT were actually some of the earliest to develop the method. It's not 100% accurate (though impressively so when I tested it against my own trips) but the end result is basically the same as tap-in/tap-out data.
  • Trip modeling: This is used to estimate larger-scale demand by inputting existing service and ridership data into a model that simulates people deciding on when/if and how to make trips. It's used for estimating future demand, and for figuring out how changes to the network (like building a new line) will affect travel patterns.
Combine those with faregate data and smaller focused investigations when needed (like manual counts to verify the high-tech methods), and that's the basic data that transit planners use.
OK, but doesn't ODX data miss all the passengers paying in cash or single trip tix? In poorer neighborhoods most bus passengers seem to pay in cash.
 
I was the only person in the system who regularly commuted between the two specific endpoints.
Do the "tap-on" bus events get registered at the granularity of the individual stop? When I first saw this comment, I was blown away by the possibility that, for example, there might be only one individual specifically commuting between (say) Woodland and Northeastern Prudential. But then it occurred to me that journeys from specific bus stops might reach that level of uniqueness.
 
OK, but doesn't ODX data miss all the passengers paying in cash or single trip tix? In poorer neighborhoods most bus passengers seem to pay in cash.

It does, just as nonpayment does. This is generally accounted for by scaling the number of known trips: if 10% of passengers (per APC) pay in cash or don't pay, then you scale all known trips up by that much. (Tap-in/tap-out systems scale to account for those who hop the faregate.) You can even scale just the trips originating at a particular stop or area if the cash or nonpayment rate varies across the route.
Do the "tap-on" bus events get registered at the granularity of the individual stop? When I first saw this comment, I was blown away by the possibility that, for example, there might be only one individual specifically commuting between (say) Woodland and Northeastern Prudential. But then it occurred to me that journeys from specific bus stops might reach that level of uniqueness.

Yes, they do - the timestamp of the tap is correlated with the vehicle location data to determine the stop. There probably are some specific subway stop pairings with one (or none) regular riders. With the BART data, with 165,000 daily rides over just 50 stations, there are a dozen or so station pairs averaging no daily riders! In my case, I was the only rider between a certain Green Line surface stop and the route 1 stop outside MIT Building 7.
 
Yes, they do - the timestamp of the tap is correlated with the vehicle location data to determine the stop. There probably are some specific subway stop pairings with one (or none) regular riders. With the BART data, with 165,000 daily rides over just 50 stations, there are a dozen or so station pairs averaging no daily riders! In my case, I was the only rider between a certain Green Line surface stop and the route 1 stop outside MIT Building 7.
Interesting! Yeah, I figured the precision (and therefore possible number of unique journeys) would be greater on journeys involving multiple surface routes —Green to Green, Green to bus, bus to bus — but that’s fascinating about the BART data (and the possibility of something similar in Boston — though, BART is more commuter rail-like, especially in its western half, and it wouldn’t surprise me if there were MBTA CR station-pairs with no riders).

(EDIT: Holy crap, BART’s Orange Line is something like 50 miles long. That’d be like a line running from Lowell to Attleboro using heavy rail rolling stock. Admittedly, its headways are 20 min, but I believe it’s interlined the whole way, yielding cumulative frequencies in the rapid transit range.)
 
Last edited:
It does, just as nonpayment does. This is generally accounted for by scaling the number of known trips: if 10% of passengers (per APC) pay in cash or don't pay, then you scale all known trips up by that much. (Tap-in/tap-out systems scale to account for those who hop the faregate.) You can even scale just the trips originating at a particular stop or area if the cash or nonpayment rate varies across the route.


Yes, they do - the timestamp of the tap is correlated with the vehicle location data to determine the stop. There probably are some specific subway stop pairings with one (or none) regular riders. With the BART data, with 165,000 daily rides over just 50 stations, there are a dozen or so station pairs averaging no daily riders! In my case, I was the only rider between a certain Green Line surface stop and the route 1 stop outside MIT Building 7.
I think you missed my point about ODX data.

Sure you can scale the card purchases up to account for the number of cash riders. But a underlying assumption is being made that the transit use pattern of cash riders is the same as card payers. Yet we know that cash riders are in a different socio-economic strata from card payers. So that assumption may not be valid, and we may not be accounting for their real transit use patterns (and hence needs) at all.
 
With modern day technology everyone’s cell phone with GPS tracking capability has their anonymous location available to be purchased by private companies. This is what the MBTA used for trip data in the Bus Network Redesign.
According to the project and company pages, their algorithm gathers information on travel velocity and other metrics gathered from gps allowing them to also determine how someone accessed a transit station i.e. walking or biking. Problem with this is that it needs to be purchased from a 3rd party as opposed to it being data gathered from the T’s own infrastructure.
 
Yes, you're entirely right. There will be assumptions made no matter what system you're using; one of the jobs of those processing the data (and sometimes that of the planners using the data) is to check whether those assumptions are valid. One of the ways to check this assumption would be to compare the ODX data to the APC data. If cash users are riding differently than other riders, the calculated loads from the two methods won't match, and from that you could infer the patterns of cash users. You can also look at the locations of cash boardings in both directions - if the vast majority in one direction are at the terminal, you can safely assume that the vast majority in the other direction are going to the terminal.

It's also important to distinguish between users paying in cash, and users refilling a Charliecard in cash then tapping. The latter will still be associated with the card for ODX purposes. This 2019 thesis (page 48) indicates that 8.8% of farebox intersections involve cash. Of those, 5% are cash fares, while 3.8% are Charliecard refills.

That thesis also indicates that cash boardings are heavily concentrated at a relatively small number of stops. Out of the 7000+ farebox stops (bus stops, Green Line surface stops, and Mattapan Line stops), just 0.52% (less than 40) accounted for 20% of cash boardings. (Those were mostly major rapid transit transfers - which already have fare machines that passengers should use instead - plus a few major destinations like malls.) Half of cash boardings were at ~250 stops. That makes it a lot easier to infer overall travel patterns.
 
With modern day technology everyone’s cell phone with GPS tracking capability has their anonymous location available to be purchased by private companies. This is what the MBTA used for trip data in the Bus Network Redesign.
According to the project and company pages, their algorithm gathers information on travel velocity and other metrics gathered from gps allowing them to also determine how someone accessed a transit station i.e. walking or biking. Problem with this is that it needs to be purchased from a 3rd party as opposed to it being data gathered from the T’s own infrastructure.
Holy crap.

Wow, that's exactly the dataset I need.

(Edit: They have "Boston Back Bay" to the north of "Boston Copley". Totally not confusing at all.)
 
Last edited:
(Edit: They have "Boston Back Bay" to the north of "Boston Copley". Totally not confusing at all.)
Okay, I have to put this in a separate post because I can't get over how ridiculous this is:

1705134407630.png
 
Okay, I have to put this in a separate post because I can't get over how ridiculous this is:

View attachment 46764
(This reminds of the NY Times neighborhood mapping project.)

I'm actually not particularly bothered by this!
  • The east-west grid north of Boylston is definitely Back Bay -- it literally was the bay at the "back" of Boston before being filled
  • The neighborhood south of Columbus Ave (and maybe simply south of the Orange Line) is definitely the South End
    • Yes, this means that "Back Bay Station" is less clearly in the Back Bay than it might be
    • Which probably is why (especially historically) the station was signed as "Back Bay/South End", suggesting that it is between the two neighborhoods
    • "Back Bay/South End" examples:
  • The area between the Orange Line and Boylston St is definitely more ambiguous
  • Plus, the character north of Boylston is significantly different than south
    • North is more residential and "shorter"
    • While south of Boylston is almost entirely the High Spine
That last point, in particular, is why it seems like it would make sense to treat these as separate neighborhoods for the purposes of analyzing commuting data (even if not for formal neighborhood demarcation).
 
That last point, in particular, is why it seems like it would make sense to treat these as separate neighborhoods for the purposes of analyzing commuting data (even if not for formal neighborhood demarcation).
I do agree with treating the two regions separately (and it already uncovers interesting data on demands, particularly when it comes to Cambridge - I can share the analysis if there's interest). I just think there's a better way of naming them, especially for the southern part that they call "Boston Copley", which doesn't even include the actual Copley Square.

Perhaps "Boston Prudential" would be more indicative of the region it's referring to. Or call them "Boston Back Bay N" and "Boston Back Bay S", which have many precedents in the dataset (Harvard was split into E, W and N).

  • Plus, the character north of Boylston is significantly different than south
    • North is more residential and "shorter"
    • While south of Boylston is almost entirely the High Spine
Also note that the entire shopping district of Newbury St is to the north of the divide line.
 
Milton vote next month will be a crucial test for state’s ambitious new housing law
It will be the greatest test yet for what’s known as the MBTA Communities Act, which compels cities and towns served by the transit agency to zone for multifamily housing and is widely considered Massachusetts’ most powerful tool for chipping away at the housing crisis.

A yes vote by Milton residents would be a major victory for the state — all 12 communities that were required under the law to create new zoning by the end of 2023 will have done so. (None of the other 11 had community-wide referendums on their zoning changes.) A no vote could spark an ugly chain of events that might include financial penalties and legal action against Milton by the attorney general’s office and, advocates say, send a signal to other towns that compliance is optional.
The vote is set for February 13.
 
Kind of an obvious point, but anecdotally my buddy has decided to start permanently commuting 30+ minutes by car to a different office rather than take the green line to the downtown office amidst the shutdowns. I also rely on the green line to get between Cambridge and Allston a couple times a week and have started biking/ubering rather than spending ~1.5 hours traveling with 4 or so transfers (69 bus, Green Line, Orange Line, Green Line Shuttle). I know the shutdowns and restrictions are necessary but in terms of respecting transit riders we haven't done enough. There are material effects on our quality of life.
 
I just think there's a better way of naming them, especially for the southern part that they call "Boston Copley", which doesn't even include the actual Copley Square.
Omg I missed this. Okay yeah that’s bad, hahaha. I like Boston Prudential too.
 
Kind of an obvious point, but anecdotally my buddy has decided to start permanently commuting 30+ minutes by car to a different office rather than take the green line to the downtown office amidst the shutdowns. I also rely on the green line to get between Cambridge and Allston a couple times a week and have started biking/ubering rather than spending ~1.5 hours traveling with 4 or so transfers (69 bus, Green Line, Orange Line, Green Line Shuttle). I know the shutdowns and restrictions are necessary but in terms of respecting transit riders we haven't done enough. There are material effects on our quality of life.
The thing is, the alternative is to let all riders endure a slow, unsafe ride for an eternity, which is much worse at "respecting transit riders" and their "quality of life".
 
The back and forth with the green line shutdowns being done in 3 separate phases seems like poor planning to me. I would think it would be best to just do one long 36 day shutdown between North Station and Kenmore instead of three 12 day shutdowns.

One long 36 day shutdown would be easier as it's just adjusting to shuttles at the start, and return to full speed coming back. It's not this flip flop back and forth the past 2 months, close for 12 days, open for 12 days, close for 12 days, open for 3 days, close for 12 days, reopen again. That would be more disruptive IMO.
 
The back and forth with the green line shutdowns being done in 3 separate phases seems like poor planning to me. I would think it would be best to just do one long 36 day shutdown between North Station and Kenmore instead of three 12 day shutdowns.

One long 36 day shutdown would be easier as it's just adjusting to shuttles at the start, and return to full speed coming back. It's not this flip flop back and forth the past 2 months, close for 12 days, open for 12 days, close for 12 days, open for 3 days, close for 12 days, reopen again. That would be more disruptive IMO.
It appears that the shutdowns (Dec 11-20 for the D branch*, Jan 3-12, Jan 16-28) and their gaps are mostly scheduled to avoid public holidays (Christmas, New Year, MLK day). I assume it was deemed undesirable or infeasible for track workers to report to work on those days.

* Note that for non-D-branch riders, the previous shut down was on Dec 3, a month before the January ones.
 
Last edited:

Back
Top