Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an available vehicles endpoint to provider #310

Closed
2 of 6 tasks
johnpena opened this issue May 9, 2019 · 13 comments · Fixed by #376
Closed
2 of 6 tasks

Add an available vehicles endpoint to provider #310

johnpena opened this issue May 9, 2019 · 13 comments · Fixed by #376
Labels
enhancement New feature or request Provider Specific to the Provider API
Milestone

Comments

@johnpena
Copy link

johnpena commented May 9, 2019

Is your feature request related to a problem? Please describe.

Agencies often use our MDS status change feed to figure out which vehicles are available in their region. Many are trying to calculate parking caps, or trying to get simple counts of vehicle availability. Using status changes is problematic. Most agencies try to do this by replaying all of the status changes in their feed, in an attempt to replay all the state changes in the vehicle state machine. Most just want to answer the simple question: what's the current status of the provider's fleet?

Describe the solution you'd like

A vehicles endpoint in MDS provider, similar to the one in MDS agency, wherein a provider publishes a list of the vehicles that are currently registered in the region. This endpoint could take as inspiration the same functionality that's provided in GBFS. It could contain the following information:

  • Device ID/Vehicle ID
  • Most recent Lat/Long
  • Current status/availability
  • Most recent event ID

Is this a breaking change

  • Yes, breaking
  • No, not breaking
  • I'm not sure

Provider or agency

For which API is this feature being requested:

  • provider
  • agency
  • both

Describe alternatives you've considered

The most obvious alternative is to add additional status change types, but this would still require agencies to do their own analysis to replay each vehicle's state machine. I believe this work should be done regardless. But since many agencies are trying to understand vehicle availability, I believe it would be easiest for all parties if the provider made this available through a simple API where little-to-no analytics or data processing would be necessary.

@black-tea
Copy link
Contributor

Agencies often use our MDS status change feed to figure out which vehicles are available in their region.

Why not just use GBFS to serve this functionality? To me this seems like the most obvious alternative, and I believe the standard includes 3/4 pieces of information you are looking for (with the exception being the most recent event ID).

I know that there is till work being done to generalize the standard for dockless (and companies have implemented their own various flavors of it for scooters / other devices), but many of them (including lime) already have GBFS for various places. In fact, there is a field in providers.csv to provide the GBFS discovery URL. Now that you mention it, Lime's is empty....😳

I guess the next question would be: in what case would you want fleet status information that wouldn't be shown in an open, public feed? If I am understanding correctly that your focus is on availability, I'd be curious to hear how GBFS falls short.

@rf-
Copy link
Contributor

rf- commented May 9, 2019

For use cases like parking caps, cities often want to see both available and unavailable devices to make sure they're getting a full picture of the provider's presence in the ROW. GBFS doesn't work for this use case since it only includes available devices.

@dyakovlev
Copy link
Contributor

some providers also purposefully don't expose consistent/reconcilable device identifiers between MDS and GBFS.

@black-tea
Copy link
Contributor

ah...I see.

@thekaveman
Copy link
Collaborator

Agencies often use our MDS status change feed to figure out which vehicles are available in their region... Most just want to answer the simple question: what's the current status of the provider's fleet?

I don't mean to sound dismissive, but to me this proposal does not address the above concern.

some providers also purposefully don't expose consistent/reconcilable device identifiers between MDS and GBFS.

Agencies need raw data to be able to do the analysis precisely because getting reliable device counts/information from companies is so problematic.

This endpoint would further remove consumers of MDS Provider from the raw data and set up an implicit trust in any given company's methods of counting/aggregation. From the basic "level playing field" point of view, how is a regulatory agency supposed to know if Company A altered their counts or otherwise counted in a "unique" way, different or altogether incompatible with Company B, C, D, etc.?

@fscottfoti
Copy link

fscottfoti commented May 14, 2019

@thekaveman I don't think @johnpena was suggesting pre-aggregating or pre-counting with provider-specific methods, but rather to provide the raw state of vehicles at a given point in time similar to GBFS but including non-public info like real device_ids and the state of non-available vehicles. Since status_changes is a messy feed at this point (and probably always will be), it seems super helpful to be able to check against the current state periodically.

The main problem I see with the proposal is that the nice thing about status_changes is it's a concise way to represent several months of data (basically it's a differential encoding), whereas this "current state" functionality would be quite large to represent historically with any granularity (GBFS gets around this by only giving the live state). Still it would be quite nice to have access to the full state of the system historically with something like e.g. an hourly granularity.

@billdirks
Copy link
Contributor

We spoke briefly about this at the MDS meeting and discussed a similar solution to what @fscottfoti suggests (though nothing was decided on). Consumer of MDS want to get an accurate world state both now and at times in the past. As long as status_changes are reported in a timely manner, a non-public feed which provided the positions, state, and perhaps other information, of all available and unavailable vehicles at a regular interval (eg 1 per day at midnight or 1 per hour or some other interval) would allow consumers to reconstruct the world state now or at any time without replaying all status change events.

@johnpena
Copy link
Author

To be clear, I'm not proposing removing status_changes, or altering anything about it at this time. To echo what others have already said, I'm proposing something like GBFS, but with the addition of unavailable vehicles, device IDs, a most recent event ID, most recent trip ID, and possibly other helpful information.

allow consumers to reconstruct the world state now or at any time without replaying all status change events

This is the primary use case I would like to address with this change. Replaying status changes is error prone and something it seems a lot of cities would rather not do.

whereas this "current state" functionality would be quite large to represent historically with any granularity (GBFS gets around this by only giving the live state). Still it would be quite nice to have access to the full state of the system historically with something like e.g. an hourly granularity.

It's a fair point that retaining historical data might be cumbersome, but it's something we should consider. The fact that GBFS only gives you a live snapshot rules it out for a lot of interesting use cases.

@hunterowens hunterowens added this to the 0.3.2 milestone May 21, 2019
@craastad
Copy link
Contributor

Thank you @johnpena for creating this issue. This is exactly what I was suggesting one or two MDS calls ago. I believe cities want GBFS implemented for the free_bike_status.json call, which lacks the currently riding vehicles. With this under the GBFS umbrella, the ids won't necessarily line up with MDS ids.

As a provider I may choose to implement the use cases of GBFS and MDS differently.

As long as status_changes are reported in a timely manner, a non-public feed ... would allow consumers to reconstruct the world state now or at any time without replaying all status change events

MDS provider APIs do not specify what is a timely matter. I interpret this as allowing use of DB replicas, or cron jobs, etc. where some seconds or minutes or lag are acceptable. MDS agency attempts to alleviate this, but hasn't seen much adoption amongst cities. A realtime API would require realtime data sources and be implemented like GBFS free_bike_status but with more data allowed in an authenticated API.

Implementing this feature would allow MDS to remove the dependency on the subset of GBFS outlined in the Realtime Data of the spec. Is the goal of MDS to replace GBFS for scooter and dockless vehicle shares? Or complement the GBFS spec with more data?

@thekaveman
Copy link
Collaborator

@fscottfoti: I don't think @johnpena was suggesting pre-aggregating or pre-counting with provider-specific methods, but rather to provide the raw state of vehicles at a given point in time similar to GBFS but including non-public info like real device_ids and the state of non-available vehicles.

Right, and I could have chosen my words more carefully... I was trying to get across that status_changes and trips map back to (theoretically anyway) real-world events: a device was placed on the street, a trip started, etc. These roughly correlate with business activity - the companies must pay their device chargers/deployers and must charge their customers for usage.

Whereas this proposal suggests a more subjective stream of data - what a given company "thinks" is out there; this is slightly removed from the ground-truth of operations that the current endpoints were designed to capture.

Under this re-framing, my main point still stands:

...how is a regulatory agency supposed to know if Company A altered their counts or otherwise counted in a "unique" way, different or altogether incompatible with Company B, C, D, etc.

If Company A decides "these are the devices we have available right now" - how did they make that decision? What conditions determine whether a device makes it on this list or not? Does Company B, C, D, and all others share those conditions and make those decisions the same way? Even the relatively simple description of this endpoint: ... a list of the vehicles that are currently registered in the region contains a ton of ambiguity - what does "currently registered" mean (especially in the absence of an agency implementation)? What region? Who decides the region?

From the standpoint of fairly administering a regulation, e.g. for parking or device caps: we need consistency in how these things are measured/accounted for. I'm concerned that this proposal does not address that need for consistency.

@johnpena
Copy link
Author

@thekaveman you bring up good points, but a lot of these strike me as broader issues with MDS that are outside the scope of this issue on it's own.

With respect to determining the set of vehicles published in this endpoint, I think we could specify that any vehicle for which the provider would publish a status change or trip during that time period, the provider should include that vehicle in the set of all vehicles.

@billdirks
Copy link
Contributor

We discussed this issue in the weekly Thursday meetings this week. I believe there was a consistent need vocalized for a consistent, well defined to calculated realtime metrics for caps.

My concern with a realtime only endpoint is if a consumer's (eg, a city's) system goes down for some time window they miss data with no way to recover it.

@fscottfoti
Copy link

I was thinking today it might be useful to separate this into two separate problems (which I sort of helped conflate in my comments above, sorry).

The original intent of the issue as presented by @johnpena was to provide a simple mechanism to check caps for cities, inspired by GBFS and possibly only for the current state of the system (no historic data). This is mainly to allow cities to see the fleet of vehicles out there right now with some privacy protections that GBFS wouldn't provide.

In the spirit of @thekaveman's comments, perhaps this is insufficient to compute a more nuanced definition of cap compliance (e.g. a rolling average of some kind) and for that we need historic data. Or like @billdirks says, maybe we want to compute cap compliance for some time before we started harvesting the feed, or the feed went down for a bit.

My biggest problem is that MDS feeds are noisy - some events aren't recorded, and I think we frequently have no evidence of when a vehicle stops pinging the operator. Basically we have no representation of what's out there at a given point in time, we just have its most recent event in the status_changes feed to that point in time. It's an excellent compression scheme, but it's a lossy compression!

I had proposed using this quasi-GBFS feed as a way to get the state of the system at any historic point in time, but alternatively, what if we just added a "ping" event that essentially says "still available" at some reasonable cadence, perhaps as infrequent as an hourly basis. This would solve having to create assumptions about when a vehicle disappears from the feed, at least within an hourly resolution. I can think of other ways to do this rather than a ping - operators could rigorously add service_ends after they haven't heard from a vehicle for a given period of time - but it might be nice to have the confidence of a good solid ping ;)

Even if you disagree with that specific proposal, the larger points still holds. One could have the new endpoint proposed by @johnpena to give a GBFS-like view of the current world out there and we could solve the messy-ness of status_changes some other way, in order to give sufficient flexibility in the definition of cap compliance. Put another way, perhaps the endpoint proposed here should give "the vehicles that are available in the city right now" rather than "cap compliance."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Provider Specific to the Provider API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants