Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Requirements] Ability to express data sharing requirements in Policy #608

Closed
quicklywilliam opened this issue Dec 16, 2020 · 34 comments · Fixed by #646
Closed

[Requirements] Ability to express data sharing requirements in Policy #608

quicklywilliam opened this issue Dec 16, 2020 · 34 comments · Fixed by #646
Assignees
Labels
Policy Specific to the Policy API privacy Implications around privacy for the attention of the OMF Privacy Committee
Milestone

Comments

@quicklywilliam
Copy link
Contributor

quicklywilliam commented Dec 16, 2020

Is your feature request related to a problem? Please describe.

It has come up in a number of contexts that it would be useful to allow agencies to express data sharing requirements in MDS. I'm probably forgetting a few use cases, but here are some I have heard:

  • In Provider Reports - Static #607, allow agencies to state which geographies they would like aggregate metrics or reports for
  • In Method to Exclude some Provider Fields from Response #507, allow certain fields or endpoints to become "optional" in the sense of not being required by the spec, but still allow them to be "required" in the sense of the city explicitly requiring them of providers
  • In Geography-Driven Events #503, allow agencies to explicitly opt int to new optional or beta endpoints
  • In Make Policy/Geography API Endpoints Optionally Public #585, allow agencies to invite transparency and public accountability around the data they collect and why
  • Improve data security by allowing operators to allocate access properly so that a city that does not request a given field/endpoint does not in fact have access to it
  • Improve system efficiency by allowing operators to know in advance which fields/endpoints need to be stored and/or computed in advance

Describe the solution you'd like

I think of MDS Policy as the digital representation of a City's mobility program policies – we are digitalizing their program PDF, in a sense. Data sharing requirements, including the requirement to provide MDS, are common in nearly any such policy. Hence, I think this belongs in MDS and in Policy specifically. One possibility that might meet some of the uses cases above would be adding a new rule type or set of rule types within the Policy endpoint that describe a city data sharing need.

Is this a breaking change

I think this could probably be done in a non-breaking way.

Impacted Spec

Policy.

Describe alternatives you've considered

There has also been discussion of this living in a new endpoint. I'm open to that, but I like the simplicity of having a single (public!) endpoint return everything about a city's policy. In addition, there are often direct relationships between rules and data sharing needs (cities need data to assess compliance with a rule). I think these relationships might be made more clear and explicit through a carefully considered augmentation of MDS Policy.

@marie-x
Copy link
Collaborator

marie-x commented Dec 16, 2020

I have given some thought to putting this information in the Jurisdictions API, but I'm not strongly attached to that.

@schnuerle
Copy link
Member

See notes from the WG Call today. Some good interest in this concept for the next release.

  • Discovery for what cities require - APIs, endpoints, optional fields, version supported
  • Can be tied to a multi-jurisdictional thing
  • Like a specification guide for cities and MDS - digital representation of what’s in the data section of the permit (as it relates to MDS).
  • OMF could create a process/checklist to pick what you want and generate text you put into the permit.
  • Permits can reference this API directly
  • Meg from Baltimore: immensely helpful - could be public for discovery and aggregation and transparency
  • Would be public like Policy, Geography
  • Could be part of Policy, but it seems clearer to have it be a higher level API served independently.

@schnuerle
Copy link
Member

See notes from the WG call this week.

  • Meg Baltimore - would be great for clarity for what cities need
  • Steve, auto catalog these feeds? Maybe if need but manual collection could work too.
  • Version settings - each supported version would be in the file
  • Good idea to collect options in the Issue, leave thoughts

@schnuerle schnuerle added the Policy Specific to the Policy API label Jan 25, 2021
@schnuerle schnuerle added this to the Next Release milestone Jan 25, 2021
@schnuerle
Copy link
Member

We will be discussing this topic at tomorrow's Working Group meeting.

@schnuerle
Copy link
Member

schnuerle commented Jan 29, 2021

Notes from the public working group meeting:

  • Which areas/APIs is GPS required, or where you could specify geographies?
  • How does aggregation fit into this, GDEs?
  • Could add telemetry information requirements to metadata
  • Is this for discovery, eg. a root level api?
  • What can be added to existing APIs like Policy/Jurisdiction? Timezone, contact info could be in Jurisdiction
  • What makes it a new API?
  • Healthy for this to be public for transparency and accountability, able to see what agencies are requiring and using
  • Need to add versioning/date changed. And guidance with policy. Need to talk to providers just like Policy now.
  • Notification period could be a parameter in metadata section.

See a draft proposal for this from me (@schnuerle). Open for comments here or directly in the document:

https://docs.google.com/document/d/1zHhg9YpGhp63mu1T3xUSRxfMmPSk1brn78ER5thTnCI/edit

Open Questions (also in the draft document):

  1. Published publicly or authenticated?
  2. Provider IDs per MDS version or per Requirements API feed? May want to ask some providers to use older versions for specific infrastructure or use case needs, eg, older docked bikeshare systems, or newer delivery robots.
  3. Any other fields that could be added to the agency metadata area?
  4. Should it be called Requirements, or are there better names, like Manifest?
  5. What level should it go to, eg, endpoint only or down to optional fields?
  6. Should this be in Policy, or a higher level API?0
  7. Does vehicle type (or mode) need to be added as a distinction?

@schnuerle schnuerle modified the milestones: Next Release, 1.2.0 Feb 12, 2021
@schnuerle
Copy link
Member

I've made some additions to the draft document based on some conversations with folks. Notably:

Summary of benefits of doing this (any others?):

A JSON file that explains all the parts of MDS that an agency would like to see from providers, as well as information about the APIs the agency is serving up, can help:

  1. clarify vague wording in agency policy documents.
  2. take some weight off the shoulders of providers for custom work and special code needed for each city’s unique requirements interpretation.
  3. allow discovery of the agency URLs, like GBFS does, for both the public and providers.
  4. allow agreement and clarity on provider URLs as communicated to agencies.
  5. the OMF with information about which agencies and providers are using which MDS versions, APIs, endpoints, and optional variables.

Addition of an agency id field

agency_uuid - Created just like providers.csv, but new agencies.csv, with agency name, website, and Requirements API URL. Would help with discovery of MDS usage by agencies, providers, pubic, etc and could be used in Jurisdictions.

Idea of an OMF manifest, and/or hosting of the file

Could the OMF host a manifest of Requirements APIs by agency, which can link to an external agency location of the file, or to a file in a GitHub repo maintained by the OMF? The file should not change very much except when permit policy is updated or new MDS versions are adopted. Agencies can open issues or pull requests or just email the OMF for updates.

If the OMF hosts the files initially during a testing phase, we can see how it goes and do the heavy lifting to kickstart the project and remove barriers to entry. Also allows for the easy creation of agency IDs.

Online generation tool

Should an online generator be created where an agency can input via form what they want, and the JSON be generated? It could run in the browser with no backend, and be hosted and maintained by OMF. Might not be needed if OMF hosts and manages the files.

@schnuerle
Copy link
Member

From our Working Group meeting yesterday.

  • Notes have been incorporated into the working draft document and welcome feedback in the comments there or on the issue.
  • Additional notes:
    • SF and spin like the idea of OMF hosting the files.
    • DC brings up a good point that cities could be required to host for legal reasons.
    • Consensus is OMF could optionally host files.
    • Manifest of agencies.csv pointing to OMF files or city hosted files should be in MDS repo (not outside), just like providers.csv (and that file should stay there too).
    • Need to be clear of frequency of updates in spec, and guidance for how to add this to city policy docs.

@schnuerle
Copy link
Member

The City and Provider Working Group Steering Committees met last week for the Midway Checkpoint for the 1.2.0 proposed release. The Checkpoint let them review feature proposals, align current work to goals, and ensure the release features and work is on track. 

For this work, we have created a new rubric to help guide the evaluation, looking at feature utility, stakeholder adoption, implementation simplicity, direction consensus, and work completed as part of the evaluation criteria. The outcomes and actions from these discussions are summarized here:

Good utility and consensus on this, but not sure who will commit to adopting it yet, and the administrative implementation of this is mostly unknown. 

Would need an effort up front from the OMF to get adoption and build tools/hosting for the new format. It won't replace paper work done now, but is a step in that direction. Good for agencies to do their homework and be clear about what parts of MDS they are requesting. Would help with discovery of what services by other agencies and providers/third parties and is good for transparency. There is a need for this but not necessarily urgency.

Actions: Get agencies/cities/providers on board to commit to using this once complete.

@schnuerle schnuerle added the privacy Implications around privacy for the attention of the OMF Privacy Committee label Apr 12, 2021
@alexdemisch
Copy link
Collaborator

I'd be interested in getting this adopted on the SFMTA side. We'd probably be able to do it in a more timely manner if we had access to a generator tool and OMF hosted it.

@joshuaandrewjohnson1
Copy link

I think this could be valuable, particularly for discovery and comparison of regulations across cities rather than chasing down paper docs to review. With that said, I would like to see an effort to get broad adoption among cities to avoid a complicated mix of some using paper docs, some using Requirements API, some with detail on their GitHub sites, etc...so with that +1 to the need for generator tool and OMF hosting.

@schnuerle
Copy link
Member

We will be deep diving into how to move this idea forward in this week's Working Group.

@schnuerle
Copy link
Member

@joshuaandrewjohnson1 even if a city uses it along with other methods of paper doc regulations/permit requirements, wouldn't it be good for this to exist for each city? It seems useful for the city to go through the exercise of creating it (with optional OMF guidance) to be clear on what they are asking for, and once it exists, useful for providers to see what they need to provide (more useful than just "the latest version of MDS") without ambiguity.

The way it's proposed you wouldn't to have to build anything to machine read the file if you didn't want to. It's very human readable and could be used by a provider/third party without building a tool to ingest it.

@schnuerle
Copy link
Member

@alexdemisch I think part of the proposal could be a workshop for cities where the OMF helps them create the file and additionally hosts the file if needed. The OMF could even offer some one on one help to make it.

I'm thinking that a generator tool at this stage might be over engineering, since it can be created by hand quite quickly (like I did in the examples in the document draft).

@ezmckinn
Copy link
Contributor

@schnuerle @alexdemisch LINK / Superpedestrian would be glad to beta test the Requirements API feature, once the spec is developed!

@joshuaandrewjohnson1
Copy link

joshuaandrewjohnson1 commented Apr 26, 2021

If it is intended that the Requirements API will be used to allow for changes to data sharing requirements during the term of a program without need for a more formal process (i.e. council approval), it should be structured in a way that preserves important policymaking elements. I've listed some considerations/questions for that process below:

  • Ensure responsible department head has authority to make changes within a program's term without formal process
  • Guidance or template for language to be used in a "paper" doc (council approved) as well as the Requirements file
  • Allow for notification/comment periods prior to changes being implemented
  • If a comment period is allowed, how are comments reviewed and responded to, and by whom?
  • Maintain appropriate grace periods based on scale of changes (i.e. min 90 days for major release)
  • Should this be used to publish requirements that may have been made public in an RFP, but not yet formally approved? How could changes be tracked between the RFP and formally approved doc?
  • If the intention is to make this public facing, are there any accessibility considerations?

Spin is happy to test this as well, we would just want to ensure the policymaking process is not intentionally or unintentionally supplanted, as well as minimize confusion between the various documents/versions of an agency's data sharing requirements.

Some of these may also be applicable for the Policy API as well.

@schnuerle
Copy link
Member

We will be discussing a draft feature branch of this new /requirements endpoint within the MDS Policy API at this week's Working Group meeting.

@schnuerle schnuerle linked a pull request May 20, 2021 that will close this issue
@schnuerle
Copy link
Member

Finished a round of updates and adding examples based on last week's WG conversations. Please review and provide additional thoughts of your own and on the comment that are already here. Still needs more outside links with supporting documentation.

See the Policy Requirement PR for specific conversations people left about boolean value formats in MDS, SLA latency, mode specification.

Another question I have is when to add the url field to the required_endpoints data. Right now it's listed only if the url is 1) possible to be public per MDS and 2) chosen to be public by the agency. But is there value in listing URLs that are authenticated too? For example, listing the Agency API endpoints or non-public Policy API endpoints would be useful for providers to see, even if they are authenticated. Or agency employees can reference this to see where the Metrics API is for them to use. If so, could there be a new field like url_authentication with values like public or 'authenticated` to make it clear that you need credentials or not? Does posting the URLs create a security risk?

@quicklywilliam
Copy link
Contributor Author

Glad you called this out, @schnuerle. It has actually been my expectation all along that required_endpoints would contain all endpoints (both public and private) – I missed it in the PR that it was limited to public ones.

I think there is tremendous value in listing private endpoints, because it allows cities to be explicit about what they require and what they don't. It also would be confusing to exclude private endpoints for agencies who want to use the required_fields property on private endpoints.

From a data-security perspective, I think including private actually makes things more secure because providers could turn off access to endpoints and fields that are not being used by an agency.

@schnuerle
Copy link
Member

I think when to publish the URL wasn't really clear or defined, and I initially also thought that all of them could be published. But as I put those in the examples docs, I started to doubt the idea and so only put public urls.

The big question for me is, for providers, is there any credible risk if their MDS urls are known? The main MDS stubs are known in providers.csv, but not the full URLs once you add cities, versions, APIs, and endpoints to the URL. Eg. is there a difference between knowing "https://web.spin.pm/api/mds/v1" and "https://web.spin.pm/api/mds/v1/louisville/provider/trips". Though arguably that could be guessable, and if you know the format for one city you know them all, and also the URLs are being sent by email now anyway. Some providers have public, shareable, and discoverable API docs too.

It would be beneficial to have these URLs listed, for all parties involved, for multiple reasons, I agree, I just want to make sure it's ok with the community first.

@quicklywilliam
Copy link
Contributor Author

quicklywilliam commented May 28, 2021

Gosh, I would sure hope not Michael. If a known API url represents a credible security threat for any provider then they have no business dealing with sensitive location data.

Edit: hopefully the above didn't come across as just being a snide comment. I truly believe that it's the right thing from a security perspective for all endpoints to be listed. If any vendors are relying (perhaps implicitly) on the "security through obscurity" of their endpoints URLs not be publicly known, making them public would provide a good opportunity for them to clean up their act. It would also make vulnerability scans trivial.

@schnuerle
Copy link
Member

We will be discussing some of these items and updates to the PR on tomorrow's Working Group call: url security, boolean values, modes, metadata.

@quicklywilliam
Copy link
Contributor Author

quicklywilliam commented Jun 3, 2021

Below is a quick sketch of what came to mind in our conversation today. Conceptually, the object hierarchy is as follows:

  • The top level object corresponds to a given version of an agency's program - ie Portland May 2021 Scooter Program. It has info about the agency itself says when the data requirements go into effect, which operators they apply to, etc.
  • The required_apis array specifies the required APIs for the program version - ie for the May 2021 Scooter Program Portland requires MDS-provider 1.1 and GBFS 2.1
  • The required_endpoints specifies the required endpoints within the API, and any required fields within each endpoint
{
   "metadata": { 
     "version": "2",
     "last_updated": "1611958740",
     "max_update_interval": "P1D",
     "agency_uuid": "737a9c62-c0cb-4c93-be43-271d21b784b5",
     "agency_name": "Louisville Metro",
     "agency_timezone": "America/New_York",
     "agency_language": "en-US",
     "agency_currency": "USD",
     "agency_policy_website_url": "https:/www.cityname.gov/transporation/shared-devices.html",
     "agency_policy_document_url": "https://www.cityname.gov/mds_data_policy.pdf",
     "url": "https://mds.cityname.gov/policy/requirements/1.2.0",
	 # removed mds_release field, gbfs_required
     "provider_ids": [ # moved up from mds_versions object
       "70aa475d-1fcd-4504-b69c-2eeb2107f7be",
       "2411d395-04f2-47c9-ab66-d09e9e3c3251",
       "420e6e94-55a6-4946-b6b3-4398fe22e912"
     ],
     "start_date": 1611958740, # moved up from mds_versions object
     "end_date": 1611970539, # moved up from mds_versions object
   },
   "required_apis": [
     {
         {
           "api_name": "mds-provider", # added mds prefix
	    "version": "1.1.0", # moved version level down to this level
           "required_endpoints": [ 
             {
               "endpoint_name" : "status"
             },
             {
               "endpoint_name" : "trips"
             },
             {
               "endpoint_name" : "vehicles"
             }
          ]
     }
     { # here's an example of a required GBFS API, with optional requirements
       "api_name": "gbfs",
       "version": "2.1.0",
       "required_endpoints": [ 
         {
           "endpoint_name" : "gbfs.json" # note that this endpoint is always required per the GBFS spec
         },
         {
           "endpoint_name" : "system_information.json"
           "required_fields": [
             "feed_contact_email" # example of an optional GBFS field being required by the city
           ]
         },
         {
           "endpoint_name" : "geofencing_zones.json" # example of an optional GBFS endpoint, as discussed today 
         },
      ]
 	}
   ]
 }

@johnclary
Copy link
Contributor

johnclary commented Jun 29, 2021

Thanks @quicklywilliam for advancing this. I like this proposal a lot and want to emphasize our support for making this data public.

Some feedback on William's sketch (above)—it would be nice if agencies could optionally define the specific fields they require from each endpoint. For example, if our agency elected to exclude the route object from trips.

{
    "endpoint_name" : "trips",
    "required_fields" : ["provider_id", "device_id", "device_type"]
}

This may be an edge case, but I can see it being more useful as the standard continues to designate more optional fields.

Edit: derp, I see you covered this use case in the GBFS definition.

@schnuerle
Copy link
Member

We will be talking about this and recent updates on tomorrow's working group call.

@schnuerle
Copy link
Member

schnuerle commented Jul 8, 2021

Great discussion today and some decisions made. See the meeting notes for details.

Since the meeting I've taken action and completed the following in the PR spec and examples:

  • rename 'version' to 'file_version' in metadata section
  • renamed policy links to be program links
  • dropped policy_id
  • added available apis, endpoints, fields for agency hosted apis
  • added dot notation option for nested fields in specs
  • created new agency requirements repo and linked to it.

Action items include:

  • getting feedback here from providers about the burden of having optional fields being sent per agency.
  • discussion here about the concept of 'disallowed_fields' for required MDS fields that agencies may want to not receive, question of burden on providers for this capability per agency, and if this is a breaking change.

@schnuerle
Copy link
Member

schnuerle commented Jul 29, 2021

Hi all, we will be talking about this proposal and the concept of 'disallowed' endpoints and fields on today's working group call.

Example of how disallowed APIs, endpoints, and fields could work in the Requirements endpoint.

UPDATE: the WG agreed that the concept of disallowing APIs did not make much sense and should be removed. Disallowing endpoints and how to implement that is up for discussion. Disallowing fields seemed to have agreement.

  ...
    "required_data_specs": [
        {
          "data_spec_name": "MDS",
          "version": "1.1.0",
          "required_apis": [
            {
              "api_name": "provider",
              "required_endpoints": [
                {
                  "endpoint_name": "status_changes",
                  "required_fields": [
                    "associated_ticket"
                  ]
                },
                {
                  "endpoint_name": "trips",
                  "required_fields": [
                    "parking_verification_url"
                  ],
                  "disallowed_fields": [ // field level example
                    "vehicle_id",
                    "route"
                  ],
                }
              ]
            }
          ],
          "disallowed_endpoints": [ // endpoint level example
            {
             "api_name": "events"
            }
          ]
        }
      ],
      "disallowed_apis": [ // is API level needed?
        {
         "api_name": "agency"
        }
      ]
    ...

@schnuerle
Copy link
Member

See the action items and discussion from yesterday's public call and meeting.

There is a new discussion area to dig deeper into some of the topics discussed including disallowed endpoint implementation, and feedback from providers on the burden of disallowed fields.

@quicklywilliam
Copy link
Contributor Author

quicklywilliam commented Jul 31, 2021 via email

@schnuerle
Copy link
Member

@quicklywilliam I think this item "William H to start a discussion in the repo about getting trip start/end point data" is a different discussion about how we can solve, in a future release, for the question of how to get parts of the route points (eg the start and end) through /trips in a way that could be specified in the Req endpoint, to satisfy the problem Alex brought up of not having to use trip_id to connect to status_changes/events to get start/end points if they don't want route at all.

My idea on the call was "Could add properties to the route GeoJSON that describe a point (start, on trip, end) so start and end could only be returned and specified via Req endpoint", but there may be better solutions to discuss.

@schnuerle
Copy link
Member

Per our working group meeting, I've updated the PR with the following:

  1. Added the concept of disallowed_fields in the spec and an example with this.
    • Note for disallowed fields, I wrote that the field names must still be returned to preserve the structure, but the values must be null
    • Did not include disallowed_endpoints yet, awaiting discussion on the usefulness and logic of this within MDS.
  2. A new 'Beta Limitations' section that makes it clear required and disallowed fields are 'requested' if only sent digitally with this endpoint to providers, unless agencies have had an external discussion or requirement for these.

@schnuerle
Copy link
Member

Adding a note here that organizations like Transit @gcamp may be interested in this feature, and could update their guidance to include using the public Requirements endpoint in regulations, like in this example. Note that @mplsmitch from MobilityData (GBFS) has already been following along and could update their guidance.

@schnuerle schnuerle changed the title [Policy API] Ability to express data sharing requirements [Requirements] Ability to express data sharing requirements in Policy Aug 18, 2021
@schnuerle
Copy link
Member

To wrap up some discussions on supporting other data specs in this MDS feature, in the area where the data_spec_name field is defined in the spec, I've added this language:

Supported values are: 'MDS', 'GBFS'. Others like GOFS, GTFS, TOMP, etc can be tested by agencies and officially standardized here in the future -- leave your feedback on this issue.

So for this first release we specify how MDS and GBFS can be used explicitly, leave it open for people to use any others they want to try, and provide a way to leave feedback for official incorporation in future releases. 

@alexdemisch
Copy link
Collaborator

I really like where this is going. Having the flexibility for cities to clearly define what's required and disallowed will support some of our existing use cases, (e.g., SFMTA's e-moped program where we do not collect trip routing data). I think Requirements will become even more useful as MDS starts to incorporate additional modes and nuances of mobility programs/permit requirements.

@schnuerle schnuerle self-assigned this Sep 2, 2021
@schnuerle
Copy link
Member

Completed with #646! Thanks to everyone for the great work on this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Policy Specific to the Policy API privacy Implications around privacy for the attention of the OMF Privacy Committee
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants