Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the MediaStreamTrack list its cropping regions directly? #6

Open
youennf opened this issue Jan 19, 2022 · 11 comments
Open

Can the MediaStreamTrack list its cropping regions directly? #6

youennf opened this issue Jan 19, 2022 · 11 comments

Comments

@youennf
Copy link
Contributor

youennf commented Jan 19, 2022

When trying to apply a region to a track, you currently do not know whether the region can be used for the track.
This might be getting more complex if transferring the MediaStreamTrack say to a worker. You then need to transfer the regions where you transfer the track. This might be worse if allowing cropping for any tab in the future. This also goes against the idea of tracks being descriptive in their capabilities.

It would seem more natural if the regions could be retrieved directly from the track itself.

Here is an API example that could illustrate this:

interface MediaStreamTrackRegion {
    USVString origin;
    DOMString name;
}

partial interface MediaStreamTrack {
    readonly attribute sequence<MediaStreamTrackRegion> regions;
    attribute EventHandler onregionschange;

    Promise<undefined> cropTo(MediaStreamTrackRegion);
}

The main issue with that idea is that this might require to expose an API on the capturee-side, so that the MediaStreamTrack source can explicitly list its regions. This is the part I am most annoyed with, in particular with regards to the handling of third-party iframes. Here is nonetheless an API example to illustrate what could be done if the source is a browser tab:

dictionary TabRegion {
    required Element or DOMString target;
    USVString or sequence<USVString> origins; // ‘self’ by default, can be ‘*’, an origin or a list of origins.
}
interface MediaDevicesTabRegions {
    maplike<DOMString, TabRegion>;
}
partial navigator MediaDevices {
    readonly attribute regions;
}
@eladalon1983
Copy link
Member

eladalon1983 commented Jan 19, 2022

I generally like the idea, but I see some issues. Discussion follows.

The main issue with that idea is that this might require to expose an API on the capturee-side, so that the MediaStreamTrack source can explicitly list its regions.

Not necessarily. We could say that calling produceCropTarget() immediately registers the crop-ID on all relevant tracks, present and future.

interface MediaStreamTrackRegion {
    USVString origin;
    DOMString name;
}

partial interface MediaStreamTrack {
    readonly attribute sequence<MediaStreamTrackRegion> regions;

As mentioned, I like the idea. Some potential issues here, curious to hear if you have good solutions.

  1. There's no guarantee that name is unique, even within a given origin (consider same-origin iframes that are unaware of each other). In such a case, it'd still be necessary for the entity minting crop-IDs/TabRgions to communicate them directly to would-be collaborators. (Consider multiple non-collaborating iframes each declaring "main".) If that is indeed necessary, perhaps the effort of exposing regions on the track would be fruitless. Wdyt?
  2. Whoever mints a token, has to keep in mind that this is a "loud" and public action, that can be "heard" by unexpected entities, if they happen to be allowed to call gDM (and if the user approves the capture).

partial navigator MediaDevices {
readonly attribute regions;
}

nit: I think you mean readonly attribute MediaDevicesTabRegions regions. Please let me know otherwise.

This might be getting more complex if transferring the MediaStreamTrack say to a worker.

Not sure if "complex" is the right word, but it's a bit more ergonomic if tokens are already attached to the track, than if the sender has to "manually" attach them to the message (or send a separate message). Modulo aforementioned issues. I do not see this as a strong selling point.

@youennf
Copy link
Contributor Author

youennf commented Jan 24, 2022

  1. There's no guarantee that name is unique, even within a given origin (consider same-origin iframes that are unaware of each other). In such a case, it'd still be necessary for the entity minting crop-IDs/TabRgions to communicate them directly to would-be collaborators. (Consider multiple non-collaborating iframes each declaring "main".) If that is indeed necessary, perhaps the effort of exposing regions on the track would be fruitless. Wdyt?

Right, the goal of the proposal is indeed to use sensible names like "main", "slides"...
Given there is an origin associated to the name, conflict may only happen for iframes of the same origin, so I would hope they are somehow collaborating and choose unique names that actually match within their origin.

If it is not possible to fully collaborate (would be sad), the application could do something like "main-XYZ", "slides-XYZ", XYZ being given to the capturer for extra safety, "main" and "slides" having an application specific semantic.

2. Whoever mints a token, has to keep in mind that this is a "loud" and public action, that can be "heard" by unexpected entities, if they happen to be allowed to call gDM (and if the user approves the capture).

This token is only visible to trusted parties, i.e. the top level frames or frames that are deemed capturing in the name of the top level frame. I think this gives a good enough story.
That said, the proposal also allows to reduce exposure to specific origins (which makes most sense for non-self capture).

think you mean readonly attribute MediaDevicesTabRegions regions

Right.

it's a bit more ergonomic if tokens are already attached to the track

It is indeed more ergonomic because it is attached to the track.

The additional point is that it allows a more declarative approach so that an iframe may create its regions without ever reacting to a message from its parent, or sending a message to its parent.

@eladalon1983
Copy link
Member

We agree that exposing the regions on the track[*] would be ergonomic.

Exposure of all CropTargets on the track would naturally co-exist with produceCropTarget() returns CropTarget. Therefore, exposing on the track becomes an additive improvement. I suggest we leave it out of the MVP. Avoiding scope-creep would make it easier to reach consensus.

That we can add this later is especially true because CropTarget (which you call MediaStreamTrackRegion) is an opaque interface. This currently-opaque interface can be easily extended later with your suggestion of origin and name. (Modulo that it would raise some concerns if we wish to extend this mechanism for non-self-capture; please see my next comment.)

I'll continue the discussion of the merits of expose-on-track in my next comment, but I hope we can agree that this is a non-blocking early discussion about later improvements? @youennf? @jan-ivar?


[*] This refers back to issue #10, as we are now discussing yet another property to be exposed on tracks, and so the issue of irrelevant-exposure becomes more important. It strikes me as odd that when getUserMedia returns an audio track associated with the user's mic, cropTo and regions would be exposed on it. I think it's more ergonomic for developers if the controls are only exposed where they're relevant, and developers rank higher than implementers and spec-authors.

@eladalon1983
Copy link
Member

eladalon1983 commented Jan 25, 2022

This comment continues the discussion of expose-on-track. Please note that, as per my previous comment, I argue that this should be a later improvement.

interface MediaStreamTrackRegion {
    USVString origin;
    DOMString name;
}

I think we want these to be readonly.

Assume we (later) extends produceCropId() to receive name as a parameter. (The origin is inferred from the call-site.) There's a decision to be made here. Do we...

  1. ...reject produceCropTarget(name) if the pair (name, origin) would not be unique?
  2. ...add a random field that ensures uniqueness?
  3. ...specify that two CropTargets with the same fields can nevertheless distinct? That is, that there might be an unexposed property that's part of any comparison?

My order of preference is currently with no1 as the most preferred and no3 as the least preferred.

This token is only visible to trusted parties, i.e. the top level frames or frames that are deemed capturing in the name of the top level frame. I think this gives a good enough story.

It's a credible story only so long as we don't try to reuse this mechanism for cropping non-own-tabs. The assertion that "the author of name allowed the the top-level document to read it, and the top-level in turn deputized gDM's caller to read name" only works if the top-level is from the same browsing context. Otherwise, the author of name does not know the top-level reader and cannot allow it to read anything[*], let alone allow this permission to be transitive. I suggest we avoid scope-creep and leave both topics for later. That is, neither crop-other-tab-capture nor expose-crop-targets-on-track belong in the MVP, is my opinion, and both topics will have to be dealt with at the same time.


[*] That is, not without extending the API further.

@youennf
Copy link
Contributor Author

youennf commented Jan 26, 2022

This token is only visible to trusted parties, i.e. the top level frames or frames that are deemed capturing in the name of the top level frame. I think this gives a good enough story.

It's a credible story only so long as we don't try to reuse this mechanism for cropping non-own-tabs.

I was answering for the case of self tabs and creator of the region not providing any origin restriction.
The creator of the region can decide which origins can learn about the region.

That said, I am not sure I like it since it is yet another hole in our partitioning client rules.
While calling getDisplayMedia on a tab is breaking partitioning client rules, it seems desirable to limit this hole in terms of API .

@youennf
Copy link
Contributor Author

youennf commented Jan 26, 2022

I'll continue the discussion of the merits of expose-on-track in my next comment, but I hope we can agree that this is a non-blocking early discussion about later improvements? @youennf? @jan-ivar?

I missed that question.
If we think expose-on-track is useful, I think we should look at the impact on the current API.
For instance, if we have expose-on-track, RegionCapture does not need to be transferable, maybe not even needed.

@eladalon1983
Copy link
Member

The creator of the region can decide which origins can learn about the region.

That requires extending the API even further. For me, this is beyond the scope of an MVP.


This thread has me convinced that we have a viable way forward for both (i) exposing region on the track and (ii) doing so in a way that's consistent with other-tab-capture. Tabling this discussion would not risk us painting ourselves into a corner.

The only thing I see us potentially losing by not having this discussion right away, is that we might specify that CropTarget is transferable and is returned by produceCropTarget. This seems quite reasonable to me. With your improvement-proposal, collaborating frames would have to postMessage the name anyway. I don't think the delta between postMessage(name) and postMessage(cropTarget) is significant.

Wdyt? Improvement label and revisit later? (Please consider that it might be more fruitful to have these discussions after more implementations - Safari, Firefox - have had the time to try this out and see what's challenging and what's not, and what the right trade-off would be.)

@eladalon1983
Copy link
Member

eladalon1983 commented Jan 31, 2022

Another interesting corollary of your suggestion to expose CropTargets on MediaStreamTrack, is that whenever a CropTarget is minted, it "magically" produces a new entry elsewhere with a message in the form of your name field. This amounts to introducing yet another potentially-cross-origin message channel. (Contrast with my own suggestion of having it explicitly postMessage-ed using existing mechanisms.) I see this as highly undesirable for an MVP.

@youennf
Copy link
Contributor Author

youennf commented Feb 2, 2022

That requires extending the API even further. For me, this is beyond the scope of an MVP.

This is part of the API example exposed in the initial message of this thread.

whenever a CropTarget is minted, it "magically" produces a new entry elsewhere with a message in the form of your name field.

It depends on the API used to create a CropTarget, this seems fine in scope of the API example exposed in the initial message of this thread.

This amounts to introducing yet another potentially-cross-origin message channel.

Yes, that is my main concern, even though in practice it does not really matter (the fact you self capture is already breaking the isolation), it seems best to stick with something simple.
It is also true that the simple case of transferring a track to worker and cropping in worker is simpler to implement with transferring a CropTarget.

Wdyt? Improvement label

Sounds good to me.

@eladalon1983
Copy link
Member

This is part of the API example exposed in the initial message of this thread.

There are two separate access-controls here:

  1. Who may read, directly from a track, the CropTargets that exist in the current browsing context.
  2. Who may cropTo(CropTarget) for a given CropTarget.

I gather from your latest message that you intend the same access-control to apply to both simultaneously. I'd just like to bring to your attention that, so long CropTargets are transferable, the first does not imply the second. You'd have to explicitly state that your access-control applies to both. And then you'd also need to discuss what happens to a track that's transferred after a crop has been applied, if the new owner of the track would not have otherwise been allowed to crop by itself. Access-control is hard and best left as post-MVP whenever possible[*].

Wdyt? Improvement label

Sounds good to me.

Great, seems like we agree about it being post-MVP. I look forward to resuming this discussion when we've reaches consensus on the MVP parts.

--
[*] "Whenever possible" added as a clarification in anticipation of this topic being debated, with inverted roles, around Capture Handle.

@youennf
Copy link
Contributor Author

youennf commented Feb 2, 2022

And then you'd also need to discuss what happens to a track that's transferred

Transferring a track is not transferring the source, so the source origin remains stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants