Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we support strings in addition or in lieu of opaque identifiers? #46

Open
youennf opened this issue May 16, 2022 · 31 comments
Open

Comments

@youennf
Copy link
Contributor

youennf commented May 16, 2022

CropTarget current proposal is basically an opaque wrapper around some kind of element identifier.
@eladalon1983 original proposal was to expose a string, with some use cases in mind that could take benefit of the identifier being a string.
The WG decided to move to an opaque identifier.
Since then, @eladalon1983 mentioned several times the string use cases with the potential idea to allow CropTarget stringification.

This raises a few questions:

  • Should we reconsider our position and directly use string identifiers?
  • In that case, should we go with web app provided identifiers or user agent generated identifiers (stringification would mean the latter).
  • Should we instead look to support these usecases without stringification? for instance by allowing MessagePort communication in CaptureHandle?
@youennf
Copy link
Contributor Author

youennf commented May 16, 2022

Some general thoughts against user agent generated strings.
Lifetime of a CropTarget is well defined. Lifetime of a string value is potentially endless.
If there is some clean-up being made when a CropTarget is GCed, this cannot be easily done when a string gets GCed since the value can be recreated at any point in the future.
For instance, browsers might have a look-up table to identify the process where the element lives from the string value.

To avoid this, using web page provided IDs or using CropTarget serialisation seem more promising approaches.

@eladalon1983
Copy link
Member

eladalon1983 commented May 16, 2022

Should we reconsider our position and directly use string identifiers?

I think that'd be for the best. Modulo the next points.

In that case, should we go with web app provided identifiers or user agent generated identifiers

I much prefer UA-assigned strings. App-assigned strings would mean we have to consider collisions (both in spec and in implementation). If the user agent assigns, it can trivially guarantee no collisions by randomizing a UUID. Less to worry about. Simple.

Lifetime of a CropTarget is well defined. Lifetime of a string value is potentially endless.

So long as we never assign the same string value to a new Element, what's the problem? And if we randomize a UUID, that's easy to guarantee.

If there is some clean-up being made when a CropTarget is GCed

As the identifier will be attached to the Element, we can GC when the Element is GCed. Am I missing some complication?

@youennf
Copy link
Contributor Author

youennf commented May 23, 2022

As the identifier will be attached to the Element, we can GC when the Element is GCed. Am I missing some complication?

A CropTarget life can be much shorter than an Element. This is one thing that makes CropTarget appealing.
With generated IDs, there is no other way than either having a revoke mechanism (like blob URLs, which is a very leaky mechanism) or expanding the lifetime of the ID to the Element itself. An Element might have a very long lifetime, especially with page cache, this is risky to rely on element destruction to do the clean-up.

I believe CropTarget serialisation/postMessage should be sufficient for all current and future scenarios.
We can certainly expand APIs to support CropTarget exchange through serialization/postMessage as needed.

@eladalon1983
Copy link
Member

Practically speaking, I think CropTarget with serialization and a possible future-path to stringification are good enough, and I am fine with sticking with them, as the consensus appears to be (WG interim meeting; your messages). I don't think it's necessary to go back to the drawing board over this.

Theoretically speaking, I don't yet understand your concerns over ID lifetime; if you have the time to educate me, I'd be happy to hear more. Namely:

  const token = makeToken(document.getElementById('some_id'));
  const uuid = makeUUID(document.getElementById('some_id'));

Assume the relevant element is garbage-collected but token and uuid are still alive. Both token and uuid now hold "something", but nothing of interest can be done with that "something". What's concerning about this state of affairs? How is token less concerning than uuid?

@youennf
Copy link
Contributor Author

youennf commented May 23, 2022

I think CropTarget with serialization and a possible future-path to stringification are good enough

I disagree here, it does not make sense to do stringification on top of CropTarget.
If the WG validates the use cases/support of strings, it seems best to deprecate the CropTarget APIs and go with a single string based API.

Assume the relevant element is garbage-collected but token and uuid are still alive

The interesting case is when the element is still present, not when it went away.

How is token less concerning than uuid?

If you GC uuid, it does not mean the actual uuid value is lost from the app knowledge, the app can reconstruct it.
For instance the app might not store uuid but might store uuid.substring(0, 1)) and uuid.substring(1).
It can later create a new uuid2 representing the exact same string.
Or it might store the uuid in IDB and query it later on.

You cannot recreate a CropTarget like you can with strings.
With strings, you will need to keep the mapping between the string and its element as long as the element is alive.
With CropTarget, you will need to keep the mapping until either one of element or CropTarget go away.
I would expect the lifetime of a CropTarget to be much shorter than the lifetime of an element.

@eladalon1983
Copy link
Member

eladalon1983 commented May 23, 2022

I disagree here, it does not make sense to do stringification on top of CropTarget.
If the WG validates the use cases/support of strings, it seems best to deprecate the CropTarget APIs and go with a single string based API.

Whether we deprecate the token in favor of a string, or add stringification on top, is an issue for future discussion. At the moment, what's important for me is that both options are technically feasible.

The interesting case is when the element is still present, not when it went away.

What's interesting about that case?

If you GC uuid...
You cannot recreate a CropTarget like you can with strings.

Recreating the UUID is equivalent to never letting go of the CropTarget. What I'm missing is a scenario where the app would be forced to let go of CropTarget, but could somehow (i) keep hold of the UUID, which is easy, AND (ii) have it still be meaningful, which is... impossible, I think...? What am I missing?

I would expect the lifetime of a CropTarget to be much shorter than the lifetime of an element.

We've specified that CropTarget holds a weak-reference to the Element. If the Element is GCed, the CropTarget becomes meaningless. (And btw - that's another way GC can be observed here, regardless of whether we go with a UUID or a CropTarget.)

@youennf
Copy link
Contributor Author

youennf commented May 23, 2022

The interesting case is when the element is still present, not when it went away.

What's interesting about that case?

If element went away, you can clear the mapping between CropTarget/uuid and the element. No different between CropTarget and UUID.
If the element is there, you can clear the mapping if CropTarget is GCed. You cannot clear the mapping if UUID is GCed.

Recreating the UUID is equivalent to never letting go of the CropTarget.

Exactly, and that is not great. We like to free memory as soon as we can.
We cannot do this with UUIDs. Please look at what blob URLs are as a precedent.

If the Element is GCed, the CropTarget becomes meaningless. (And btw - that's another way GC can be observed here, regardless of whether we go with a UUID or a CropTarget.)

No. Before being GCed, the element will be detached from the DOM. At that point, cropping/cropTo should give us the same result, whether the element is GCed or not.

@steely-glint
Copy link

One advantage of keeping it as an opaqueToken is that from the developer point of view you can be pretty certain of it's provenance - it probably came in a postMessage via a message port you opened. This makes it easy to decide how much to trust it.
With a UUID/String you have to do a lot more thinking about how it got to you before you decide to trust it. I don't think the benefit is sufficient for the risk.

@steely-glint
Copy link

The benefit of Stringifying is that it allows the token to round-trip via a server or two. But this risks forcing apps to speculatively send their cropTargets to all the big players on the off chance that their page may be being captured, producing a potential privacy honeypot.

@eladalon1983
Copy link
Member

With a UUID/String you have to do a lot more thinking about how it got to you before you decide to trust it. I don't think the benefit is sufficient for the risk.

Why would it be more/less trustworthy if you got it as CropTarget or as string-which-is-a-UUID? Could you please lay out a concrete problematic case that developers would need to be wary of?

But this risks forcing apps to speculatively send their cropTargets to all the big players

Or they could use Capture Handle.

  1. Expose to whomever.
  2. Exposure can be done locally - quick and simple.
  3. Capturer can ignore if they don't know and trust the capturee. (Capture Handle lets you know the origin of the capturee - if they opt-in - and you can just ignore otherwise.)

@steely-glint
Copy link

I can't tell if it is more or less trust worthy - that's the point - but
there are many more ways that a uuid could arrive in my app and be manipulated/tracked on the way. An opaque token ensures that it was minted in this user agent, for this user, in this session and gives you a bunch of origin info too. It can't be tracked or correlated, this makes it much easier to reason about security or lack thereof.

@steely-glint
Copy link

As to a specific risk - one I have in mind goes like this:

A major Video Conference app chooses to offer a server based webAPI for co-operating web apps to submit their cropTargets (to avoid cross origin issues). Perhaps it even penalises sites that don't with a user warning or something.

Now suddenly every app that ever wants to be capable of being screenshared without the warning will have to (speculatively because it can't know the user intent to capture it) always send cropTargets to the video conference server's API for every user session - even if this user has never and will never use that conference server.
As a reward the conference app gets detailed usage stats for all screen-shareable apps. This is not a good thing IMHO and we should not set up a situation which permits such leverage.

None of this happens with an opaque token because unless the user actually has a session with the videoconference app, there is nowhere to post message the token to, so no stats can be collected.

@eladalon1983
Copy link
Member

eladalon1983 commented Jun 24, 2022

there are many more ways that a uuid could arrive in my app and be manipulated/tracked on the way

Assume we specify:

  1. Serializing the same CropTarget multiple times yields different UUIDs. cropTarget.serialize() != cropTarget.serialize().
  2. CropTargets cannot be compared. cropTarget1 != cropTarget1 as well as cropTarget1 != cropTarget2. Essentially, all comparisons evaluate to false.
  3. Clarification: Naturally, deserializing different UUIDs derived from the same CropTarget, derives a CropTarget referencing the same original Element. That's not an issue because of 1 and 2.

Tracking becomes impossible unless you actually get the user to capture the tab from which the CropTarget comes, which is sufficiently rare and user-driven as to be uninteresting. (At that stage, tracking using CropTarget is the least of your worries; you have plenty of other surfaces.)

A major Video Conference app chooses to offer a server based webAPI for co-operating web apps to submit their cropTargets (to avoid cross origin issues). Perhaps it even penalises sites that don't with a user warning or something.

  1. Without an active screen-capture of the tab which minted the CropTarget, the UUID is indistinguishable from any randomly generated UUID. What good does it do the video conference app to get arbitrary UUIDs that can't be used for tracking?
  2. Suppose the video conferencing app has captured another tab. What now? Start checking the billions of CropTargets it's received? Assuming for the sake of argument that it could feasibly reduce the number of reliable candidates - what would it learn of the captured tab, that it would not learn from just examining all of the app's pixels, which it's already observing?

@steely-glint
Copy link

There are 2 issues here, which I put in separate comments above, and I'll try and keep separated again here.

From the point of view of the recipient of a cropTarget (e.g. a video conference app), I claim it is much easier to be sure that it is genuinely from where it appears to be than a UUID string that may have been passed through several layers of servers. No amount of improvement of the rules on creation of the CropTarget UUID makes any difference to proof of provenance.

@steely-glint
Copy link

To be clear, my second point is about the threat of a theoretical VC service that uses the arrival of cropTarget UUIDs as a way of collecting usage data on its competitor's other apps (say slideshow apps) - especially interestingly of users who don't use the VC app, but the slideshow vendor still has to send cropTargetUUIDs anyway on the off chance that the user might use the VC app. It doesn't need to actually apply the UUIDs in a live capture, it still gets usage data.
(Yes one could spam the service with fake UUIDs to prevent the stats being accurate, but that's a road I'd rather avoid).

The existence of cropTargets as UUIDs enables this risk in a way that an opaque token prevents.

@eladalon1983
Copy link
Member

eladalon1983 commented Jun 25, 2022

From the point of view of the recipient of a cropTarget (e.g. a video conference app), I claim it is much easier to be sure that it is genuinely from where it appears to be than a UUID string that may have been passed through several layers of servers. No amount of improvement of the rules on creation of the CropTarget UUID makes any difference to proof of provenance.

  1. Assume you got a CropTarget/UUID and you're not sure of its origins. What compels you to use it?
  2. Suppose you've used it. What are the ramifications?
  3. Suppose for the sake of argument that there are ramifications. What's compelling you to keep on using these untrusted CropTargets?

It doesn't need to actually apply the UUIDs in a live capture, it still gets usage data.

So the video conferencing application would twist the arm of the entire Web by giving degraded service to those who do not comply with its demand to be tracked? What's more important for Wikipedia - their compliance with various regulations, or the off-chance that someone might be tab-sharing them? Your hypothetical video conferencing tracking-tyrant would find that:

  1. Nobody is sending it any CropTargets.
  2. All of its users have left for better products that don't arbitrarily degrade performance.
  3. Its CEO is going to jail.

I don't think this is a credible concern. But I am open to hearing more and having my mind changed.

@eladalon1983
Copy link
Member

eladalon1983 commented Jun 25, 2022

Here is an alternative version of the attack you proposed, which would not require any CropTarget. It works equally well with today's means:

  1. [Different step] Demand that all sites be watermarked with some pixel-pattern that is unique, machine-readable, and not human-readable, and that this pattern be posted to the video-conferencing service.
  2. [Identical step] Punish those sites that don't comply by giving them degraded service.
  3. [Identical step] Collect tracking-data the same way you would have with the CropTarget UUIDs.

Since this attack is equally technically feasible with/without serialization, I believe we can now forget about this concern.

@steely-glint
Copy link

steely-glint commented Jun 25, 2022

Sure, but I'm afraid my scenario is plausible and yours hopefully isn't because it involves a lot of compute cycles to add and detect watermarks and they offer no user benefit in return for breaking e2e crypto promises.

CropTarget UUIDs are lightweight and offer apparent user benefits. My point is that you can get all the user benefit without the risks by keeping it as an opaque token.

My example wouldn't be described as tracking, it would be 'enhanced collaboration', but of course once the data arrived it would be irresistible to use it in additional ways.

@eladalon1983
Copy link
Member

eladalon1983 commented Jun 25, 2022

CropTarget UUIDs are lightweight and offer apparent user benefits. My point is that you can get all the user benefit without the risks by keeping it as an opaque token.

The UUID is still opaque if it observes these guidelines. I believe the crux of your concern is not opaqueness, but rather transmissibility to remote servers. I believe transmissibility to be desirable, because it allows cross-origin collaboration that would easily hook up to existing infrastructure. You have brought up a possible abuse scenario. I don't personally see it as credible, but we can have different opinions, that's fine. I have shown that technically, the same concern exists already; however, now you believe my scenario is not credible. (That's also fine.) How do we resolve this? Do you have a suggestion for determining which concerns are credible and which are not?

@steely-glint
Copy link

Yes, you are right - Transmissibility is the word I should have been using all along.

As to how to evaluate such concerns - I guess there are a couple of axes .

  1. look at the benefits.
  2. look at how likely the risk is to occur.

Can you outline a scenario where lack of transmissibility would prevent cooperation between cross-origin services in the same user agent session ?

I think the credibility is partly to do with a how small the steps are that you have to take to get to the risk.
My scenario is quite incremental, with legitimate user 'benefits' offered initially, but
once the data has been gathered for a legitimate purpose it is difficult to prevent it being repurposed for others.
e.g. We have seen that corps were unable to stop themselves from misusing 2FA phone numbers as advertising targeting data.

@eladalon1983
Copy link
Member

eladalon1983 commented Jun 29, 2022

look at how likely the risk is to occur.

Video conferencing is big in our world, but it's not so big that Wikipedia and other non-video-conferencing sites would worry that they must comply with arbitrary demands for fear of tab-sharing not working too well.

Let's run a thought experiment. You and I together quit our current jobs and start a new video-conferencing application. We're tremendously successful and gain 125% of the market - we're just that good. Now we write an email to Wikipedia and explain to them that, unless they start sending us CropTargets whenever they do any page-load, we will penalize them! They write back - "penlize how?" To which we respond - "we will annoy our OWN users with anti-Wikipedia messages whenever they try to tab-share Wikipedia pages!! (Not if they share a windows or the entire screen, though. We have technical limitations and decided not to use watermarking to circumvent them; we'll only harm your tab-sharing game.)"

What do you think happens next? I think we might give someone in Wikipedia the laugh of their life. But I don't think they'll bother emailing us back. After all - why should they care? What percentage of Wikipedia page loads culminate to a user trying to tab-share that page?

Also - how long will our users stick with us if we do follow-through on our threat?

To summarize - I don't think this is a real risk.

Can you outline a scenario where lack of transmissibility would prevent cooperation between cross-origin services in the same user agent session ?

I can outline a scenario and simultaneously give a shoutout to the greatest musical genius of our generation.

image

Meet captures YouTube running in another tab. YouTube wants to send over a CropTarget focusing on the video and cropping away the list of suggested next videos. Meet could then show the user controls to crop to the video or share the entire tab. [1] The benefit here is that the user could easily hide their playlist, which they might find embarrassing. But how can youtube.com send a message to meet.google.com? Enter shared cloud infrastructure...[2]

--
[1] Jan-Ivar has previously asked about Meet's plans for user-facing crop-controls. To clarify, this is a hypothetical scenario I have concocted in the last minute. It's not an announcement of future plans.
[2] Or enter specialized fields on Capture Handle. I also intend to suggest those.

@eladalon1983
Copy link
Member

eladalon1983 commented Jun 29, 2022

As for legitimate purposes for sending CropTarget optimistically to unknown capturers - if these purposes are truly legitimate, then the real question is - how do we provide them in such a way, that sites can reap the benefits without:

  1. Paying ridiculous costs (post CropTarget to all video-conferencing services that potentially exist).
  2. Compromising privacy.

I think these are great questions, and we should tackle them as follow-ups. (Exposure using Capture Handle is my answer there, btw.)

The use-case outlined in the second half of my previous message, assumes the capturer/capturee have an intimate relationship despite being cross-origin, and stringification solves that problem simply and easily.

@steely-glint
Copy link

But how can youtube.com send a message to meet.google.com? Enter shared cloud infrastructure...

So vimeo users will continue to get their playlists shown. Unless vimeo also participate by sending API messages to an open meet.google.com.

@eladalon1983
Copy link
Member

But how can youtube.com send a message to meet.google.com? Enter shared cloud infrastructure...

So vimeo users will continue to get their playlists shown. Unless vimeo also participate by sending API messages to an open meet.google.com.

I am calling into question your basic premise. I don't believe that a non-negligible number of sites would ever send stringified CropTargets optimistically, because that's just too expensive, and the upside is nil in 99.9999% of the cases. I think the only case where CAPTUREE would send CAPTURER a CropTarget, is if:

  1. CAPTUREE and CAPTURER already trust each other.
  2. CAPTUREE and CAPTURER are often used together, e.g. if they are part of a bundled offering.
  3. Key requirement: Through a pre-existing channel, CAPTURER has alerted CAPTUREE to the existence of a capture session, thereby soliciting CropTarget with a pre-existing well-defined meaning that both sides understand as mutually useful. That means that a video-conferencing app could expose user-facing controls to focus on video/share entire tab. Without a well-understood semantic for the CropTarget, it is useless. The user does not understand apply CropTarget/remove CropTarget. Sane apps are not going to expose to the user unknown CropTargets that they had received over an open API that arbitrary sites can trigger.

As such, I still hold that the risk of tracking by soliciting the entire Web to send you CropTargets is unfounded.

@steely-glint
Copy link

I don't (entirely) disagree. But the practical effect is the same - either vimeo performs worse in meet than youtube does or they optimistically send cropTargets. Because they don't share server infra/trust with meet.
(service names are for illustration only - I have no idea of the relationship between meet and vimeo now or in the future)

@steely-glint
Copy link

However.... I have had an idea - what if the CropTarget UUID was also made visible as a property of the track.

Then an app can set an 'area of interest' CropTarget - and any capturing apps can retrieve it and apply the CropTarget or not at it (or the user's) discretion.

@eladalon1983
Copy link
Member

eladalon1983 commented Jul 3, 2022

what if the CropTarget UUID was also made visible as a property of the track.

Do you mean a property of the track via being a property of Capture Handle? If so, yes, I've had this idea too. The idea is basically for the capturee to expose a mapping like {name1: cropTarget1, name2: cropTarget2}. But please note that this is still not useful for arbitrary pairs of capturer/capturee, because "name1" is not meaningful to the capturer if it does not recognize the capturee. I just don't think that a video-conferencing tool is going to invest engineering effort in receiving and applying crop-targets "of interest" from unknown websites. From known websites where a collaboration is set up manually by product managers - that's a different story. But there, stringifiability is a much simpler solution and it works with no concerns.

@youennf
Copy link
Contributor Author

youennf commented Jul 4, 2022

In general, I agree with @steely-glint.
The WG made a decision to replace UUIDs by serializable opaque objects, we should build on this instead of revisiting this decision.
Given CropTarget is serializable, we should be able to serialise a CropTarget from capturee to capturer.
@steely-glint mentioned hooking this at the track level, which is basically #6.

An alternative mentioned during past discussions is to either postMessage the CropTarget and/or add new API via Capture Handle API to expose CropTarget to capturers. For instance by allowing to pass serializable objects in addition to strings, or by allowing to open a MessagePort-based communication channel between a capturee and any capturer.

@danjenkins
Copy link

I really like the idea of not needing to use post message at all and relying on other constructs we've already got access to. While discussing this with @steely-glint the other day I started thinking what those could be ranging from "let any page listening for a "createdCropTarget" event know about an available crop target" to "have a list of crop targets on a media stream track and also allow events to tell tab A that's already sharing tab B that tab B has a new crop target available" and then we get away from needing to talk about strings/opaque identifiers etc... when creating a crop target you could limit a crop target to a certain origin if you wanted your crop to only be available to your set of domains.... you'd also need to be able to set a description against crop targets so that an end user could select "I want to share the presentation and not the notes" - there would be no in-between server or post message for you to add any meta data - you'd be reliant on the browser to have that information in a crop target and give that information to the other tab etc.

I'd kinda like to offer crop targets direct to a end user visually too - think about selecting a tab from the share your tab gui...... you select a tab that has active crop targets..... and it asks if you want to share the whole tab or a crop taget - giving the nice visual representation of that crop target just like they get for a tab.... this kind of media stream track wouldnt be allowed to remove that crop target as it had been explicitly shared. (but that's getting side tracked)

Getting back to this topic.... how would this work? No access to crop targets until you have access to the media stream track that has corresponding crop targets associated with it.... see a list of available crop targets (and their descriptions etc) on the media stream track with a new event telling you about any new crop targets that have been made since the media stream track was initialised.....

Maybe we could allow a crop target to have a user generated ID associated with it.... just like a description. that way Meet could still be told that google slides has a list of crop targets and one of them has a user generated ID of X so that Meet can auto crop to X.....

Sorry for the brain dump....

@eladalon1983
Copy link
Member

@jan-ivar, during today's meeting, you asked "why not make the CropTargets stringable?"
The arguments for and against are here. What say you? Shall we go with that option?

Tagging @alvestrand, @youennf and @dontcallmedom for visibility.

@alvestrand
Copy link

Note - if a CropTarget() is stringable, there are only two ways to use that string that I can think of:

  • Compare it to the string value of a cropTarget you already have ("verification")
  • Create a CropTarget from it (new API)

All the verification that this is a valid crop target that you're supposed to have access to would be in the "create a CropTarget" function. The stringification part alone is basically harmless, because it's mostly useless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants