Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtual MIDI ports #45

Open
cwilso opened this issue Jan 29, 2013 · 75 comments
Open

Virtual MIDI ports #45

cwilso opened this issue Jan 29, 2013 · 75 comments
Assignees
Labels
class: substantive https://www.w3.org/policies/process/#correction-classes status: blocked Another issue or external dependency needs to be resolved before proceeding
Milestone

Comments

@cwilso
Copy link
Contributor

cwilso commented Jan 29, 2013

In speaking with Korg at NAMM, they really wanted to have the ability to define virtual MIDI input/output ports - e.g., to build a software synthesizer and add it to the available MIDI devices when other apps/pages query the system.

Yamaha requested the same feature, even to the point of potentially creating a reference software synth.

We had talked about this feature early on, but cut it from v1; truly adding the device into the list of available MIDI devices to the system (e.g. other native Windows/OSX apps could access) would likely be quite hard, involving writing virtual device drivers, etc., which is why we decided not to feature in v1. We might consider a more limited feature of virtual devices that are only exposed to web applications; this might still be a substantial amount of work to add, but I wanted to capture the feedback.

@jussi-kalliokoski
Copy link
Member

I've spent some time thinking about this feature lately and I want to share some ideas and open questions I have. At the simplest, what I see we could have is this:

dictionary MIDIPortOptions {
  string name;
  string manufacturer;
  string version;
}

partial interface MIDIAccess {
  MIDIInput createVirtualOutput(MIDIPortOptions options);
  MIDIOutput createVirtualInput(MIDIPortOptions options);
}

As you can see, createVirtualInput() creates a MIDIOutput, naturally, since the return value is not the resulting port but an API for making it do things, i.e. in order for a MIDI output port to generate output, you have to define that output.

The resulting ports would naturally have no id since they don't correspond to any actual ports, virtual or physical.

One open question is whether we want to have these as factories on the MIDIAccesss interface or as global constructors. On one hand, for the basic uses of the ports it doesn't make any sense to require user consent, there's no fingerprinting potential since even the id is not exposed, nor do the ports really do anything unless the user explicitly connects them somewhere manually or via another piece of software (s)he's running.

However, this needs to be thought through. I've heard that some iOS apps use virtual MIDI ports to communicate with each other. If that is the case, we need to consider whether a web app pretending to be another native application should be considered a potential risk. In the worst case nightmare scenario an app would be transmitting Lua (or similar) code via MIDI which could result in a cross-application scripting attack, possibly leveraging all the privileges the user has granted that application. Another, much likelier case would be that a user's credentials would be transferred from one application to another, similar to OAuth except that the authentication would happen in another application instead of on the web, and an intercepting application could steal these credentials.

@toyoshim
Copy link
Contributor

toyoshim commented Aug 7, 2013

Here is my thoughts for virtual ports.
Sorry, but this note focus on another point.

If we start to support virtual ports, we should consider each device latency more seriously.
Here, I suppose that major use cases of virtual ports are software synthesizers.
Software synthesizers cause much more latency than real devices. As a result, without latency information, it must be very hard to use both hardware and software simultaneously in one web application.

Also virtual ports can be used for controlling remote MIDI devices through the internet. Even in this use case, latency is important and should be handled correctly.

So in the v2 spec, MIDIPort may want to have an additional attribute for latency from event arrivals to audio playing back.

@jussi-kalliokoski
Copy link
Member

I agree about the latency, we need to take that into account. Some use cases:

  • Virtual instrument on a web page. Has inherent latency from the audio graph that the host needs to take into account. Needs a way to set the latency of its virtual input.
  • Web page that does sequencing. Needs to take the latency of an external instrument into account. Needs a way to read the latency of its output.
  • A web page that takes a guitar input and converts it to MIDI. Has latency in the D/A and pitch detection and needs to convey that to the consumer. Needs a way to set the latency of its virtual output.
  • A notation software. Needs to be able to sync various MIDI sources in order to get a synced composition. Needs a way to read the latency of its inputs.

So basically, I think what we need is for normal ports a way to read their latency (if not available, report 0) and for virtual ports to write their latency, e.g.

partial interface MIDIInput {
  readonly double latency;
};

partial interface MIDIOutput {
  readonly double latency;
};

partial interface VirtualMIDIInput {
  double latency;
};

partial interface VirtualMIDIOutput {
  double latency;
};

@marcoscaceres
Copy link
Contributor

I prefer these were constructors instead of a factory. Agree about the latency, but I'm not sure about using 0 as meaning both 0 latency and unknown... but then, making the attribute nullable might not be great either.

@jussi-kalliokoski
Copy link
Member

I'm not sure about using 0 as meaning both 0 latency and unknown

I can't think of any case where the default behavior in the case of unknown latency would not be to assume zero latency, so for most cases there would be no extra work to account for the unknown latency situation, hence the suggestion. If we're able to come up with sane scenarios where you'd benefit from it being non-zero and unknown, I'll be happy to use another value.

I prefer these were constructors instead of a factory

I agree, but we'll have to carefully assess whether there's a security risk in that.

@marcoscaceres
Copy link
Contributor

I agree, but we'll have to carefully assess whether there's a security risk in that.

I understand the security issues you mentioned above - but those appear to be orthogonal to having a constructor or factory (maybe I'm missing something).

@jussi-kalliokoski
Copy link
Member

but those appear to be orthogonal to having a constructor or factory

It comes down to whether we need to ask for permission or not, and if we do, the factory method has the permission model already set up (to get the MIDIAccess instance), whereas for global constructors there isn't one. That is, unless the constructor takes a MIDIAccess instance as an argument (in which case I'd argue that it doesn't make sense to detach it from the MIDIAccess) or we throw if a valid MIDIAccess instance hasn't been created during the current session.

@cwilso
Copy link
Contributor Author

cwilso commented Aug 27, 2013

I'm not sure that the security/privacy model for virtual ports will be the same as for non-virtual ports, as I expect one would want to have virtual ports exposed to native software as well?

@cwilso cwilso modified the milestones: V1, V2 Sep 26, 2014
@cwilso
Copy link
Contributor Author

cwilso commented Sep 26, 2014

I continue to hear demand for this from nearly every vendor I talk to.

@notator
Copy link

notator commented Mar 9, 2015

#126 is pretty close to this issue, but:
1 I'm only asking for a virtual output device. :-)
2 I'm asking for one that can be loaded with custom sounds.

The situation has become more urgent than it was last year because operating systems are no longer providing GM Synths.

Whether this gets into the Web MIDI API itself or not, it would be great to have a shim.

@toyoshim
Copy link
Contributor

toyoshim commented Mar 9, 2015

Just for playing back a SMF file with GM synths or your own custom synths, Web MIDI is not needed at all. Web Audio is enough for such purpose. Did you see the link I posted in #126?

The important thing here is that we need a standardized way to share software synths written in JavaScript.

@cwilso
Copy link
Contributor Author

cwilso commented Mar 9, 2015

@notator The delta between virtual input and virtual output ports is nearly zero. If we do one, we should do the other.

On the other hand, for "I'm asking for 'one' that can be loaded with custom sounds" - you're asking for an IMPLEMENTATION of a virtual device, that enables custom sound loading; this isn't going to get baked into the Web MIDI API, since it's like declaring one programmable synth is the right one.

@joeberkovitz
Copy link

I think this is a great idea but feels wide open to very different definitions of what constitutes a "virtual device", and where its implementation might live (in a browser page? in the browser itself, persistently? in native apps?).

Also there's some overlap with existing "virtual" device mechanisms for MIDI, e.g. in iOS.

And how persistent or transient is such a device? Is it only there when the page that instantiates it happens to be open?

Not to mention all the security considerations.

In short virtual devices seem cool and I'm sure vendors are asking for them (hey I want them too), but I wonder if we all know exactly what we mean when we say it, and if we mean the same thing. It feels like more research and discussion is needed to nail the use cases and make this idea definite enough to implement.

Also, Web Audio has a very similar problem to solve in terms of inter-app communication. I would hope that Web Audio and Web MIDI could wind up adopting a similar approach to abstracting sources and destinations for both audio and MIDI data.

@agoode
Copy link

agoode commented Mar 10, 2015

If we solved issue #99 and then used ServiceWorkers (https://github.com/slightlyoff/ServiceWorker/blob/master/explainer.md) then it wouldn't be hard to extend the spec to allow for virtual ports and have a reasonable lifecycle story.

(At least on Linux and Mac. Windows has no standard way to do virtual ports.)

@toyoshim
Copy link
Contributor

SGTM on the Worker story. I think exposing virtual ports against applications outside the web browser could be optional on platforms that underlying system supports such feature.

@notator
Copy link

notator commented Mar 10, 2015

@cwilso Hi Chris, I think it would be a good idea to concentrate on virtual output devices first, then see what the answer implies for input. That's because I can see from toyoshim's link [1] that output looks pretty ripe already...

@toyoshim Hi! Thanks for the link. Great stuff! Is that your site?

There is, of course quite a lot to discuss:

  1. Web MIDI applications send messages to real output devices in Uint8Arrays. As an application author, I don't want to have to convert the message to a string, just so that the device can do string analysis on every message I send. It would save a lot of code and processor time if the device just read the Uint8Array. That could easily be implemented in WebMidiLink, and in the existing devices, by defining a new message type (maybe "webmidi").
  2. Unfortunately, Yuuta Imaya's sf2player.js fails to load. Yes, there's even an sf2player there, that says it supports loading soundFonts!
  3. WebMidiLink uses parallel processes created by opening new tabs in Chrome. That's not ideal. Many of the devices look great, but I don't actually want/need to look at them. Also, I think I need to spawn subthreads in the device's thread, and that's not going to be easy. We need a better approach to creating threads.

To help thinking about threading, here's a use case:
Let's say I'm writing a prepared piano emulator. (That's actually not quite true, but close enough to the truth for present purposes.)
There's a (real) midi keyboard attached to my application, and I have things set up so that each key is associated with a (rather noisy) sequence of midi messages waiting to be played.
Sending a noteOn from the keyboard triggers its sequence. Sending the noteOff tells the sequence to stop. I have no control over how many keys are playing at once, or when they are depressed relative to each other. The whole point is that the keyboard is never blocked. The performer is free to just play.

I'm a beginner with web workers, but currently imagine setting up something like this before the performance begins:
The input device is in the browser's user thread.
The output device is in a SharedWorker (@agoode or ServiceWorker?). Let's call this the Marshall.
The keys' sequences run in their own (ordinary) web workers, and access the output device by sending messages to the Marshall.

If things do indeed work like that, then I'm the one who has control over the life cycle of all the threads and devices.

But can Sharedworkers or ServiceWorkers access virtual output devices? I have been unable to find out.

[1] http://www.g200kg.com/en/docs/webmidilink/

@toyoshim
Copy link
Contributor

@notator It is not my site, but of my friend who is a famous music application developer in Japan.
This is a good example how many software synth can be developed in a short period in the community. Once he proposed the WebMidiLink idea, many people developed their own synths supporting WebMidiLink. This is the reason why I do not stick on OS providing synths. We can develop a great synth with Web Audio, and web community has a power to create various kind of synths.

@notator
Copy link

notator commented Mar 10, 2015

@toyoshim Ah yes, I forgot: +1 for "The important thing here is that we need a standardized way to share software synths written in JavaScript."

I think the interface that should be implemented by software synths can be more or less dictated by the powers-that-be here. This is much easier than defining an interface for hardware manufacturers. The interface should, I think be modelled on the one for hardware synthesizers.
For starters, I think, there should be a function that returns the synth's metadata. See the properties in
http://www.g200kg.com/en/docs/webmidilink/synthlist.html

And, if its not clear enough already, my original request from #126 is no longer on the table. :-)

@cwilso
Copy link
Contributor Author

cwilso commented Mar 10, 2015

@agoode Service Worker WILL NOT fix this. You wouldn't be able to keep an AudioContext alive inside a SW; SWs are designed to come alive when needed, but not be resident/running all the time. For a soft synth, you need to wake up and be alive (when routed, usually).

Let's keep this focused: THIS issue is about creating virtual MIDI ports, input and output, that can then be used by other programs on the system while this application is resident - i.e., creating a MIDI "pipe". The other issue (#124) is for managing and referring to virtual instruments - including, presumably, how you initialize them. (ServiceWorker might be involved there, but it's not going to solve it by itself.)

Whether the MIDI API is available to Workers (#99) is relevant in that you'll probably need it in the context of whatever the initialization context is for #124, but it's also useful in more narrow contexts (e.g. I want to run my sequencer in a non-UI thread).

@cwilso
Copy link
Contributor Author

cwilso commented Mar 10, 2015

Forgot to say: @joeberkovitz : Note the above, I'm trying to keep each of these issues separated, because the bedrock they detail is independently useful. This issue, for example - you could utilize a Web MIDI synth that you loaded in your browser from Ableton (since it could show up as a MIDI device in OSX). The top-of-the-heap "virtual device" spec is #124, and yes, it's heavily related to the the virtual device/routing issue in Web Audio; I'd expect at the very least you'd want to be able to bundle them.

@mjwilson-google
Copy link
Contributor

Yes, I think the "Security Considerations" section is trying to enumerate all possible security issues and describe how we will mitigate them. Requiring an explicit user gesture is one form of mitigation. That section needs some more work; I only did the bare minimum to satisfy the requirements for the next spec stage so far.

Having thought about this a little bit now, I think there are two main security concerns that are being brought up:

  1. Interaction with native applications. I think this is still important to consider, because most native applications weren't designed with the consideration that their MIDI inputs and outputs could be connected to the Internet. Maybe the user gesture is enough, or maybe there isn't really much of a difference after all, but I don't think we can assume this. For instance, the Windows software synthesizer crash that @cwilso referenced is a bug that went undiscovered for years until someone tried using it with Web MIDI in a particular way.
  2. Cross-domain sidechannel-type attacks. These are unique to web applications, and the basic idea is that a malicious site can manipulate the user (sometimes automatically) to perform actions on a trusted site or set of sites that allows the malicious site to learn information about the user. I'm not sure if virtual MIDI ports will change the risk of this kind of attack or not. I also don't know of a effective mitigation for this kind of attack if it does increase the risk.

I also see we have MIDIOptions.software specified: https://webaudio.github.io/web-midi-api/#dom-midioptions-software -- Chromium has not implemented this (tracking bug: https://crbug.com/502127) and I'm not sure about Firefox. This is a different but related API just for software synthesizers, so maybe it can fill this use case or be modified to fit.

@notator
Copy link

notator commented Dec 29, 2023

@mjwilson-google Apropos 1. Interaction with native applications:
FYI: the original (June 2015) thread, in which I reported the Windows synthesizer crash, can be found here:
https://bugs.chromium.org/p/chromium/issues/detail?id=499279
As I understood it at the time, the synth had to be banned because it was part of the Windows operating system, and could not be reliably patched across all installations. This was really Microsoft's problem, not the Web MIDI API's.
Comment 75 in that thread says:

If we have strong evidence that MS have done a significant clean up of the code based on the issue reported by Project Zero, we might reconsider this decision.

Am I right in thinking that this Issue #45 is about defining an interface between MIDI input and output devices and operating systems? Nowadays, operating systems are regularly patched to solve security problems, so I don't see why there should be a problem in principle for OSs to have MIDI ports. Maybe the OS programmers just need a well-defined API to work with?

@rianhunter
Copy link

rianhunter commented Dec 29, 2023

For instance, the Windows software synthesizer crash that @cwilso referenced is a bug that went undiscovered for years until someone tried using it with Web MIDI in a particular way.

Chromium has not implemented this

Perhaps this issue was surfaced before the opt-in browser dialog but this issue seems comparable in terms of security to a trusted native application misusing the Windows software synthesizer, either by accident or maliciously. Without the special opt-in, it definitely makes sense to me to generically block access to the software synth if it can cause system instability.


I just also want to surface something I got wrong. It seems that at least as of Chromium version 118, no opt-in permissions dialog is required to use WebMIDI unless SysEx is requested. Given my previous comments, my recommendation would be to always require permissions before WebMIDI can be used, SysEx or not. As @cwilso said there are just so many unintended interactions that could happen, even with short MIDI messages, even without virtual ports. Of course this is probably a larger discussion and the usability and security concerns should be weighed. I guess my question would be, is there anyone who thinks permission-less WebMIDI is essential? @mjwilson-google should this type of question be moved to a separate gh issue?

@notator
Copy link

notator commented Dec 30, 2023

Currently, I'm a bit confused, but would very much like to understand this.

So much has happened since the original posting of this issue in January 2013, that I think it would be helpful to start a fresh one. In particular, I suspect that the meaning of "virtual port" has changed.

@rianhunter (I've now added a thumbs up to your comment above). Could you say more precisely what you are trying to achieve? A motivation and concrete use-case would be very helpful!
@cwilso said:

...this feature would (intentionally!) open up communications between arbitrary native apps as well as web domains, and it is really hard to ensure that such interactions are safe, given the unknown interactions possible.

If communications were going via a well defined API, then restrictions (e.g. no SysEx messages) could apply. As I understand it, the port could either be created by the native application itself, or be part of the operating system that is running the application. In either case, the API's restrictions could be strictly enforced by the OS. Maybe that would be a necessary requirement for the API: It would need the cooperation of the OS vendors. So they'd have to be involved in development of the API at an early stage.

@rianhunter
Copy link

Could you say more precisely what you are trying to achieve? A motivation and concrete use-case would be very helpful!

Sure, that might be generally helpful. I think my use case represents a generic use case but please lmk if you disagree.

I provide a web application to users of my synthesizer that exposes all MIDI functions that my synthesizer supports. Critically, I provide an interface to upload samples to the synthesizer (which uses SysEx messages) but also every other CC, NRPN, AND RPN that my synth supports for helping them quickly get up to speed. Another core feature is that the web application allows users to save "sessions" which represent every MIDI setting currently set by the web application. This allows them to quickly restore their current settings. This application intends to operate and feel like a native application, so asking the user permission to use MIDI is perfectly fine.

As my product has been around for a few years, a common use case has emerged where my users will have multiple MIDI sources connected to the synthesizer, one of which is my web application. This is a problem for users who use the web application to manage device state because if they change a CC through a separate midi controller then my web application is not aware of that change, so they have to key it in twice if they want to preserve that setting in the web application: once through their MIDI controller and once through the web application. My idea is to provide a virtual MIDI input port by the web application, though which they can route all of their controllers and the web application would resend those messages to the output port corresponding to the device. That way, every setting changed through a hardware controller will be reflected in the web application.

You can find the application in question here: https://www.supermidipak.com/app/

@mjwilson-google
Copy link
Contributor

@notator

I reported the Windows synthesizer crash

Thank you, this is a famous bug for us. Yes, technically it's a Windows bug. We still have a responsibility to design web APIs that are possible to implement safely, with the understanding that OSes and other applications will often have bugs.

Am I right in thinking that this Issue #45 is about defining an interface between MIDI input and output devices and operating systems? Nowadays, operating systems are regularly patched to solve security problems, so I don't see why there should be a problem in principle for OSs to have MIDI ports. Maybe the OS programmers just need a well-defined API to work with?

I think the ultimate goal is to allow web applications to send and receive MIDI messages to and from native applications, and possibly to and from other web applications. This issue #45 is about creating virtual MIDI input and output ports associated with a web application that are visible to native applications on the same system, which may be a step in that direction. My current understanding of "virtual port" is that it shows up in the operating system as if it were any other MIDI device, and can be accessed via the Web MIDI API like any other MIDIPort, but it's actually just a software interface. But that may not be the best definition, so please feel free to consider other ideas.

I read through things in more detail and discussed with @cwilso directly, and it looks like the big potential security issue here is what the last paragraph of the second comment on this issue mentions: two different web applications could open virtual MIDI ports and have a channel to send arbitrary data to each other. This may circumvent the protections the web platform provides for certain types of data transmission. It's subtly different from existing loopback MIDI devices on the user's computer because the user wouldn't necessarily have knowledge of or control over ports created by the web applications. Even restricting to non-SysEx messages would allow arbitrary data transmission, since the data could be encoded into CC or note on / off messages. I don't think the OS could do much to help with this either, since the data would all still be valid MIDI messages.

@rianhunter

It seems that at least as of Chromium version 118, no opt-in permissions dialog is required to use WebMIDI unless SysEx is requested.

This is something we are actively working on in Chromium, so you can consider it a Chromium bug (see https://crbug.com/1420307). The spec is clear that the user should be prompted for all access (https://webaudio.github.io/web-midi-api/#dom-navigator-requestmidiaccess), although that section should be updated a bit (see #220).

My idea is to provide a virtual MIDI input port by the web application, though which they can route all of their controllers and the web application would resend those messages to the output port corresponding to the device. That way, every setting changed through a hardware controller will be reflected in the web application.

It seems like your use case would be satisfied by an input-only port, which would also satisfy the soft-synth case. Only allowing virtual input ports feels like it would be safer to me, although at some point we should involve an actual security expert in this discussion. Of course, if we could safely implement both input and output that would be more versatile.

@rianhunter
Copy link

rianhunter commented Jan 3, 2024

it looks like the big potential security issue here is what the last paragraph of the second comment on this issue mentions: two different web applications could open virtual MIDI ports and have a channel to send arbitrary data to each other.

I think this is the only WebMIDI-specific issue that has been raised in this discussion on this feature. When considering the threat model, this doesn't seem like an issue if both WebMIDI applications have gained access to WebMIDI through an explicit permission by the user. Right now WebMIDI is considered an API that only trusted applications may use, i.e. there are no untrusted use-cases for WebMIDI (even though this may have been the initial goal). If there were a mitigation for malicious use of virtual ports by applications that have already been granted trust by the user, that would then imply the existence of a third level of trust: applications which may use virtual ports unrestricted. This would make the trust-model more complex but I also think it would require justification for this extra level of threat: Why would we distrust an otherwise trusted application to specifically use virtual ports maliciously?

In any case, my suggested way to mitigate that issue is to by default hide WebMIDI-created virtual ports to other WebMIDI applications. Perhaps later, there can be an extra level of permissions requested to allow those ports to be exposed to other WebMIDI applications if there was enough demand.

Only allowing virtual input ports feels like it would be safer to me,

Just want to point out that only allowing virtual input ports isn't any safer than providing both ports when considering the inter-domain communication channel threat specifically. Was there another threat you were considering for this mitigation?

This is something we are actively working on in Chromium, so you can consider it a Chromium bug (see https://crbug.com/1420307).

That's a relief. Thanks for the reference!

@notator
Copy link

notator commented Jan 3, 2024

@rianhunter said

In any case, my suggested way to mitigate that issue is to by default hide WebMIDI-created virtual ports to other WebMIDI applications. Perhaps later, there can be an extra level of permissions requested to allow those ports to be exposed to other WebMIDI applications if there was enough demand.

Completely hiding virtual ports from other WebMIDI apps seems to me to be overkill, but I agree with you that there could well be a way to solve the problem by having an extra level of permissions.

@mjwilson-google Thanks for the clarification, I think we're now on the same page. :-)
You said (my emphasis):

I read through things in more detail and discussed with @cwilso directly, and it looks like the big potential security issue here is what the last paragraph of the second comment on this issue mentions: two different web applications could open virtual MIDI ports and have a channel to send arbitrary data to each other. This may circumvent the protections the web platform provides for certain types of data transmission. It's subtly different from existing loopback MIDI devices on the user's computer because the user wouldn't necessarily have knowledge of or control over ports created by the web applications.

Could the problem be solved by requiring that ports created by web applications (=virtual ports) always ask for the user's permission, regardless of whether the request for use was coming from an OS/native application or another web application? That would elevate the permission status of virtual ports to that of the existing hardware ports.

@mjwilson-google
Copy link
Contributor

@rianhunter

Why would we distrust an otherwise trusted application to specifically use virtual ports maliciously?

I'm not a security expert, but my understanding is that there is a type of side-channel attack that manipulates users to do things on trusted sites that expose information about themselves. So it's not necessarily about the application being malicious, it's the existence of the side-channel that is the security risk. I'm not sure if it's a real danger in this particular case given the permissions in place, but it's something that may come up in a security review.

my suggested way to mitigate that issue is to by default hide WebMIDI-created virtual ports to other WebMIDI applications.

That seems like it would eliminate this threat to me, too. We should probably think about how this could be implemented.
We would need a way to tell Web MIDI virtual ports apart from other MIDI ports on the system.

Just want to point out that only allowing virtual input ports isn't any safer than providing both ports when considering the inter-domain communication channel threat specifically.

Right, because the "listener" site could open its port first and the "sender" site could connect to it directly. Good point.

@notator

Could the problem be solved by requiring that ports created by web applications (=virtual ports) always ask for the user's permission, regardless of whether the request for use was coming from an OS/native application or another web application? That would elevate the permission status of virtual ports to that of the existing hardware ports.

Something like always showing a prompt every time a virtual port is created or connected to? It could make the user more aware, although we want to avoid "prompt fatigue" where users start clicking OK on everything without reading it because there are too many prompts. @cwilso also suggested a persistent indicator, although we didn't work through any details.

If we can find a similar existing API (not necessarily MIDI-related) and examine what it does that might give us some ideas, too.

Also, just to be clear about why we're having this discussion, even if something is in the specification the browser vendors could block their own implementations due to security concerns. I think we're doing the right thing for now thinking through the possible security issues and mitigations. Once we have rough consensus here I will ask the browser vendors for opinions from their security teams. In other words, even though I'm a spec editor I'm not the final authority on if a mitigation is good enough, and my goal is to specify something that will actually get implemented safely by all the implementers.

@rianhunter
Copy link

Also, just to be clear about why we're having this discussion, even if something is in the specification the browser vendors could block their own implementations due to security concerns. I think we're doing the right thing for now thinking through the possible security issues and mitigations. Once we have rough consensus here I will ask the browser vendors for opinions from their security teams. In other words, even though I'm a spec editor I'm not the final authority on if a mitigation is good enough, and my goal is to specify something that will actually get implemented safely by all the implementers.

Thanks Michael, that sounds great. I'm going to file a bug on Chromium regarding this specific issue (I'll CC you) just so there is some documented progress being tracked on that front and maybe to provoke more interested parties there. Working on my own, I may be able to have a prototype available for Windows and Linux in the next month or so. Maybe sooner if others help me out.

@rianhunter
Copy link

@mjwilson-google

It seems like I'm not able to classify the issue as "Feature" or add you to the CC list but you can find the created issue here: https://bugs.chromium.org/p/chromium/issues/detail?id=1515390

@bradisbell
Copy link

Hello, just wanted to share some thoughts as a regular Web MIDI and virtual MIDI port user:

On platform capability... To my knowledge, defining virtual MIDI devices is not generally available. @cwilso You mentioned the possibility of a newer Windows API that might enable this feature? I checked the docs and didn't see one, but could you please take a look? Maybe I'm missing it or am not understanding. In any case, if the platform does not support virtual MIDI devices, then this all seems moot. I agree with previous comments in that it does not make sense to ship device drivers with the browser. The only working virtual MIDI driver I'm aware of on Windows is from Tobias Erichsen. https://www.tobias-erichsen.de/software/virtualmidi.html

On security considerations around cross-domain/cross-application communication... The purpose of such a virtual port is to enable applications to communicate with each other over MIDI. Hampering this in any way defeats almost all of the usefulness of the functionality. Yes, we want web applications to be able to communicate with other web and non-web applications alike. Cross-domain MIDI communication should be possible if the users allow it to be. It is up to the users to decide what they want to do, and up to the user agent to carry out what they want.

@mjwilson-google mjwilson-google added the status: needs discussion Needs to be discussed on GitHub before proceeding label Jan 3, 2024
@cwilso
Copy link
Contributor Author

cwilso commented Jan 3, 2024

@bradisbell I was thinking of the WinRT MIDI API (https://learn.microsoft.com/en-us/windows/uwp/audio-video-camera/midi). I suspect this is not applicable, but I'm a long way out from when I used to write Windows apps. :). Tobias' work is pretty much what would need to be incorporated on Windows, I think.

As for the security considerations: trust me, I understand the purpose of virtual ports, and I 100% understand how useful they would be in integration between web and native as well as web-to-web. I first started using MIDI (and programming it) in the late '80's/early '90s. At the same time, I've helped build the Web pretty much since its inception too - and we can't just make something possible because a set of users want it, when it might very negatively impact an incredibly larger set of users. We will need to create a design that is bulletproof for all users, and it is going to have to pass muster with the security and privacy horizontal reviews (both Chromium ones and W3C ones, I mean). I'm just setting expectations that I doubt very much "just put it behind a permission" is going to be good enough.

@mjwilson-google
Copy link
Contributor

@bradisbell

Yes, we want web applications to be able to communicate with other web and non-web applications alike.

Thank you, it's important to hear as many perspectives as possible.

Cross-domain MIDI communication should be possible if the users allow it to be. It is up to the users to decide what they want to do, and up to the user agent to carry out what they want.

I agree, and I think this the key point here is "if the users allow it". I think most users wouldn't expect allowing site A to use their MIDI devices and then allowing site B to use their MIDI devices would also allow sites A and B to send arbitrary data between each other. That is what we have to be careful about: that we don't allow more than the user intended to allow.

@bradisbell
Copy link

I think most users wouldn't expect allowing site A to use their MIDI devices and then allowing site B to use their MIDI devices would also allow sites A and B to send arbitrary data between each other.

@mjwilson-google If the user allows Site A to create a virtual MIDI devices, and Site B to use MIDI devices, then the user will certainly expect that Site A and Site B could communicate arbitrarily. Again, that is a key, if not the, use case... Allowing interconnection between applications, web or otherwise.

If it's necessary, I see no problem with another tier of permissions. Currently, in practice we have:

  1. Basic MIDI messages (no permission required, yet)
  2. SysEx Messages (permission required)

This could be added to mitigate concerns:

  1. Virtual Devices / "Manage MIDI Devices" (permission required).

Adding a virtual MIDI device implies adding a device with full MIDI capability. MIDI devices are, without known exception, available to any application on the host that supports MIDI. Therefore, the user should not expect a virtual MIDI device to be any different or otherwise limited.

As long as we indicate to the user that they are allowing permission to the application for adding virtual MIDI devices, I don't see a problem.

@rianhunter
Copy link

@cwilso I just checked that documentation page and unfortunately it doesn't look like it lists an example showing how to create virtual MIDI ports. If this feature requires a kernel driver, then that makes things a lot more complicated. I have done Windows driver development in the past and writing a virtual MIDI driver doesn't seem too hard (as in wouldn't take longer than a year, end-to-end) but logistically would be a lot more difficult in terms of deploying with browsers. I've been in this sort of situation before and the ideal thing to do would be to lobby Microsoft to add this functionality to Windows. The odds of that working out might seem far-fetched but in comparison to getting both Firefox, Chrome, and others to coordinate on the bundling, installing, and auto-upgrading of a kernel driver, it's probably more likely :) Plus, Microsoft has a stake in not incentivizing the creation of more third-party kernel drivers. I'm happy to attempt to reach out to Microsoft myself but maybe this is something better suited for @mjwilson-google, I'm also happy to coordinate with Microsoft engineers on building such a bundled first-party driver.

@bradisbell I think we are all in agreement in terms of keeping the permissions model as simple as possible while also making the API as powerful as possible. I think we have decent first-pass ideas on the table to address this issue but it would be best to get more feedback from someone who spends their working days thinking about web security.

@mjwilson-google
Copy link
Contributor

@bradisbell Yes, if we add another explicit permission it should be clear. As @cwilso pointed out, the security reviews will have the final say and experience has shown that only adding a permission isn't always enough.

That said, we can definitely propose it and see what feedback we get. I will try to summarize the recent discussions:

  • It looks like there isn't any real security benefit for only allowing virtual input ports.
  • Only allowing virtual output ports would break most of the use cases.
  • There may be a security benefit for not allowing virtual ports to connect to each other, but this breaks important use cases and it's not clear how implementers would implement this.
  • We are relying on Permissions to indicate that users trust sites enough to allow virtual ports, possibly introducing a separate permission just for virtual ports.
  • If we had direct support from OS vendors we may be able to define a more secure API, but we don't have details of what that might look like yet.
  • We don't think it's a problem if web applications send weird sequences to native software. Software MIDI has been around long enough that native applications should be robust to any input. OS bugs like in the Microsoft soft synthesizer are already protected against.

I think you brought up a good point that doing this is basically installing a device on the user's system. Some of my coworkers work on other device-related web APIs, and I talked with one of them who doesn't think that there is any other web API that installs a virtual device currently, and that it could break cross-origin protections.

This might be the difficult point to understand; I don't fully understand all the details either but much of the web is built on assumptions about how different origins can communicate with each other. Anything that breaks these assumptions will receive a lot of scrutiny.

Again, this isn't to say that this is impossible. But it looks like we might be doing something new, and if we can't satisfy the security and privacy reviews we won't be able to move forward.

If the above list looks complete then I can bring it up at the next Audio Working Group meeting and ask for opinions and for the browser vendors to do a preliminary security review. Also, reminder that anyone is welcome to join the W3C audio community group and attend working group meetings (https://www.w3.org/community/audio-comgp/).

@rianhunter

I'm happy to attempt to reach out to Microsoft myself but maybe this is something better suited for @mjwilson-google, I'm also happy to coordinate with Microsoft engineers on building such a bundled first-party driver.

Thank you for your confidence in me, but realistically I don't think I can drive this effectively right now. I will bring it up at the next Audio Working Group meeting though.

@rianhunter
Copy link

There may be a security benefit for not allowing virtual ports to connect to each other, but this breaks important use cases and it's not clear how implementers would implement this.

Just FYI I can think of a few ways to implement this. OS's usually provide an ID with a MIDI device, even on Windows. If not there are application-level workarounds, e.g. adding a forced prefix to devices created through Web MIDI.

Thank you for your confidence in me, but realistically I don't think I can drive this effectively right now. I will bring it up at the next Audio Working Group meeting though.

No problem. I'll see what I can do. I'll try to join the next audio working group meeting as well.

@mjwilson-google mjwilson-google added status: needs WG review Needs to be discussed with the Audio Working Group before proceeding and removed status: needs discussion Needs to be discussed on GitHub before proceeding labels Jan 4, 2024
@Psychlist1972
Copy link

@cwilso I just checked that documentation page and unfortunately it doesn't look like it lists an example showing how to create virtual MIDI ports. If this feature requires a kernel driver, then that makes things a lot more complicated. I have done Windows driver development in the past and writing a virtual MIDI driver doesn't seem too hard (as in wouldn't take longer than a year, end-to-end) but logistically would be a lot more difficult in terms of deploying with browsers. I've been in this sort of situation before and the ideal thing to do would be to lobby Microsoft to add this functionality to Windows. The odds of that working out might seem far-fetched but in comparison to getting both Firefox, Chrome, and others to coordinate on the bundling, installing, and auto-upgrading of a kernel driver, it's probably more likely :) Plus, Microsoft has a stake in not incentivizing the creation of more third-party kernel drivers. I'm happy to attempt to reach out to Microsoft myself but maybe this is something better suited for @mjwilson-google, I'm also happy to coordinate with Microsoft engineers on building such a bundled first-party driver.

@bradisbell I think we are all in agreement in terms of keeping the permissions model as simple as possible while also making the API as powerful as possible. I think we have decent first-pass ideas on the table to address this issue but it would be best to get more feedback from someone who spends their working days thinking about web security.

Thanks for reaching out to me on Discord.

Windows MIDI Services, which will ship in-box in latest supported windows 10 and 11 releases in 2024, includes a number of features you need.

Because we now have a Windows service in the middle, new transports are written as user-mode service plugins (COM components), not kernel drivers. We do want to avoid writing KS drivers for anything which doesn't need it. App-to-app MIDI is part of that, just like how the built-in diagnostics loopback endpoints are. Network MIDI 2.0 (coming), app-to-app MIDI, our diagnostics loopbacks, Bluetooth MIDI (coming), are all written as user-mode components in the service.

Project is OSS, but is mirrored internally to include in Windows builds. I'll have another release out within apx one week (there are a few bugs in the message scheduling logic in Release 2 which block some folks) that you may want to look at.

Note that only apps using the new API will be able to create the virtual endpoints. That means that they need to be UMP-aware apps. Once created, they will likely be made available to the legacy APIs (winmm, older WinRT MIDI). We need to verify no issues there. We do recommend anyone writing new MIDI code this year use the new API completely, as it can do everything the old API can do, including talking with MIDI 1.0 devices, plus more. It also has a much faster USB implementation, auto-translation of MIDI 1.0 bytestream messages to/drom a MIDI 1.0 device, etc. The API itself uses only the new Universal MIDI Packet for messaging, however.

https://aka.ms/midirepo
https://aka.ms/mididiscord

PS: @cwilso nice to see you around :)

@Psychlist1972
Copy link

Psychlist1972 commented Jan 4, 2024

BTW, while considering any new features for Web MIDI, you may want to consider MIDI 2.0 as well. Our new API in Windows is MIDI 2.0-centered, and Apple, Linux (ALSA), and Android also have MIDI 2.0 support now. It's taking us longer on Windows because we've completely rewritten MIDI from the ground up to support all this.

In the MIDI Association (I am the chair of the executive board), there have been some folks interested in Web MIDI 2.0, but no takers yet for working with the W3C to formalize it.

@mjwilson-google
Copy link
Contributor

BTW, while considering any new features for Web MIDI, you may want to consider MIDI 2.0 as well.

Yes, we have an issue for that here: #211. I am trying to get the current specification to Recommendation status first. I am in support of this: I think if we can get MIDI 2.0 on the web that will help drive adoption, and it's good to know that the platform support is there.

@mjwilson-google
Copy link
Contributor

Quick update: the next WG meeting is tentatively scheduled for January 31, 2024. I already put this on the schedule.

@mjwilson-google
Copy link
Contributor

Another update: WG/CG meeting has been actually scheduled for January 25 at 09:00 Pacific time:
https://www.w3.org/groups/wg/audio/calendar/

@mjwilson-google
Copy link
Contributor

Conclusion from WG meeting today:

@mjwilson-google mjwilson-google self-assigned this Jan 25, 2024
@mjwilson-google mjwilson-google added status: blocked Another issue or external dependency needs to be resolved before proceeding and removed status: needs WG review Needs to be discussed with the Audio Working Group before proceeding labels Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
class: substantive https://www.w3.org/policies/process/#correction-classes status: blocked Another issue or external dependency needs to be resolved before proceeding
Projects
None yet
Development

No branches or pull requests