Browser Music: Google’s Chris Wilson

Chris Wilson has been involved with web browsers since 1993, when he co-authored the original Windows version of the NCSA’s Mosaic. Between 1995-2010 he worked on Internet Explorer at Microsoft.  He has since moved to Google, where he develops Chrome and leads the Web Audio API team. The MBJ spoke to him recently at the A3e conference in Boston. 

MBJ: Why are you currently involved with music?

Chris Wilson: I graduated in computer science and engineering but I was a huge synthesizer fan when I was a teenager and played quite a bit. I kept it up into my thirties and had a band, which broke up.  My wife and I decided to start having kids almost right after that and any free time since I spend with them!

I have been working with web browsers my entire career, and started at the University of Illinois. I worked at the National Center for Supercomputing Applications (NCSA), and got involved in NCSA’s Mosaic. I co-wrote the Microsoft version–these were very early browsers. Then I moved to Microsoft in Seattle, and helped develop Internet Explorer. About four years ago I was hired by Google, but due to a non compete clause in my contract I had to take a year off from browsers.

I moved into the developer team for Chrome after that and was looking for something to do when I ran across the Web Audio API (Web Audio Application Programming Interface). Chris Rogers was the lead engineer on that, and he had designed the API to do all kinds of really powerful audio stuff simply. For any other audio API, you had (TO) study a book on Digital Signal Processing (DSP) and process every bit of output. I liked the idea of scheduling sounds like oscillators, setting up a biquad filter, a cutoff frequency, and a Q factor. Putting all those plugins together in a web browser made perfect sense to me.

Chris was a really hardcore engineer and there were parts of the API that weren’t easy or transparent to use for developers and end users.  As I was representing the perspective of the developers, I was the perfect foil for him. With Chris, I learned a tremendous amount about DSP, and how the Web Audio API could work, so when he later left Google, I took up the slack in the Web Audio Working Group. I then started standardizing the API to push it even more inside Google. I do other jobs too, but the development of the Web Audio API is definitely my biggest passion.

MBJ: Why is the development of a Web Audio API important to musicians?

CW: Because it will allow a new long-distance collaboration between musicians that we don’t have today because of all the hardware and software incompatibilities. Think about collaborations on a web browser that go from inception to actual creation and the potential for community building. Imagine a digital audio workstation that lets you do stuff and immediately hit publish so that other people can poke into your production. There are things that need to be fixed and stuff that needs to be added, and we’re discovering the challenges as we go along. 

MBJ: What are those challenges?   

CW: Collaboration in music is very natural. We react to sound instantaneously.  Let’s take WiFi and TCP/IP (the most popular Transmission Control Protocol of the Internet Protocol suite).  When a group of laptops are using the same WiFi network to play music they send data packets on request. Each computer asks “did my packet get there?” When packets happen to collide with one another, the system waits some random period of time and then tries again. If a few data packets are sent this is not a problem, but with many packets collisions escalate; imagine adding video to the mix.

The problem with WiFi is that it cannot use a scalable protocol. It has a built-in latency—up to about 6 milliseconds of latency for every packet. If I’m using my Wifi connected laptop at one end, and I’m trying to collaborate with my buddy at the other end, also on Wifi, we’ll have twelve milliseconds of latency just on account of the WiFi network.  Latency happens for other reason too. In fact, getting fewer than fifty milliseconds of latency across a WiFi network today is pretty much impossible because of the restrictions of the TCP/IP protocol.

So musical collaboration online in real time will evolve slowly.  A truly live musical collaboration would need a new network. 4G LTE connections have much lower latency and could be a solution, but they are very expensive.

In the meantime, you will see a lot more collaboration of the Google-doc-style type, where maybe you and your friend are working on the same Ableton Live tracks, and as long you’re producing audio on your system and she’s producing audio on her system, you both send control messages and don’t have to worry about latency. From the point of view of latency, control messages are a lot easier to handle than audio messages. Gobbler, a new program, makes it possible to work on an Ableton Live pack as long as I can work on the music for a couple hours and then hand it over to my buddy. JAM with Chrome is trying that too. However, doing something like jazz drumming and pushing it live in real time to someone else that is not in that room is really hard. It probably won’t happen soon.

MBJ: Can we expect the development of a new network soon?  

CW: Unfortunately, you don’t have the same network needs for a live music collaboration on the web anywhere else, which makes it difficult to advocate for it.  Who is going to want to replace the existing network infrastructure just to make live music collaboration possible?  For video you can easily push 1080p (Full HD) through our current pipelines. In gaming your frame rates are 60 frames per second, so that means you have almost 17 milliseconds between frames. At that rate you can lag several frames behind your buddy before it gets really obnoxious.

However, there are systems being built inside academia that could help, but I don’t think they are going to get deployed in homes any time soon. You might see it happen with Google Fiber, which is 100 times faster than the average broadband connection. But the last ten feet is problematic—unless you plug your system into the wall to get rid of the WiFi hop.

MBJ: Wouldn’t that new network need a new business model?

CW: Yes. In the US, people get really frustrated when they hear about heavy data users getting throttled by networks. But if the alternative is trying to push five HD videos through the pipe at the same time, you’re sucking up all the bandwidth and impacting the entire neighborhood. In that case, throttling is not so bad. We tend to wildly prefer unlimited data plans, and don’t like metered use.  In Europe they’re used to paying by the byte and that changes the vision around the network. Indeed, paying by the byte would be better for live music collaboration online.

MBJ: Can you go a bit more in depth into the Web Audio API?

CW: At the time Chris Rogers started developing Web Audio, audio in the web was awful. You could play back an audio file but not tell precisely when because the loading and decoding of the audio were wrapped together. Also, audio events with JavaScript produced jitter, depending on what else was going on.

Building a sound editor was actually difficult too for a variety of reasons. The maximum number of audio tracks you could have was five, so people who were trying to build, say, a platform jumper game had to manage these five playing audio instances and swap between them. You had the ability to precisely position the sounds but very poor quality audio.

Web Audio was an attempt to build a pro audio platform, so that you would have precise, sample- accurate access to audio playback. You could hook in and do processing, you could get filtering and things like that built into the system, and also do audio analysis and visualization–because one of the first things you always want to do is visualize whatever you’re hearing.

We solved the encoding and decoding problem by giving developers access to a decoder–not a fabulous one, but it worked: you can load and decode your own audio and then you can play it back precisely when you need it. You can have many simultaneous overlapping sounds—beyond the ear’s ability to actually detect them. The software is doing the math so there are no hardware limitations. With Web Audio you can build a 128-voice synthesizer and never hear dropping voices. Then there’s a routing and effects pipeline, and hooks to visualize, analyze, encode, and record.

At the core, this is a great and easy to use platform for building sound in games. One of my co-workers was recently preparing a talk about building HTML 5 games for Google’s I/O Annual Conference (a conference for software developers). He came to me two weeks before the event and told me he needed a sound manager for his game and that he knew he should be using the Web Audio API. So I took a look at his project and twenty minutes later came up with a sound manager. Doing anything from panning to 3D sound effects is quite easy in the Web Audio API. As a matter of fact, I always tell people that there is no a priori framework needed to learn the language.  Just use Web Audio, because it’s easier to figure out how the pieces plug together and from there you can tackle other stuff.

MBJ: What is your take on Music AI (Artificial Intelligence)?

CW: I don’t think that anybody has ever figured out how to make a computer truly creative. For example, even today coming up with a lead track or a unique riff is not something we really consider a computer task. Computers may help our creativity and assist us.  Band-in-a Box helped me when I was using it to fill the gaps of stuff that I didn’t really care to come up with. “Give me something random and I’ll see if I’ll like it”—that idea has ben around, in my time, since the first arpeggiators.

Also, to think that computers will replace musicians is a bit farfetched. People feared that drum machines were going to replace drummers and that organs were going to replace chamber musicians, but none of that has happened. Compared to the 1700s, the average person likely listens to more music throughout the day and there are many people making music, not just professionals.

When you look at look at some of the music services like Pandora or Spotify, they are effectively using AI to figure out what tracks to play you next. These programs can ideally sense your mood in context and make recommendations. You can track listener’s habits and sound analytics well, but humans must be in the room at some point too.

MBJ: What role has open licensing of software had on music/audio development?

CW: It’s one of the things I find most powerful about working for Google and doing what I do. Pretty much everything I build is open source. The bar for this software is quite high and development depends on sharing ideas and educating. I believe that open source software has been tremendously beneficial to the industry, particularly in the music space over the last five years. I believe it can coexist with closed source software and I have nothing against making money as a developer. But I particularly enjoy letting people learn from what you’re doing. That is what our Web Audio API is all about.

Beyond that, I have pet favorites. There are hundreds of convolution impulse responses files that you can mix into your own recordings that are available for free on the Internet today. You can find some guy who recorded a reverb from a cathedral in the middle of Germany and makes it available on the web to everyone else.

By Griffin Davis and William Kiendl

email

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *