Shwetank Dixit - WebRTC: A front-end perspective

Fronteers 2014 | Amsterdam, October 9, 2014

WebRTC gives us a way to do real-time, peer-to-peer communication on the web. In this talk, we'll go over the current state of WebRTC (both the awesome parts and the parts which need to be improved) as well as what could come in the future. Mostly though, we'll take a look at how to combine WebRTC with other web technologies to create great experiences on the front-end for real-time, p2p web apps.

Slides

Transcript

--and one, and give a warm welcome to Shwetank Dixit.

[APPLAUSE] Hi everyone.

My name is Shwetank.

This talk is going to be about WebRTC, but more from a front-end perspective.

We'll see what are the latest parts of WebRTC? What are the good things? What are the things which need to be improved, as well as what are the things, which are coming up in the future? And we'll get a sense of what the overall state of WebRTC is.

First a little bit about me.

My name is Shwetank I work in developer relations in Opera, and I've been keeping a close eye on WebRTC, right from the days of when we just had one part of WebRTC called getUserMedia.

We at Opera also made a bunch of demos with getUserMedia back in the day on a site called shinydemos.com.

You can check it out.

And a little bit more information about me.

I grew up mostly in India, and when I was making this talk, I got a few memories flooding in of my first memories of using some kind of technology for communication.

And I remember these things.

we, back in the early '90s, we used to-- at least in my county, we didn't have landline phones everywhere.

So my family, for example, didn't.

And we had to go to these places to pay a lot of money to make a long distance call to a few of our relatives.

And I remember my mother saying, OK, hurry up, hurry up.

And those of some of my few cherished childhood memories, you could say.

One more memory than I had was concerning something like this.

How many people over here have a cellphone with some kind of lock screen? Like you enter a code or gesture something? I've encountered stuff like this, back to my childhood.

In case you don't know what it is, it opens up like this.

And this is because when people finally started getting some landline phones, they were still a little bit new.

And the children in the house they were like, yay, we have a landline phone.

And they started to make crank calls and dial random numbers and talk to people, without getting the concept that it actually cost money.

So at the end of the month, the parents would get the phone bill.

And they would be like whoa.

They used to install some kind of lock, so they're actual physical phone locks, to prevent the children to dial unless one the parents were in the house.

So the reason why I wanted to mention these things is because I want you to just see how far we've come.

And part of the reason why and how far we've come is because the worldwide web.

It completely changed communications forever.

It changed the way in which we communicated, the speed and frequency in which we communicated, and the cost at which we communicated as well.

But something even-- actually, in some ways almost equally revolutionary happened, at least in developing countries, which is the emergence of mobile phones.

Now I mean-- previously, there were parts of people-- there were people in parts of the world, which were very remote, and they had no way of having an online connection.

But now with mobile phones, they can have it and not even think about landline anymore.

So these things have been really revolutionary, when it comes to communication.

And the question arises, what's going to be the next thing? Maybe it's not going to be revolutionary, but at the same time it's going to be really, really awesome.

And that's WebRTC.

why? Because it's real time, peer to peer communication.

And the thing is I haven't really seen a lot of people who are front end developers really getting and talking about it.

But when you do, whenever you have someone who really takes a look at WebRTC, you see that there are extremely excited.

I've seen very few web people talk about it.

And that's why I wanted to talk about it, from the perspective of a web guy.

Because as I said, not a lot of people talk about it.

There was this survey of.

what's the biggest barrier to WebRTC adoption? And if it was, the number one answer would have been lack of browser support.

Then that would have been fine because with every emerging technology, there's some browsers that support it first and some later on.

But the number one answer over here was lack of awareness.

So how many people over here have built stuff with WebRTC? OK, very few.

That's what I'm talking about.

WebRTC is for web developers.

There has been real time communication technologies, which have existed for a long, long time.

But these have always been for native app developers and all the telecommunication guys.

But this is the first time that we, as web developers, can make something peer to peer, real time on websites.

So this is really, really awesome.

So let's take a look at the various parts of WebRTC.

the first is Mediastream.

The second is RTCPeerConnection.

And the third is RTCDataChannel.

These are the three big pillars of WebRTC.

And Mediastream is pretty much just getUserMedia.

And we'll take a look at getUserMedia because this is one of the things that as front end developers, we will be using quite a lot.

So when it comes to the webcam access and the output generally speaking, sites don't really do that much.

There's a site called Sqwiggle, which uses some kind of filters and effects to create some kind of fun output.

But generally speaking, it's just a cookie cutter rectangle, and that's it.

But you can do a lot.

Generally speaking, a lot of people have this misconception that to do any kind of manipulation on getUserMedia, you have to put it into a canvas and then do something.

But that's not true.

You can actually use CSS with getUserMedia to do a lot of cool effects.

So I just made a small demo to illustrate this.

Let's see.

So as to see over here, you have the normal camera output.

It's fine.

We can apply CSS filters over here pretty easily, without using any kind of canvas or whatever.

You can use 3D transforms to change this around.

You can use CSS masks to have a round cut, or you can even use border radius.

Blend modes, you can do like this.

And this is stuff that you had to use Photoshop for.

And now you can do it real time video using just CSS.

I haven't seen a lot of people give attention to it.

But I think it's something that we should also explore.

It's really, really nice.

So let's go down.

One often ignored part of getUserMedia is the sound, the microphone access.

So you can actually hook up the sound too, and hook it up with the Web Audio to do some really cool stuff.

People have done guitar tuners and that kind of stuff.

You can also do sound powered actions and navigations.

And of course, first I want it, as a demo to make something which just a clap based interface.

You clap once, it does something.

Clap twice, it does something else.

But I thought that would be a little bit too cliche.

Let's do something a little bit more fun.

I once again got back to nostalgia.

In the '90's, I used to watch two sitcoms.

One was Seinfeld.

The other was Friends.

How many people watch Friends over here? So maybe you'll get this.

I hope it works.

Pivot.

[APPLAUSE] So I kind of cheated over here.

I wasn't actually using speech processing.

I was pretty much just analyzing the volume, and if it's above a certain volume, then it uses 2D transforms to rotate it.

And once it goes over 360 degrees, it replaces Ross' image with Chandler.

So you can do a lot of stuff using getUserMedia mic access.

There's a project called Pocketsphinx.js,

which uses normal web audio API to do some rudimentary speech processing.

You should check out.

It's still early days, but it's something worth checking out.

How many people know about the Trojan room coffee pot? A few.

So for the people who don't, this is the story of the world's first web cameras, since we're talking about webcams and stuff like that.

Back in 1991, I think, the early '90s, there were a few computer programmers in the University of Cambridge who really liked their coffee.

So much so that they used regularly doing the day, they used to go up and check the coffee pot.

If it was full, they would get the coffee.

If it wasn't, they had to go back empty-handed.

Of course, this was a problem which needed to be solved with computer science.

So what they did was, they wrote a client and server program and hooked up a camera to watch the coffee pot live, 24/7 and have a 1/20 by 1/28 gray-scale image.

So whenever the coffee pot was full, it's only then that they would get up and go and take the coffee.

So the world's first webcam was actually a surveillance monitor to watch coffee, which I found really amazing.

So of course, you can make surveillance stuff as well using getUserMedia.

People have made stuff like security cams, baby monitors.

People have made stuff like emotion recognition libraries, in which you can smile and it detects it.

If you frown it detects.

People have made stuff like hotspot recognition.

For example, if you wave your hand in one direction, it does something.

If you wave your hand in the other direction, it does something.

And once again, I made a small thing, which isn't complete yet.

But I think it demonstrates the concept of hotspots.

So let's hope this works.

So you have a keyboard.

I call this the airboard.

Let's see.

I hope this works.

[APPLAUSE] Go back.

Where's the mouse? So yeah, you can do all those kind of crazy stuff.

And of course, you can make-- it has some accessibility or application in some games as well.

You can do this.

Now with getUserMedia, let's take a deeper look.

We all know how to use getUserMedia? everyone knows how to use getUserMedia over here? You'll use Navigator.getUserMedia

with prefixes of course.

Now exceptions over here.

There's a problem because if there's an editor, OK.

But what if there's an exception? So generally speaking, the Mediastream editor, the editor column, I just hang those Mediastream editors.

That's it.

But if you go into the success function, and then some kind of exception happens, then you have to do that a tri catch block within the success function.

And that's a little bit icky.

I haven't seen a lot of people do this.

So in the future, what's going to happen is we're shifting getUserMedia to navigator.mediaD

evices.getUserMedia.

This change is still being done in the spec level.

But it's pretty much-- consensus has been built around this.

And this is in recognition of the fact that computers can have multiple media devices.

You can have multiple cameras.

You can have multiple sound outputs or mics.

And you need a way to detect, which one do you want to use.

So you'll have a function also similar to something called enumerate devices, in which you'll list all the output devices or input devices you want to use.

And of course, getUserMedia will be in the media devices on it.

But it also is going to be promised based.

I mean, that's the discussion right now.

And almost everyone is convinced that, yes this is the way to go.

So this is good.

Pretty much every API which is coming up in the last one year is using promises, and it makes sense for us to do promises as well.

So this is good.

There's still debate.

There's a few people who are against this.

There's some people who are for this.

Mainly the discussion, however, is what you do with navigator.getUserMedia.

So do we have the same callback based approach? Or do we have an approach, in which we just shelve navigator.getUserMedia but

have navigator.MOSgetUserMedia,

with the prefixes and webkit getUserMedia? My opinion though is that since it's already in prefixes, everyone knows that it's open to change.

Let's not go with the callback based approach.

Let's move to promises.

And let's have navigator.MOS and

webkit getUserMedia for people who want to do callbacks.

But overall, we want people to use navigator.mediaD

evices.getUserMedia

with promises.

I think that's the way to go.

If you want to join the discussion, you comment on the mailing this.

These things are still being discussed.

There are other concerns with getUserMedia as well.

You cannot really use getUserMedia in low light.

You cannot use zoom.

You cannot use focus.

You cannot use Flash.

These are major concerns.

Paul Kinlan he went the whole week just using web apps.

And he wrote about it.

And he wrote a lot of stuff.

This is the stuff you said about getUserMedia.

It doesn't offer any kind of advanced features, like focus, flash or zoom.

And it's clearly been designed just for P2P applications like web chat.

rather than a dedicated camera experience, which puts us way below native applications, when it comes to making something as simple as a camera application.

So this is something which we, in the web platform, need to improve.

Fortunately there is a spec going on.

Work is being done, which could provide that.

Is called the Mediastream Image-Capture spec.

And it provides all these kind of things, red eye reduction, exposure, brightness, even zoom, which is pretty cool.

One important thing in this spec is the take photo function.

So right now what's happening is if you want to take a photograph using getUserMedia, it's kind of like a hack.

It's not exactly a photograph.

It's just a frame grab.

You have video.

And you just capture a frame.

That's it.

So in take photo, it's a little bit different because you can specify that I want to take an actual photograph with this kind of sharpness, with this kind of zoom, with this kind of exposure.

So I think this is something which is really, really needed.

And the sooner it gets implemented in browsers, I'll be really happy.

But this isn't enough.

You're still in the spec right now, currently.

There isn't any support for focus and flash And if you make a camera application, no matter how advanced it is, but you can't even focus or can't use flash, it's going to be useless.

So I found a bunch of bugs and talked to the people on a meeting this on and on having a provision like this.

And if you agree, I'll request you guys to do the same.

One more thing, with asking for mic and camera access is the fact that users don't seem to notice the permissions dialogue in a lot of these browsers.

So if you go to any kind of site which uses getUserMedia right now, any kind of big site which does any kind of video chat, you'll find stuff like this.

You'll find this big red arrow, pointing upwards, saying hey, notice this thing over here.

Click over here.

In GoToMeeting Free, for example, in Firefox, they actually have an entire illustration like this, asking the person to click.

So this is something which is a problem.

Yeah and appeared in once again.

And this is in Opera.

But this is further complicated by the fact that permission styles are different in different browsers and in different platforms.

So Chrome on desktop might have the permissions dialogue on the top, but Chrome on mobile might have the permissions dialogue at the bottom.

Opera has the permissions dialogue, in mobile, in the middle, and Firefox upwards.

So it's very difficult to test all the time, and try to see, OK where the permissions dialogue is in which browser, in which platform, and where to point the user to say, hey click over here.

And if you don't do that, then they might not even notice it.

So this is something of an issue right now.

I hope we, as browsers, get a consensus and do a better job of it.

One thing which helps though is HTTPS because with HTTPS, the browser remembers the permission, which means that you only want the user to come once and click Yes.

The next time the person comes to your site, the browser will not ask for permission for enabling the camera.

One more thing, which is being discussed, and this is a little bit-- there isn't clear consensus right now.

But one of the things on the mailing list, which is being discussed right now is making getUserMedia HTTPS only.

And this is because it's a highly sensitive API.

So any kind of man in the middle attack could be there and compromise the security.

We have seen a lot of specifications in the recent past, moving away to something which is HTTPS only.

For example, service workers is one of the most widely known.

And in my personal opinion I think it makes sense.

We have to place the user's above short term discomfort of developers.

So what do you think? I will be happy to know your views as well.

So this was about taking a good hard look at getUserMedia.

What are the problems? What are the great things? What are the things which are coming? But the other part of WebRTC is RTCPeerConnection.

And it's pretty much the heart of WebRTC.

But as a web developer, who has no background in telecommunications, reading about this and watching the tutorials on YouTube or something, and reading the spec it is a little bit discomforting.

Because sometimes people assume that you are from a telecommunications background.

They sometimes have terminologies related to that.

So as a web developer you feel kind of like this.

You know, what the hell? And the thing is, you're inundated with so many acronyms.

There's so many acronyms in WebRTC.

In fact, there's so many acronyms in WebRTC that a person actually made the time to create something like WebRTCglossary.com, which

lists all the acronyms that you encounter in WebRTC and what they mean.

So this is actually pretty helpful.

I would advise you to check it out.

So I just wanted to briefly mention how to do this, in the words of how would web developers understand it.

So the first thing, when you want to make a connection to the other person using WebRTC, you cannot just make a connection like this.

There are firewalls in between.

There are NATs in between.

So even before you try to make a connection, you want a way to punch through these firewalls, to punch through these NATs.

And this is because this is a big problem nowadays.

So you have a protocol called ICE, interactive connectivity establishment.

And what this ICE protocol says is you can use something called a STUN server.

Once again, an acronym.

And what it does is something very simple, which is it just tells you what's your public IP address and port.

That's it.

And generally 85% of the times, it works.

But about 15% of the times, it doesn't.

And in those cases, you have to use something called a turn server.

A turn server is just a relay server.

So what it does is it just takes stuff from one side to the other and vice versa.

So in this case, the connection won't be exactly P2P p because you'll be using a server in between.

But at least the connection will be made.

And if you specify in your options a stun server, as well as a turn server, then it'll try of course, stun first so that you make a P2P connection.

If it fails, then it'll fall back to turn.

And if you're wondering, all these stun servers, and turn servers, where can I find more information? There's a project called the free ICE project, where it lists all the freely available STUN server that you could use in your project right now.

But the thing is, this was about punching through NATs and firewalls.

But how do you actually make a connection? And this is something, which I have struggled with.

Because a lot of people who explained this, explain it in the terminology or someone from a telecommunications background who already knows this stuff.

So I thought that maybe I can explain a little bit differently.

So WebRTC doesn't really say what kind of protocol or service should you use.

You can use WebSockets, XHR, whatever.

But the main thing is to get a certain piece of information from one side to the other, so that afterwards you can make a proper connection.

So what happens is you use a central server.

You have to use it once so that it exchanges information.

And once the P2P connection is made, the server just steps away.

It just plays the role of a matchmaker.

So what happens is in WebRTC, there's a concept of a caller and a callee.

So the caller, it it makes something called a session description.

And it doesn't matter what's said over here.

Basically what it is information about your audio and video codecs, and what type of qualities.

That's it.

So you set the local session description and you create something called and offer based on it and send it to the server.

The server receives the offer, and sends it to the other person.

The other person receives this offer, sets it as its remote description, then creates its own session description, which has information about his own codecs and this and that.

And it sends the answer back.

So the first person now receives the answer, sets the remote description now.

Both sides have the remote and the local description set.

Now we can begin the process of the connectivity checks, punching through firewalls and that kind of stuff.

And afterwards, we get something called ICE candidates, which we can use to create a P2P connection.

So this is, in a nutshell, how signaling works in WebRTC.

And if you think that this is a little bit too complicated, then there are libraries that you could use as well.

The stuff like simple WebRTC, PeerJS rtc.io.

But one of the things that you have to realize, and that I also found with WebRTC is that in a normal WebRTC connection, peer to peer connection, things will not scale well.

For example, if you have just two local people in a call, that's fine.

If you have three people in a call, that's going to be somewhat of a problem.

If it's four, then it's even a bigger problem because what happens is each and every WebRTC connection is encrypted, which means that the more number of peers that you add in a call, the more amount of streams that you have to encrypt in real time.

So after a while, it becomes too much load on the CPU, especially if you're on a mobile connection.

So for example, appear.in, they

have a limit of eight people, and even then it kind of struggles.

The best thing would be around four people.

But CPU is not the only thing you have to be concerned about.

WebRTC also uses VP8 for its video codec-- at least in the browsers, which support it right now-- and the opus codec for audio.

So for, like a 720p video, if you want to send, it's going to take about one to two Mbps of bandwidth.

So for a typical HD call, you have to be prepared for sending about 2.5 Mbps of bandwidth upstream.

And that's going to be somewhat of a problem because a lot of network connections cannot really handle this.

So in those cases, I think it's better to err on the side of caution and actually have a lower resolution video being sent rather than a higher one right.

So this is about bandwidth.

One more thing about WebRTC and signaling and dealing with all these things as a developer is debugging.

And this is somewhat disappointing right now.

When you make a WebRTC call, you can go to WebRTC/internals in Opera or Chrome, and you get a bunch of useful information about the call.

You also have something called the get stats method, which basically is just a dump of the entire call.

But these things are a little bit hard to work with.

What we really need right now and what really is missing from the platform right now is proper developer tools, in the actual browser developer tools to deal and to analyze WebRTC properly.

And this is something which I would happily like.

One more thing is that when you're dealing with WebRTC as a developer, no matter what you do, you're at the mercy of the network.

No matter how well you test your application, in the end, your users in the wild, they'll have their own network connections, which might be crappy, which might be good.

They might be on a mobile phone, which might be crappy, which might be good.

And dealing and debugging is going to be somewhat of a pain.

So this is about debugging.

The third leg of WebRTC is data channels.

And of course, it's about delivering data, whether it's text or binary data.

And the best way to look at data channels is to compare and contrast it with WebSockets.

So data channels, they provide a very high performance P2P, very low latency connection to the other person.

Now even with something as fast as WebSockets, you have a server in between.

So there is some kind of time, which will be used to send it to the server and the server to send it to the other side.

But with WebRTC data channels, it's P2P.

So you know, it's less latency.

If you work with WebSockets, you'll feel right at home with data channels.

It works pretty much exactly the same way, almost by design.

You want people who are familiar with WebSockets to feel really at home when it comes to data channels.

So you have seen event handler, the same functions, and that's good.

But the really cool thing about data channels is that you can make it work exactly like WebSockets, like exactly the same, but you don't have to.

And this is where things get interesting.

Because data channels actually uses this kind of protocol stack.

They use SCTP for access to primary protocol stack, which is a transfer protocol just like you have TCP or UDP.

It's done using DTLS, which is used for encryption, and it goes in the end over UDP.

Nowhere over here that you'll notice is TCP being mentioned.

TCP is used for WebSockets.

But data channels don't have to abide by those rules.

TCP is ordered and reliable in terms of packet transmission.

But you don't have to do that with data channels.

So you have a few options in data channels to configure it almost exactly as you want it.

And this is really, really cool for stuff like games.

So you can make it such that you know, it works exactly like TCP.

You can just say ordered true and reliable true, and that's it.

But if you want, you can make it work exactly like UDP.

You can just say, I don't want it ordered, and I don't want any kind of retransmission of, or retries if a packet is missed.

And you can also have something like partially ordered so you can say that I don't want it reliable, but I want it partially reliable.

In the sense that, if a packet is not given, if it is a packet is missed, they try five times before giving up.

Or try for three seconds before giving up.

And this is where you have really fine grained control over the kind of data that you send and how it's sent.

So this is really, really cool.

And this kind of differentiates it from WebSockets.

In the wild you have projects like Peerflix, PeerCDN, WebTorrent.

And the surprising thing about-- and these are the people who are making it-- and surprising thing for me, when it comes to WebSockets-- sorry about WebRTC is the fact that when people were talking about WebRTC, when it was just being made, people were always talking about video chat and that kind of stuff.

But some of the most innovative stuff, some of the most groundbreaking stuff, has actually happened on the data channel side.

Now we have Peerflix, which is like streaming torrents.

Yeah you have Feross, who has made a peer CDN, which is like a peer to peer CDN.

Amazingly cool.

So data channels is something which you should really, really explore.

Now we come to browser support.

So when it comes to browser support, Chrome, Opera, Firefox, we supported.

But when it comes to Apple, we don't know anything.

So I just expect a few years from now, Apple will be like, yeah now we support WebRTC.

And half of the world will be like, huh? And the other half of the world be like, I knew it.

So let's see what happens over there.

On the other hand, you have Microsoft.

And Microsoft has been a little bit more open about its plans.

They have a site in which people can request user features.

WebRTC is the fourth most requested feature over there.

If you look at their platform status site, you'll see that media captured in streams is in development, which means that getUserMedia is probably coming soon.

But when it comes to WebRTC spec and making the actual connection and that kind of stuff, that probably is not going to happen for the current version.

We're at 1.0, probably

would not happen.

They are showing good signs, positive signs for something called the ORTC spec, which is going to be WebRTC 1.1,

you could say.

So what is that? Back when WebRTC was being standardized and stuff, there were a group of people who said that they're not truly happy with it.

And they made something which-- they made a community group called ORTC, ORTC community group.

And they worked on a kind of a spec on their own.

Later on, they discussed this thing with the original WebRTC group.

And a lot of the stuff is now being integrated.

So you could say that ORTC is kind of like WebRTC 1.1,

rather than a competing specification.

And one of the things which is one of the highlights of WebRTC 1.1, or ORTC, is

the fact that you don't need this SDP exchange, which happens right now.

This will make things-- should make things a little bit less complicated.

At the same time, it offers a little bit more advanced features, a bit more low level stuff.

So you can do is to stuff like simul-cast and all the stuff that you can't really do right now with WebRTC.

A little bit more advanced stuff, and low level stuff.

And just as an outside observer, I've seen people talk about compatibility, and you know to make sure that if you implement ORTC, you shouldn't break existing WebRTC apps.

So what is the work being done over there? People are talking about making some kind of spec library, so that existing WebRTC applications can use that and not break.

But at the same time, people want to use ORTC, or WebRTC 1.1, can dig in

and use the more low level stuff, to have more power and applications.

So in the end though, the end user doesn't really care about what technology are being used.

You could use any kind of technology, but the experience has to be good.

And this is where WebRTC provides you with more options.

The kind of power that you have with WebRTC, you didn't have before.

I just remember, I was talking about in the beginning of the presentation, about calling my grandfather by going to these PCO booths.

And just recently I made an application-- like a test application-- and I put it up on my private server.

And I asked one of my relatives, who lives with my grandfather, to just go to the site using his mobile phone.

And he did.

And I had an actual conversation.

And I could see my grandfather's face.

And that was just so cool, trying to see and trying to think, oh you know how far we've come.

I made something in an afternoon, and I could see my grandfather's face.

And I could do it for free.

So this is something which is really, really awesome.

So WebRTC provides you with the kind of stuff that previously wasn't available before.

For example appear.in,

kind of money that's spent on the server cost was really, really low.

I think it was about $300 to $400 for the entire year.

That's amazingly cheap because the bandwidth is really paid by them, if it's peer to peer connection.

So it just lowers the barrier for entry, when it comes to making really, really kick-ass, world class communication services.

So people have already made peer to peer CDN's.

There's going to be someone who's going to make the next peer to peer social network, probably.

There's going to be someone who makes the next peer to peer video sharing site.

The possibilities are just up to you.

So I hope that you go ahead and play with WebRTC.

And thank you for listening.

[APPLAUSE] Come and join me in my lounge.

We can do a few questions.

All right.

So my experience of WebRTC, I've had a lot of issues with reliability of the connections and stuff.

And is this related to some of the real time problems that we were hearing earlier, or I am I just bad at code? No, it is related to the kind of problems that were talked earlier in this talk.

A lot of the stuff that WebRTC does, it kind of hides the amount of complexity which is required in making a peer to peer connection.

There's a lot of stuff that the browser does under the hood that is just not exposed to web developers because we don't want them to know.

But getting the connection, and that to appear to be a connection, is hard, especially behind firewalls, behind NATs.

So sometimes it's real work.

But sometimes you have to use some other techniques.

So we've seen protesters in Hong Kong recently sort of used, create a mesh network for communication.

Is just something that WebRTC could do? Could we do this with the web? Could we build that kind of network? Yeah you could.

You don't necessarily have to use a server which is located somewhere else.

You can use any kind of thing.

So yeah, it is pretty much possible to do this.

There's also a project called Named WebSockets, which one of my colleagues, Rich Tibbett is working on.

And what it does is create a way to do totally local connections and use WebRTC to do it.

And it also has a provision for web applications to talk with other things, like native applications using Bonjour and stuff like this.

So this is a really, really cool project.

I'll ask you all to check it out.

Named WebSockets, it's on GitHub.

What do you need that's in terms of centralization, in order to make that work? Would they, if you wanted to create a mesh network, would you just have to create stun and turn network on your small network and you're good to go? I'm not really sure about this.

But with the Named WebSockets thing, it is completely possible to do it because it creates a local service.

And then you can use that local service on the local network.

And it has automatic peer discovery as well.

So I think that would be a better way to do the stuff that you're talking about.

If I'm using a turn server, and that's one where the full data is going through another server, right? What are the security implications of that? Yeah, so a turn server, the only job is to relay stuff.

So there is no decryption happening at the turn server.

And if you want to use a turn server, there's an open source turn server called, I think, the RC57661 that you could use.

So it's open source.

You can see how it works.

But in general, a turn server's job is just to relay stuff to the other side.

So it's not actually possible for it to do decryption? Decryption, no.

That's really cool.

What about a stun server? I mean that's just negotiating the IPs.

But is there a security issue there? Is the-- I mean, are you kind of leaking information about calls being made? Yeah so you have to expose some kind of information, which is like your public IP.

But you know, that's needed, right? So what can you do? That is needed.

If you're under a firewall and you have like a private IP address and a public IP address, you need some way to negotiate that kind of stuff.

So is there anything stopping us building a true mobile phone app, just in the browsers? Or have we got everything we need to do that today? I didn't get exactly what you're talking about.

So if I wanted to recreate the dialer app on my phone but just in the web, assuming that I'm just going to call the people on the web, are we there with that? Is that everything, everything in the browser? There's actually a library called sipML5, in which you can use it to dial people, actually dial people, and actually make phone calls and stuff like this.

So there are a number of projects going on, in which you can hook up WebRTC with conventional communication technologies to do some really cool stuff.

You can hook it up with sip.

You can hook up with XMPP, whatever.

So in that case this there's some kind of server that's acting as a WebRTC client.

I guess it's dealing with the old telephone.

Yeah, dealing with the signaling and everything.

So are there libraries already that it acts as a WebRTC client, but on the server? Does that stuff exist? There are libraries like what I talked about, in which they can make phone calls on your behalf.

You just connect it through.

So those things are possible right now.

When we're dealing with some of your demos, we're processing video, and I guess that processing was also happening on the main thread? Was that using Canvas? It was using Canvas.

And Canvas, as you know, doing something in real time with the camera is expensive.

So you can use stuff like web workers and shared workers, and stuff like that.

I didn't use it right now, but it is a good idea to do it.

I guess in that case, you would be-- because you would still receive the frame in the mainframe.

But you'd pass it off and then pass it back.

It was a suggestion that came from the audience.

What was really interesting is that, would there be a way-- I guess it's a little bit like service workers, to have the information arrive into another thread, where you can maybe transform it that stage before you send on.

I don't know if that's been thought about before.

I don't know.

I thought of the offer, it was I just wondered if it was thought about.

Is there a freeJS equivalent for RTC? Is there a library that, sort of, takes out that really low level, into something that's going to be easiest to deal with? Yeah so when it comes to the signaling part, you have stuff like RTC IO, or simple WebRTC.

If you just want to make a proper video chat, you can do that using that.

When it comes to the stuff using computer vision stuff like, emotion recognition and that kind of stuff, there's a library called Clam Tracker, I think, in which you can do emotion recognition.

There's a library called JSfeat, in which you can detect different features in computer vision, and do stuff with it, like in a basic edge detection and that kind of stuff.

So there's a number of libraries on a number of facets of WebRTC, whether it's computer vision, whether it's signaling, whether it's just normal video chat.

If we're having problems with WebRTC adoption-- you showed the graph there-- if this move's going to happen to force it to be over HTTPS only, that's going to hurt adoption a lot more, isn't it? Yeah but in the end, in my opinion is, you-- and my opinion is, by the way, changed.

initially.

I was against having getUserMedia as HTTPS only.

But then Anne van Kesteren, he made some really good points, which is you can't really defend against a man in the middle attack.

And getUserMedia's such a privacy sensitive thing.

Someone getting access your camera, that is something which is pretty hard core.

So you have to take all kinds of steps for it.

And I think with all of the different moves that are happening in web standards right now, with service workers going HTTPS only and a few others, I think this is the right time to make it HTTPS only.

But this is just my opinion.

Other people might disagree.

We had exactly the same discussion with the service worker stuff.

Like I really was fighting for it to be, no we want to be able to caches on HTTP.

We want this to be.

But yeah, when you can see the kind of attacks people can do, there's no-- it's a pretty big bug, the app cache is over HTTP.

In fact, maybe HTTP in general is just a bug, right.

It should've been secure from the very start.

So Microsoft are considering it.

They're starting to do stuff.

Are they cautious with this stuff because of their work with Skype, a product they own? I'm not a Microsoft guy, so I can't say.

But what as an outside observer, just looking at the stuff which is happening with ORTC and the new stuff being discussed, they're pretty active.

They're taking a pretty good look.

They have some good points in discussions, as well.

So I think they're taking a pretty serious look, when it comes to WebRTC and having it in the browser eventually.

So at least their taking a serious look.

We don't know about the other browser.

Looking at some of the API, the change the API from navigator getUserMedia, to navigator media devices getUserMedia, how about navigator.mediadevices.get?

Can we just have .get?

Can we make this decision right now? Let's do it.

That's brilliant, sorted, OK.

This is how Sander's group should operate, just two people petrified on stage.

Well I'm petrified.

Don't know about you.

The image capture API, that stuff looked really interesting.

What about video capture? It seems like we want an equivalent to do that as well.

There's an API-- I didn't have time-- but there's an API called the Mediastream Recorder API.

I haven't, to be honest, taken a good hard look at it.

But there is something called Mediastream Recorder, in which you can-- there is a provision of recording stuff as well.

And of course, there is the media devices API, or something like this, in which you overload the input element, so that if you're on a mobile phone, for example, and you say input type is equal to something, I think, camera or something.

There's a whole article on devOpera that you can take a look at.

And what it does is it overloads the input element.

So if you click on it, it opens up the native camera application.

And then you can do whatever you want.

You can do use flash, or you can use zoom or whatever.

But it's just ceding control to the native application.

And as web developers, we don't want to cede control to a native application to take pictures.

We want very, very low level control on what kind of brightness, what kind of zoom we have, using JavaScript.

We don't want to just give control away like this.

Well I suppose it make sense to have both.

We definitely need the low level stuff but.

If I wanted to build something for a site, just upload your avatar, or click here to take a photo, I guess-- In those cases, it's fine.

But in some other cases, like if you make a dedicated camera application, you want all of these things in JavaScript.

That's my opinion.

So when you've got-- you're having a RTC call with many people at once, does the turn server act like a multicast? Or is that the simulcast thing you mentioned? Simulcast, I'm not-- I regularly get confused between simulcast and multicast.

So don't ask me about this.

But yeah in general, a turn server's only job is to just send it to the other side.

And that's it.

And the more number of people there are in the call, the more activity has to be done by the turn server .

You end up with, if you've got five calls going through a turn server, you end up with five streams potentially coming from different turn servers, I guess.

There's also a solution called an MCU that you could use, which is like a big fat really, really powerful server in between that you could use to offload this kind of stuff to that particular node.

That's more like what Google are doing, with the hangout stuff.

They're probably using an MCU.

That makes sense.

Or some other, more powerful stuff.

You mentioned improving debugging.

Are there any plans there? What do you think these debugging tools might look like? What do we need? We need-- well some of the stuff is, Google is doing, as you know-- I don't actually.

In the WebRTC internally as well, they just have their own prefix things that they use for measuring different called parameters.

So we need this inside the developer tools itself, rather than going to a particular page, or using a particular API, I think.

If it's like one more tab in the developer tools, I think that'll be really cool.

That's excellent stuff.

That was absolutely brilliant.

Ladies and gentlemen, Shwetank Dixit.

[APPLAUSE]

0 comments
Comment