jLuger.de - WebRTC

Some time ago I wanted to stream a video from my webcam to someone else. Of course there a lot of online services that allow you to do that. But I wanted to setup my own server to have full control of the video data. While searching for some open source software I've discovered WebRTC. WebRTC is short for Web Real-Time Communication and is an official standard by W3C and IETF. That means all you need to access your webcam is a modern browser (so no Internet Explorer) and some HTML+JavaScript. The communication in WebRTC is meant to be peer-to-peer. All of that sounded so great that I've wanted to dive into it. Here are my lessons learned.

To get started with WebRTC I've found the samples from https://webrtc.github.io/samples/ pretty useful. Especially the one under getUserMedia. They show you really nicely how to access your webcam via JavaScript. Accessing the camera is one thing. Getting the data to the other party is another. When searching on how to do this you will hear the term signaling server. Server? What about the peer-to-peer? Well, it turns out you need a server to find your peer. But once found you don't need the server any more. So how does it work?
Both parties need to create a RTCPeerConnection. Party A uses it to create a offer. The offer needs to get to Party B. Using websockets is the easiest way for this as it is better when the server tells Party B that there is an offer instead of Party B guessing when to catch the offer from the server. Party B uses the offer to set the remote description and then creates an answer. The answer needs to get to Party A. As you see it also helps when Party A uses websockets. Party A uses the answer from Party B to set its remote description. Now the connection isn't established. Both RTCPeerConnections throw icecandidate events. ICE is short for Interactive Connectivity Establishment. The icecandidate data in the event need to be exchanged to the other Party and added to their candidate list.
Congratulations now you've got your connection. But to exchange audio/video data the media tracks must be added to the connection by the streaming party. The receiving party needs to listen on the connection for that. Once an event indicates the receiving party that a stream is added it must set this track as source of a audio/video element (or anything else that could play audio/video data).
I have created an example app. You can get it from https://github.com/de-jluger/WebRTCSignalingSample. Please note that I have tested it only with computers in my local network. As far as I've researched it will not work when one party uses NAT (Network address translation) or similiar techniques. For that you need additional servers that support STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT). So at least three servers for a peer-to-peer standard? Yep. Oh and I didn't find a way to easily scale the system to more participants. Well, except that every member in a video session connects to the newly joined member.

So I looked around the WebRTC standard and found out that you can use it to record video. Yes you can record a video with just your browser. Of course the video length is limited because it is stored in memory. This wasn't a problem for my purpose. I wanted to record only very small chunks that are then transfered to the other participants.
The object to record videos is MediaRecorder. It takes a stream as constructor argument. A stream is provided by getUserMedia (as a Promise you have to resolve). The start method of the MediaRecorder takes a time in milliseconds. The smaller the time the smaller the recorded chunks. Keeping the chunk size small is important or else there will be delay that annoys the user. Unfortunately there is a lower limit around 50 milliseconds. Below that it will be ignored and a much longer default will be used. The chunks will be provided as events to the ondataavailable event handler.
To get the chunks to the receivers I've chosen to use a websocket connection. Unfortunately websockets don't like binary data and they have a limit of 64K for the messages. To make things worse the built in base64 encoding doesn't work well with the video binary data. To solve all of this I've created my own Base64 en-/decoder and splitted one video chunk into several data chunks.
On the receiving site you need to create a MediaSource. Once you receive a sourceopen event you can call addSourceBuffer to get a sourceBuffer. On addSourceBuffer you sepcifiy some codecs. While Firefox is pretty relaxed about the values (but doesn't allow everything), Chromium is pretty strict. The sourceBuffer gets video chunks but only while not updating. This results in two queues. One collects the data chunks until one video chunk is complete. The other queue stores the video chunks when the sourceBuffer is updating. Ignoring the updating and directly adding any complete video chunks results in stops of the video streams that seem unpredictable.
To see an example of this go to https://github.com/de-jluger/MediaRecorderStreamingSample. Please note that it doesn't work quite well when streamer and receiver use different browsers as they seem to use different encodings which aren't 100% compatible. I've tested the code mostly with Firefox.
Also note that you can't join an existing video conference with this schema as the video encoding uses a container format that has some important information at the start of the stream. You need a reencoding on the server to solve this which I haven't implemented.

As a last note I want to say that WebRTC offers way more than what I've presented. E.g. you can use your desktop instead of your webcam as video source. Please take a look for yourself at the standard.