Some time ago I wanted to stream a video from my webcam to someone
else. Of course there a lot of online services that allow you to do
that. But I wanted to setup my own server to have full control of
the video data. While searching for some open source software I've
discovered WebRTC. WebRTC is
short for Web Real-Time Communication and is an official standard by
W3C and IETF. That means all you need to access your webcam is a
modern browser (so no Internet Explorer) and some HTML+JavaScript.
The communication in WebRTC is meant to be peer-to-peer. All of that
sounded so great that I've wanted to dive into it. Here are my
lessons learned.
To get started with WebRTC I've found the samples from https://webrtc.github.io/samples/
pretty useful. Especially the one under getUserMedia. They show you
really nicely how to access your webcam via JavaScript. Accessing
the camera is one thing. Getting the data to the other party is
another. When searching on how to do this you will hear the term
signaling server. Server? What about the peer-to-peer? Well, it
turns out you need a server to find your peer. But once found you
don't need the server any more. So how does it work?
Both parties need to create a RTCPeerConnection. Party A uses it to
create a offer. The offer needs to get to Party B. Using websockets
is the easiest way for this as it is better when the server tells
Party B that there is an offer instead of Party B guessing when to
catch the offer from the server. Party B uses the offer to set the
remote description and then creates an answer. The answer needs to
get to Party A. As you see it also helps when Party A uses
websockets. Party A uses the answer from Party B to set its remote
description. Now the connection isn't established. Both
RTCPeerConnections throw icecandidate events. ICE
is short for Interactive Connectivity Establishment. The
icecandidate data in the event need to be exchanged to the other
Party and added to their candidate list.
Congratulations now you've got your connection. But to exchange
audio/video data the media tracks must be added to the connection by
the streaming party. The receiving party needs to listen on the
connection for that. Once an event indicates the receiving party
that a stream is added it must set this track as source of a
audio/video element (or anything else that could play audio/video
data).
I have created an example app. You can get it from https://github.com/de-jluger/WebRTCSignalingSample.
Please note that I have tested it only with computers in my local
network. As far as I've researched it will not work when one party
uses NAT
(Network address translation) or similiar techniques. For that you
need additional servers that support STUN (Session
Traversal Utilities for NAT) and TURN
(Traversal Using Relays around NAT). So at least three servers for a
peer-to-peer standard? Yep. Oh and I didn't find a way to easily
scale the system to more participants. Well, except that every
member in a video session connects to the newly joined member.
So I looked around the WebRTC standard and found out that you can
use it to record video. Yes you can record a video with just your
browser. Of course the video length is limited because it is stored
in memory. This wasn't a problem for my purpose. I wanted to record
only very small chunks that are then transfered to the other
participants.
The object to record videos is MediaRecorder. It takes a stream as
constructor argument. A stream is provided by getUserMedia (as a
Promise you have to resolve). The start method of the MediaRecorder
takes a time in milliseconds. The smaller the time the smaller the
recorded chunks. Keeping the chunk size small is important or else
there will be delay that annoys the user. Unfortunately there
is a lower limit around 50 milliseconds. Below that it will be
ignored and a much longer default will be used. The chunks will be
provided as events to the ondataavailable event handler.
To get the chunks to the receivers I've chosen to use a websocket
connection. Unfortunately websockets don't like binary data and they
have a limit of 64K for the messages. To make things worse the built
in base64 encoding doesn't work well with the video binary data. To
solve all of this I've created my own Base64 en-/decoder and
splitted one video chunk into several data chunks.
On the receiving site you need to create a MediaSource. Once you
receive a sourceopen event you can call addSourceBuffer to get a
sourceBuffer. On addSourceBuffer you sepcifiy some codecs. While
Firefox is pretty relaxed about the values (but doesn't allow
everything), Chromium is pretty strict. The sourceBuffer gets video
chunks but only while not updating. This results in two queues. One
collects the data chunks until one video chunk is complete. The
other queue stores the video chunks when the sourceBuffer is
updating. Ignoring the updating and directly adding any complete
video chunks results in stops of the video streams that seem
unpredictable.
To see an example of this go to https://github.com/de-jluger/MediaRecorderStreamingSample.
Please note that it doesn't work quite well when streamer and
receiver use different browsers as they seem to use different
encodings which aren't 100% compatible. I've tested the code mostly
with Firefox.
Also note that you can't join an existing video conference with this
schema as the video encoding uses a container format that has some
important information at the start of the stream. You need a
reencoding on the server to solve this which I haven't implemented.
As a last note I want to say that WebRTC offers way more than what
I've presented. E.g. you can use your desktop instead of your webcam
as video source. Please take a look for yourself at the standard.