Obtaining a live video and audio feed from a user's webcam and microphone is the first step to creating a communication platform on WebRTC. This has traditionally been done through browser plugins, but we will use the getUserMedia API to do this all in JavaScript.

In this chapter, we will cover the following topics:

  • Getting access to media devices
  • Constraining the media stream
  • Handling multiple devices
  • Modifying the stream data

Getting access to media devices

There has been a long history behind trying to get media devices to the browser screen. Many have struggled with various Flash-based or other plugin-based solutions that required you to download and install something in your browser to be able to capture the user's camera. This is why the W3C decided to finally create a group to bring this functionality into the browser. The latest browsers now give the JavaScript access to the getUserMedia API, also known as the MediaStream API.

This API has a few key points of functionality:

  • It provides a stream object that represents a real-time media stream, either in the form of audio or video
  • It handles the selection of input devices when there are multiple cameras or microphones connected to the computer
  • It provides security through user preferences and permissions that ask the user before a web page can start fetching a stream from a computer's device

Before we move any further, we should set a few standards about our coding environment. First off, you should have a text editor that allows you to edit HTML and JavaScript. There are tons of ways to accomplish this and, if you have purchased this book, the chances are high that you have a preferred editor already.

The other requirement for working on the media APIs is having a server to host and serve the HTML and JavaScript files. Opening up the files directly by double-clicking them will not work for the code presented in this book. This is due to the permissions and security set forth by the browser that does not allow it to connect to cameras or microphones unless it is being served by an actual server.

Setting up a static server

Setting up a local web server is the first step in any web developer's tool belt. In conjunction with text editors, static web servers are also plentiful and vary from language to language. My personal favorite is using Node.js with node-static, a great and easy-to-use web server:

  1. Visit the Node.js website at http://nodejs.org/. There should be a big INSTALL button on the home page that will help you with installing Node.js on your OS.
  2. Once Node.js is installed on your system, you will also have the package manager for Node.js installed called node package manager (npm).
  3. Open up a terminal or command line interface and type npm install -g node-static (you will, more than likely, need administrator privileges).
  4. Now you can navigate to any directory that contains the HTML files you would like to host on the server.
  5. Run the static command to start a static web server in this directory. You can navigate to http://localhost:8080 to see your file in the browser!

Creating our first MediaStream page

Our first WebRTC-enabled page will be a simple one. It will show a single <video> element on the screen, ask to use the user's camera, and show a live video feed of the user right in the browser. The video tag is a powerful HTML5 feature in itself. It will not only allow us to see our video on the screen, but can also be used to play back a variety of video sources. We will start by creating a simple HTML page with a video element contained in the body tag. Create a file named index.html and type the following:

<!DOCTYPE html>
<html lang="en">
    <meta charset="utf-8" />
    <title>Learning WebRTC - Chapter 2: Get User Media</title>
    <video autoplay></video>
    <script src="main.js"></script>

If you open this page, there is nothing exciting going on yet. It should be a blank white page,which tells you that it is looking for the main.js file. We can start by adding the main.js file and add the getUserMedia code to it:

function hasUserMedia() {
  return !!(navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia);
if (hasUserMedia()) {
  navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia;
  navigator.getUserMedia({ video: true, audio: true }, function (stream) {
    var video = document.querySelector('video');
    video.src = window.URL.createObjectURL(stream);
  }, function (err) {});
} else {
  alert("Sorry, your browser does not support getUserMedia.");

Now, you should be able to refresh your page and see everything in action! First, you should see a similar permission popup that you saw in the previous example when you ran the WebRTC demo. If you select Allow, it should get access to your camera and display your face in the <video> element on the web page.

The first step to work with the new browser APIs is to deal with the browser prefixes. Most of the time, browsers like to be ahead of the curve and implement features before they become an official standard. When they do this, they tend to create a prefix which is similar to the name of the browser (that is, WebKit for Chrome or Moz for Firefox). This allows the code to know if the API is a standard one or not, and deals with it accordingly. Unfortunately, this also creates several different methods for accessing the API in different browsers. We overcome this by creating a function to test if any of these functions exist in the current browser and, if they do, we assign them all to one common function that we can use in the rest of the code.

The next thing we do is to access the getUserMedia function. This function looks for a set of parameters (to customize what the browser will do) and a callback function. The callback function should accept one parameter: where the stream is coming from and the media devices on the computer.

This object points to a media stream that the browser is holding onto for us. This is constantly capturing data from the camera and microphone, and waiting for the instructions from the web application to do something with it. We then get the <video> element on the screen and load this stream into that element using window.URL.createObjectURL. Since elements cannot accept the JavaScript objects as parameters, it needs some string to fetch the video stream from. This function takes the stream object and turns it into a local URL that it can get the stream data from.

Constraining the media stream

Now that we know how to get a stream from the browser, we will cover configuring this stream using the first parameter of the getUserMedia API. This parameter expects an object of keys and values telling the browser how to look for and process streams coming from the connected devices. The first options we will cover are simply turning on or off the video or audio streams:

navigator.getUserMedia({ video: false, audio: true }, function (stream) {
  // Now our stream does not contain any video!

When you add this stream to the <video> element, it will now not show any video coming from the camera. You can also do the opposite and get just a video feed and no audio. This is great while developing a WebRTC application when you do not want to listen to yourself talk all day!

If you refresh a browser, you will see a drop-down popup stating that needs access to the microphone

Constraining the video capture

The options for constraining the getUserMedia API not only allows the true or false values, but also allows you to pass in an object with a complex set of constraints. You can see the full set of constraints provided in the specification detailed at https://tools.ietf.org/html/draft-alvestrand-constraints-resolution-03. These allow you to constrain options such as minimum required resolution and frameRate, video aspect ratio, and optional parameters all through the configuration object passed into the getUserMedia API.

This helps developers tackle several different scenarios that are faced while creating a WebRTC application. It gives the developer an option to request certain types of streams from the browser depending on the situation that the user is currently in. Some of these streams are listed here:

  • Asking for a minimum resolution in order to create a good user experience for everyone participating in a video call
  • Providing a certain width and height of a video in order to stay in line with a particular style or brand associated with the application
  • Limiting the resolution of the video stream in order to save computational power or bandwidth if on a limited network connection

For instance, let's say that we wanted to ensure the video playback is always set to the aspect ratio of 16:9. This would be to avoid the video coming back in a smaller than desired aspect ratio, such as 4:3. If you change your getUserMedia call to the following, it will enforce the correct aspect ratio:

    video: {
      mandatory: {
        minAspectRatio: 1.777,
        maxAspectRatio: 1.778
      optional: [
        { maxWidth: 640 },
        { maxHeigth: 480 }
    audio: false
  }, function (stream) {
    var video = document.querySelector('video');
    video.src = window.URL.createObjectURL(stream);
  }, function (error) {
    console.log("Raised an error when capturing:", error);

When you refresh your browser and give permission to the page to capture your camera, you should see the video is now wider than it used to be. In the first section of the configuration object, we gave it a mandatory aspect ratio of 16:9 or 1.777. In the optional section, we told the browser that we would like to stay under a width and height of 640 x 480. The optional block tells the browser to try and meet these requirements, if at all possible. You will probably end up with a 640 x 360 width and height for your video as this is a common solution to these constraints that most cameras support.

You will also notice that we passed in a second function to getUserMedia call. This is the error callback function and gets called if there are any issues with capturing the media stream. This could happen to you in the preceding example if your camera did not support 16:9 resolutions. Be sure to keep an eye on the development console in your browser to see any errors that get raised when this happens. If you successfully run this project, you can also change minAspectRatio or maxAspectRatio to see which parameters your browser can successfully run.

The power this gives us is the ability to adapt to the situations of the user's environment to provide the best video stream possible. This is incredibly helpful since the environment for browsers is vast and varied from user to user. If your WebRTC application plans to have a lot of users, you will have to find unique solutions to every unique environment. One of the biggest pains is supporting mobile devices. Not only do they have limited resources, but also limited screen space. You might want the phone to only capture a 480 x 320 resolution or smaller video stream in order to conserve power, processing, and bandwidth. A good way to test whether the user is on a mobile device is to use the user agent string in the browser and test it against the names of common mobile web browsers. Changing the getUserMedia call to the following will accomplish this:

var constraints = {
    video: {
      mandatory: {
        minWidth: 640,
        minHeight: 480
    audio: true
  if (/Android|webOS|iPhone|iPad|iPod|BlackBerry|IEMobile|Opera Mini/i.test(navigator.userAgent)) {
    // The user is using a mobile device, lower our minimum resolution
    constraints = {
      video: {
        mandatory: {
          minWidth: 480,
          minHeight: 320,
          maxWidth: 1024,
          maxHeight: 768
      audio: true
  navigator.getUserMedia(constraints, function (stream) {
    var video = document.querySelector('video');
    video.src = window.URL.createObjectURL(stream);
  }, function (error) {
    console.log("Raised an error when capturing:", error);

Constraints are not something to quickly glance over. They are the easiest way to increase the performance of a WebRTC application