At some point when creating a WebRTC application, you will have to break away from developing for a client and build a server. Most WebRTC applications are not solely dependent on just being able to communicate through audio and video and typically need many other features to be interesting. We are going to dive into server programming using JavaScript and Node.js. We are going to create the basis for a basic signaling server.

We will cover the following topics:

  • Setting up our environment to develop in Node.js
  • Connecting to the client using WebSockets
  • Identifying users
  • Initiating and answering a WebRTC call
  • Handling ICE candidate transfers
  • Hanging up a call

Throughout this talk, we are going to focus solely on the server part of the application. Next, we will build the client part of this example. Our example server will be bare bones in nature, giving us just enough to set up a WebRTC peer connection.

Building a signaling server

The server we are going to build will help us connect two users together who are not located on the same computer. The goal of the server is to replace the signaling mechanism with something that travels over a network. The server will be straightforward and simple, supporting only the most basic WebRTC connections.

Our implementation will have to respond to and answer requests from multiple users. It will do this by having a simple bidirectional messaging system between clients. It will allow one user to call another and setup a WebRTC connection between them. Once a user has called another, the server will pass the offer, answer, and ICE candidates between the two users. This will allow them to successfully setup a WebRTC connection.

The preceding diagram shows the messaging flow between clients when using the signaling server to setup a connection. Each side will start by registering themselves with the server. Our logging in will simply send a string-based user identifier to the server and make sure it is not taken already. Once both users have registered with the server, they can then call another user. Making an offer with the user identifier they wish to call does this. The other user should answer in turn. Finally, candidates are sent between clients until they can successfully make a connection. At any point, a user can terminate the connection by sending the leave message. The implementation will be simple, acting mostly as a pass-through for the users to send messages to each other.

Keep in mind that this is just one example of a signaling server. Since there are no rules when implementing signaling, you can use any protocol, technology, or pattern that you like!

Getting a connection

The steps required to create a WebRTC connection are required to be real-time. This means that clients will have to be able to transfer messages between each other in real time without using a WebRTC peer connection. This is where we will utilize another powerful feature of HTML5 called WebSockets.

A WebSocket is exactly what it sounds like—an open bidirectional socket connection between two endpoints—a web browser and a web server. You can send messages back and forth over the socket using strings and binary information. It is designed to be implemented in both web browsers and web servers to enable communication between them, outside of the realm of AJAX requests.

The WebSocket protocol has been around since about 2010 and is a well-defined standard that is available in most browsers today. It has wide support across web clients, and many server technologies have frameworks dedicated to their use. There are even entire frameworks that rely on WebSocket technology such as the Meteor JavaScript framework.

The big difference between the WebSocket protocol and the WebRTC protocol is the use of the TCP stack. WebSockets has been designed to be client-to-server in nature and utilizes TCP transport for a reliable connection. This means it has many of the bottlenecks that WebRTC does not have, which we described in Understanding UDP transport and real-time transfer , Creating a Basic WebRTC Application. This is also the reason that it works well as a signaling transport protocol. Since it is reliable, our signals are less likely to get dropped between users, giving us more successful connections. It is also built into the browser and makes it easy to set up using Node.js, which makes the implementation of our signaling server easier to understand.

To utilize the power of WebSockets in our project, we must first install a supported WebSockets library for Node.js. We will be using the ws project from the npm registry. To install the library, navigate to the directory of the server and run the following command:

npm install ws

npm is a package manager for Node.js. It hosts and keeps a list of open source frameworks that anyone can download and use in their projects. Navigate to https://www.npmjs.org/ for more information on this subject.

Now that we have installed the WebSocket library, we can start using it in our server. You can insert the following code in our index.js file:

var WebSocketServer = require(''ws'').Server,
    wss = new WebSocketServer({ port: 8888 });

wss.on(''connection'', function (connection) {
  console.log(""User connected"");

  connection.on(''message'', function (message) {
    console.log(""Got message:"", message);
  });

  connection.send(''Hello World'');
});

The first line requires the WebSocket library that we installed in our previous command. We then create the WebSocket server, telling it what port to connect to listen on. You can specify any port you would like if you need to change this setting.

Next, we listen to the connection event coming from the server. This code will get called whenever a user makes a WebSocket connection to the server. It will give you a connection object that has all sorts of information about the user who has just connected.

We then listen to any messages that are being sent by the user. For now, we just log these messages to the console.

Finally, we send a response to the client saying Hello World. This happens immediately when the server has completed the WebSocket connection with the client.

Note that the connection event happens for any user connecting to the server. This means that you can have multiple users connecting to the same server and each one will trigger the connectionevent individually. This asynchronous-based code is often seen as one of the strong points of programming in Node.js.

Now we can run our server by running node index.js. The process should start and simply wait to handle WebSocket connections. It will do this indefinitely until you stop the process from running.

Testing our server

To test whether our code is functioning properly, we can use the wscat command that comes with the ws library. The great thing about npm is that you cannot only install libraries to use in your application, but also install libraries globally to be used as command-line tools. The way to do this is by running npm install -g ws, although you might need to use administrator privileges when running this command.

This should give us a new command called wscat. This tool allows us to connect directly to WebSocket servers from the command line and test out commands against them. To do this, we run our server in one terminal window, then open a new one and run the wscat -c ws://localhost:8888 command. You will notice ws://, which is the custom directive for the WebSocket protocol instead of HTTP. Your output should look similar to this:

Your server should also log the connection to its console:

If either of these do not work, then check the code against the listing and read the documentation for the ws library as well as Node.js and npm. These tools may work differently in different environments and require extra setting up in some cases. If everything does work, pat yourself on the back for writing a WebSocket server in Node.js with 12 lines of code.

Identifying users

In a typical web application, the server will need a way to identify between connected clients. Most applications today use the one-identity rule and have each user login to a respective string-based identifier known as their username. We will also be using the same rule in our signaling application. It will not be as sophisticated as some of the methods used today since we will not even require a password from the user. We simply need an ID for each connection so we know where to send messages.

To start, we are going to change our connection handler a bit, to look similar to this:

connection.on(''message'', function (message) {
    var data;

    try {
      data = JSON.parse(message);
    } catch (e) {
      console.log(""Error parsing JSON"");
      data = {};
    }
});

This will change our WebSocket implementation to only accept JSON messages. Since WebSocket connections are limited to strings and binary data, we need a way to send structured data across the wire. JSON allows us to define structured data and then serialize it to a string that we can send over a WebSocket connection. It is also the easiest form of serialization to use in JavaScript.

Next, we will need a way to store all of our users who are connected. Since our server is simplistic in nature, we will use a hash-map otherwise known in JavaScript as an object, to store our data. We can change the top of our file to look similar to this:

var WebSocketServer = require(''ws'').Server,
    wss = new WebSocketServer({ port: 8888 }),
    users = {};

To login, we will need to know that a user is sending a login type message. To support this, we are going to add a type field to every message that is sent from the client. This will allow our server to know what to do with the data that it is receiving. Firstly, we will define what to do when the user tries to login:

connection.on(''message'', function (message) {
    var data;

    try {
      data = JSON.parse(message);
    } catch (e) {
      console.log(""Error parsing JSON"");
      data = {};
    }

    switch (data.type) {
      case ""login"":
        console.log(""User logged in as"", data.name);
        if (users[data.name]) {
          sendTo(connection, {
            type: ""login"",
            success: false
          });
        } else {
          users[data.name] = connection;
          connection.name = data.name;
          sendTo(connection, {
            type: ""login"",
            success: true
          });
        }

        break;
    default:
        sendTo(connection, {
          type: ""error"",
          message: ""Unrecognized command: "" + data.type
        });

        break;
    }
    });

We use a switch statement to handle each message type accordingly. If the user sends a message with the login type, we first need to see if anyone has already logged into the server with that ID. If they have, we tell the client that they have not successfully logged in and need to pick a new name. If no one is using this ID, we set the connection to a key in our user's object with the ID being the key. If we run into any commands we do not recognize, we also send a message back to the client saying there was an error processing their request.

I also added a helper function called sendTo in the code that handles sending a message to a connection. This can be added anywhere in the file:

function sendTo(conn, message) {
  conn.send(JSON.stringify(message));
}

What this function does is ensure that all of our messages are always encoded in the JSON format. This also helps reduce the amount of code we have to write. It is always good practice to keep message sending in one place in case something else has to be done when sending messages to clients.

The last thing we have to do is provide a way to clean up client connections when they disconnect. Luckily, our library provides an event just when this happens. We can listen to this event and delete our user in this way:

connection.on(''close'', function () {
    if (connection.name) {
      delete users[connection.name];
    }
  });

This should be added in the connection event as in the case of the message handler.

Now it is time to test our server with our login command. We can use the client as we did before to test out our login command. One thing to keep in mind is that messages we send now have to be encoded in the JSON format for them to be accepted by the server.

Once we connect, we can send the following message to our server:

{ ""type"": ""login"", ""name"": ""Foo"" }

The output you receive should look similar to this:

Initiating a call

From here on, our code does not get any more complex than the login handler. We will create a set of handlers to pass our message correctly for each step of the way. One of the first calls that is made after logging in is the offer handler, which designates that one user would like to call another.

It is a good idea not to get call initiations mixed up with the offer step of WebRTC. In this example, we have combined the two to make our API easier to work with. In most settings, these steps will be separated. This can be seen in an application, such as Skype, where the other user has to accept the incoming call before a connection is established between the two users.

We can now add the offer handler into this code:

case ""offer"":
        console.log(""Sending offer to"", data.name);
        var conn = users[data.name];

        if (conn != null) {
          connection.otherName = data.name;
          sendTo(conn, {
            type: ""offer"",
            offer: data.offer,
            name: connection.name
          });
        }

        break;

The first thing we do is get connection of the user we are trying to call. This is easy to do since the ID of the other user is always where our connection is stored in our user-lookup object. We then check if the other user exists and if so, send them the details of offer. We also add an otherNameproperty to the user's connection object so that we can look this up easily later on in the code. You might also notice that none of this code is WebRTC-specific. This could potentially refer to any sort of calling technology between two users.

Something you may also notice is the lack of error handling. This is perhaps one of the most tedious parts of WebRTC. Since a call can fail at any point of the process, we have many places where making a connection can fail. It can also fail for various reasons, such as network availability, firewalls, and more. We leave it up to the user to handle each error case individually in the manner that they would like.

Answering a call

Answering the response is just as easy as offer. We follow a similar pattern and let the clients do most of the work. Our server will simply let any message pass through as answer to the other user. We can add this in after the offer handling case:

case ""answer"":
        console.log(""Sending answer to"", data.name);
        var conn = users[data.name];

        if (conn != null) {
          connection.otherName = data.name;
          sendTo(conn, {
            type: ""answer"",
            answer: data.answer
          });
        }

        break;

You can see how similar the code looks in the preceding listing. Note, we are also relying on answerto come from the other user. If a user were to send answer first, instead of offer, it could potentially mess up our server implementation. There are many use cases where this server will not be sufficient enough, but it will work well for our next implementation.

This should be a good start to the offer and answer mechanism in WebRTC. You should see that it follows the createOffer and createAnswer functions on RTCPeerConnection. This is exactly where we will start plugging in our server connection to handle remote clients.

We can even test our current implementation using the WebSocket client we used before. Connecting two clients at the same time allows us to send offer and response between the two. This should give you more insight into how this will work in the end. You can see the results from running two clients simultaneously in the terminal window, as shown in the following screenshot:

In my case, my offer and answer were simple string messages.

Handling ICE candidates

The final piece of the WebRTC signaling puzzle is handling ICE candidates between users. Here, we use the same technique as before to pass messages between users. The difference in the candidate message is that it might happen multiple times per user and in any order between the two users. Thankfully, our server is designed in a way that can handle this easily. You can add this candidatehandler code to your file:

case ""candidate"":
        console.log(""Sending candidate to"", data.name);
        var conn = users[data.name];

        if (conn != null) {
          sendTo(conn, {
            type: ""candidate"",
            candidate: data.candidate
          });
        }

        break;

Since the call is already set up, we do not need to add the other user's name in this function either. Go ahead and test this one on your own using the terminal WebSocket client. It should work similarly to the offer and answer functions, passing messages between the two.

Hanging up a call

Our last bit is not part of the WebRTC specification, but is still a good feature to have—hanging up. This will allow our users to disconnect from another user so they are available to call someone else. This will also notify our server to disconnect any user references we have in our code. You can add the ""leave"" handler as detailed in the following code:

case ""leave"":
        console.log(""Disconnecting user from"", data.name);
        var conn = users[data.name];
        conn.otherName = null;

        if (conn != null) {
          sendTo(conn, {
            type: ""leave""
          });
        }

        break;

This will also notify the other user of the leave event so they can disconnect their peer connection accordingly. Another thing we have to do is handle the case of when a user drops their connection from the signaling server. This means we can no longer serve them and that we need to terminate their calls. We can change the close handler we used before to look similar to this:

connection.on(''close'', function () {
    if (connection.name) {
      delete users[connection.name];

      if (connection.otherName) {
        console.log(""Disconnecting user from"", connection.otherName);
        var conn = users[connection.otherName];
        conn.otherName = null;

        if (conn != null) {
          sendTo(conn, {
            type: ""leave""
          });
        }
      }
    }
  });

This will now disconnect our users if they happen to terminate their connection unexpectedly from the server. This can help in cases where we are still in offer, answer, or a candidate state but the other user closes their browser window. In this case, the WebRTC API will not send any events of this happening and we need another way to know that the user has left. Having the signaling server handle in this case helps make our application more reliable and stable overall.

Complete signaling server

Here is the entire code for our signaling server. This includes logging in and handling all response types. I also added a listening handler at the end to notify you when the server is ready to accept WebSocket connections:

var WebSocketServer = require(''ws'').Server,
    wss = new WebSocketServer({ port: 8888 }),
    users = {};

wss.on(''connection'', function (connection) {
  connection.on(''message'', function (message) {
    var data;

    try {
      data = JSON.parse(message);
    } catch (e) {
      console.log(""Error parsing JSON"");
      data = {};
    }

    switch (data.type) {
      case ""login"":
        console.log(""User logged in as"", data.name);
        if (users[data.name]) {
          sendTo(connection, {
            type: ""login",
            success: false
          });
        } else {
          users[data.name] = connection;
          connection.name = data.name;
          sendTo(connection, {
            type: "login",
            success: true
          });
        }

        break;
      case "offer":
        console.log("Sending offer to", data.name);
        var conn = users[data.name];

        if (conn != null) {
          connection.otherName = data.name;
          sendTo(conn, {
            type: "offer",
            offer: data.offer,
            name: connection.name
          });
        }

        break;
      case "answer":
        console.log("Sending answer to", data.name);
        var conn = users[data.name];

        if (conn != null) {
          connection.otherName = data.name;
          sendTo(conn, {
            type: "answer",
            answer: data.answer
          });
        }

        break;
      case "candidate":
        console.log("Sending candidate to", data.name);
        var conn = users[data.name];

        if (conn != null) {
          sendTo(conn, {
            type: "candidate",
            candidate: data.candidate
          });
        }

        break;
      case "leave":
        console.log("Disconnecting user from", data.name);
        var conn = users[data.name];
        conn.otherName = null;

        if (conn != null) {
          sendTo(conn, {
            type: "leave"
          });
        }

        break;
      default:
        sendTo(connection, {
          type: "error",
          message: "Unrecognized command: " + data.type
        });

        break;
    }
  });

  connection.on('close', function () {
    if (connection.name) {
      delete users[connection.name];

      if (connection.otherName) {
        console.log("Disconnecting user from", connection.otherName);
        var conn = users[connection.otherName];
        conn.otherName = null;

        if (conn != null) {
          sendTo(conn, {
            type: "leave"
          });
        }
      }
    }
  });
});

function sendTo(conn, message) {
  conn.send(JSON.stringify(message));
}

wss.on('listening', function () {
    console.log("Server started...");
});

You may notice that we are using an unsecure WebSocket server in our example. In the real world, the WebSocket protocol actually supports SSL similar to how HTTP supports HTTPS. You can simply use wss:// to enable this when connecting the server.

Feel free to test our server application using the WebSocket client just as you did earlier. You can even try connecting three, four, or more users to the server and see how it handles multiple connections. You will probably also find many use cases that our server does not handle. It is a good idea to note these cases and even improve the server to work around them.

Signaling in the real world

It has taken us a lot of effort to get to a basic signaling server to connect two WebRTC users. At this point, you may be wondering how signaling servers are built in the real world for production applications. Since signaling is such an abstract concept that is not defined by the WebRTC specification, the answer is that anything goes.

Signaling is such a complex and difficult issue to solve because of the "anything goes" mentality it brings. There are many resources out there offered by the WebRTC makers, but none of them details how exactly signaling is best implemented for users. There are many issues to solve here and not all of them are the same for every use case. Some developers might need a highly scalable solution that can connect millions of users across the globe. Another developer might need a solution that integrates with Facebook, and another might need to integrate with Twitter. It is an extremely tough topic to cover and will require lots of time and research to find the best solution. Here, we will detail a few of the common pitfalls and solutions when researching signaling servers.

The woes of WebSockets

The great thing about WebSockets is that it has brought bidirectional communication to browsers. Many consider WebSockets to be the answer to all their problems, enabling faster socket connections directly to servers. This being said, there are still a few wrinkles to iron out in the WebSocket space.

One of these wrinkles is the network firewall problem. Under ideal conditions, WebSockets is a reliable connection but, unlike its HTTP counterpart, it is easy for it to become unstable under proxy configurations. The additional overhead of a Virtual Private Network (VPN) or complex firewall systems can cause the connection success rate to drop significantly. This means you will have to fall back on other technologies such as HTTP streaming to accomplish the same task.

This now introduces some race conditions in the WebRTC space. Any latency in the pipeline can cause out-of-order message processing, giving poor results when connecting in WebRTC. Remember that when making a WebRTC connection, the order is important and doing things out of order can cause a failed connection.

All this aside, WebSockets is an amazing piece of technology. The moral of the story is that when creating a real-world product there will be hiccups using WebSockets as a signaling server technology. Many companies are using them effectively today but have many fallbacks in cases where WebSockets does not work well in given network conditions.

Connecting with other services

One of the most exciting parts of WebRTC is that it will not only work well as a standalone solution, but also pairs well with other technologies. There have been numerous peer connectivity applications before WebRTC came around and since its introduction, efforts have been put forward to make WebRTC backward compatible. This includes using common frameworks seen in instant messaging systems and even technologies that our cellular phones use today.

XMPP

XMPP is an instant messaging protocol that dates back to the 90's under the name Jabber. The protocol was aimed at defining a common way to implement instant messaging, user presence, and contact lists. It is an open standard which anyone can use and integrate into their application. A large number of big instant messaging platforms have integrated XMPP into their service at some point, including Google Talk, Facebook Chat, and AOL Instant Messenger.

It is this wealth of historical data that makes XMPP an easy platform to use. It gives a lot of power to any typical WebRTC application since many video and audio communication platforms will at some point need presence and contact list data. On top of this, it is secure, has lots of documentation, and is extremely flexible in its implementation. There are a number of well-built ports to JavaScript and WebRTC as well as companies that are dedicated to offering WebRTC-based XMPP as a service. If you are able to get one working, it would be a great deal better than the simple signaling server we have built.

SESSION INITIATION PROTOCOL

The Session Initiation Protocol (SIP) is another standard dating back to the 90's. It is a signaling protocol that has been designed targeting cellular networks and phone systems. It is a well-defined and extremely well-supported protocol seen in use by major cell networks and network equipment providers.

The aim of SIP integration and WebRTC is to provide communication support with SIP-based phone devices that do not have WebRTC support. If we made our connection to a server that could translate our information, it could easily connect to a mobile phone or other communication devices. If it used SIP, it would also come with support for many of the features that come with phones today.

SIP is another large topic in itself. You can find countless ports and resources on integrating SIP with WebRTC on the Web. As opposed to XMPP, this is definitely on the other end of the spectrum as far as difficulty and complexity is concerned. Phone-based communication is a completely different topic with its own technologies and standards.

Self-test questions

Q1. The goal of the signaling server is to connect two users on separate networks so that they can make a peer connection between them. True or False?

Q2. What technology does WebSockets use to make a bidirectional connection between client and server?

  1. UDP
  2. TCP
  3. ICE
  4. STUN

Q3. Using JSON for client-to-server messages gives us which of the following benefits?

  1. String-based packet data for easy transport
  2. Complex structure definition inside messages
  3. Widely supported encoding and decoding methods
  4. All of the above

Q4. In signaling, the order of operations is offer, answer, then sending candidates back and forth until a connection is made. True or False?