In this part, we will focus on the Basic WebRTC Application.

The first step of any WebRTC application is to create an RTCPeerConnection. Creating a successful RTCPeerConnection will require an understanding of the inner workings of how a browser creates peer connections. Firstly, in this chapter, we will lay the groundwork to understand the internals of WebRTC. Then we will utilize this knowledge to create a basic WebRTC video chat application.

In this chapter, we will cover the following topics:

  • Understanding UDP transport and real-time transfer
  • Signaling and negotiating with other users locally
  • Finding other users on the Web and NAT traversal
  • Creating an RTCPeerConnection

Understanding UDP transport and real-time transfer

Real-time transfer of data requires a fast connection speed between both the users. A typical connection needs to take a frame of both—audio and video—and send it to another user between 40 and 60 times per second in order to be considered good quality. Given this constraint, audio and video applications are allowed to miss certain frames of data in order to keep up the speed of the connection. This means that sending the most recent frame of data is more important than making sure that every frame gets to the other side.

A similar effect can already be seen with any video-playing application today. Video games and streaming media players can tolerate losing a few frames of video due to the special properties of the human brain. Our minds try to fill in the missing gaps as we visualize and process a video or game that we are watching. If our goal is to play 30 frames in one second and we miss frame 28, most of the time, the user will not even notice. This gives our video applications a different set of requirements:

That is why User Datagram Protocol (UDP) is the transport protocol of choice when dealing with WebRTC applications. It gives us the power, or rather the lack of control, we need when dealing with a high-performance application. Most web applications today are built on top of the Transmission Control Protocol (TCP). The reason for this is because of the guarantees it makes for its users, some of which are listed here :

  • Any data sent will be acknowledged as received
  • Any data that does not make it to the other side will get resent and halt the sending of any more data
  • Data will be unique and no data will be duplicated on the other side

These features are the reason why TCP is a great choice for most things on the Web today. If you are sending an HTML page, it makes sense to have all the data come in the correct order with a guarantee that it got to the other side. Unfortunately, this technology is not a great fit for all use cases. Take, for instance, streaming data in a multiplayer game. Most data in a video game becomes stale in seconds or even less than that. This means that the user only cares about what has happened in the last few seconds and nothing more. If every piece of data needs to be guaranteed to make it to the other side, this can lead to a large bottleneck when the data goes missing:

It is the need to work around the constraints of TCP that led the WebRTC developers to choose UDP as their preferred method of transport. The audio, video, and data requirements of WebRTC are not meant to be the most reliable connection, but rather to be the fastest one between the two browsers. This means we can afford to lose frames, which in turn means that UDP is a much better choice for these types of applications.

This does not mean that WebRTC never uses TCP as a mode of transportation. Later on, we will learn about Traversal Using Relays around NAT (TURN) servers and how they assist in transporting the WebRTC data between networks with heavy security using TCP.

UDP enables this scenario by making a lot of non-guarantees. It was built to be a less reliable transport layer that makes fewer assumptions about the data you are sending. You can see why in this list of things it does not guarantee:

  • It does not guarantee the order your data is sent in or the order in which it will arrive on the other side
  • It does not guarantee that every packet of data will make it to the other side; some may get lost along the way
  • It does not track the state of every single data packet and will continue to send data even if data has been lost by the other client

Now, WebRTC can send audio and video in the fastest way possible. This should also reveal why WebRTC can be such a complex topic. Not every network allows UDP traffic across it. Large networks with corporate firewalls can block UDP traffic outright to try and protect against malicious connections. These connections have to travel along a different path than most of the web page downloads do today. Many workarounds and processes have to be built around UDP to get it to work properly for a wide audience. This is just the tip of the iceberg when it comes to WebRTC technology. In the next few sections, we will cover the other supporting technologies that enable WebRTC in the browser.

UDP and TCP are not just used for web pages, but most Internet-based traffic you see today. You will find them being used in mobile devices, TVs, cars, and more. This is why it is important to understand these technologies, and how they work.

The WebRTC API

These functions and objects allow developers to communicate with the WebRTC layer and make peer connections to other users. It consists of a few main pieces of technology:

  • The RTCPeerConnection object
  • Signaling and negotiation
  • Session Description Protocol (SDP)
  • Interactive Connectivity Establishment (ICE)

The RTCPeerConnection object

The RTCPeerConnection object is the main entry point to the WebRTC API. It is what allows us to initialize a connection, connect to peers, and attach media stream information. It handles the creation of a UDP connection with another user.

The job of the RTCPeerConnection object is to maintain the session and state of a peer connection in the browser. It also handles the setup and creation of a peer connection. It encapsulates all of these things and exposes a set of events that get fired at key points in the connection process. These events give you access to the configuration and internals of what happens during a peer connection:

The RTCPeerConnection object is a simple object in the browser and can be instantiated using the new constructor as follows:

var myConnection = new RTCPeerConnection(configuration);
myConnection.onaddstream = function (stream) {
  // Use stream here
};

The connection accepts a configuration object, which we will see later. In the example, we have also added a handler for the onaddstream event. This is fired when the remote user adds a video or audio stream to their peer connection.

Signaling and negotiation

Typically, connecting to another browser requires finding where that other browser is located on the Web. This is usually in the form of an IP address and port number, which act as a street address to navigate to your destination. The IP address of your computer or mobile device allows other Internet-enabled devices to send data directly between each other; this is what RTCPeerConnection is built on top of. Once these devices know how to find each other on the Internet, they also need to know how to talk to each other. This means exchanging data about which protocols each device supports as well as video and audio codecs and more.

This means that, in order to connect to another user, you need to know quite a bit about them. One possible solution would be to store a list on your computer of the users that you can connect to. To enable communication with another user, you would simply have to exchange contact information and let WebRTC handle the rest. This has the drawback, however, of your having to manually share information with each user that you want to connect to. You would have to maintain a big list of any users you wanted to connect with and exchange information through some other channel of communication. With WebRTC, we can make this process much more automated.

Luckily, the Web today has solved this problem in most communication applications we use today. To connect with anyone on popular services such as Facebook or LinkedIn, you just need to know their name and search for them. You can then add them to your list of known contacts and access their information at any time. This process is known as signaling and negotiation in WebRTC.

The process of signaling consists of a few steps:

  1. Generate a list of potential candidates for a peer connection.
  2. Either the user or a computer algorithm will select a user to make a connection with.
  3. The signaling layer will notify that user that someone would like to connect with him/her, and he/she can accept or decline.
  4. The first user is notified of the acceptance of the offer to connect.
  5. If accepted, the first user will initiate RTCPeerConnection with the other user.
  6. Both the users will exchange hardware and software information about their computers over the signaling channel.
  7. Both the users will also exchange location information about their computers over the signaling channel.
  8. The connection will either succeed or fail between the users.

Session Description Protocol

To get connected with another user, you need to know a bit about them first. Some of the most important things to know about the other client is what audio and video codecs they support, how their network looks, and how much data their computer can handle. It also needs to be easily transportable between clients. Since we do not specify how this data should be transferred, it should also be capable of being sent over numerous types of transport protocols. This means we need a string-based business card with all the information about a user that we can send to other users. Luckily, this is exactly what SDP provides us with.

The great thing about SDP is that it has been around a long time, dating back to the late 90s for the first initial draft. This means that SDP is a tried-and-true method of establishing media-based connections between clients. It has been used in numerous other types of applications before WebRTC, such as phones and text-based chatting. This also means there are a lot of great resources out there on using and implementing it.

The SDP is a string-based data blob provided by the browser. The format of this string is a set of key-value pairs, all separated by line breaks:

<key>=<value>\n

The key is a single character that establishes the type of value this is. The value is a structured set of text that comprises a machine-readable configuration value. The different key-value pairs are then split by line breaks.

The SDP will cover the description, timing configuration, and media constraints for a given user. The SDP is given by the RTCPeerConnection object during the process of establishing a connection with another user. When we start working with the RTCPeerConnection object later in the chapter, you can easily print this to the JavaScript console. This will allow you to see exactly what is contained in the SDP, which may look something like this:

v=0
o=- 1167826560034916900 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS K44HTOZVjyAyAlvUVD3pOLu8i0LdytHiWRp1
m=audio 1 RTP/SAVPF 111 103 104 0 8 106 105 13 126
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:Vl5FBUBecw/U3EzQ
a=ice-pwd:OtsNG6FzUH8uhNEhOg9/hprb
a=ice-options:google-ice
a=fingerprint:sha-256 FB:56:7D:B6:E0:C7:E7:39:FE:47:5A:12:6C:B4:4E:0E:2D:18:CE:AE:33:92: A9:60:3F:14:E4:D9:AA:0D:BE:0D
a=setup:actpass
a=mid:audio
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=sendrecv
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:zE+3pkUbJyFG4UmmvPxG/OFC4+QE24X8Zf3iOSCf
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
a=maxptime:60
a=ssrc:4274470304 cname:+j4Ma6UfMsCcQCWK
a=ssrc:4274470304 msid:K44HTOZVjyAyAlvUVD3pOLu8i0LdytHiWRp1 a1751f6b-98de-469b-b6c0-81f46e19009d
a=ssrc:4274470304 mslabel:K44HTOZVjyAyAlvUVD3pOLu8i0LdytHiWRp1
a=ssrc:4274470304 label:a1751f6b-98de-469b-b6c0-81f46e19009d
m=video 1 RTP/SAVPF 100 116 117
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:Vl5FBUBecw/U3EzQ
a=ice-pwd:OtsNG6FzUH8uhNEhOg9/hprb
a=ice-options:google-ice
a=fingerprint:sha-256 FB:56:7D:B6:E0:C7:E7:39:FE:47:5A:12:6C:B4:4E:0E:2D:18:CE:AE:33:92: A9:60:3F:14:E4:D9:AA:0D:BE:0D
a=setup:actpass
a=mid:video
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send- time
a=sendrecv
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:zE+3pkUbJyFG4UmmvPxG/OFC4+QE24X8Zf3iOSCf
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=ssrc:3285139021 cname:+j4Ma6UfMsCcQCWK
a=ssrc:3285139021 msid:K44HTOZVjyAyAlvUVD3pOLu8i0LdytHiWRp1 bd02b355-b8af-4b68-b82d-7b9cd03461cf
a=ssrc:3285139021 mslabel:K44HTOZVjyAyAlvUVD3pOLu8i0LdytHiWRp1
a=ssrc:3285139021 label:bd02b355-b8af-4b68-b82d-7b9cd03461cf

This is taken from my own machine during the session initiation process. As you can see, the code that is generated is complex to understand at first glance. It starts off by identifying the connection with the IP address. Then, it sets up basic information about the request such as whether I am requesting audio, video, or both. Next it sets up some audio information, including topics such as encryption type and the ice configuration. It also sets up the video information in the same manner. In the end, the goal is not to understand every line, but to get familiar with what the use of SDP is. You will never have to work with it directly during the course of this book, but may need to at some point in the future.

Overall, the SDP acts as a business card for your computer to other users trying to connect with you. The SDP, combined with signaling and negotiation, is the first half of the peer connection. In the next few sections, we will cover what happens after both users know how to find each other.

Finding a clear route to another user

A big part of most networks today is security. The chances are that any network you are using has several layers of access control, telling your data where and how it can be sent. This means that connecting to another user requires finding a clear path around not just your own network, but the other user's network as well. There are multiple technologies involved to achieve this inside WebRTC:

  • Session Traversal Utilities for NAT(STUN)
  • Traversal Using Relays around NAT (TURN)
  • Interactive Connectivity Establishment (ICE)

SESSION TRAVERSAL UTILITIES FOR NAT

STUN is the first step in finding a good connection between two peers. It helps identify each user on the Internet, and is intended to be used by other protocols in making a peer connection. It starts by making a request to a server, enabled with the STUN protocol. The server then identifies the IP address of the client making the request, and returns that to the client. The client can then identify itself with the given IP address.

Using the STUN protocol requires having a STUN-enabled server to connect to. Currently, in Firefox and Chrome, default servers are provided directly from the browser vendors. This is great for getting up-and-running quickly and testing things out.

Although you may be praising the joys of serverless communication, setting up a good quality WebRTC application actually requires several servers to be enabled. You will need to provide your own set of STUN and TURN servers for your clients to use. There are plenty of great services already providing this today, so be sure to search around to find more information.

TRAVERSAL USING RELAYS AROUND NAT

In some cases, a firewall might be too restrictive and not allow any STUN-based traffic to the other user. This may be the case in an enterprise NAT that utilizes port randomization to allow thousands of more devices than you would typically find. In this case, we need a different method of connecting with another user. The standard for this is called TURN.

The way this works is by adding a relay in between the clients that acts as a peer to peer connection on behalf of the client. The client then gets its information from the TURN server, much like streaming a video from a popular video service by making a request out to the server. This requires the TURN server to download, process, and redirect every packet that gets sent to it for each client. This is why, using TURN is often considered a last resort when making a WebRTC connection as the cost is high for setting up a quality TURN service.

There are many different statistics published on the use of STUN versus TURN, but they all seem to point to the same conclusion—most of the time, your users will be fine without TURN. The use of WebRTC with STUN will work with most network configurations. When setting up your own WebRTC service, it is a good idea to track this information and decide for yourself if the cost of using a TURN service is worth it.

You may notice that none of the examples have configuration values for TURN servers. We assume that the network you are on will be compatible with STUN. If you are having trouble connecting, it may be necessary to find a public low-use TURN server and use it while following the examples.

INTERACTIVE CONNECTIVITY ESTABLISHMENT

Now that we have covered STUN and TURN, we can learn how it is all brought together through another standard called ICE. It is the process that utilizes STUN and TURN to provide a successful route for peer to peer connections. It works by finding a range of addresses available to each user and testing each address in sorted order, until it finds a combination that will work for both the clients.

The process of ICE starts off by making no assumptions about each user's network configuration. It will incrementally go through a set of steps to discover how each client's network is set up. This process will use different sets of technologies to do this. The goal is to discover enough information about each network to make a successful connection.

Each ICE candidate is found through the use of STUN and TURN. It will query the STUN server to find the external IP address and append the location of a TURN server as a backup if the connection fails. Whenever the browser finds a new candidate, it notifies the client application that it needs to send the ICE candidate through the signaling channel. After enough addresses have been found and tested, and a connection is made, the process finally comes to an end.