WebRTC — The technology that powers Google Meet/Hangout, Facebook Messenger and Discord

WebRTC — The technology that powers Google Meet/Hangout, Facebook Messenger and Discord

https://res.cloudinary.com/practicaldev/image/fetch/s--dFPYxrOT--/c_imagga_scale,f_auto,fl_progressive,h_420,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/i/9tsx7bf26ejjb1wcj2ve.gif

Here is what happens during a P2P connection, and all you need to know about Web Real-Time Communication

Everything you need to know about Web RTC in 9 minutes

History of Real-time Communication

  • Around the 2010s, real-time communication was only available using additional software, plugins, or Adobe Flash.
  • In 2013, the first cross-browser video call between Chrome & Firefox was introduced.
  • In 2014, the first cross-browser data transfer occurred, opening to a new-emerging trend in real-time communication via the client-side.

Today, it is known as the Web RTC that we use every day in Chrome, Mozilla Firefox, Opera, Safari, Edge, iOS, and Android.

Source by CometChat

Overview

WebRTC stands for Web Real-Time Communication, which is a networking technology introduced in 2011 by Google to enable real-time audio, video, and data transmission across the web and native browsers.

"Its mission is to enable rich, high-quality RTC applications to be developed for the browser, mobile platforms, and IoT devices, and allow them all to communicate via a common set of protocols."

WebRTC allows web apps to create Peer-To-Peer communication. WebRTC is a vast topic, so in this post, we'll focus on the following issues of WebRTC:

  1. Why do developers & companies love Web RTC?
  2. What happens during the P2P connection
    • Signaling
    • NATs & ICE
    • STUN & TURN Server
    • VP9 Video Codec
  3. WebRTC APIs
  4. Security

Why do developers & companies love Web RTC?

  1. Free open source
  2. It provides browsers with end-to-end direct communication and allows developers to facilitate this connection easily.
  3. Speed Enhancement
  4. No longer needs to be routed through a server; it reduces latency and bandwidth consumption.
  5. Direct communication improves the speed of data transfer & file sharing.
  6. No third party app required
  7. Requiring no additional software, plugins, or continuous server involvement (Well, it does, but only in the beginning, you will know why later)
  8. Easily be embedded in any websites and connect peers across the internet.
  9. Easy to implement
  10. Less time and effort to facilitate peer-to-peer (P2P) connection.
  11. All the functionality can be done through the client-side. Developers just need to download a WebRTC compatible browser and use
  12. Compatible
  13. Supported by most popular browsers: Microsoft Edge, Google Chrome, Mozilla Firefox, Safari, Safari, Opera, Vivaldi.
  14. Supported by Android, Chrome OS, Firefox OS, BlackBerry 10, iOS, Tizen.
  15. Provide a safe connection across many browsers
  16. Encryption is mandatory for all of the WebRTC components.
  17. Since it is not a plugin, it runs inside the browser's sandbox without creating a new process so that no malware can get into the user's system.
  18. No need to keep track of the updates. With the automatic updates of the browser's version, the user gets the patch as soon as it is available.

What happens during the P2P connection

Image by PubNubTo connect two browsers, Web RTC is required to perform five steps to set up a P2P connection.

  1. Signal processing to remove ambient noise from the audio or video.
  2. Codec handling to compress and decompress the audio or video.
  3. Routing from one peer to another through firewalls, (NATs), and relays to create an Interactive Connectivity Establishment (ICE)
  4. User data is encrypted before transmitting across connections.
  5. Managing bandwidth to the user what each peer has to give

Signaling

  • P2P connections in the browser are established by a server to ensure all peers agree to the session.
  • Information like session keys, error messages, media metadata, codecs, bandwidth, and public IP address and ports are shared between peers to create the connection.
  • The server signals to both peers to determine what media format to use and what each peer wants to send to the other.

https://dev-to-uploads.s3.amazonaws.com/i/ty0arvc09kvgpeeedvh2.png

Network Address Translations (NATs) and ICE

NATs translate a private IP address found on devices like a home router to a public IP address. Firewalls and NATs slow the process by blocking specific protocols or ports. The solution WebRTC uses is a framework called ICE. ICE establishes a P2P connection over the internet by trying all connections in parallel and selecting the most efficient path. There are two types of connections: STUN & TURN

STUN Servers

It first works connecting through a STUN (Session Traversal Utilities for NAT) server to get a direct link through the network address.

A STUN server provides the requestor a public IP address to communicate with others. Its purpose is to help a requestor answer the question, "What's my IP Address?"

How STUN servers work

https://dev-to-uploads.s3.amazonaws.com/i/jca6kacmkgeg6hefd7dc.png

To set up a connection with other peers, an endpoint is required to know its public IP to share with others.

  1. When an endpoint (Calvin) is behind a NAT/Firewall, it can only identify its local IP address, and the other (Elana) cannot connect to the local IP because of the firewall security.
  2. This endpoint will ask for help from the STUN server to provide its public IP address and a type of NAT.
  3. The other endpoint (Elana) can attempt the connection between the two using the given public IP address from the STUN server.
  4. If successful, media will flow directly to each endpoint without a 3rd party or another server.
  5. For security, all STUN servers will be dropped and wait for the next query.

Limitations - Symmetric NAT

However, the situation above may sometimes fail, and the PORT and IP number will be changed.

This situation is called "symmetric NAT" as the public IP address of the STUN server does not have enough capability to establish connectivity here, as the port would also need a translation.

Some routers use Symmetric NAT, which is made to add another security layer to the endpoint or avoid many strangers to connect to your device. A Symmetric NAT not only translates the IP address from private to public but also translates ports.

In other words, the router will only accept connections from known peers that the user previously connected to. Hence, another solution is made to ensure that the connection between two peers is successful is through the TURN server.

Why STUN servers are useful

As a protocol, STUN is super fast, lightweight, and straightforward. It allows the media to travel directly to each other in a short time. STUN is beneficial to speed up the connection and get the result faster in real-time.

The scenario is similar when the user is using LAN to download the data, which is faster than downloading from the Wi-fi. Most importantly, it allows the media to travel directly between both endpoints. STUN can be used publicly and free.

TURN Servers

TURN (Traversal Using Relays around NAT) server acts as relay servers incase the peer-to-peer connection dies. While STUN servers are used to establish the connection, TURN servers remain active throughout the association.

A TURN server keeps relaying the media between the WebRTC peers. That is why the term "relay" is used to define TURN.

How TURN servers work

https://dev-to-uploads.s3.amazonaws.com/i/mu1qcuy8h8t951caxhk2.png

This relay server is used to relay traffic if the STUN server fails, and it also has the STUN's functions.

The TURN server is a STUN server with transmitting capability built-in. More specifically, TURN is used to relay audio/video/data streaming between peers, not signaling data.

  1. Follow the steps for STUN Servers
  2. If STUN fails, an end-user will create a connection with a TURN server, inform all peers to send data to the server, which is in charge of transferring data to the first end-user.

One main reason why a STUN server is always used first is that the TURN server is too expensive, and uses massive bandwidth if HD Video is streamed online.

VP9 Video Codec

One of the main features, why many people start to use WebRTC, is for video streaming. As live video becomes more mainstream and starts getting higher quality, it requires data transfer to be faster or the packet size to be smaller to be easily transferred.

That is when VP9 Video Codec takes place to compress and decompress the audio or video. It helps stream video quicker and more apparent. By supporting VP8, Safari 12.1 can exchange live video with other peers.

https://dev-to-uploads.s3.amazonaws.com/i/aik6ikypshs5ojhj1cpt.png

VP9, which is an improvement from VP8, is a video compression format owned by Google and created by On2 Technologies.

The main feature is to conceal packet loss and clean up noisy images, as well as capture and playback capabilities across multiple platforms.

With VP9, users can use WebRTC to stream a 720p video without packet loss or delay. It can also support a 1080p video call at the same bandwidth and helps reduce poor connections and data usage to avoid expensive costs for users.


JavaScript APIs

There are three main Javascript APIs that handle audio capturing, video conferencing, and data transmission:

MediaStream

  • Utilizes a user's camera and microphone to capture and stream audio and video. Using this API allows you to get access to input devices such as the microphone and the web camera.
  • When a developer integrates WebRTC into their website, they can create constraints on how they want the audio and video streamed. Limitations like the frame rate, size of the video frame, resolutions, and much more.
  • This API was provided as part of HTML 5, whereas the other two APIs are explicitly offered for WebRTC.

RTCPeerConnection

  • Send the captured stream of audio and video in real-time across the internet to another WebRTC endpoint. Using these APIs enables users to transmit audio and video captured by getUserMedia to other peers.
  • Has features to connect to a remote peer, maintain and monitor the connection, and close the connection once after it's done.

RTCDataChannel

  • Transmit arbitrary data. Each data channel is associated with an RTCPeerConnection.
  • Built-in security (DTLS) and congestion control.

Security

One of the security risks in any real-time communication application may raise during the transmission of data. Eventually, encryption is a mandatory feature of WebRTC and is enforced on all components.

WebRTC uses two standardized encrypting protocols:

Datagram Transport Layer Security (DTLS)

  • A standardized protocol that is built in a browser. It's used to encrypt data streams. It is based on the Transport Layer Protocol (TLP).
  • Preserves the semantics of the transport because DTLS uses User Data Protocol (UDP).
  • It is an extension of Secure Sockets Layer (SSL); any SSL protocol could be used to secure WebRTC data allowing end-to-end encryption.

Secure Real-time Transport Protocol (SRTP)

  • Used to encrypt media streams.
  • It is an extension to Real-Time Transport Protocol (RTP), which does not have any built-in security mechanisms. Adds protection, integrity, and message authentication to the RTP.
  • Downside: While it provides encryption for the RTP packets, it does not encrypt the header.
  1. Initiates the signaling process exchanges the metadata between two peers.
  2. ICE check is performed, and ICE establishes a channel between parties.
  3. DTLS handshake is performed. If there are media transported, the SRTP uses the keys that were exported at the DTLS handshake step.
  4. All peers have secure channels with keys that are not known publicly.
  5. Exchange Keys between the peers.

Applications that use WebRTC

  1. Google Meet/ Google Hangout
  2. Facebook Messenger
  3. Discord
  4. Amazon Chime
  5. ....

For more, you can check out this link for a list of apps that use WebRTC http://www.webrtcworld.com/webrtc-list.aspx