Keywords

1 Introduction

Historically, a real time communication, has only been available to large companies who could afforded to pay expensive licensing fees, and/or to buy specific proprietary depend HW. Since that, videoconferencing systems evolved to a low cost standard-based technology that is available to the general public. Besides the audio and video communications, current videoconferencing systems also support variety of other services, including instant messaging, documents sharing, screen sharing, video recording, etc. Technological developments in the 2010 s have further extended the capabilities of videoconferencing systems for use not only for in the static environment but in the mobile environment as well.

A videoconferencing technology typically employs both audio and video channels jointly interconnecting two (point-to-point) or several users (multi-point) located at different sites. The multi-point communication is possible via a unit called Multipoint Control Unit (MCU) that represents a bridge interconnecting the communicating users (similarly to the audio conference call). Either all participating users can call the MCU, or the MCU can call the involved users itself. A MCU can be characterized according to the number of simultaneous supporting calls, its ability to provide transposing of data rates and protocols, or the way it is implemented (stand-alone HW devices, vs. embedded unit in a dedicated videoconferencing system). Certain videoconferencing platforms employ a standards-based H.323 technique known as “decentralized multi-point” where each involved end-point device in a multi-point communication exchanges video and audio directly with other end-point devices without using a MCU.

Videoconferencing systems can be operated in two modes: (a) Voice-Activated Switch (VAS), or (b) Continuous Presence. In the VAS mode, the MCU switches end-point devices in such way that it can be seen by the other end-point devices by the levels of one’s voice. For an example, if there are four users in a video-conference, the only one that will be displayed is the currently talking user. In the Continuous Presence mode, all users are displayed at the same time; the user streams are assembled together by MCU into a single video stream that is transmitted to all users.

Though, the videoconferencing systems gain increasing popularity, there are still several issues the systems have to face, among them: complexity (not all users are technically skill, users prefer a simple interface and usage), interoperability (not all systems can readily be interconnected - there are different standards, features and qualities), or bandwidth (it is not always possible to have a high quality connection to support a good quality conferencing communication).

In general, the videoconferencing systems can be classified into the following three categories:

  • Dedicated systems: these systems have all required components packed into a single platform (console), which contains all necessary interfaces, codecs, and control unit.

  • Desktop systems: they are designed as add-ons (HW boards or SW codecs) to common PCs and laptops, and they transform them into videoconferencing platforms. A range of different cameras and microphones, which contains the necessary codecs and interfaces, can be used.

  • WebRTC platforms: they represent videoconferencing solutions that are available through standard web browsers. Typically, a browser uses the local camera and/or microphone to establish the connection.

Current approaches supporting real time communications are based either on utilizing a separate application or on employing a specific plug-in (e.g., Flash plug-in). By using a separate application means leaving the web browser and launching a dedicated application, i.e. the browser content and the real-time content are independent. The plug-in solution provides a tighter integration between the real-time and browser contents. However, plug-ins are proprietary solutions and they do not work in all environments. Contrary, the WebRTC platforms include all necessary audio and video components to support the real-time communications directly via web browsers.

The rest of paper is organized as follows. The next section briefly describes basic features of the WebRTC technology, including related works. A comparison of selected real-time communication platforms with WebRTC is provided in Sect. 3. Impact on the multiplexing server’s CPU load and memory requirements for different number of communicating end-point devices are analyzed in Sect. 4. Finally, Sect. 5 concludes the paper.

2 Web Real-Time Communication

2.1 Basic Features

Web Real-Time Communication, or WebRTC, is a specification plug-in free, real time communication via a web browser. It provides communication protocols and defines application programming interfaces allowing two or more users to mutually communicate in real-time. The WebRTC technology is being standardized by World Wide Web Consortium (W3C) organization at the application level and by Internet Engineering Task Force (IETF) organization at the protocol level.

WebRTC is composed of three main elements:

  • Web browser: To send and receive audio and video streams directly from a web browser, the web browser has to be enhanced with capabilities for controlling the local audio and video elements at the device at which the web browser is running (e.g., a PC or smartphone).

  • Web application: A user is typically asked to download a Java script from a web server. This script runs locally at the user’s device and interacts with the web server.

  • Web server: The server provides the Java scripts for users and executes the necessary application logic.

Initially, the WebRTC technology was designed to provide a real time communication between two users. Later on, it was enhanced with the multi-connection support. There are 2 approaches how to support the multi-connections: (a) centralized mode, and (b) fully meshed mode. In the first case, a user agent is in charge of dialog and mixing of media streams. The assumption here is that only the host mixes streams and sends them to each end-point device; the host could be a dedicated server or a standalone device with sufficient computational power. To join the videoconference call, a new end-point device has to notify the host. In the centralized mode, the communication fully depends on the host; if a host leaves the conference, all connections are disconnected. In the second mode, fully meshed model, end-point devices are interconnected to each other and any end-point device can invite and leave the call without effecting the other end-points.

The WebRTC technology can be nowadays integrated within a SIP-based system and thus offering additionally functions [1, 2]. A videoconferencing architecture can include a unit which allows legacy SIP user agents and SIP WebRTC clients to interoperate. A real-time chatting tool based on HTML5 and WebRTC can also be proposed; it includes a basic information management module and a communication module allowing text communication by means of a dedicated server and voice/video sessions through a point-to-point connection between web browsers.

2.2 Related Works

The WebRTC technology has received a lot of attention by researchers couple of last years.

In [3], a study of WebSockets and WebRTC usage from the point of mobile device power consumption in LTE networks is provided. Based on realized power consumption analysis, a couple of recommendations for standardization process are provided by authors.

Performance of WebRTC’s adaptive video streaming capabilities in an IEEE 802.16e network environment is evaluated in [4].

The applicability of HTML5 and WebRTC technology for a P2P video streaming is investigated in [5]. The work describes a Bit Torrent P2P video streaming solution which is used to identify the performance bottlenecks in WebRTC-based P2P video streaming implementations.

In [6], the WebRTC-based communication performance is evaluated for case of star and full-mesh network topologies, with focus on the performance of the congestion control mechanism used with WebRTC.

A feasibility of live video streaming protocols via the WebRTC technology is investigated in [7]. The experiments illustrate a possibility to implement a pull-based P2P streaming protocol with WebRTC, at least, for small-scale P2P networks.

A performance of WebRTC on mobile devices, while taking into account different type of mobile devices, wireless network connectivity and web browser configurations, is evaluated in [8].

In [9], authors present a benchmarking tool, call WebRTCBench, which measures WebRTC peer connection establishment and communication performance. Authors discuss performance of the WebRTC technology across a range of implementations and devices.

WebRTC communications over the LTE mobile network simulated by NS-3 tool is investigated in [10]. Several multimedia WebRTC streams between the two end-point devices are analyzed and empirical CDFs of typical performance figures including throughput, jitter, and packet loss are derived under different LTE scenarios.

2.3 Security and Identification

As to the WebRTC security aspects, the WebRTC stack includes Datagram Transport Layer protocol that is designed to prevent eavesdropping and information tampering and to support real-time data encryption and association management service to Secure Real-time Transport Protocol.

An issue with a traditional desktop SW is whether a user can trust the application itself [11]. The installation of a new SW or a plug-in can potentially scrumptiously install a malware or other undesirable SW. Typically, users have no idea where the SW was created or from whom they are downloading the SW. Thus, a malicious third party has possibility in repackaging perfectly safe and trusted SW to include malware, and offering this package on free SW websites.

Advantageously, WebRTC is not a plug-in, nor is there any installation process for any of its components. The WebRTC platform is simply installed as part of WebRTC compatible browser (e.g., Chrome or Firefox). Therefore, there is no risk of installing malware or virus through the use of an appropriate WebRTC application. Anyway, WebRTC applications should still be accessed via a HTTPS website signed by a valid authenticator.

Moreover, the WebRTC technology is done in such way that a user is explicitly asked a permission for the camera or microphone to be used. As the permission can be done on one-time or permanent access base, a WebRTC application cannot arbitrarily gain access or operate any device. Furthermore, if the microphone or camera is being used, the user interface is required to expressly show the user that the microphone or camera are being operated.

It is desirable that a user is able to verify the identity of their communicating partners, i.e. users naturally wants to be certain that they are communicating with users they believe that they are speaking to, and not to an imposter. The authentication of peers has to be performed independently from the signaling server as the signaling server itself cannot be trusted. To do this, web-based identity providers can be utilized, such as Facebook Connect, BrowserID, OAuth, etc. The role of these providers is to verify the user identity to other users, based on the authority of identity provider itself. This allows users to tie their authentication on other services to their main account on a trusted service. The implementations of each identity provider may differ due to independent development by different companies rather than being based on an open-source standard, but the underlying principle and functions remain the same.

3 Comparison of Videoconferencing Platforms

In this section, we provide a brief comparison of selected real-time communication platforms and we highlight differences between them and the WebRTC technology.

3.1 Skype

Comparing to WebRTC, Skype is an application. Skype is available for different platforms and provides video and audio services between two or more users. Additionally, users may exchange digital documents (images, text, video and any others), and text and video messages.

3.2 Hangouts

Hangouts is a communication platform, developed by Google, supporting instant messaging, video, audio, and SMS between two or more users. The service can be accessed online (Gmail or Google+ websites) using either PCs or mobile phones. Comparing to the WebRTC technology, Hangout uses a proprietary protocol, although with WebRTC elements.

3.3 WebEx

WebEx Meeting Center is a part of commercial solution developed by Cisco. It is a web conferencing platform delivered through the Cisco Collaboration Cloud. The stream processing is done in the cloud and the multiplexing of media streams is not at the charge of end-point device. It uses a proprietary algorithm named Universal Communication Format that deals with a large range of media.

3.4 Jitsi

Comparing to previously mentioned solutions, Jitsi is a free and open source multi-platform supporting video, audio, and instant messaging. It is based on the OSGi framework (Collection of Java Library). Jitsi is not a web-oriented, a user is asked to download and install specific SW to run it. In terms of functions, libraries are not as rich as the WebRTC ones but the main abilities are effective. Similarly to WebEx, the media stream processing could be done at a server instead of at the end-point device.

The Jitsi videobridge, a videoconferencing solution supporting the WebRTC. It’s a Selective Forwarding Unit, a centralized server, which is forwarding selected streams to other participating users in the videoconference call.

3.5 AnyMeeting

AnyMeeting is a web conferencing and webinar services allowing users to host and to attend web based conferences and meetings and to share their desktop screens with other users via web browser.

The above mentioned platforms are summarized and compared in Table 1. The comparison takes into account aspects such as the place of stream multiplexing, set of functions (text chat, screen sharing, etc.), web-browser compatibility, and multi-connection support.

Table 1. Comparison of videoconferencing platforms.

4 Measurements and Results

In this section, we investigate impact on the multiplexing server’s CPU load and memory requirements for different numbers of used end-point devices; distributed processing issues are discussed in more details for example in [12, 13]. We set up a testbed that considers 3 PCs with various HW and SW configurations as indicated in Table 2. The configuration of the server is indicated in the table as well. Each participant receives a return video with multiplexed streams of each participant in one video stream, i.e. the end-point device has to decode one flow instead of one flow per end-point device. The multiplexing is provided by Kurento media server [14, 15].

Table 2. Parameters of PCs.

In total, there have been tested 28 scenarios, which differs in the number of involved end-point devices, PCs (1, 2, and 3), and display resolution configuration (160\(\,\times \,\)120, 320\(\,\times \,\)240, 640\(\,\times \,\)480, and 1280\(\,\times \,\)720), see Table 3. Notice that the configuration id is constructed in such way that the first digit indicates PC (using Table 2) and the second digit is associated to the display resolution; for example the configuration 1222, refers to two interconnected end-point devices (PC1 and PC2) where the display resolution is set to 640\(\,\times \,\)480. The resolution of the return video, for mono user configuration is used 800\(\,\times \,\)600 and for the other cases 200\(\,\times \,\)150 video per user is sent back to the end-point device. The PC1 and PC2 use all the available bandwidth, while the data rate of PC3 is limited to 600 kB/s.

All measurements are done by using Firefox browser, and the values are given as an arithmetic average of ten consequent measurements (which is a common way to evaluate a CPU load).

Figure 1 shows the server’s CPU load for different testing scenarios (please notice that the server’s CPU has 4vCPU, therefore the load can theoretically reach up to 4\(\,\times \,\)100%). As can be observed, the server’s CPU load is the lowest for the PC3, where the bit rate is the lowest. As expected, the bit rate and therefore the video quality has a considerable impact on the server’s CPU load. For increasing number of involved end-point devices, the server’s CPU load increase as well (scenarios 102030–132333). From the point of display resolution, the server’s CPU load strongly depends on the resolution and bit rate, often given by the HW/SW configuration of involved end-point devices. Figure 1 indicates that about one CPU per end-point device is needed.

The impact on server’s RAM is illustrated in Fig. 2. As we can expect, the requirements on the RAM are directly proportional to the increasing number of communicating end-point devices, more or less no matter what is the HW/SW configuration of these end-point devices. Whereas, the display resolution has a negligible impact on the memory requirement.

Scenarios with one PC are special cases that illustrate the memory and CPU consumption when no multiplexing is performed, i.e. the scenarios represent the load needed for decoding and encoding of the video flow.

Table 3. Testing configurations.
Fig. 1.
figure 1

Server’s CPU load.

Fig. 2.
figure 2

Requirements on the size of server’s RAM.

5 Conclusion

In this paper, we have discussed basic characteristic of the WebRTC technology, and compare the technology with some other exiting real-time videoconferencing systems. Additionally, we have evaluated impact on the multiplexing server’s CPU load and requirements on RAM for different numbers of end-point devices running WebRTC sessions. In total, we have tested 28 different configurations. The obtained measurements illustrate a strong relation between the video resolution and bit rate of the involved end-point devices and the server’s CPU load. The requirements on server’s RAM are than directly proportional to the number of involved end-point devices, no matter what is the HW/SW configurations and considered display resolutions. Obtained results illustrate that the server has to be dimensioned in such way that about one CPU has to be considered per one end-point device.

The current scenarios only include PCs. In our next work, we plan to take into account a more heterogonous environment and to consider as end-points mobile devices as well.