1 Introduction

The COVID-19 pandemic forced millions of people worldwide to stay at home [19]. As a result, meetings, classrooms, and private events were held online, and the demand and interest for online video conferencing systems increased [42]. In April 2020, Zoom Video Communications reported that their number of users had significantly increased due to the pandemic. While Zoom had 10 million daily users in December 2019, it had 200 million daily users in March 2020. Besides Zoom, there are numerous other video conferencing services, such as Microsoft Teams, Webex by Cisco, Google Meet, Skype, Zoho Meeting, BlueJeans, LifeSize, Whereby, and many others [30].

Security of Video Conferencing Systems. With the rise of video conferencing systems, security and privacy concerns grew. In April 2020, Google, SpaceX, and others banned Zoom over privacy concerns regarding its end-to-end encryption (E2EE) [22, 38, 45]. To eliminate vulnerabilities and increase security, several video conferencing providers, such as Zoom Video Communications, Inc. (Zoom) and 8x8, Inc. (Jitsi Meet), started bug bounty programs [1, 48]. This measure bears fruit; Zoom received 401 reports and awarded $1.8 million in 2021 [12].

The demand for security and privacy in conferencing technologies also led to open-source video conferencing systems gaining popularity. Open-source software allows one to analyze the source code and self-host conferencing servers, which requires know-how but has the advantage that data remains on known servers. This is especially important in deployments that need to comply with European regulations for the protection of personal data. For example, in 2022, Germany’s federal state Rhineland-Palatinate forbade the usage of Microsoft Teams in schools because it is not compliant with the General Data Protection Regulation (GDPR) [16].

Towards Systematic Security Analysis of Video Conferencing Systems. Despite the importance of security and privacy in video conferencing systems, there is, to our knowledge, no systematic research on the security of video conferencing systems yet. To gain insights into the attack surface associated with video conferencing systems, we selected two open-source systems widely used in research and education: BigBlueButton and eduMEET.

BigBlueButton has been developed since 2007 with the goal of giving teachers and researchers the ability for a new style of hybrid teaching where BigBlueButton should serve as an online classroom [27]. Similarly to other video conferencing systems, BigBlueButton gained popularity during the COVID-19 pandemic. It has, for example, become the primary mode of communication and learning in schools in France [7]. The recommendation was issued by the French Ministry of Education, which is responsible for 65,000 schools serving 12 million students. Even after the pandemic, BigBlueButton remained the recommended education tool in France [5] and several German federal states [16, 46].

eduMEET was released in April 2020 by GÉANT, a European research network [15]. According to GÉANT, the release was rapidly accelerated due to lockdown measures and the need for an alternative and trustworthy video conferencing solution [26]. GÉANT’s main arguments for having their video conferencing system were that it is from their community, self-hosted, and therefore the traffic stays within their network. Thus, they consider the tool trustworthy and cost-efficient compared to commercial alternatives [15].

To analyze the security of the chosen video conferencing systems, we must first understand how both systems are composed and which features they provide. This leads us to our research questions:

  1. RQ1

    Which architecture concepts do BigBlueButton and eduMEET follow?

  2. RQ2

    What are the common features and user roles, and how are permissions assigned to individual features?

  3. RQ3

    What types of attacks result from the given architecture, features, and user roles?

Our Approach. To perform a systematic analysis of a video conferencing system, we need to know how it is structured. That includes its components, as well as their responsibilities and tasks. Therefore, we break down the complex structures of each system to form shared components with their main tasks. Furthermore, we examine the connection between features, permissions, and user roles that are common to video conferencing systems. Using this information, we define our attacker model and use it to perform a source code analysis, for which we follow the data flow within and between components. Thus, we can check whether the components adhere to their responsibilities and correctly enforce user permissions as assigned by the user roles.

Results. Besides both being web video conferencing systems with similar user roles, the architectures of BigBlueButton and eduMEET differ drastically. BigBlueButton has many features with a very complex structure, while eduMEET is more minimal in comparison. Our architecture analyses laid the groundwork for the systematic security analyses of both conferencing systems; we found 7 vulnerabilities and 7 bugs. Among them are classic security flaws like broken access control, NoSQL injection, and DoS, but also vulnerabilities that are feature-specific and could be detected due to our in-depth architecture analyses.

Contributions. In summary, we make the following contributions:

  • We provide a structured security analysis of two modern open-source video conferencing systems: BigBlueButton and eduMEET.

  • We present a common structure of both systems and introduce their main components, features, and user roles.

  • With our security analyses, we were able to identify 57 vulnerabilities and bugs. These range from attacks targeting confidential meeting chats, participant lists, and streams to impersonation and DoS attacks.

Responsible Disclosure. We responsibly disclosed all vulnerabilities and bugs to the developers of BigBlueButton and eduMEET.

2 Background

In this section, we cover WebRTC, which both analyzed video conferencing systems use as their method for real-time audio and video transfer.

2.1 WebRTC

WebRTC [8] is a suite of protocols for real-time communication (RTC) over the Internet. For web applications, it defines a JavaScript API to access media devices and to manage WebRTC connections. WebRTC supports media streams and message-based transfer of arbitrary data. It supports peer-to-peer (P2P) connections, where two users exchange data directly, without the data flowing over a server.

Before two peers can establish a direct WebRTC connection, they need to exchange information using a signaling server. They negotiate the initial media streams, with configuration such as codecs and bitrate, in the form of a Session Description Protocol (SDP) offer and answer [6]. These also contain the information needed for opening the direct connection, including NAT traversal (Interactive Connectivity Establishment (ICE) candidates). Once a direct connection has been established, the peers can transfer the negotiated streams. WebRTC currently supports only one media transport protocol, DTLS-SRTP [25].

2.2 WebRTC Architectures in Conferencing Systems

In a typical conferencing setting, a group of users exchanges media data, for example, audio and camera streams. Using a P2P architecture for broadcasting media streams minimizes latency and avoids server bandwidth overhead. However, this approach does not scale well due to the limited bandwidth of end users. Therefore, using a P2P architecture is often infeasible for conferences.

Instead of using a P2P architecture, conferencing systems implement servers that can receive and redistribute the media streams for each user. There are two types of architectures a WebRTC server can follow. In the Selective Forwarding Unit (SFU) architecture, the server distributes incoming streams unmodified. If the server processes and combines incoming media streams, the architecture is called Multipoint Control Unit (MCU). This lowers the bandwidth requirements for the clients in exchange for processing on the server.

3 Analysis Method

Due to the lack of systematic analyses of video conferencing systems, there is no methodology for us to use and adapt. Thus, we started developing an approach that was further refined during our analysis. The structure of our paper reflects the steps of our analysis.

In the first part of this section, we outline the procedure for analyzing the architecture and user roles of the chosen systems. The second part of this section deals with the structured source code analysis. We assume attackers can reach a conferencing server over the network, with the server operator being a trustworthy party. The source code analysis requires a detailed attacker model based on the architecture, so we defer the detailed attacker model to Sect. 6.

3.1 High-Level Analysis

In the primary analysis, we use the respective documentation and the publicly available source code to get a broad overview of the respective system. Getting an overview helps to assess the complexity of the system, as well as understanding the functionality and use case for the video conferencing system (e.g., education). We divide the primary analysis into the following steps.

Architecture and Components (RQ1). The first step contains the architecture of the respective system (see Sect. 4). Next to the main components of the architectures, such as web client and server, we are especially interested in the WebRTC components and the messaging between components since these aspects facilitate the understanding of the systems the most.

Conferencing Features (RQ2). Then, we look at the features that each system offers (see Sect. 5.1). The features are needed for our analysis because each feature interacts with a meeting in a certain way (e.g., removing users from a meeting). These interactions are mostly limited to certain groups of users, such as moderators, and therefore require access control.

User Roles (RQ2). Finally, we check the user roles and permissions (see Sect. 5.2). We map these to the features that we gathered in the previous step. This allows us to get an overview and understanding of the system, which facilitates a more detailed source code analysis.

3.2 Source Code Supported Security Analysis

In our source code analysis, we chose a manual approach since automation does not work in our case (see Sect. 8.3). As the first step, we perform a detailed analysis of the implementation. This shares commonalities with the primary architecture analysis, but we now focus on the internal implementation of each component. We manually validate that the implementation matches the documentation and our understanding of the features. This step also results in the identification of internal assumptions, for example, which parts of internal messages the components treat as trustworthy. All components have the responsibility to satisfy these often implicit internal assumptions.

Because almost all server logic gets triggered by user actions, we perform a data flow analysis on each possible user action. We confirm the overall behavior in practice, for example, via the browser’s developer tools. During this data flow analysis, we consider the responsibilities of each component (e. g., access control on the conferencing server). Whenever it is not certain that an aspect of the conferencing system correctly adheres to the responsibilities, we need to investigate further.

When investigating a potential vulnerability, we may move directly to building a proof of concept. Otherwise, we may also re-evaluate whether it is handled elsewhere than expected. In either case, we conclude when we have either demonstrated an exploit or have complete reasoning for the behavior to be correct.

4 Architectures of the Analyzed Open-Source Conferencing Systems (RQ1)

We answer RQ1 by analyzing the architectures of BigBlueButton and eduMEET. From both systems, we first derive a shared architecture that gives a high-level overview of common components and outlines their main tasks, which not only helps understanding the analyzed video conferencing systems, but also might facilitate future work. Then, we describe the implementation specifics of BigBlueButton and eduMEET. Finally, we compare the feature sets they offer to users.

4.1 Shared Architecture

We first focus on commonalities of the analyzed video conferencing systems by deliberately abstracting from specific features of BigBlueButton and eduMEET. This results in the architecture of a video conferencing system with minimal features. Figure 1 shows a summary of the components of such a video conferencing system. In the following, we describe the main components of the shared architecture. Then, we describe how each analyzed system implements each component with its uniqueness.

Web Client. The web client is responsible for three main tasks. The first main task, closest to the user, is rendering the user interface (UI). The UI allows users to interact with the meeting. The web client updates the UI in response to interactions triggered both by the local user and actions by other users in the meeting. Such actions include a user enabling their camera or sending a chat message. The web client is also responsible for displaying the conferencing system’s features, such as video chat. The features depend on the conferencing system (see Sect. 5.1).

Fig. 1.
figure 1

Shared architecture of the analyzed video conferencing systems, showing the three components “web client”, “conferencing server”, and the “WebRTC component” with their main tasks. Arrows represent communication, and media streams are marked in green. The dotted arrows mark the creation of the WebRTC connection. The cylinders represent data storage.

The second main task of the web client is handling media streams. This includes establishing a WebRTC connection (see Sect. 2.1), where one peer is the web client and the other peer is the WebRTC component. Once a streaming session has been established, the peers can start sending and receiving media data. On the client side, incoming media streams are connected to the UI, where the videos are displayed. The client also displays its outgoing video streams.

The third main task is processing and sending meeting state updates. When a user performs actions in the UI, the client sends user actions, i. e., intended changes to the meeting state, to the server. If the user’s intended change is valid, the server notifies the web clients of the changes to the meeting state, which we refer to as an event. When a web client receives an event, it processes it and updates the local state in near real-time. Possible events include receiving new chat messages, changes to permissions, muting audio, or starting and stopping a video. The possible events depend on the features of the conferencing system.

Server-Side Components. The analyzed conferencing systems consist of two server-side components: the conferencing server and a WebRTC component.

Conferencing Server. The first task of the conferencing server is processing incoming user actions. This involves three main steps. First, the server performs access control by checking whether the user may perform the requested user action. Second, if the action is valid, the conferencing server executes it. This may involve additional processing by the server and results in changes to the meeting state. Finally, the server publishes events to the clients.

The second task of the conferencing server is managing streaming sessions. For this purpose, it controls the WebRTC component, which may be an external media server or embedded in the conferencing system as a library. The conferencing server participates in establishing streaming sessions by creating them in the WebRTC component and providing communication between the client and WebRTC component for the initial negotiation. The conferencing server is responsible for access control by mediating the initial communication. After the negotiation, the WebRTC component and client establish a direct communication channel, and the conferencing server can no longer mediate. If the permissions of a user get revoked, the conferencing server is responsible for closing streaming sessions via the WebRTC component’s management interface.

WebRTC Component. Finally, there is a WebRTC component with loose coupling to the conferencing server. The WebRTC component relies on commands by the conferencing server for management and has the task to establish streaming sessions. The second task of the WebRTC component is to route media streams. The conferencing systems covered here use the SFU architecture for all video streams, so the server redistributes media streams unmodified (see Sect. 2.2).

4.2 Implementation of BigBlueButton

In this section we show how BigBlueButton’s components implement their tasks.

Web Client. BigBlueButton uses the frontend framework ReactFootnote 1 for its UI. React does not provide any communication between the server and client. The server and client of BigBlueButton use the web framework Meteor.js to facilitate communication, which provides remote procedure call (RPC) and publish/subscribe capabilities. Internally, if possible, Meteor.js uses a WebSocket for communication. Using the publish/subscribe capabilities of Meteor.js, the client mirrors the meeting state of the server and receives state changes triggered by user actions. Therefore, the web client of BigBlueButton only needs to perform limited state management.

Server-Side Components. The server side of BigBlueButton is split into a conferencing server and two standalone servers for WebRTC.

Conferencing Server. The conferencing server of BigBlueButton is internally split into several individual components. It receives user actions on the WebSocket connection provided by Meteor.js and routes them within the conferencing server. At the destination, a handler performs access control checks and updates the meeting state. These updates to the meeting state are propagated internally. The conferencing server keeps a copy of the meeting state in a MongoDB database and uses the publish/subscribe mechanism of Meteor.js to pass change events to the clients, with access control in the publishing step.

For managing media streams, BigBlueButton interacts with its two media servers. The web client has the initiative to open media streams for its outgoing and incoming streams. For the audio conference, BigBlueButton does not mediate signaling between the client and the WebRTC component but instead relies on the client’s knowledge of a five-digit voice conference number for access control. For video streams, the server performs a permission check when clients want to open a stream. When clients get removed from the meeting, the server component reacts to the event by closing their video streams.

WebRTC Component. BigBlueButton 2.3.3, the version analyzed here, uses two media servers: FreeSWITCH, and Kurento Media Server (Kurento).Footnote 2 The voice conference of meetings is handled by FreeSWITCH, with clients directly connecting to FreeSWITCH to perform media negotiation. The video streams are handled by Kurento, with the conferencing server mediating media negotiation. BigBlueButton also uses Kurento to relay the voice conference to participants who only listen. The conferencing server communicates with both media servers for access control and necessary configuration, for example, media routing.

Extensions to the Shared Architecture. BigBlueButton does not provide user management but instead relies on external software to integrate BigBlueButton’s meeting functionality, for example, GreenlightFootnote 3 or Moodle.Footnote 4 The 3rd-party application performs authentication and access control for joining meetings. For this purpose, BigBlueButton provides a custom HTTP management API. A shared secret between BigBlueButton and 3rd-party server applications controls access to the API.

For processing uploaded presentation slides, BigBlueButton uses several external programs, depending on the file type. The resulting files are made available to other clients from disk. BigBlueButton allows users to record the meetings. If a meeting is recorded, it stores audio and video recordings and the internal messages of the entire meeting as files. BigBlueButton embeds an instance of the collaborative text editor EtherpadFootnote 5 to implement its shared notes.

The media negotiation between the client and FreeSWITCH extends our general model as it is not mediated by the conferencing server. This allows server operators to connect FreeSWITCH to an external telephony provider via Session Initiation Protocol (SIP), allowing users to join the conference by telephone.

4.3 Implementation of eduMEET

Web Client. For its web client, eduMEET uses the frontend framework React. For maintaining the meeting state on the client, eduMEET uses Redux,Footnote 6 a JavaScript library for state management in web applications. It uses a store that holds the application state. The application state gets updated when user actions or events are dispatched. User-triggered actions can either modify the meeting room (e. g., locking the room) or change the user settings. The client may pass these user actions to the server using a WebSocket. To establish and manage a WebRTC streaming session, the client uses the mediasoup client library.

Server-Side Components. The server side of eduMEET is split into a conferencing server, which consists of an ExpressFootnote 7 web server with WebSocket support, and the Node.js library mediasoup for WebRTC.

Conferencing Server. The conferencing server handles all incoming connections and user authorization. Because of the WebSocket support in the Express web server, WebSocket handlers and HTTP request handlers share access to a session object for all requests from the same client. The WebSocket are attached to a peer object representing a user. The peer object contains relevant information such as user roles, a unique peer ID, a room ID, and a socket. Any modification to a peer object is done via the peer ID, which references the peer object in a dictionary.

WebRTC Component. For media handling on the server, eduMEET uses the Node.js library mediasoup, a layer of JavaScript that communicates with a set of C/C++ subprocesses. The internal architecture of mediasoup has its own terminology, which contains workers, routers, transports, producers, and consumers [11]. When a new user joins, the client and the conferencing server create a producer instance. The conferencing server then notifies the other peers and creates a consumer instance for each. The notified peers create local consumer instances for themselves.

Extensions to the Shared Architecture. The first additional component in eduMEET is a torrent tracker for its file sharing feature (see Sect. 5.1). It keeps track of users participating in upload and download, helping users to connect to each other. In the web client, eduMEET uses the WebTorrentFootnote 8 library, which uses WebRTC for peer-to-peer communication. Furthermore, eduMEET uses the PassportFootnote 9 module for external authentication strategies. Depending on the authentication strategy, new components might arise, for example, an Identity Provider for OpenID Connect (OIDC) [29].

5 Features and User Roles (RQ2)

In this section, we first compare features the analyzed conferencing systems offer. We then present user roles shared by both analyzed conferencing systems. Finally, we go into detail on how each of the analyzed video conferencing systems handles user roles, permissions, and the mapping to features.

5.1 Comparison of Features

Table 1 shows an overview of the features of both systems. While there is some overlap, there are also several features specific to BigBlueButton or eduMEET. Features specific to BigBlueButton are, for example, polls or shared notes. On the other hand, eduMEET offers file sharing, which is not implemented in BigBlueButton.

Some features require additional libraries or application logic. Other features require extending the conferencing system with new components, which are either controlled by the server operator or an external entity. Components can be additional servers or important libraries that play a vital role in the video conferencing system (e. g., mediasoup in eduMEET). A component controlled by the server operator is, for example, a WebRTC media server. A torrent tracking server for file sharing would be an example of a component that is controlled by an external entity (see Sect. 4.3). Importantly, additional features and components introduce a new level of complexity and a broader attack surface.

Table 1. Conferencing features supported by BigBlueButton and eduMEET, with their required roles. Several features are present both in BigBlueButton and eduMEET, while others are only supported by one. The table lists which role a user needs to actively use a feature, where the role “everyone” includes users without access to the meeting. Note that some features are accessible by multiple user groups.

5.2 User Roles

Common User Roles. BigBlueButton and eduMEET use user roles combined with permissions for their access control; users who participate in meetings have different roles, which give them permission to access or use certain features. Such permissions allow users to share their audio or video, or give users access to moderation features.

The analyzed conferencing systems have two main user roles in common: “viewer” and “moderator”. The viewer role, also referred to as “normal” in eduMEET, gives users basic permissions and allows them, for example, to send and receive media streams. The moderator role allows for managing the meeting room, the users, and access to other features. Furthermore, depending on the features of the respective conferencing system, we can differentiate between users in a waiting room (or lobby) and users in a meeting. Oftentimes, restrictions like this are not implemented by creating new user roles, but rather using properties or flags that are part of the user objects. Thus, two viewers might have different permissions or access to different features. For example, one user with the viewer role might be in a waiting room and cannot receive audio and video streams from other users, while other users with the same role do not have these restrictions because they are already in the meeting.

Because user roles and the associated permissions are heavily influenced by features and the current meeting state, access control is a complex topic. The following sections explain the details of each analyzed conferencing system. Table 1 gives an overview of the requirements to access individual features. Some features have additional requirements besides the user role; for example, regular users in BigBlueButton may only draw on the whiteboard if the presenter has given them permission.

BigBlueButton-Specific User Roles. In addition to the viewer and moderator role, every meeting has at most one presenter, who gets permissions related to a presentation area in the meeting. These additional permissions are limited to the presenter; other users, including moderators, cannot affect the presentation area.

Permissions may depend on context. Within breakout rooms, there is no distinction between moderators and viewers, and all users can interact with the meeting as viewers. BigBlueButton has a guest waiting room, allowing moderators to limit access to the meeting until they approve new users. Users calling in via telephone do not have access to the web interface of BigBlueButton and thus only have access to a very limited set of user actions.

BigBlueButton also allows moderators to “lock” viewers and presenters, to take away specific permissions. One may use this to aid in the moderation of large meetings or for specific use cases, like online exams, where participants should not see each other.

eduMEET-Specific User Roles. In eduMEET, one can use a configuration file to define new roles and to assign specific permissions. This also permits changing existing assignments of roles and permissions. The default configuration contains the roles “normal” (here referred to as “viewer”), “moderator”, and “admin”. A user can have multiple roles.

A moderator can kick users, disable audio, video, and screen sharing for users (which the user can activate again), take down raised hands, clear the chat, and end the meeting. Furthermore, a moderator can use the role manager to give and remove roles during a meeting. Each role has a “promotable” flag, which determines whether moderators can give and take the role. In addition, each role has a configurable level. The level of a moderator must be at least equal to the role of the target user to modify the target user role. The admin role, which has the highest level, allows users to enter a full room or a locked room, which normally sends users to the room’s lobby to wait for approval. As long as no moderator is in the meeting, viewers can also lock and unlock the room. The permission is revoked as soon as a user with the moderator role joins.

6 Attacker Model

After performing the primary analysis, as mentioned in Sect. 3, we developed an attacker model that fits the setting of a video conferencing system.

We assume an attacker may send arbitrary network requests. They do not have access to any private information regarding the server or the users. The attacker cannot read or interfere with the network traffic of other users. The server operator is assumed to be entirely trustworthy. We do not impose conditions on the surrounding situation because conferencing systems are used to host various types of events. The attacker may be a viewer, presenter, or even moderator in a meeting. The attacker may also be a non-participant with no roles at all or be in the waiting room. The attacker may create their own meetings. During a meeting, users’ roles may change, so we also consider cases where a moderator revokes an attacker’s permissions.

We consider an attack successful if the attacker breaks any of the aspects of the CIA triad. The attacker breaks confidentiality guarantees if, for example, they join a locked meeting and retrieve sensitive streams or public chat content. The integrity of the meeting state is broken when an attacker oversteps the permissions of their role, by performing any action that modifies the meeting state in a way that they are not allowed to. This includes an attacker joining a meeting without permission. For availability, we consider an attack successful if the attacker performs DoS against any feature in any meeting, affecting any user other than the attacker themselves. We exclude DoS by resource exhaustion and only consider cases of DoS in the application logic, for example, an attacker blocking seats in a meeting.

We limit the scope of our analysis to the first-party code of BigBlueButton and eduMEET, respectively. External components and libraries are out of scope and thus deemed to be safe for the purpose of our evaluation. They are expected to conform to their documentation with configuration files as distributed with the conferencing systems. With our analysis, we target the server-side code because it implements the main application logic. For eduMEET, however, all clients take an active role in maintaining the meeting state, so we consider both the server and client side of eduMEET. BigBlueButton relies on the external framework Meteor.js to maintain the client state, which is out of the scope of this analysis. Since the server operator is fully trustworthy, we assume that additional configurations made by the server operator are secure.Footnote 10

7 Evaluation (RQ3)

We performed the evaluation on BigBlueButton 2.3.3 and eduMEET 3.5.0-beta.1, the most recent versions at the time of analysis.Footnote 11 Because our attacker model is relatively broad, we identified not only high-impact vulnerabilities but also several smaller vulnerabilities without significant impact on the meeting. To not overestimate their impact, we explicitly classify such vulnerabilities as “bugs”. Hereafter, we refer to vulnerabilities and bugs as “findings”.

Table 2. Summary of all findings in BigBlueButton (BXX) and in eduMEET (EXX). The final two columns denote which role a legitimate user needs to access the feature, and the role an attacker needs to perform the attack. The role “everyone” includes users without access to the meeting.

Our evaluation resulted in 45 findings in BigBlueButton (38 vulnerabilities and 7 bugs) and 12 findings in eduMEET (12 vulnerabilities). Table 2 gives a short description of each finding, provides the type of violation of the CIA triad (see Sect. 6), and assigns to each finding the features it affects (see Table 1). A finding may affect multiple features because some features share parts of their implementation, for example, the voice and video conference. If a finding is not related to any specific feature but the core implementation for meeting state and communication of the respective conferencing system, we use The last two columns of Table 2 show the user roles needed to perform the actions associated with the findings. Some of these actions are available as features to specific user roles, shown in the column legitimate roles, while others are not intended to be accessible.

As can be seen in Table 2, most of the findings in BigBlueButton are in the core implementation and in the video conferencing feature. The rest of the findings are distributed across the other features. In eduMEET, most of the findings are also in the core implementation and in the text chat feature.

7.1 BigBlueButton

In this section, we present five representative findings out of the 45 findings in BigBlueButton.

B1: Read Other Meetings’ Public Chat. This finding allows an attacker to access sensitive data from other meetings hosted on the same server.

To transfer chat messages from the conferencing server to the web client, BigBlueButton uses the publish/subscribe mechanism of Meteor.js (see Sect. 4.2). In particular, the client subscribes to a publisher called group-chat-msg, which always publishes public chat messages in their meeting and messages in private chats. The client establishes the subscription with a WebSocket message to the server. Listing 1.1 shows how the server restricts its responses in the publisher. In this query, the server inserts the meetingId of the meeting. The first branch only matches messages where the chatId value is set to "MAIN-PUBLIC-GROUP-CHAT", which means that the client is subscribed to the public messages in their particular meeting. The second branch matches all messages with a chat ID in the chatsIds array, which is a parameter sent by the client. However, missing validation of chatsIds resulted in the fact that the server can leak public chats of every meeting hosted on the server.

figure b

For the attack description, we assume an attacker who participates in any meeting hosted on a BigBlueButton server. The attacker has access to the public chat in their particular meeting. Using a modified client or browser developer tools, the attacker can modify the parameters their client sends to the server for the subscription. If the attacker adds "MAIN-PUBLIC-GROUP-CHAT" into the chatsIds list, intended for private chats (see Listing 1.2), their clients’ subscription applies to the public chat of every meeting hosted by the server; the publisher on the server provides the messages from the public chats of all meetings. The attacker thus gains access to all messages from the public chat of every meeting hosted on the particular server.

B2: Read Arbitrary Private Chats. This finding interacts with B1, increasing the impact of this finding. The publisher group-chat-msg is also vulnerable to NoSQL injection. The parameter chatsIds can contain arbitrary values supported by EJSON, an extension of JSON used by Meteor.js. The server does not check the value’s type. An attacker can modify the parameters their client sends to the server like in the previous attack. In particular, the attacker can set the publisher’s parameter chatsIds to , causing it to provide all messages from all public and private chats in all meetings on the server.Footnote 12

B4: Retain Full Access to Shared Notes After Leaving. This finding affects shared notes. BigBlueButton relies on the external server component Etherpad for shared notes. Thus, the conferencing server needs to ensure access control, including revoking access when a user loses access to the meeting. For this, the conferencing server includes checks when users make HTTP requests to Etherpad which reject all users without a BigBlueButton session and check whether the Etherpad pad belongs to the meeting that the user is in.

However, there is an issue with this process as Etherpad uses a long-running WebSocket connection for communication between the server and client. When a user leaves or gets kicked from a meeting, the conferencing server cannot close the WebSocket to Etherpad; an attacker can continue reading and editing the shared notes. In addition, the session used for the server-side check stays valid after leaving the meeting, so the server also allows new WebSocket connections to Etherpad.

B26: View Unshown Presentation Slides in Current Meeting. BigBlueButton relies on a client’s knowledge of a presentation ID for the client to download presentation slides for each uploaded presentation. However, the server leaks the presentation IDs.

The server sends the presentation IDs to clients so they can can display the slides, but inadvertently reveals them for all presentations in the meeting due to incomplete filtering. This allows an attacker in the meeting to view all slides that have been uploaded, including future slides in the currently chosen presentation and the slides of presentations that were uploaded but never shown to the viewers.

B34: Receive Audio and Screen Share After Leaving. The final finding described here affects the voice and video conference and the screen share feature of BigBlueButton. It allows the attacker to listen to audio and watch screen shares secretly, even after they leave the meeting.

We assume the attacker is a viewer in a meeting and leaves or gets kicked. While still in the meeting, the attacker can open multiple viewing sessions for each media stream with a modified client that sends additional requests. For the screen share and listen-only audio, the conferencing server only closes one of the sessions when the attacker leaves the meeting. The remaining sessions stay valid in the WebRTC component and only get closed when the screen share or meeting ends, respectively.

7.2 eduMEET

We explain three representative findings in eduMEET. In Sect. A.1, we cover an additional interesting yet more complex finding in depth (E3).

E1: Forge Malicious Chat Objects. This finding points to one of the root causes of several findings in eduMEET and allows a multitude of attacks. A client can send a chat message to the server as a chat message object in its WebSocket connection to the server. The server forwards this object to all other participants in the same room as long as the sender has the SEND_CHAT permission. An attacker can manipulate fields in messages they send to perform several attacks. In the following, we describe three possible attacks. First, the attacker can manipulate the name field, which is used to display the name of the sender. The attacker can abuse it to impersonate other users by changing the content. Second, the attacker can also manipulate the time field, allowing them to manipulate the chat conversation and send messages in the past or future. Third, the attacker can also set the name field to null or other invalid objects. This leads to a DoS attack against the receiving clients because the client does not expect other data types, and the errors are not handled, which leads to a crash in the application. Interestingly, when users affected by such a DoS attack try to rejoin the meeting, they are usually redirected to the index page instead of joining the meeting. This happens because joining users receive the chat and file history, which automatically repeats the attack. The attacker can stay in the room to prevent the room from resetting, effectively blocking the room indefinitely.

E2: Rejoin After Kick, Bypassing Locked Room. This finding allows an attacker to bypass the room lock, which can be used as a security mechanism to prevent other users from joining the room without approval. In this attack scenario, a moderator kicks the attacker from the room. Afterward, the moderator locks the room, which prevents users from joining the room without approval. The attacker is now not able to rejoin the meeting without further actions because the client generates a new peer ID and the server prevents new users from joining a locked room. However, the attacker can set their client’s peer ID to any value, for example, by overwriting the client-generated value with the browser developer tools. If the attacker sets their peer ID to their old peer ID when they were in the meeting, the server treats the attacker as a returning user, which allows bypassing a locked or even a full room. Therefore, the attacker can rejoin the locked room after getting kicked by changing the peer ID to the old peer ID.

E10: Prevent Getting Muted. This finding allows an attacker in a meeting to disrupt it without others being able to mute them. Moderators can mute participants for everyone (global mute). The affected user can still unmute themselves, so this is not a security mechanism. Participants can also decide to mute another participant for themselves (local mute). However, an attacker can circumvent getting muted by sending a request to create a second microphone producer and muting their first microphone producer. Other participants cannot globally or locally mute the attacker’s second microphone producer.

7.3 Responsible Disclosure

We reproduced all findings on unmodified instances of BigBlueButton 2.3.3 and eduMEET 3.5.0-beta.1. We worked in local environments to not affect real video conferencing deployments with their users. We reported the findings to the developers of BigBlueButton and eduMEET between July 2021 and May 2022. The developers of eduMEET thanked us for the findings but have not released fixed versions as of December 2023. The developers of BigBlueButton acknowledged the findings and started publishing fixes with BigBlueButton 2.3.9. As of December 2023, the developers have fully addressed 37 of the 45 findings and assigned CVEs to 14 of them (see Table 3). The remaining issues are still to be fixed.

8 Discussion

We discuss our findings from Sect. 7 by considering the potential root causes in the respective conferencing system. For this, we identify commonalities between the findings. Finally, we discuss the limitations of our evaluation.

8.1 BigBlueButton

BigBlueButton offers a lot of features, making it the more complex of the two conferencing systems. Because of this breadth of features, the attack surface is naturally larger when compared to eduMEET. In addition, the interplay between features makes correct implementation more difficult. We observed that our findings in BigBlueButton have two major types of root causes, both of which relate to the complexity of the software.

Several vulnerabilities came up as a result of subtle logic bugs in the internal server logic. We can see this in the situation arising when an attacker opens multiple media streams B35, but also in several other findings: B1, B3, B10, and B18, among others. These can, to some extent, be traced back to the internal logical complexity of BigBlueButton, which results from a large set of features and evolution over time.

For several other vulnerabilities, one can see a commonality of incomplete or missing security considerations in the design. For example, in B33, the ability of an attacker to join voice conferences without legitimate access can be traced back to reliance on a 5-digit voice conference ID for access control. When users leave, the server cannot revoke this ID to revoke access, as it is identical for all participants. In this case, there is a mitigation in place, but it is not sufficient to prevent attacks. There are also some more subtle cases, for example, in B27, where guessable secrets allow an attacker to gain read access to uploaded slides.

8.2 eduMEET

The root causes in most of our findings for eduMEET. are of a different nature. Oftentimes, the server trusts the client and forwards its messages without properly checking the input, for example, in E1, E4, and E6.

While the technical details of the other findings differ, they may stem from a similar root cause. For example, E2, which results from an implementation error, can also be seen as a missing feature because the moderator cannot effectively ban the attacker from the meeting. The same applies to E10 where the moderator cannot force an attacker to stop sharing audio. Here, it would be helpful to have a more fine-grained permission system, like the “lock settings” feature in BigBlueButton. This feature could allow the moderator to withdraw permissions of viewers, for example, to share audio.

In summary, most findings in eduMEET are either because there is too much trust in the client or because of missing moderation features. Both factors result in a lack of security and measures to eliminate disruptive factors within a meeting. Consequently, these findings show that filtering client messages and moderation features are critical measures to ensure secure meetings.

8.3 Limitations

Scope. To understand the architecture and behavior of conferencing systems, we analyzed the functionality and interaction of both conferencing systems. We examined server-side and client-side components in eduMEET. In BigBlueButton, we concentrated on the server-side components since these implement most of the logic and functionality. BigBlueButton’s client delegates state management to the third-party framework Meteor.js, which is out of scope for our analysis. For this reason, we did not examine the BigBlueButton client, which could bring new findings regarding web security.

Automation. For comparison with our manual approach, we used SonarCloudFootnote 13 to scan for bugs automatically. While it found code snippets that could be improved, it did not find any vulnerabilities. This result is expected because most of the bugs can be classified as logical flaws and require user interactions and a certain meeting state. Such conditions cannot be automatically applied by a static code analysis tool. BigBlueButton has publicly used SonarCloud as part of their quality control since June 2021.Footnote 14

Architecture of Conferencing Systems. Comparing two architectures as different as those of eduMEET and BigBlueButton was not a trivial task. Thus, we agreed on a shared architecture by breaking down the architecture of the respective conference systems. Certainly, the shared architecture can be used for future work. However, depending on the conferencing system and architecture, it may be necessary to extend the model. Our model uses the SFU WebRTC architecture, while other systems may use P2P or other WebRTC architectures, which allow for direct communication between the clients. Furthermore, other conferencing systems may communicate differently, for example, by using Extensible Messaging and Presence Protocol (XMPP).

Analysis of Further Conferencing Systems. We limited the scope of our analysis to allow us to cover the chosen conferencing systems and their architectures in detail. Further analyses of open-source conferencing systems may be performed using a similar process, applied to their respective architectures. Our analysis process is not directly applicable to closed-source software. Nevertheless, the detected logical flaws can provide inspiration for new vulnerabilities in other closed-source conferencing systems supporting the affected features.

9 Related Work

Although various vulnerabilities have been found in web conferencing systems in the past, there is little exhaustive scientific research in the general area of video conferencing systems. Thus, we consider previous research, vulnerability reports, and talks regarding conferencing systems to get a grasp of the attack surface.

Most of the vulnerabilities found in web conferencing systems are related to classic web security vulnerabilities such as cross-site scripting (XSS) [43, 44], server-side request forgery (SSRF) [9], SQL injection via custom URI scheme [18], and different types of misconfigurations [40, 41]. Also common are vulnerabilities resulting from missing checks [40, 41], flawed role management [4, 40, 47], missing security considerations [2, 47], and image or document conversions leading to vulnerabilities [2, 10]. While all these vulnerabilities are interesting, we wanted to focus more on factors that extend our attack surface.

Among the previously mentioned reports, some stand out in particular because the described vulnerabilities are located in the client, but the client differs from our architecture. In our architecture, the client is a web browser. In some reports [3, 20, 28, 43], the client is an ElectronFootnote 15 app. These applications are made with web technologies and use Chromium and Node.js. Vulnerabilities in these applications are critical since they can lead to client-side remote code execution (RCE) [3, 20, 28, 43]. Other kinds of conferencing clients are classic executables on Windows, Mac, or Linux, which extend the attack surface as well, for example, due to memory-related issues [31, 37]. Thus, different types of clients introduce different types of attacks, and the more types of clients the conferencing system offers, the larger the attack surface. The same applies to other components, such as Zoom’s Multimedia Router (MMR), which is responsible for transmitting audio and video between Zoom clients; this component was affected by a buffer overflow found by Google Project Zero [37].

Another interesting component used in conferencing systems is the login mechanism. Sudhodanan and Paverd found an attack related to Single Sign-On (SSO), where an attacker creates a Zoom account with the victim’s email (before the victim creates an account) [39]. When the victim now uses an identity provider with the same email to create a Zoom account, Zoom merges the accounts, which allows the attacker to log in to the victim’s account with the attacker’s password.

Natalie Silvanovich from Google Project Zero released articles in 2018, where she analyzed and fuzzed the WebRTC implementation in Chrome and closed-source video conferencing applications such as FaceTime and WhatsApp [32,33,34,35,36]. Four years later, she found one memory-related vulnerability in Zoom’s client and another one in Zoom’s MMR [37]. In the end, she pointed out that the closed-source software comes with a lot of challenges for researchers, which prevents further progress in verifying security properties [37]. She recommended making closed-source software available to security researchers [37]. In the same year, Ivan Fratric from Google Project Zero presented at Black Hat USA a 0-click RCE vulnerability in Zoom [14]. Fratric found out that different components use different XML parsers, which allowed him to smuggle XMPP messages (stanza smuggling) [14].

In the last years, cryptographic vulnerabilities in Matrix clients and libraries became public [13, 24]. In 2021, Kasak et al. drew attention to two vulnerabilities where vulnerable clients may be tricked into disclosing encryption keys [13]. In 2022, Albrecht et al. presented six attacks that affected the Matrix standard and its flagship client Element [24]. These attacks break authentication and confidentiality but require the cooperation of the homeserver, which is responsible for storing communication history and account information and relaying messages [23, 24].

While we mentioned vulnerabilities in conferencing systems from a technical point of view, Ling et al. focused on the attacker as a person who is responsible for disruptions in a meeting, i. e., Zoombombing [21]. Their results indicate that such attackers often have help from an insider within the meeting. Therefore, password protections and meeting IDs are a rather ineffective mechanism to prevent Zoombombing; they argue that unique join links would be an effective security mechanism.

In summary, there are lots of reports and findings in different fields regarding video conferencing systems and their components. However, there is a gap in scientific approaches, especially regarding open-source video conferencing systems. Our work is a first step to approach this problem.

10 Conclusions and Future Work

In our work, we systematically analyzed two open-source conferencing systems and detected 57 vulnerabilities and bugs. While the root cause for vulnerabilities in BigBlueButton mostly lies in the complexity of the system and the interplay between the features, in eduMEET, they mainly resulted from missing strict authorization checks and excessive trust in client messages. We want to highlight that our findings do not imply that BigBlueButton and eduMEET are less secure than commercial closed-source alternatives. The high number of findings was largely enabled by the open-source implementations, which facilitated our in-depth evaluations. On the negative side, it needs to be mentioned that both systems lack a swift vulnerability patching process. In the case of eduMEET, none of the reported vulnerabilities have been fixed. This is not acceptable for systems processing security-critical data.

The high number of findings shows that there is indeed a research gap in the security of video conferencing systems. With our systematic security analyses, we want to draw attention to this topic and want to stress that video conferencing systems offer a large attack surface due to their large number of components and used technologies. This is also confirmed by many related vulnerabilities, mostly found in non-systematic analyses by bug bounty hunters in recent years.

Our work can be extended in different directions. XML parsers within XMPP implementations are underexplored and are an interesting attack vector since XMPP is often used in video conferencing systems [14]. Other than that, the systematic approach that we applied to BigBlueButton and eduMEET could be applied to other open-source conferencing systems. Closed-source software is often more difficult to analyze if it is not freely and openly available [37]. Commercial providers should consider facilitating further security research and we hope there will be more future work that helps to improve the security of video conferencing systems.