Keywords

1 Introduction

A NoSQL database also referred to as “Non-SQL” or “Not Only SQL,” is a database which stores data in various formats other than relational tables. NoSQL databases were introduced in the late 2000s when the cost of storage drastically reduced, and are now one of the buzzwords of modern data storage systems. This is due to the large amount of data that currently exists and the swiftly growing heterogeneous data sources like sensors, GPS as well as several other types of smart gadgets. Web 2.0 companies (such as Amazon Facebook and Google) are the main drivers of NoSQL databases due to their increasing data and infrastructure demands. Web 2.0 has brought about numerous new apps that rely on the storage and also processing of large amounts of data and require high availability and scalability. This poses additional challenges for RDBs [1]. Primarily, the objective of these databases is to distribute large amounts of data across many cloud servers. There is growing interest in efficiently processing this unstructured data, commonly referred to as «big data »and incorporating it into traditional applications. But recently, NoSQL databases have built-in protection mechanisms to prevent security attacks [2,3,4].

This white paper analyzes the security features and issues of the four most preferred NoSQL databases-one from each of the four major categories [45] of NoSQL databases [44]. Specifically, Cassandra (column-based database), MongoDB (document database), Redis (key-value store), and Neo4j (graph database). It describes the key security features and issues of these four database systems under vulnerability to authentication, authorization, communication encryption, auditing, and DoS/injection attacks. In most survey literature on NoSQL security features, the comparisons made did not include the graph database Neo4j, which makes this survey unique from others.

2 Overview

2.1 Cassandra

Cassandra is an open-source, distributed, and also decentralized storage system (database) that offers an extremely readily available solution without a single point of failure [5]. Cassandra is a column-oriented database that is consistent, fault-tolerant, as well as scalable, and it runs on a network of hundreds of nodes. Its replication model is based on Amazon’s Dynamo [6], and its data design is based on Google’s BigTable “column family” data model [7]. Cassandra is thus a hybrid data management system that combines a column-oriented DBMS (e.g., Bigtable) and a row-oriented store. Cassandra was designed to work with Facebook’s Inbox Search feature [5]. Cassandra combines BigTable’s data structures with Dynamo’s high availability to serve over 100 million users daily. Cassandra combines BigTable’s data structures with Dynamo’s high availability to serve over 100 million users daily. Currently, Twitter, Facebook, Cisco, eBay, Rackspace and Netflix are some of the largest companies using this type of database The following are some of the several outstanding features of Cassandra [5, 8,9,10,11,12].

2.2 MongoDB

MongoDB is a document database created by 10gen that was designed for ease of development and scaling [13]. Written in C++, MongoDB is a schema-free, document-oriented database that manages JSON-like document collections. This enables Data to be nested in sophisticated hierarchical structures while remaining queryable and indexable. As a result of this, it allows many applications to model data more naturally. MongoDB makes use of collections and documents instead of rows and tables as in relational database management systems (RDBMS). Some of MongoDB’s key features are as follows: Scalability/sharding, MongoDB Query Language, Indexing, Data Replication and Document Oriented nature [14,15,16,17,18].

2.3 Redis

Remote Dictionary Server, widely recognized as Redis, is a fast, open-source, in-memory key-value data store. It is an in-memory NoSQL database that recognizes a variety of data structures, including strings, lists, sets, hashes, as well as sorted sets. Salvatore Sanfilippo wrote Redis in C Language and released it in 2009. Currently, Redis provides sub-millisecond latency, allowing for countless thousands of transactions per second in real-time applications such as gaming, healthcare, advertising and others. Redis, unlike the other key-value stores, provides data structures for handling any form of binary data, including arrays, bytes, numbers, strings, XML documents, images, and so on [19]. Furthermore, Redis provides hashes for storing and querying the database’s objects. Listed here are some of the features of Redis database: Scalability and High availability, In-memory performance, Replication and persistence and Rich Data Structures [20,21,22,23,24,25,26].

Table 1. Features of presented NoSQL databases.

2.4 Neo4j

Neo4j is arguably the most popular open-source Graph Database in the world. Written in Java, Neo4j adheres to a data model known as the native property graph model. The graph contains nodes (entities), which are linked together (by relationships). Data is stored in key-value pairs known as properties by nodes and relationships. Each piece of data is explicitly linked, resulting in unprecedented speed and scale. Neo4j is powered by a native graph database which stores and manages data in a more natural and connected manner, allowing for ultra-fast queries, a deeper context for analysis, and easily modifiable data relationships. Some of it’s Key features includes: Reliability and Scalability, Data model, Cypher Query Language and Indexing [27,28,29,30].

Summarized in Table 1 are the features of MongoDB, Casandra, Redis and Neo4j; the NoSQL databases presented in this paper.

3 Security Features in Cassandra, MongoDB, Redis and Neo4j

Security has been a weakness in all NoSQL databases. There is no NoSQL database that provides complete security. As stated previously in the introduction, the primary concern of NoSQL Database designers was not based on security; hence, there are numerous security concerns in their design. This section focuses on some of the security features and issues with Cassandra, MongoDB, Redis, and Neo4j. Specifically, this section will look at how secure these systems are against vulnerabilities to authentication, authorization, communication encryption, auditing and DoS/Injection attack criteria [43] and possibly outline briefly the main issues in each one.

3.1 Cassandra Security Features

  1. 1.

    Authentication:

    Cassandra supports pluggable authentication and it is configured via «the authenticator »settings in «cassandra.yaml ». In Cassandra’s default distribution, there are two choices available. This implementation removes the need to authenticate to the database and is thus used to completely disable authentication. The «Password Authenticator »is another option, in which usernames are hashed but unsalted MD5 passwords [31] are saved in the system’s «auth.credentials table ». To manage security in enterprise Cassandra, you can also utilize external, third-party packages such as Kerberos authentication. This will necessitate the installation of separate Kerberos servers as well as Kerberos client software on all joining Cassandra hosts.

  2. 2.

    Authorization:

    Similar to Authentication, Cassandra also supports pluggable authorization, which is customizable via the «authorizer »setting inside «cassandra.yaml ». It also comes with two major options to choose from. The default distribution enabled is the «AllowAllAuthorizer ». This undertakes no checks and thus provides no authorization; hence it gives full permissions to all users irrespective of their roles. The second choice is the «CassandraAuthorizer ». This provides full permissions management capability and saves its data in «Cassandra’s system tables ». By selecting this option, privileged administrators gain the ability to enable any of the privileges on any resource to a selected user by running the CQL. The problem with the Cassandra Authorizer approach is its inability to refresh the file on each access, making it impossible to modify the valid permissions without restarting the entire Cassandra process.

  3. 3.

    Communication encryption:

    Encryption in Cassandra is transparent to all end-user activity. You can read, insert, update, etc. data without changing anything on the application side. Cassandra comes with multiple levels of encrypting data such as auxiliary encrypted mode of communication (client node communication) from the “client machine” to the “database cluster”. By default, client node communication is unencrypted, but can be enabled after a valid server certificate is generated. The Client-to-server SSL ensures that data in flight is not compromised and that client machines are securely transferred back and forth. Consequently, “node-to-node encryption” can be used to make sure that data is secured as it is transferred between database cluster nodes. This can also be customized by changing the appropriate settings in “server_encryption_options” in the “cassandra.yaml” file. The SSL feature is deactivated by default, because using these default settings by firms may result in data breaches while sending data over the network in plain text. Finally, in DataStax Enterprise, transparent data encryption (TDE) prevents “data at rest” from theft and unlawful use [32]. Because the data encryption is kept locally, the TDE must be enabled when using a secure file system. Similarly, Cassandra’s commit log (the location where the file is edited) is also not secured.

  4. 4.

    Auditing:

    Cassandra 4.0 and higher versions comes with audit Logging [33]. This is used to log all incoming CQL command requests, as well as authentication to a Cassandra node. In the cassandra.yaml file, the custom logger can be implemented and injected with the class name as a parameter. An administrator can use data auditing to determine “who looked at what/when” and “who changed what/when”. However, executing prepared statements in Cassandra will log the query as provided by the client in the prepare call, along with the execution timestamp and all other attributes.

  5. 5.

    Vulnerability to DoS/Injection Attack:

    Cassandra utilizes a “Thread Per-Client” approach in its network code. With this, an attacker can prevent the Cassandra server from accepting new client connections by causing the Cassandra server to allocate all its resources to fake connection attempts. However, Cassandra offers creating user-defined-functions (UDFs) functionality to perform custom processing of data in the database. But JFrog’s Security Research team [34] recently disclosed a remote code execution vulnerability that they said is “easy to exploit and has the potential to wreak havoc on systems.” This is possible because even though these new vulnerabilities do not affect Cassandra default installations where UDFs are disabled, many Cassandra configurations enable them, causing the instance to be vulnerable to DoS attack.

3.2 MongoDB Security Features

  1. 1.

    Authentication:

    Enabling authentication is essential for MongoDB security because it is not enabled by default. Since MongoDB does not have a distinct user directory, authentication data is kept as part of MongoDB databases. By default, MongoDB employs the Salted Challenge Response Authentication Mechanism (SCRAM) when authentication is enabled. The IETF RFC 5802 standard provides the foundation of this system. With a customizable iteration count and unique random salts for each user, it allows for bi-directional authentication between client and server. It is compatible with both SHA-1 and SHA-256 hashing. In addition, MongoDB has other authentication options such x.509 certificate authentication, Kerberos authentication, Microsoft Active Directory authentication, and Lightweight Directory Access Protocol (LDAP) authentication [35]). Members of replica sets and sharded clusters can use the x.509 certificate authentication for client authentication as well as internal authentication. However, a secure TLS/SSL connection is required to authenticate x.509 certificates. In this situation, MongoDB’s authentication feature needs to be active so that each server may be verified before entering the cluster [36].

  2. 2.

    Authorization:

    Similar to the authentication described, MongoDB authorization is not enabled by default. You can enable authorization by using “-auth” or “security.authorization” setting [3, 37]. You can also enable internal authentication for client authorization. Once MongoDB authorization is enabled, it allows to set permissions that are either explicitly assigned to a role, inherited from another role, or both. You can use the default database roles, or specify new roles if they are insufficient for your purposes. MongoDB also utilizes Role-Based Access Control (RBAC) to regulate access to the system. If a user is assigned one or several roles based on which resources and operations you want the user to perform. But aside from the role assignments, users have no access to the system. Also, MongoDB version 3.4 and higher versions support LDAP authorization, which allows the authenticated user to query the LDAP server to know the LDAP groups it belongs to. MongoDB links the Distinguished Names (DN) of every corresponding group with roles in the admin database. Following this, the user can then be authorized by MongoDB based on the linked roles and privileges.

  3. 3.

    Communication encryption:

    MongoDB encryption provides robust features, some of which are pre-installed on the MongoDB Atlas Data-as-a-Service platform. MongoDB Atlas includes client-to-server TLS encryption as a requirement. MongoDB’s “encryption at rest” is an Enterprise functionality that needs Enterprise binaries to provide a layer of security to ensure that written files or storage are only visible after they have been decrypted by an authorized process/application. MongoDB version 4.2 also provides “encryption in use”. This allows MongoDB Clients such as drivers and shell to instantly encode and decode fields using secure keys stored in a secure vault.

  4. 4.

    Auditing:

    For mongod and mongos instances, MongoDB Enterprise has an auditing feature. This auditing facility allows administrators and users to track system activity for deployments with multiple users and applications [38]. To enable audit logging in MongoDB, you need to go to the mongod.conf configuration file. The auditing system, when enabled can record the operations of the schema, replica set and sharded cluster, authentication and authorization, and CRUD operations. MongoDB Atlas also provides support for auditing all M10 and larger clusters.

  5. 5.

    Vulnerability to DoS attack:

    MongoDB by default does not enforce authentication as already stated. In many instances, this can allow anyone on the network to access all data within the database. This leaves MongoDB vulnerable to DoS attacks. An attacker does not need to be an administrator to conduct the attack; because they can use any legitimate user credentials.

3.3 Redis Security Features

  1. 1.

    Authentication:

    Even though Redis doesn’t attempt to provide access control, it offers a thin layer of optional authentication that may be activated by modifying the redis.conf file. Redis versions before Redis 6 were only able to understand the one-argument version of the command: AUTH. In this configuration, unless the connection is authenticated by AUTH, Redis will reject any command issued by newly connected clients. In Redis 6, it is possible to use the AUTH command in two-arguments form: AUTH. This technique, however, provides backwards compatibility. Additionally, the AUTH command, like all other Redis commands, is delivered in clear text and is not secure against eavesdropping by an intruder with sufficient access to the network.

  2. 2.

    Authorization:

    Redis comes with an Authorization layer when installed. Once the authorization layer is enabled, any query from an unauthenticated client will be rejected by Redis. A client can authenticate itself by sending the “AUTH” command preceded by the password provided by the system administrator in clear text inside the Redis.conf file. Although a strong password can be generated using the ACL GENPASS command, hackers can take advantage of Redis’ great performance to test many passwords simultaneously in a short amount of time. Also, you would have to restart your Redis server after editing the configuration file.

  3. 3.

    Communication encryption:

    Redis does not by default support any form of encryption. Redis does not support SSL-encrypted connections because it’s been created for usage only in trusted private networks. Assuming that encryption is desired in the client-server connection, extra tools are necessary. It does not offer data encryption for Data-at-rest (stored as plain text) and Data-in-transit between Redis client and server is not encrypted. Redis, therefore, uses stunnel to encrypt Redis communication. It is an SSL encryption wrapper between a local client and a local or remote server. This stunnel application can tunnel unencrypted communication via an encrypted SSL tunnel to another server [39]. Although SSL encryption is added by stunnel, this does not completely ensure that unencrypted communication will never be recorded. Any attacker will be able to intercept unencrypted local communication as it is being transmitted to Stunnel if they can breach the server or client-server relationship.

  4. 4.

    Auditing:

    Redis has service logs that compile and document operations taken on various Redis entities. The account itself, users, API Keys, subscriptions, databases, accounts, payment methods, and more are examples of these entities. Syslog and local text log files are the two mechanisms that Redis offers for logging. Syslog takes in log messages, directs them to different on-disk log files, and takes care of rotation and deletion of old logs. This method of logging files can present problems because numerous services are writing to numerous log files.

  5. 5.

    Vulnerability to DoS/Injection Attack:

    Redis is an open-source, in-memory database that persists on disk as already indicated earlier on. By default, Redis can be accessed without credentials and can be exploited to corrupt the heap and potentially result in remote code execution. DoS attack is a key threat that Redis does not address. This attack is possibly done by inserting elements into the input set and changing a constant time-taking algorithm to a linear or exponential time-taking method. This will render the system inoperable, resulting in the Distributed denial of service attack.

3.4 Neo4j Security Features

  1. 1.

    Authentication:

    Neo4j make use of user details such as username and password. Passwords are encoded using the SHA-256 format. It has an authentication module that utilizes the AuthenticationPlugin interface. In Neo4j, authentication is enabled by default but can be turned off by the setting using dbms.security.auth_enabled. It includes a “native auth provider” that keeps the users and their role information in the database. In addition to the Native auth, LDAP auth Provider is also available. Similarly, Neo4j also provides “Single Sign-On” provider and “Custom-built” plugin auth providers for clients with special requirements that are not handled by either native or LDAP. Again, Neo4j supports Kerberos for authentication with single sign-on.

  2. 2.

    Authorization:

    Similar to the authentication, authorization is enabled by default in Neo4j. It comes with the authorization module which utilizes the AuthorizationPlugin interface. Neo4j connects data along with intuitive relations to make identity and access management happen quickly and effectively. Neo4j 3.1 introduced the concept of role-based access control (RBAC). This allows you to possibly create users and grant them specific roles in the database. This was enhanced significantly in Neo4j 4.0 with the inclusion of privileges. However, it is impossible to have different security privileges on different instances of a cluster [40]. As the whole cluster shares the privileges already configured in the database using Cypher administrative commands. This indicates that consumers have the same privileges irrespective of the server they access inside a cluster.

  3. 3.

    Communication encryption:

    Neo4j does not currently deal with encryption for data-at-rest explicitly [41]. However, it supports the securing of data-in-transit by using TLS/SSL technology which is implemented by Java Cryptography Extension (JCE), a digital certificate and a set of configuration options provided in neo4j.conf. The SSL framework supports using common SSL/TLS technology to secure the following Neo4j communication channels [42]. Neo4j also provides APIs (OGM) for Java-based Application-Level Encryption [3, 37].

  4. 4.

    Auditing:

    Neo4j offers limited auditing facilities in Open source and it offers logging facilities in Enterprise. The systems root directory where the general log files are stored can be configured via “dbms.directories.logs”. Queries executed in the database can be enabled or disabled by dbms.logs.query.enabled parameter. Neo4j includes security event logging, which logs all security events. It records login attempts, authorization failures from role-based access control and all administration commands and security procedures that run towards the system database.

  5. 5.

    Vulnerability to DoS/Injection Attack:

    Noe4j prevents cypher injection by sending input as a parameter to the query. In a parameterized query, placeholders can be used for parameters and their values supplied at execution time. This means developers do not have to resort to string building to create a query. Moreover, parameters greatly simplify Cypher’s caching of execution plans, resulting in quicker query execution times. Parameters can be used for, (literals and expressions) and (node and relationship ids). Since Neo4j uses the Cypher (CQL) declarative graph query, it makes Neo4j vulnerable to injection attacks by using string concatenation. That is because Cypher is vulnerable to injection.

Table 2. Security features of presented NoSQL databases

Table 2 shows the security features of the four databases. However, all databases offer security do the data with respect to all categories of the security features presented in this work. Thus, the next section will throw more emphasis on the strengths of each database with regard to each of the security category.

4 Comparative Findings and Discussions

4.1 Security Assessment Key

The following descriptions are key [43] for assessing the security of the databases for all categories;

  • \(*\) High

    A database is considered high with respect to a security category if and only if the features that it provides completely secures the data

  • \(*\) Medium

    If the features needed or provided to secure the data are partial or limited, the database is said to provide medium security with respect to the category.

  • \(*\) Low

    When databases provide no or low required features to secure data.

4.2 Criteria for Assessing Security

With the key established, a description for the categories of security [43] with respect to the metric values are given as;

  1. 1.

    Authentication

    • High - Logon authentication such as password-oriented, multifactor, certificate, and SSL-based authentication. Logon makes use of a combination of user identifier and password. Examples of logon authentication are captcha images, pin numbers, and biometrics.

      Network-based authentication uses authenticated user session through drivers and network protocol stack.

      IP-based authentication uses IPsec security modules to validate the source and destination IPs.

    • Medium - The database supports only one means of logon, network-based or IP-based authentication.

    • Low - No means of authentication or a basic password requirement.

  2. 2.

    Auditing

    • High - NoSQL databases must be able to audit and analyze transaction logs(including external and internal activities), database connections and privilege grants.

    • Medium - If database can log all user profile activities

    • Low - No mechanism to secure the system or data.

  3. 3.

    Authorization

    • High - The three levels; database, content, or object level must be supported by the database with some popular models for authorization such as MAC, discretionary, policy-based, task-based, role-based access control(RBAC) and fine-grained access controls.

    • Medium - Database must be able to at least support a level of authorization with any of the models under high as well.

    • Low - Little or no authorization support by the database.

  4. 4.

    Communication Encryption

    • High - NoSQL databases must provide encryption(two broad categories: data-at-rest and data in transit). Examples the former category are MD5 hashing, Data Encryption Standard (DES), AES, SHA1 and SHA2 hashing. Methods of the latter category include SSL, TLS, SSH, and IPsec. Examples of transport level security methods are SSL Record Protocol, Change Cipher Spec Protocol, Secure Shell (SSH), Handshake Protocol, Alert Protocol and IPsec Protocol.

    • Medium - Database provides either of the methods of data-at-rest or the methods of transport layer security.

    • Low - Database does not provide any encryption method to secure data.

  5. 5.

    Vulnerability to DoS/Injection Attack

    • High - Security assurances by the databases include input validation, least privilege policy and secure coding practices.

    • Medium - Databases provide only of the mechanisms stated with high security.

    • Low - None of the methods are provided by the databases.

Fig. 1.
figure 1

Comparisons under authentication

Fig. 2.
figure 2

Comparisons under auditing

Fig. 3.
figure 3

Comparisons under authorization

Fig. 4.
figure 4

Comparisons under communication encryption

Fig. 5.
figure 5

Comparisons under vulnerability to DoS/injection attack

Figures 1, 2, 3, 4 and 5 show the results of comparing the four featured NoSQL databases under the categories of security described. In all the figures, on the Y-axis and X-axis are the metric values and NoSQL databases. The metric values are labeled 0, 1, 5, 10 and 15, where labels 1, 5 and 10 represent low, medium and high respectively elaborated in Sect. 4.1. In Fig. 1 where a comparison was made under the category “Authentication”, databases MongoDB and Neo4j had high security features to secure data. Both Casandra and Redis databases have little or no authentication features at all. Comparisons were made under the category “Auditing” shown in Fig. 2. It can be seen that databases Redis and Neo4j had high features of protecting data under this category while both Casandra and MongoDB have medium or partial protective features. Thirdly, category “Authorization” represented in Fig. 3 has MongoDB with high features, Casandra and Neo4j having medium security features and only Redis having a low protection of data under this category. Figure 4 shows comparisons of NoSQL databases under the security category “Communication Encryption”. In this figure only MongoDB has a high security feature. With Casandra and Neo4j providing medium security features, while Redis is either not able to secure data or does it minimally. Finally, presented in Fig. 5 is category “Vulnerability to DoS/Injection Attack”. None of the compared NoSQL databases can fully secure data under this category. They either do it partially by Casandra and Neo4j or not at all by databases MongoDB and Redis.

From all five figures, based on combined features and their power to protect data, Neo4j can be said to have the best protective features to secure data. While Redis database performs the weakest in a collective security features in protecting data.

5 Conclusion, Recommendations and Future Works

Given the various security improvements made by NoSQL database platform vendors to improve their security mechanisms, there is still a paucity of research in discussing the security flaws of NoSQL systems as well as the way forward for resolving them. This paper discussed an extensive overview of various vulnerabilities in four of the most common NoSQL databases (MongoDB, Cassandra, Redis and Noe4j) one from each category. In existing works, the comparisons made either included a few NoSQL databases or excluded Neo4j. The algorithms used by each database to support security features were discussed. Each of these databases discussed has its own set of drawbacks and benefits. Comparisons of the databases under different security categories were made as well. Looking at the features identified and comparisons made, NoSQL system developers and administrators can choose and make a better security plan to make their database systems more secure. Despite making significant improvements to improve NoSQL databases, future studies aimed at designing a more robust security framework are required. This should be targeted at designing and implementing a strong security mechanism against the Vulnerability of DoS attacks. Again, further studies can be conducted to design a standard security framework for each category of the NoSQL databases.