Keywords

1 Introduction

Smart City uses data lake technology to store data in enormous capacity and there are many types of databases. One of the main and key features of big data technologies is NoSQL database. NoSQL database is able to store structured, semi-structured and unstructured data [1,2,3] regardless of type, format with 4Vs features: “volume, velocity, variety, veracity” [8,9,10,11] which is consisting of several model of data such as document, key-value, wide-column, and graph [4,5,6,7,8,9].

Data lake provides a scalable framework for storing large amounts of data and generating analytics that can assist multiple stakeholders in making effective decisions and developing new markets [10,11,12,13]. However, data lake still has many problems and drawbacks, one of the main issue is the use of trigger functions within ACID (Atomicity, Consistency, Isolation and Durability) to process complex online transactions and text statements [14,15,16]. This paper focuses on the evaluation of NoSQL database execution of data lake storage to support the use of trigger functions in managing various transactions. Four features and capabilities were evaluated in this study, namely, performance, scalability, accuracy and complexity to measure the execution of the selected NoSQL databases product i.e. MongoDB, Cassandra, Redis and Neo4j.

  • Performance—The performance measurements in this study include several operations consisting of; select, enter, update, delete [9, 17]. Database performance can be defined as optimizing the use of resources in carrying out operations to in-crease throughput and minimizing errors, allowing as much workload as possible to be processed.

  • Scalability—The scalability measurement in this study consists of several operations which include; data storage (write), retrieval (read), data sharding, data chacing, cluster management [17, 18].

  • Accuracy—The accuracy measurements in this study consisted of several operations which included; import data, export data, load data. Accuracy access data is a component of data quality and refers to whether the value of data stored for an object is the correct value [19].

  • Complexity—The complexity consisting of operations; query, function, variety. for query operations and functions used in this study include; group by, order by, select distinct, aliases, create primary [20].

The rest of the paper is organized as follows: Sect. 2 briefly discusses related work on comparison of NoSQL database features and capabilities followed by the detail description of methodology of comparison in Sect. 3. Section 4 discuss the result of the experiment. Finally, the conclusion and propose relevant expansion suggestions will be described in Sect. 5.

2 Related Work

In order to collect data, a smart city uses distinct types of electronic Internet of Things (IoT) sensors. These information and communication systems are integrated in digital technology throughout all city functions. It is a term that incorporates several ICT solutions in a safe way to manage the assets of a community, including transportation systems, waste management, water management, protection systems, information systems of municipal departments and other community services, as well as data management. With that, data lakes are a perfect place to store large amounts of data redundantly scale and store it. The concept is to connect, store and analyze the various very heterogeneous data sources, and by using NoSQL databases, data must be systematized, organized and modified for further use.

In the case of big data and real-time web applications, NoSQL databases are progressively used. NoSQL databases are particularly useful for working with vast sets of distributed data and are compliant with smart city data collection functionality. For relational data bases (RDBMS) to tackle on their own, this data explosion is proving to be too big and too complex. NoSQL databases are not constrained by the confines of a fixed schema model, unlike relational databases. NoSQL databases implement Schema on Read instead of applying Schema on Write. This makes NoSQL databases especially appropriate for the high-volume, high-variety online applications of today.

Currently, data model is the most important feature in selecting the appropriate NoSQL databases. Though, there are studies conducted by several researchers regarding the comparison of NoSQL databases based on features and capabilities such as performance, integration, and security [9, 21,22,23]. Meanwhile, other studies support the comparison of performance [17], integrity, cloud service criteria [8, 24], and frameworks [25]. However, those studies do not discuss the accuracy of data access and scalability, which are important features in evaluating of NoSQL databases capabilities for the purpose to select the appropriate NoSQL databases in supporting the trigger function.

Therefore, in this study we conducted a comparison of the NoSQL databases based on four features and capabilities namely; performance, scalability, accurate and complexity. In this study, we compared four NoSQL databases product such as MongoDB, Cassandra, Redis and Neo4j that used for data model document, wide-column, key-value, and graph respectively.

3 Research Evaluation Method

This section presents the research method in evaluating of NoSQL databases product for Smart City Data Lake Management based on features and capabilities. In this study, the research method was implemented based on the framework as shown in Fig. 1.

Fig. 1
figure 1

Framework of research evaluation

In this experiment, 10 simple SQL queries have been tested with the combination of functions and operations such as SELECT, INSERT, UPDATE, DELETE, IMPORT DATA, EXPORT DATA, LOAD DATA ORDER BY, GROUP BY etc. as shown in Table 1. Each SQL query was implemented by the selected of NoSQL databases product, which is MongoDB, Cassandra, Redis and Neo4j in executing different data model: document, key-value, wide-column, and graph. The identified features and capabilities of NoSQL databases were measured and compared. The scope of the study is covering four features and capabilities as below:

Table 1 Average NoSQL response score
  1. 1.

    Performance

    Capability to find, analyze and then resolve various database congestion that can impact application response times or hinder application performance.

  2. 2.

    Scalability

    Capability of a system to handle a growing amount of work or potential to perform more total work in the same elapsed time when processing power is expanded to accommodate growth.

  3. 3.

    Accuracy:

    Capability to represent the right data in a form that is consistent and unambiguous and most relevant to historical records stored on computer-accessible digital media.

  4. 4.

    Complexity:

    Capability of the query in evaluating the function and size of the expression.

4 Results and Analysis

The following outcomes of average scores using functions and operations available in NoSQL were obtained from the experiment, as shown in Table 1.

Based on performance, scalability, accuracy, and complexity, the results were grouped, as shown in Figs. 2, 3, 4 and 5 respectively.

Fig. 2
figure 2

Comparison of averages based on performance criteria

Performance—consists of operations; select, insert, update, delete. The average results can be seen in Fig. 2 as below.

Scalability—In Fig. 3 shows the operations such as data storage (write), retrieval (read), data sharding, data chacing, and cluster management, where virtually all types of NoSQL databases have high values.

Fig. 3
figure 3

Comparison of averages based on scalability criteria

Accuracy—includes of operations such as import data, export data, and load data. The average results were illustrated in Fig. 4, where the average score for MongoDB and Redis is the highest.

Fig. 4
figure 4

Comparison of averages based on accuracy criteria

Complexity—consists of tasks such as questions, functions, combinations, and it is possible to see the results in Fig. 5.

Fig. 5
figure 5

Comparison of averages based on complexity criteria

The findings showed that MongoDB and Cassandra had the highest results, while Redis and Cassandra owned data scalability, while Neo4j and Cassandra owned middle-class data access accuracy. The highest percentage for complexity issues are MongoDB and Cassandra, while Redis and Neo4j are relatively poor. This demonstrates the difference between NoSQL databases by referring to performance, scability, the accuracy of database access, and complexity as shown in Fig. 6.

Fig. 6
figure 6

Comparison of averages based on indicator criteria

The complexity and accuracy given a moderate value is shown by several studies that have been carried out because it is affected by the semi-structured data format of the input. It is also possible to view the results of this study in a Table 2, where all NoSQL databases have high average output criteria. As for the complexity criterion, as shown in Table 2, all NoSQL databases have moderate values.

Table 2 Implementation of NoSQL comparisons based on items

5 Conclusion

This study is to measure and compare the use of the NoSQL databases product for smart city data lake which is includes several data model. The evaluation has been conducted based on quantitative analysis through the experiment. The result of evaluation able to find the appropriate NoSQL databases product based on performance, scalability, accuracy, and complexity. The most important technical characteristics from each NoSQL databases have been studied in selecting the appropriate level of each database within the given features and capabilities. Although the NoSQL database product has the same performance, scalability and complexity scores, the accuracy shows the effect of the difference. MongoDB and Redis have high scores, although there are modest values for Neo4j and Cassandra. NoSQL databases compromise consistency to provide high performance and scalability in order to indicate the requirements that NoSQL is suitable for analyzing and accessing data across agencies based on investigation through experiments. It is in line with the success of the web-scale information system, that availability and speed are of high importance, and accuracy is compromised to some degree by sufficient NoSQL databases to meet these needs.

The future work from this study will involves optimizing algorithms and supporting complex transaction features for NoSQL databases with security and data integrity. In addition, we intend to support more categories of NoSQL databases in future testing and implementation.