1 Introduction

NoSQL, for “Not Only SQL”, attributes to a diverse and progressively more recognizable collection of non-relational data management systems; where SQL is not being used for manipulation of data as well as databases are not built primarily on tables (Akhtar et al., 2021; Al Ali, 2021; Alhamad et al., 2021, 2022; Ali et al., 2021). While working with massive data whose architecture does not necessitate a relational model NoSQL database management system are preferable. These structures are disseminated, non-relational databases planned for all-encompassing wide range data storage as well as for densely-parallel data operations and processing over a great amount of commodity servers. Non- SQL languages and systems are also employed by them to interrelate with data and information (although several latest APIs that renovate SQL inquiries to the system’s indigenous query language or tool). NoSQL database systems came up next to chief Internet corporations, like Facebook, Amazon and google which experienced difficulties while working with massive amounts of data and information with usual RDBMS systems were not able to deal with. Multiple processes can be carried out by them like, investigative and prophetic analytics, ETL-style the transformation of data, and OLTP non-mission-critical (like handling transactions between organizations and long duration transactions). Unlike the conventional DBMSs and data warehouses, these systems as inspired by Web 2.0 applications are planned to reach out to the maximum number of those users who are doing updates and the reads as well (Ali et al., 2022; Alnazer et al., 2017; Alnuaimi et al., 2021; Alsharari, 2021; Alshurideh et al., 2022).

A relational database system which is planned and intended to offer ACID (Atomicity, Consistency, Isolation, Durability) properties, traditional SQL-based OLAP in Big Data domain and real-time OLTP (Online Transaction) are known as NewSQL system. Utilizing NoSQL-approach features like column-based information storage and scattered styles this system has covered up all the limitations encountered by conformist RDBMS system. Other novel features introduced by this system are in-memory processing, symmetric multiprocessing (SMP) or Massively Parallel Processing.

The data that is too much comprehensive, ambiguous, readily changing is hard to be tackled using the conventional methods as per the observation of analytics. These days every research institution, businesses, and governments are generating extraordinary amounts of data which is too complex as well. Hunting of required information from such enormous data is very crucial for the organizations. It’s a great challenge to extract the meaningful insight out of the bulk of data swiftly. That is why analytics has become inextricably important to understand the significance of Big Data to perk up their business performance and boost their market share. In the past few years, the ways to deal with the variety, velocity, and volume of big data has improved to a great extent. The immense rise in data size necessitates rapid analytics with each new inquiry by the application user. This situation has escorted the technologists to introduce a new DBMS system that can triumph over this processing holdup at Database side. As the structural design of RDBMS has restrictions to handle such huge data and carry out analytics. NoSQL architecture is specifically designed to deal with such speed breakers. Thanks to the flexible and adaptable architecture of NoSQL massive volumes of information can be processed rapidly.

In this paper, we will discuss the conventional RDBMS features as well as its limits to manage the huge data. Furthermore, NoSQL Databases will be discussed along with their types and distinctive characteristics to deal with Big Data will be explained. The application areas where NoSQL databases can incorporate will also be discussed. By means of the industry's experience with NoSQL, we will also attempt to elucidate the problems that can be encountered using later mentioned systems for Big Data. Finally, there is going to be a comparison between the two systems about how they contend with normal data to Big Data.

2 Literature Review

From the several data-models, the model that has been surpassing all others is a relational model from the start of early 80 s, with achievements like Oracle databases, MySQL and Microsoft SQL Serveralso known as Relational Database Management System (RDBMS). The models mentioned before are all designed on the relational model. The main reason to build RDBMS was to provide data processing to businesses and from that time till now RDBMS is proving as the best tool for information storage whether the information is personal data, financial statements, transaction processing and so on (Aziz & Aftab, 2021; Cruz, 2021; Eli, 2021; Farouk, 2021; Ghazal et al., 2021a, 2021b, 2021c, 2021d).

2.1 Big Data

As time went by, the provisions for data kept on growing. This data has seen a revolution from structured to unstructured in form and from Megabytes, Gigabytes, Terabytes to Petabytes in size this ever-changing data has caused people to consider about a different solution to supervise this big amount of data. As the data became big, comprehensive, multifaceted, structured or amorphous, and diverse, it needed noteworthy consideration and concentration. Vast amounts of data are being produced at a very swift rate from a variety of distinct prospective areas, systematic tools, and the internet, especially the world known social media, mentioning a few of them. This type of data was termed as Big data. Big data points to those datasets the size of which are past the ability of typical database software techniques to capture, process and manage, store, and examine (Al-Khayyal et al., 2021; Al Batayneh et al., 2021; Alshurideh, 2022; Alshurideh et al., 2022). This characterization is deliberately subjective because it is evident that as technology progresses over the passage of time, the volume of datasets that become licensed as big data will also enlarge (Al Guergov & Radwan, 2021; Hamadneh et al., 2021; Hanaysha et al., 2021a, 2021b; Joghee et al., 2020; Naqvi et al., 2021; Shebli et al., 2021).

4v’s can explain the nature of Big data easily. 4v’s include Volume, Variety, Velocity, Variability.

2.1.1 Volume

The magnitude of produced and stored data, its importance, value and potential insight is determined by its size. Means whether the data is worthy enough to be termed as Big data or not.

2.1.2 Variety

Variety is the kind, category, and the architecture of the data. This assists experts who scrutinize it to efficiently employ the consequential insight. Big data it the collection of unstructured type data including videos, audios, images and text; also, it uses data fusion to fill in the missing pieces.

2.1.3 Velocity

It can be described as, the pace of data production and processing in the system to comply with the business requirements, tests, and obstacles that stretch out in the technique of escalation, improvement, and lastly the growth. Big data is usually obtainable in up to date format.

2.1.4 Variability

Irregularity and unpredictability of the data set can obstruct the course of action to hold and handle it (Fig. 1).

Fig. 1
A radial diagram of big data four V represents volume, velocity, veracity, and variety.

(“Big Data,” IBM, [Online]. Available: http://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data)

BigData 4 V explanation

While considering big data, one problem is its growth, but the other bigger issue is the dire need to manage and store not just the structured data but also the unstructured data as well as the pictures, videos and files. An analytical case in point is that the relational model cannot deal with the data traffic that the social media sites like Facebook and Twitter produce, also it is not the type of data they want to store. Now, for the conventional data processing systems, this significant velocity of the increase in the volume of the data poses a solemn challenge (Alshurideh et al., 2020; Alzoubi, 2021a, 2021b; Alzoubi et al., 2022a; Alzoubi et al., 2020b).

Newly, though, in many cases, the utilization of relational databases escorted to troubles both due to discrepancy and glitches in the design of data and restraints of parallel scalability among various servers and massive size of data. The two main tendencies that brought the mentioned issues into the consideration of international software community are.

  1. 1.

    The massive increase of the quantity of data produced by sensors, systems and users, additionally fast paced by the attention of a huge module of this volume on big disseminated systems like Google, Amazon and other cloud services.

  2. 2.

    The escalating interconnection and intricacy of data fast paced by Web 2.0, Internet, social networks and exposed and uniform access to sources of data consists of different systems at a very large number (Fig. 2).

    Fig. 2
    A model diagram illustrates megabytes, gigabytes, terabytes, petabytes with E R P, C R M, WEB, big data in the increasing order. Transactions + interactions + observations equals big data.

    (Slide Share, [Online]. Available:https://www.slideshare.net/cloudstack/vbacd-july-2012-apache-hadoopnow-and-beyond. [Accessed 12 February 2018].)

    BigData Transactions with Interaction and Observations

Big Data = Transaction + Observation + Interactions.

Due to this very reason, a lot of emerging companies took up different kinds of non-relational databases, which are also known as NoSQL databases and the application run arise e.g. Yahoo which used PNUTTS to fulfill enormously parallel and physically worldwide dispersed database system to run their web based applications.

Since their release NoSQL (“Not Only SQL”) systems are extensively accepted in several realms. The main idea behind NoSQL systems is to hold up applications not properly served by relational systems, specifically those which involved in managing and processing BigData. NoSQL system can be classified as graph databases, document stores and key-value stores. It is important to mention here that, there is not one specific query language like standard query language used in RDBMS or a typical APIs used to communicate with various NoSQL systems. Normally, customers are required to use custom build APIs at programming level to communicate (Alzoubi & Ahmed, 2019; Alzoubi & Aziz, 2021; Alzoubi & Yanamandra, 2020; Alzoubi et al., 2020, 2021). This makes portability to be reduced and necessitates code at system-specific level.

2.2 Characteristics of RDBMS

The data organized in relational databases is in the form of tables, which are made up of columns and rows. In order to remove ambiguities during queries, these tables cannot have duplicate rows and every table has been assigned a primary key to a column that distinctively recognize each and every row that is known as record. For example, Fig. 3 demonstrates that Product_ID in the product table is the primary key. Author_ID Column used as Foreign Key in the Product_Book table, which is a child table, Author_ID column is used to reference Author table which is a parent table. The keys used like Foreign and performance, that scan be described as supplementary table relationships may possibly be mandatory while retrieval of data.

Fig. 3
A process flow diagram of library database begins with the product classifies into book and film. Book classifies into author, and publisher and film classifies into director, and distributor.

Relational Database Schema [17]

Multiple tale inheritances are utilized by libraries’ database to hoard familiar traits in a common table knows as Product table (please refer to Fig. 3), every single one of unique attributes is kept in specific type product tables. This said method is far additional well-organized in contrast to concrete table inheritance where for every product category a new table is fashioned, and the queries used are custom-made for specific products. But, as explained to get all the significant characteristics of an established solution multiple table inheritance necessitate many joined operations.

The RDBMS make it certain to have Database ACID properties as the primary prerequisite. For databases, ACID properties act as the vital concept. The abbreviation globally knows as Atomicity, Consistency, Isolation, and Durability.

The ACID makes certain arrangements to business to keep these purchase of sweater dealings from overlaying one another thus the merchant is kept safe from the flawed register and account balances.

2.2.1 Atomicity

Atomicity is the first ACID property and it is best explained by the phrase “all or nothing” to understand this let’s consider an example; When a database gets an update, either all of it is available or none of the updates happen to be accessible to anybody past the application or user executing the update. The above-mentioned action performed on database is called a transaction and it is either assigned or canceled. In other words, only a part of an update cannot be put into the database, you get whole of it or none.

2.2.2 Consistency

ACID property of consistency makes sure that if there is a change to values in an instance then there will be a consistent change to other all values in that specific instance. The constraint of consistency is a base on data and it assists in the system as precondition, the condition after the execution knows as post-condition and at the end the change or transformation condition to be ensured on every transaction (Kashif et al., 2021; Khan, 2021; Lee & Ahmed, 2021; Lee et al., 2022a, 2022b).

2.2.3 Isolation

The third property knows as isolation section of the ACID is needed when there are many transactions running in parallel known as concurrent. The executions of transaction that take place in parallel are known as concurrent transactions, for example shared multiple users accessing shared objects in the figure the scenario is explained at the top as actions happening over time. Basically, the safety precautions employed by DBMS to thwart clashes between concurrent transactions is known as isolation.

Let’s understand this with the help of an example, if there are two parties updating the similar catalog article, the changes made by the first party should not depend on or be affected by the changes made by the other party to be able to work in isolation means that both parties act as they are the solitary user. Every change must be kept isolated from other users of the same catalog (Fig. 4).

Fig. 4
A model diagram illustrates concurrently existing transactions 1, 2, and 3 occur at the same time and transactions 1, 3, and 2 represent alternate serialized execution of the concurrent transactions.

(“Database ACID Properties,” [Online]. Available: https://www.servicearchitecture.com/articles/database/acid_properties.html.)

Isolation Relational Database Property

Serializability is another important concept which should be understood while debating separation over transactions. The execution of transactions could be serialized once the consequence on the database is unchanged whether the transactions are executed in an interleaved manner or in serial order. Concurrent transaction i.e. Transactions 1 through Transaction 3 are being executed at the same time as it can be seen in the figure. An important point to keep in mind here is that in serialized execution it is not compulsory that transactions started first will be the ones automatically completed before the finishing the other transactions in the sequential execution (Mehmood, 2021; Mehmood et al., 2019; Miller, 2021; Mondol, 2021; Obaid, 2021).

2.2.4 Durability

Durability is that ACID property which attends to the need of keeping a record of committed transactions. These updates are not supposed to be missed in any case as it's a very critical thing. It’s the systems capability to recover the completed transactions in system or storage media failure. The durability features are as follows:

  • The recovery of recently committed transactions in case of database failure

  • The recovery of recently committed transactions in case of application failure

  • The recovery of recently committed transactions in case of CPU failure

  • The recovery of recently committed transactions in case of storage.

The restrictions of RDBMS’s are to deal with amorphous, diverse, heterogeneous, massive amounts of data. For RDBMS retailer it is a huge challenge because of its architecture. This test to manage the big data has compelled them to devise a new technology that can handle the said amounts of data and information.

SQL-Like centralized databases have been pushed towards their perimeters by computational processing and storage requirements of applications like Big Data used for Analytics, Social Networking and Business Intelligence having large than petabyte datasets. So, these limitations of paved a way that directed to the growth of horizontally ascendable, dispersed non-relational type of data stores, which are called as No-SQL databases, like Google's has built open source HBase implementation and Bigtable and the well-known Facebook has Cassandra. The effectiveness, competence, and cost-effectiveness of these approaches are gained by the embodiment of distributed architecture-based key-value stores, for example Voldemort and Cassandra. In regard to Data warehousing, Web 2.0, Grid and applications used in Cloud were very hard with RDBMS and it was a major drawback of this system. Pokorny (2013), emphases mainly to NoSQL databases from the perspective of cloud environment, chiefly concurrency model and scalability in horizontal fashion. The Relational Database Management Systems (RDBMS) and NoSQL databases are differing from each other, but NoSQL databases did not assure ACID properties (Radwan & Farouk, 2021; Shamout et al., 2022).

2.3 Characteristics of NoSQL Databases

The conventional database systems are designed on the basic idea of execution of transactions in the manner to keep the data veracity and reliability. This keeps the data consistent while managing it. The features of transactions are also known as ACID (Atomicity, Consistency, Isolation, and Durability) as we have already discussed. Though, developing a system compliant with ACID has made known to be a trouble. CAP-theorem has been observed, i.e. clashes arose among distributed systems diverse sides of high availability that are not completely resolvable.

2.3.1 Strong Consistency

On updates to the data set the version of data seen by the clients is totally same e. g. through the method of two-phase commit protocol (XA transactions), and ACID.

2.3.2 High Availability

In the case, if a few of the machines in a cluster are not working, all the clients still can always find a copy of the requested data. Down machines do not create a problem in this matter.

2.3.3 Partition-Tolerance

The goal of the entire system always is to maintains its characteristics and features even while being positioned on various servers at the same time being transparent to the client. According to the CAP-Theorem, at the same time, out of three only two dissimilar aspects of scaling out can be attained entirely (see Fig. 5).

Fig. 5
A radial diagram of 3 characteristics of S Q L database as consistency, availability, and partition tolerance.

Characteristics of NoSQL Database

To attain improved Availability and Partitioning a lot of the NoSQL databases mentioned above have lessened the needs for Consistency. This step laid the way to develop systems globally called as a BASE (Basically Available, Soft-state, Eventually consistent). NoSQL databases have been classified according to the CAP theorem by Han, J. They had compared different NoSQL databases by executing multiple different criteria (Al Ali et al., 2021; Alzoubi et al., 2021; Batayneh et al., 2021).

Main Usages of NoSQL Database can be categorized as (1) Huge-scale and wide data calculation and processing (processing in parallel in the distributed systems); (2) Embedded IR (general machine-to-machine data search and reclamation); (3) Investigative analytics done on unstructured and structured data (knows as expert level); (4) Huge size data storage (unstructured, semi-structured, small-packet structured) (Afifi et al., 2020).

They prove valuable as well for machine-to-machine communication for information and data retrieval, recovery and exchange, for dispensing large number of executions, to the extinct ACID restrictions can be made soft, or the way is to apply them on application side not on DBMS side. In conclusion, when we are to deal with semi-structured or hybrid data these systems act as very good probing analytics, nonetheless to get to the lowermost of intellect, the researcher should be a skillful mathematician working in accordance with an expert programmer (Ghazal, 2021; Ghazal et al., 2021a, 2021b).

2.4 Classification of NoSQL Databases

NoSQL databases have been classified by Leavitt (2010) in three types: Key-value stores e.g.SimpleDB column-oriented databases—e.g. Cassandra, HBase, Big Table and document-based stores—e.g. CouchDB, MongoDB. In this segment, according to the suitability of different kinds of tasks we categorize NoSQL Databases into four basic categories,

  1. (1)

    Key-Value stores.

  2. (2)

    Document databases (or stores).

  3. (3)

    Wide-Column stores.

  4. (4)

    Graph databases.

2.4.1 Key-Value Stores

Classically, in these DBMS the data objects are stored as alpha-numeric identifiers (keys) and related values in plain, standalone tables (also known as ―hash tables‖). The values could possibly be as simple as text strings or could be more complex like lists and sets. Data searches can usually one can perform data searches only with the use of keys, not values, and they are restricted to precise matches. See Table 1.

Table 1 Key-Value store NoSQL Database

2.4.2 Document Databases

As the name indicates, document databases and the idea were derived from Lotus Notes, these mainly are designed and intended to store and manage the different kind of documents. Customary data exchange systems like JSON (Javascript Option Notation), XML, or BSON (documents are encoded by Binary JSON). Contrasting to the uncomplicated key-value stores illustrated above, in the value column of document databases structured and unstructured data is present—particularly attribute name/value pairs. Hundreds of these attributes can dwell into a single column, also from row to row the type and number of attributes recorded can differ. In document databases, the values and keys are totally searchable which is in contrast with simple key-value stores (Ghazal et al., 2013, 2021c; Kalra et al., 2020) (Fig. 6).

Fig. 6
A schematic diagram represents a relational database in a table format with 4 columns c 1,c 2,c 3,c 4, and a document data model.

(Slide Share, [Online]. Available: https://www.slideshare.net/cloudstack/vbacd-july-2012-apache-hadoopnow-and-beyond. [Accessed 12 February 2018].)

Document Type Database vs Relational Database

2.4.3 Wide-Column (or Column-Family) Stores (BigTable-Implementations)

Wide-Column (or Column-Family) stores (after this WC/CF) are just like document databases. To house multiple attributes for each key they utilize a column based distributed data structure. Whilst several WC/CF stores comprise a Key-Value DNA (for example the Cassandra Dynamo-inspired), the majority are designed like Google’s Bigtable, that is petabyte scale internal system based on distributed storage for data. This system is developed by Google for its famous search engine and additional products like Finance by Google and Google Earth. In general, the capability is not only to reproduce Google’s storage structure BigTable, but Google’s file system which is distributed in architecture (GFS) and its processing framework which is parallel MapReduce too. Similar scenario is with Hadoop, which comprises the file system called the Hadoop File System (HDFS, based on GFS) + Hbase (a Bigtable-style storage system) + MapReduce (Khan et al., 2021; Lee et al., 2021) (Fig. 7).

Fig. 7
A schematic of a wide column database for 2 super column families includes customers and orders.

(“Graph Databases: NOSQL and Neo4j,” infoQ, [Online]. Available: http://www.infoq.com/articles/graphnosql-neo4j [Accessed 26 March 2018].)

NoSQL Wide Column Database

2.4.4 Graph Databases

Relational databases have been replaced by graph databases with more organized relational graphs having key value pairing interconnected to each other. These are like object-oriented databases because of the graph is illustrated as object-oriented network of graph nodes, (objects in concept), relationship of node knows as edges and properties (the object characteristics stated as key-value pairs). The four NoSQL forms conferred here are those that are related with relations. Among other NoSQL DMS, these types are considered more human-friendly because they focus on the visual depiction of data and information (Fig. 8).

Fig. 8
A process flow diagram of a database store shares the details of the people include name, last name, age, occupation, rank, language, and version.

(“Graph Databases: NOSQL and Neo4j,” infoQ, [Online]. Available: http://www.infoq.com/articles/graphnosql-neo4j. [Accessed 26 March 2018].)

NoSQL Graph Database Store

Many big organizations that deal with big data have now adopted NoSQL. Following is the table that shows a few of these big businesses (Table 2).

Table 2 NoSQL type used by Companies

Big businesses because of their high data storage demands have converted to NoSQL and its experts are also in the favorable light now.

2.5 Comparison Between RDBMS and NoSQL

In this study as the characteristics of both RDBMS and NoSQL has been described, the comparison between RDBMS and NoSQL has been analyzed in detail. Which shows the major differences and capabilities of both systems according to customer needs (Table 3).

Table 3 Comparison between RDBMS and NoSQL (Author Created)

3 Material and Methods

In 1980’s the primary cohort of commercial systems come into sight by Teradata Corporation and in the same time, the necessity surfaced for well-defined systems to determine the ability of DBMS dealing with very big quantities of data. Motivated by vendor’s desires to weigh the commercial systems, in the starting of 90’s the Transaction Processing Performance Council designed a series of data warehouse end-to-end benchmarks. Likely systems have been developed by TPC-H and PC-R at the beginning of 2000 (the details are all accessible from the TPC website2). With some update on a venture data warehouse, these benchmarks are limited to a data size of a terabyte, highlighting single and multi-user performance of complex SQL query processing abilities. Even before this, academia had started developing micro-benchmarks like EXRT and XMark benchmarks for XML-related DBMS technologies and the OO7, the Wisconsin benchmark, and BUCKY benchmarks for object-oriented DBMSs, (Matloob et al., 2021; Naqvi et al., 2021).

With the passage of time, the volume of data kept on growing from megabytes to petabytes in size and from simple data models (a few tables with a small number of relationships) to complex ones (big tables with many complex relationships). This change in the demand for data needs has led TPC to act in response. In the dawn of 2000’s TPC-DS developed its next generation decision support benchmark. Its foundation is based on the SQL programming language, but it consists of several big data elements, like exceedingly large system sizes and data. Even bthough the existing limit is 100 terabytes; the schema and data generator can be expanded to petabytes. Quite composite analytical data queries are also contained by it using sophisticated and complicated SQL structures and a synchronized update model.

3.1 Adaptation of NoSQL

The term NoSQL was invented in 1998. Lots of people assume NoSQL is a deprecating term fashioned to jab at SQL but in actual, the term NoSQL stands for Not Only SQL. Putting forth the idea that both these technologies SQL and NoSQL can exist together in their own specific place. For the previous few years, NoSQL technology has been heard and seen in the news most likely because of the reason that as many of the Web 2.0 leaders have taken the NoSQL technology. Facebook, Twitter, Digg, Amazon, LinkedIn and Google all these companies use NoSQL in one way or another.

The main factors behind the adaptation of NoSQL includes flexibility of data, no rigid schema and scalability.

3.2 Questionnaire Development

As discussed earlier technology is changing extensively, and data is becoming more crucial to any organization. The accessing and manipulation of data is much more necessary than saving it to storage. Accessing and storing of data to storage obviously is time consuming. The RDBMS’s due to their architecture must consider the data types, relations and other hidden processes involved in execution and storing the data to disk and the same when accessing the data from storage. This adds up the time which is very crucial for the applications to respond. The massive data growth requires additional storage on the fly. This is difficult to manage in RDBMS environment (Rehman et al., 2021; Suleman et al., 2021).

On the other hand, accessing and storing of heterogenous type of data in NoSQL environment is very fast. The flexibility of multitype data such as text, images, videos and documents are managed very efficiently. The schema free and no predefined architecture gives a strong advantage over RDBMS. Dealing with enormous growth of data is very easy in NoSQL environment as it gives flexible scalable architecture by adding more and more additional servers called shards in running environment.

Qualitative Study has been done on previous researches for the same topic. This has helped enormously to get the basic idea about the database vendors and organizations requirements and improvements which have been done to accomplish the day-to-day challenges. This also has shed light on the technology enhancements in this specific field within last 2 decades.

3.3 Data Collection

A questionnaire survey has been conducted as part of quantitative method. To get the current situation in dealing with bigdata and analytics a questionnaire has been developed and spread in IT field. The targeted people were IT company CEO’s, CTO,’s, Database Administrators which are dealing with day to day management challenges and the network professionals which are designing networks to deal with bigdata. The results are then analyzed by the respondents and have helped in getting the result and findings for this study. As the restriction of direct access to all the respondents the study focused on various methods to collect data which include professional groups, emails, printed copies to professionals in contact. The results are then analyzed and discussed in Chap. 4.

4 Results and Findings

As the world already has adopted the NoSQL Database, to analyze the levels and reasons of moving the software developing companies towards it a survey has been conducted. The survey has been conducted to analyze and judge the necessity of NoSQL with respect to business needs, Data management problems, flexibility or low-cost DBMS adaptation.

The survey has been conducted among database administrators, IT company CEO’s, technology decision makers, software developers and IT experts. The survey has been conducted considering those people doing business in RDBMS and Big Data.

The main points of the survey are given below.

  • About half of the more than 250 respondents pointed to the fact that they have worked on NoSQL projects in past couple of years. The companies which have secured large software projects more than 50% of their projects are to deal with Big Data for their clients. The reasons of adaption of NoSQL is described by them as:

  • 49% referred to inflexible schemas to be the major reason for their migration to NoSQL technology from the relational database system. A prime reason for switching to NoSQL is the deficiency of scalability and high latency/low performance when dealing with Big Data.

  • Overall, 40% were of the viewpoint that NoSQL is very essential and significant to their daily operations, and it is continuing to become more important.

  • The management of Big Data in NoSQL is much easier than in RDBMS due to the pre-defined architecture and limitation. As the capturing of multi structural data is greeted in NoSQL. The type of data is high rank in considering the NoSQL database to be used (Figs. 9, 10).

    Fig. 9
    A horizontal bar graph represents the problems to drive SQL with value as follows: lack of flexibility, 49; data incapability, 25; high latency, 17; costs, 8; other,

    (Created by authors)

    Problems to driving towards NoSQL

    Fig. 10
    A vertical bar graph represents the factors with value as follows: security, 10; cost, 5; resilience, 1; distributed architecture, 3; simplicity to edit and maintain,14; big data, 38; unstructured data, 29.

    (Created by authors)

    Factor to decide NoSQL

As per the above survey results explains according to software professionals the major factors which take part in deciding about NoSQL and RDBMS are BigData, unstructured data and the management of the same. This means when the experts have to deal with massive amount of unstructured data, they are more likely to adopt NoSQL (Fig. 11).

Fig. 11
A pie chart of performance advantage for Big Data in percent reads as follows: strongly agree, 40; agree, 34; neutral, 19; disagree, 5; strongly disagree, 2.

NoSQL Performance with BigData (Author Created)

  • Another question regarding the performance of NoSQL using BigData shows the experts agreed on having better performance than RDBMS.

The architecture of DBMS and data has the main effect on performance. Performance is the major requirement of all bigdata based companies. At the same time fast data store and access is the major challenge for DBMS vendors and they keep doing research on this to get the customer’s confidence. In the survey 40% of the respondents strongly agree that NoSQL perform well when dealing with BigData along with 34% agree for the same.

Those organizations that show considerably high needs of storing data are considering NoSQL seriously and the demand of NoSQL database experts have also risen to a higher level in these developing organizations.

Overall, the results of the survey can be concluded as more than 65% of the respondent are agree that when it comes to deal with BigData they have used, or their first choice is to go with NoSQL. The reasons that have emerged from the survey answers are particularly highlighted as pre-defined schema in RDBMS. This has very strong characteristic and work very well in small to large databases which deals with structured and well-organized data. This kind of architecture is very popular and successful in OLTP environments like Banks, online retail shops and university library systems. The ERP systems also have pre-defined database architecture and are doing well with RDBMS. However, when it comes to very large to huge databases with the data volume from petabyte and greater, unstructured type of data including text, files, images and videos and all incoming very massively, it cannot be handled by RDBMS with high performance. The massive increase in data also require scalability at hardware level and the DBMS system should accept it on the fly. This has been the major characteristic of NoSQL DBMS that it dynamic schema and accept any kind of data. As the previous studies has shown social media is growing very widely in terms of data and dealing with all kinds of data, all the social media systems are using NoSQL in one way or another. So, the other top factors highlighted as scalability, performance when dealing with BigData, simple in maintaining the system.

5 Conclusion

Storage and processing requirements of some applications like Analytics for Big Data, Business Intelligence and social networking which is growing rapidly over peta-byte datasets have forced RDBMS to their limits. This has directed to the development of technology that is horizontally scalable, dispersed non-relational database named as No-SQL. The study speculate about the primarily usages of NoSQL Databeses: The larger scale data processing system (parallel processing over distributed systems), (machine-to-machine data look-up & recovery); Analytics on semi-structured data (professional level); Huge capacity data storage (structured, semi-structured, unstructured) NoSQL is a huge and growing field, for the purposes of this study, characteristics (benefits and features of NoSQL DBMS); classification (the four categories with their features); and the comparison and assessment (with a table on basis of few characteristics- strategy, integrity, attributes, distribution) of different kinds of NoSQL databases. The study has also shown the difference between DBMS and NoSQL with present state and reason of acceptance of NoSQL databases. This study with motivation has provided an autonomous understating about the weakness and strengths of NoSQL databases which are supporting the applications that are dealing with large volume of data. The study has concluded the applications dealing with BigData performs well in NoSQL environment. Still the requirement varies from solution to solution. NoSQL will be emerging as the solution for BigData analytics in future.

5.1 Future Work

As the technology has been changing rapidly, the business is more relying on analytics. The analytics is based on data and the data is growing massively. The fast processing of this data is the major need of every organization. This is forcing technology leaders to put more efforts in making the DBMS more efficient in dealing with BigData. NoSQL has fulfilled the requirement to some level. However, there are still challenges that need more efforts from DBMS vendors to get confidence of customers. The main challenges are like data security and more compatible environments to most available development systems. Hope this study will help the users to understand the different DBMS architectures, BigData and its requirements and how to deal with this by understanding the nature. And will help users to guide to make decision while choosing DBMS according to their need and DBMS capabilities.