Keywords

1 Introduction

The database technologies have been evolving at a very fast pace. Database systems are moving from SQL to NoSQL systems to handle datasets of diverse categories. Most NoSQL systems have been developed with the general goal of offering simple operations on flexible data structures. Indeed, they are focused to provide massive throughput and high scalability [1]. NoSQL databases support large data volumes, simple and easy copy, eventual consistency and simple API. NoSQL databases are becoming the core phenomena for big data applications. NoSQL databases are broadly classified into three categories: document stores databases, key-value stores databases and column-oriented databases [2]. Today most used document-oriented databases are MongoDB and CouchDB. This paper explores CouchDB documents, databases and views in detail and compares with traditional RDBMS basic architecture. Document-oriented databases provide the alternative of a ‘row’ of traditional RDBMS in the form of a more flexible ‘document’. They permit access to data values of database via its content. Documents are hierarchical structures that can be in BSON, JSON, or other formats like XML and therefore consist of more composite elements such as collections, maps or scalar values [3]. There is no need to define schema beforehand means there is a flexibility of defining the size/type of key/value fields of documents on the runtime. This makes it easier to perform operations like insertion and deletion of fields whenever required. Each document is collection of set of fields and is associated with a unique identifier, for indexing and querying purpose.

2 CouchDB Experimental Setup

Developed and maintained by the Apache Software Foundation, CouchDB is an admired and easy to use choice among NoSQL Document-Oriented databases. CouchDB is developed in Erlang language. CouchDB’s RESTful APIs make it special because they allow database access through HTTP requests [4]. The software required for the CouchDB setup is:

  • CouchDB Version 1.6.1

  • Web-Based Interface Futon

  • Command Line Utility Curl.

CouchDB makes use of APIs by using the command-line utility curl. A very interesting fact is that users are provided insights of the database and great control on HTTP requests by curl. Suppose CouchDB is running and the following request is sent to database via curl

  • curl http://127.0.0.1:5984 [ 5 ]

The above curl command will seek an http GET from CouchDB and will return the detailed JSON object as follows:

  • {″couchdb″:″Welcome″,″uuid″:″6c200d933f15fda6543e960e874796a8″,″version″:″1.6.1″,″vendor″:{″version″:″1.6.1″,″name″:″The Apache Software Foundation″}}

CouchDB is giving a welcome message with the running version number. Another way to interact with CouchDB is using Futon. It provides default web interface for administration of CouchDB. A full access to features of CouchDB is provided more conveniently by Futon. It provides us the facility to create and remove databases; create, delete, view and modify documents; replicate databases; and compose and run MapReduce views. To load Futon the following link is used in the browser application.

  • http://127.0.0.1:5984/_utils/ [ 5 ].

The remaining sections explain the other distinctive features of CouchDB experimentally under above set up.

3 CouchDB Features

3.1 Data Model

CouchDB is composed of databases. Databases are a further collection of variable- schema documents. As opposed to this, RDBMSs are a collection of fixed schema relations which are composed of rows. CouchDB stores JSON (Java Script Object Notation) documents in a binary format. The documents stored fields in the form of key-value pairs. Every document is identified by a unique field “_id”. This “_id” may be automatically generated by CouchDB or manually provided by the user. There is no maximum limit for key-value pairs and size of documents. The file extension of database files is .couch. The data model is shown in Fig. 1 [4].

Fig. 1
figure 1

CouchDB data model

3.2 RESTful API

One of the very outstanding features of CouchDB is its RESTful API. REST architecture explains how to provide services for interaction among machines. REST facilitates the use of HTTP requests for carrying out basic Create, Read, Update and Delete operations (CRUD) on documents [4]. Following HTTP methods are used for performing basic CRUD operations-

  • PUT—Create databases/documents/attachments

  • POST—Update databases/documents/attachments

  • GET—Read databases/documents/attachments

  • DELETE—Delete databases/documents/attachments

RESTful architecture gives unique identifiers to databases/documents/attachments etc. in the form of URIs. Suppose a database named “university” is being created on a local CouchDB setup. Using the command-line utility curl following statement will do the needful:

  • curl -X PUT http://127.0.0.1:5984/university

The success of the above statement in curl will be replied back by CouchDB in the form of a JSON document, e.g. the creation of database named “university” is notified by:

  • {″ok″: true}

All other basic operations on documents (CRUD) can be performed with similar curl statements.

3.3 Documents

Documents are the fundamental data structure of CouchDB. It is just like a physical document—such as an invoice, an address card, or an invitation card [6]. Documents use the dynamic schema as opposed to relations of RDBMS which use fixed schema. The document contains unique _id and _rev field showing id and revision number of document. Documents are stored by CouchDB using the JSON format in the form of key-value pair. The example below shows one such document of PhD student of the “university” database. “_id” field uniquely identifies the document in the “university” database and “_rev” field shows the version number of the document. Every time a document is updated a new version of the document is created and older versions also remain on the disk. The example below shows the 7th version of the document shown.

{        ″_id″: ″ae288eecf814e1e177e94263870031e7″,        ″_rev″: ″7-de540398f44262de6a80b0d67d72df9d″,        ″name″: ″Reet″,        ″age″: 25,        ″doj″: ″15/06/2015″,        ″branch″: ″Electronics″,        ″topic″: ″Wireless Communication″,        ″supervisor″: ″Deepak″,        ″course″: ″Phd″,        ″Marks″: {                 ″Digital″: 88,                 ″MicroProcessor″: 79,                 ″Analog″: 62        } }

The example database “university” contains some other documents containing details of UG, PG and PhD students having different schemas as well as a design document “phd_course” implementing permanent views. A design document is identified by word “_design”. Figure 2 shows the snapshot of documents of “university” database from Futon.

Fig. 2
figure 2

Preview of documents of “university” database

3.4 Revisions

CouchDB supports well-built versioning system. It means that when some changes are carried out by user in a document, the next version is created by the CouchDB and it does not overwrite the existing document. CouchDB keeps all versions so that user can later on track earlier amendments. This feature makes it highly available database because this feature is inherent to CouchDB as opposed to RDBMS which does not offer this feature so conveniently [4].

3.5 Scaling and Replication

Replicating databases in CouchDB allows easy replication of databases. The databases can be kept in synchronization very easily by the following HTTP request for database replication:

  • POST/replicateHTTP/1.1{″source″:″old_database″,″target″:″http://www.new/database.com/db ″}

Replication can be performed from the Futon even more easily. Scaling out means splitting the databases across multiple servers of a cluster. CouchDB does not support scaling inherently; instead, CouchDB Lounge9 application serves the purpose of scaling [4].

3.6 Attachments

Like e-mails support attachments, CouchDB also supports binary attachments to documents. Any kind of file can be attached to a document [4]. The example below shows the command to attach an image file to a document:

  • http://www.localhost:5984/cars/6e4485ed6c37255e54dg12357f18c8af/car_image.jpg

3.7 Querying

RDBMS normally use fixed schema means static data and dynamic queries [7]. On the other hand, CouchDB does the opposite. It uses the dynamic schema in the form of JSON documents. Views are used to query data in CouchDB. Views created are of two kinds: temporary and permanent Views. The outcome of MapReduce operations is demonstrated using Views [5]. Map operation carries out parallelism in CouchDB which is its inherent feature as opposed to RDBMS which carries out operations on data in serial order [7]. The user writes statements for Map operations. Each document is traversed by Map function parallel such that the documents with matching criteria only are emitted using emit () function in the form of the key-value pair [4]. Section 3 explains this concept in more detail.

3.8 Indexing

Indexing is required for faster retrieval and better performance of a database. CouchDB inherently provides auto index updating feature. Map functions traverse through all the documents when running for the first time on a data set [7]. The results are then used by CouchDB to make a B-Tree. This is the most suitable and performance oriented data structure for operations like retrieval, insertion, and deletion. On each modification, deletion of a document; CouchDB updates index B-tree automatically [4].

4 CouchDB Views

Views are an old concept which is being used in traditional RDBMSs for long times. There a view behaves like a window based on the original SQL table. It retrieves rows and columns on the basis of criteria and shows the results out of a real table. In CouchDB view are used to query the database and demonstrate the results of MapReduce functions. These functions offer enormous flexibility, being capable of fine tuning to alterations in the structure of document and indexes being automatically and parallel computed. Each document is traversed and passed as an argument to the Map functions. Documents with matching criteria are emitted as key/value pairs. Because of this flexibility, Views in CouchDB are created parallel and incrementally. Some functions carried out by Views are:

  • Filter the documents in the database

  • Retrieve data from the documents and show it in a particular order

  • Build indexes to find documents by any structure that resides in documents

  • Carry out all sorts of calculations on the data in the documents.

CouchDB uses two kinds of views for querying the database i.e. temporary view or adhoc view and permanent view or static view.

4.1 Temporary Views

Temporary views are also called adhoc views because their MapReduce functions are changed from Futon in the code section. In “view” drop down box select “Temporary View” to see this view. Emit () function in map function code is used to emit key-value pair. Emit function always takes two arguments i.e. key and value. A default temporary view emits null in key value and document structure in value field. Value field of default view show details of all items of document including _id and _rev field. Figure 3 shows the values of default temporary view of our example database, i.e. “university” database. The value field shows the attributes of all documents. It is clear that various documents have different schema, i.e. different key-value attribute pair. For example, Fig. 3 shows one document having field (name, doj, course, branch, age, marks) and another one having fields (name, doj, course, age, branch, topic, supervisor). These two documents have different schema and have been easily incorporated in same database without need of any kind of joins. This is how CouchDB is different from RDBMS as schema of documents may change on need.

Fig. 3
figure 3

Default temporary view of “university” database

We can change the code of map function to get results according to user query. Suppose from our example database, we want the names of students having age less than 22, and then code of map function would look like the one shown below.

     function(doc) {              if(doc.age&&doc.age   <    22 && doc.name)                        emit(doc.name , doc.age);  }

After we run this code, the result would look like as shown in Fig. 4.

Fig. 4
figure 4

Temporary view of name-age pair of “university” database

Permanent Views

Temporary views are not appropriate for use in production, as they are actually slow for any database with more than a few dozen documents. They can be used to experiment with view functions, but we must switch to a permanent view if it has to be used in an application. Permanent views are also called static views. Permanent views are created with the help of special type of documents called design document. Design documents specify these views in the map function. A single design document can have multiple views. We have created a json file named “design2.json” which contains id of the design document “phd_course” and the code of map functions of two view namely “supervisor_detail” and “marks_detail”. The file is shown below in Fig. 5.

Fig. 5
figure 5

Code of design document “phd_course” in “design2.json”

The below curl command shows how we created our design document “phd_course” with the help of our file “design2.json”. curl-X PUT http://127.0.0.1:5984/university/_design/phd_course-d@desktop/design2.json

Curl replies with the following JSON document means our design document “phd_course” has been successfully created.

{″ok″:true, ″id″:″_design/phd_course″, ″rev″:″ 1-ddde17668d269f5cfeb2aa3259e26b1 ″}

Futon shows our design document containing two views as shown in Fig. 6.

Fig. 6
figure 6

Design document “phd_course”

Now to see the result of our view, we select view name “marks_detail” from views drop down box. This view shows the marks of every student in each subject individually and rows sorted by the name of the students as shown in Fig. 7.

Fig. 7
figure 7

Result of permanentview “marks_detail”

CouchDB has given excellent powers to the user in the form of MapReduce functions to design views according to user queries. CouchDB, definitely has some unique features like this incremental MapReduce and indexing which makes it a better fit than traditional RDBMSs for dealing with large datasets.

5 Conclusion and Future Scope

CouchDB features such as dynamic schema, versioning system, and MapReduce view queries have been explained and compared in-line with traditional RDBMS in this paper. Command line utility Curl and web based utility Futon of CouchDB have been used to implement MapReduce operations of permanent views. Focus on present work is majorly on describing the architecture of CouchDB, key features and implementation of views using futon/curl and inline architectural comparison with RDBMS. The future scope includes using the parallelism of MapReduce on multiple servers and on large and diverse datasets with comparison statistics, graphs, and reports with RDBMS.