Keywords

1 Introduction

The blockchain is a distributed data storage ledger with certain key features that rely heavily on cryptography. This structure, which is replicated over all nodes in a network (dependent on the type of blockchain), is fundamentally a cryptographically linked chain of blocks, similar to a linked list data structure. Each block, along with data, consists of a hash of the previous block in the chain, until the genesis block (the first block) whose hash field is 0.

This property ensures a key aspect of blockchain, immutability, in the following way: Assume the data in block n is being tampered with. This change would need to be followed by a recalculation of that block’s hash, which is present in block number \(n+1\). This would lead to re-computation of the hash of block \(n+1\) and the following blocks until the latest block. This implies immutability because the creation of a new block in the network now becomes difficult (dependent on the blockchain protocol), leading to honest nodes which do not tamper with the data.

Blockchain also ensures privacy to an extent as the user identities are just cryptographic keys and not their information or credentials. Hence, the user’s personal details are not compromised in case of a breach. The blockchain is decentralized and so no single node manages the network and the blockchain itself. This fact eliminates any single point of failure issues that centralized systems such as most of the current Internet technologies face.

It is to be noted that a blockchain is not complete without the network architecture that buttresses it, just as in any distributed system.

These properties of decentralization, immutability, and privacy make it an attractive architecture to be used in various use cases.

Moreover, as a consequence of the distributed nature of the blockchain architecture, several issues that pertain to standard distributed systems such as consensus and consistency among others also apply here. Moreover, because there is no one perfect solution to the issues in distributed systems, there is always scope for improvement concerning technologies surrounding blockchain.

This paper is a systematic mapping study of the work done in the area of blockchain. Several attempts [4, 66] have been made to prepare such studies earlier, but they are now outdated owing to the rapid progress in the research of blockchain systems. Furthermore, they have included only the papers that are openly accessible. Mapping the published literature, we highlight the areas in blockchain that are actively being researched, and we also highlight the current and potential use cases of blockchain.

Further, Sect. 2 describes the research methodology that we have used in conducting our mapping study. We define the research questions that we attempt to answer and the motivation behind the same. In Sect. 3, we elaborate on basic publication-related information such as the year of publication and types of publications obtained after our search process. Section 4 answers the defined research questions. Finally, we present concluding remarks in Sect. 5.

2 Research Methodology

We have followed the standard procedure for a systematic mapping study, with minor changes as applicable, as defined in [46]. We have documented our entire search procedure along with the results online [51].

Fig. 1
figure 1

Research methodology [46]

2.1 Research Questions

The first step is to identify the research questions that are to be answered with this systematic mapping study. The questions have been identified and elaborated as follows:

RQ1: How have publication amount, frequency, and research topics changed over time?

This question seeks to answer how the trends in blockchain research have changed over time, from its inception with the Bitcoin. We seek to identify areas of research in blockchain that are growing and the areas that are gradually being ignored by answering this research question (Fig. 1).

RQ2: What are the use cases of blockchain technology?

Blockchain is seeing applications in a wide variety of use cases beyond cryptocurrencies, for which it was designed initially, especially with the shift toward the decentralized Web. The answer to this question will outline the domains in which blockchain is being used, providing solutions to the problems in those areas.

RQ3: What are the areas of current research in blockchain?

While RQ2 focuses on the domains where blockchain is used, RQ3 aims to elaborate on enhancements and optimizations that are being made to blockchain architectures themselves. Blockchain architectures in the current scenario are not perfect and have many drawbacks in terms of scalability and transaction processing speed, among others, and this question addresses the work done toward improvement in these aspects.

RQ4: How is research on blockchain distributed geographically?

We aim to provide an estimate of the number of papers published in different countries. The answer to this research question enables one to study where in the world blockchain research is being carried out.

RQ5: What is the future research direction for blockchain?

This question aims to draw conclusions from the above questions and predicts where the research in blockchain is heading toward and the areas researchers are most likely to pursue. We also intend to point out some areas of research that were lacking attention at the onset of blockchain but are now significant areas of research.

2.2 Selection of Paper Sources

Our aim was to select popular sources that would contain the most significant number of publications. IEEE Explore, ACM Digital Journal, SpringerOpen, and ScienceDirect were considered owing to having an extensive collection of papers in the blockchain domain. Springer, although as popular as the above databases, was not considered because searching for the keyword ‘blockchain’ resulted in only book chapters and irrelevant papers.

2.3 Conducting the Search

The second stage of the systematic mapping study is to form the search strings that are used for the search of the papers and to conduct the search. The search strings that we used to obtain results from different sources can be found at [51].

In our database search process, only those publications having the keyword ‘blockchain’ in their title or keywords section were selected. This eliminates the consideration for papers that do not have the word in either field but refer to the same in the content of the paper.

2.4 Screening of Relevant Papers

We then applied filters on the paper databases to extract only journal and conference papers. We selected only those sources that are peer-reviewed. Following the database filtering, we screened the papers first based on their titles and then abstracts and further excluded/included papers based on a set of inclusion and exclusion criteria:

Exclusion Criteria:

  • Review/summary/secondary papers—these papers do not pertain to the scope of a systematic mapping study and hence were removed

  • Book chapters/keynotes/case studies/work-in-progress/news articles papers—for the same reason as above

  • Duplicate papers

  • Papers where blockchain is not the main area of focus

  • Papers that focus on an economic/financial point of view

  • Papers that address issues pertaining to a specific country and do not focus on generic issues

  • Papers not written in the English language

  • Commentaries/news.

Inclusion Criteria:

  • Papers that introduce novelty in blockchain

  • Papers that explained the use of blockchain in other domains

  • Papers that enhance blockchain.

Due to the large number of papers published on blockchain, we had to stick to rigorous filtering criteria to reduce the number of papers for efficient classification. There is a possibility that we may have missed out some relevant papers.

Table  1 shows the number of research papers that we considered initially and the number of papers left at the end of application of the corresponding filtering criteria.

Table 1 Keyword search results

2.5 Data Extraction and Mapping

After filtering relevant and vital papers, we have identified categories that each of the papers belongs to, by reading their abstracts. The list of these classifications along with the papers that belong to these classifications is listed in [51]. There are a total of 123 categories that we have identified. We have classified the chosen 604 primary papers under these 123 categories. Several papers were found to belong to multiple categories. Compared to [66], which has classified a total of 41 primary papers into 14 classifications, our mapping study is done on a much larger scale owing to the rapid increase in research on blockchain technologies.

3 Publication Statistics

Papers obtained after all filtering criteria were applied and analyzed to give the following inferences.

3.1 Search and Selection Results

The search string that was formed based on the relevant keywords, and exclusion criterion was used for searching on the various publications sites. The results were obtained with details like title, abstract, keywords, authors, citations, and other vital details. We began the work on title screening. We excluded several papers marked as Demos and Tutorials. We also found several papers with single pages, which do not have any significant contributions or citations and hence removed them. We also went through the abstracts of each paper that cleared the title screening phase and identified review papers, secondary papers like literature surveys, and papers that do not focus on blockchain. We removed such papers from consideration. The first phase of the classification of papers was also done along with the abstract screening phase. After title screening, abstract screening, and duplicate removal phases, we had a total of 604 primary papers that we could consider for the purpose of this systematic mapping study.

3.2 Publication Year

Blockchain research started gaining its popularity in the year 2015 (according to publication searches). Between 2015 and 2018, the number of papers in the domain of blockchain has risen significantly. One of the attributes of this rise is due to the advent of newer blockchain technologies that are more capable than Bitcoin [42] in factors like transaction speed, storage, and the ability to execute code.

Fig. 2
figure 2

Year of publication

Fig. 3
figure 3

Papers classified according to publication type

With the increase in such factors, the scope of blockchain has broadened and today blockchain is being used in several areas like health care, supply chain, edge computing, and education apart from just finance. One notable example would be the Ethereum blockchain [63]. Ethereum was proposed in 2014 and brought with it the ability to execute user-defined Turing complete code, called smart contracts. With this, Ethereum swept its way into several Internet of Things (IoT)-based applications. Smart contracts were then utilized in several other domains like health care, energy market, and vehicular networks.

Figure 2 gives a count of the publications with respect to the year of publication after applying all the filtering criteria.

3.3 Publication Type

These numbers were obtained after the application of all the filtering criteria. Figure 3 shows the count of papers obtained from the different publication types. Papers presented in workshops and symposiums have been included under conference.

4 Discussion

Based on the results of our search, we answer the research questions posed above.

4.1 RQ1: How Have Publication Amount, Frequency, and Research Topics Changed over Time?

Figure 2 shows the rapid growth in number of publications related to blockchain technology over time. We analyzed the relative percentages of research areas in each year, i.e., how much of the total research in a particular year is conducted in different research fields. In the earlier mapping study on blockchain technologies [66], most of the research was focused on the enhancement of Bitcoin. Around 80.5% of all the papers studied in [66] have focused on Bitcoin. However, the research scenario is largely different today, and the applications of blockchain have diversified.

Fig. 4
figure 4

Distribution of research areas in blockchain

Figure 4 shows a pie chart of relative percentages of publications classified under each of the research areas. The complete data and list of all the categories or research areas are published in [51]. Compared to [66], this represents a wide range of research possibilities. We also analyzed how blockchain research areas have evolved over time.

Fig. 5
figure 5

Distribution of research areas in blockchain—2016

Fig. 6
figure 6

Distribution of research areas in blockchain—2017

Fig. 7
figure 7

Distribution of research areas in blockchain—2018

Figures 5, 6, and 7 represent the evolution of research areas over time. Many inferences about the evolution of research in blockchain can be drawn based on the data presented in these charts. Bitcoin was a major area of research spanning over 80.5% of all publications during the onset of blockchain research [66]. Bitcoin occupied 10.6% of all research in blockchain in 2016 owing to a boom in the potential applications of blockchain. This percentage reduced to 2.6% in 2017 and is less than 1% in 2018. This not only shows a shift of attention away from Bitcoin-oriented research but also points to the increased efforts put into generalizing the applications of blockchain. Another interesting result is the evolution of application of blockchain in the IoT. Reference [9] summarized different methods of adopting blockchain in the IoT domain, and the number of publications has gone up ever since. The application of blockchain in IoT occupied 8.5% of the total blockchain research in 2016. In 2018, research in blockchain and IoT increased to 11.3% with the increase in the total number of papers published. This shows that blockchain and IoT combination are one area which researchers are actively looking at. The number of research areas has also risen significantly since 2016 as shown by the number of segments in the pie charts (Fig. 4). Many new areas like energy trading on smart grid occupy a significant portion of research today.

4.2 RQ2: What Are the Use Cases of Blockchain Technology?

From various areas where blockchain is being used today, IoT stands above them all (11.6% of the papers) followed by storage solutions (7.1%) and energy trading (5.9%). This is followed by other domains like health care (6.6%) and smart grids (5.7%). Blockchain aids applications in these domains by enhancing fundamental features which include authenticity, security, data integrity, data immutability, data privacy, data provenance, and data ownership among many others.

We have chosen some papers in certain important domains, and we provide a gist of the research conducted in each of those domains. All the papers mentioned in this section are referred to by their IDs in [51].

There has been a significant amount of research happening in the fields of IoT and blockchain, paving the way to many possibilities. In our study, we found 82 papers that directly deal with a combination of blockchain and IoT. [P160] [44] shows how blockchain can be used for access management in IoT scenarios. While current centralized methods do exist, scalability of such methods is limited. Blockchain with its distributed access control system for IoT seems to provide a new and better alternative. [P219] [22] uses blockchain as a method to build trust in consumers to trade their smart devices’ data for incentives. They describe how the IoT device will be sold to a user by proving the devices integrity without a third party and provide a secure method of sharing the users data. However, blockchain remains computationally expensive, and one such paper [P561] [12] tackles the problem by introducing an optimized blockchain. The main idea is to create an overlay network of high resource devices to handle the blockchains operations while still providing end-to-end security and privacy to the low resource IoT devices.

Smart grid is vital in today’s world to increase distribution of locally produced energy, mostly renewable energy. However, a centralized architecture often poses issues of reliability and privacy or anonymity. In our study, we found a total of 43 papers directly dealing with energy trading through blockchain transactions. A blockchain-based integration into the energy trading society has been proposed successfully by [P505] [3]. [P196] [41] offers a blockchain and smart contracts solution to tackle the distribution of energy from multiple sources and allows to handle payments securely.

Several blockchain-based solutions have been proposed to tackle problems related to data storage on the cloud. [P461] [33] introduces ProvChain which is an architecture to embed provenance data into blockchain transactions. Most of the storage solutions use blockchain architecture as middleware between users of the data and the data itself to enhance several features of the system including privacy, security, data provenance, auditing capabilities, anonymity, and so on. Reference [69] was one of the early papers to give details on the way a blockchain layer can provide enhanced privacy in storage systems. In our study, we have collected 51 papers that utilize blockchain as a storage solution.

Edge computing is used to offload computation required for mining onto edge devices, from mobiles, enabling mobile systems to participate in the blockchain network [P376] [35]. There are also pricing schemes designed for edge service providers (ESPs) [P429] [64]. Edge nodes also make use of distributed control systems, which in turn have function blocks as their main component. Smart contracts are used to implement these function blocks [P141] [57]. Blockchain is also used for trusted data sharing between edge nodes. Ideas are also proposed to reduce work done by mining to replace proof of work by proof of collaboration, catering to the limited computational and storage resources of edge devices.

Blockchains can also replace a traditional CA, as proposed by [P22] [65]. Additional x509 certificate extensions are proposed which facilitate smart contracts to handle the tasks of a CA such as issuing, storing, validating, and revoking certificates. A particular application of blockchain in verification of certificates for SSL/TLS-secured communication is proposed in [P212] [8]. A distributed PKI is proposed in [P211] [49] which supports a distributed certificate library, where the miners in the blockchain environment act as CAs, ensuring the correctness of certificates. Smart contacts have also been used to implement a dynamic trust protocol in PKIs [P592] [2]. [P338] [59] proposes a novel approach of creating a cloud-based PKI using blockchain, where certificate issuing is done on the cloud, and the blockchain is used to record the issued certificates.

We came across several unique use cases of blockchain during our studies. For example, [P286] [50] deals with mixed reality applications. [P276] [62] uses blockchain in a transaction processing system. [P201] [18] uses blockchain in a video surveillance system. [P266] [43], [P360] [26], and [P93] [55] use blockchain to collect and analyze data related to pollution. [P477] [37] and [P486] [25] use blockchain as a reviews framework. [P155] [52] uses blockchain to record work history of employees and aids in corporate management. [P138] [48], [P388] [23], [P467] [67], [P77] [34], and [P95] [24] deal with electric vehicles and charging stations. [P413] [20], [P512] [14], and [P521] [53] use blockchain as a framework to enhance Information Technology Operations (Ops). A list of all the categories and the corresponding papers for each category can be found at [51].

4.3 RQ3: What Are the Areas of Current Research in Blockchain Technology?

While blockchain is used in many domains as seen above, there has been intense research going on to enhance blockchain technology itself. In our study, we identified several papers that deal exclusively with generic blockchain. These papers do not narrow down on a specific use case or a domain but focus on features such as blockchain’s storage scalability, transaction scalability, consensus protocols, formal analysis of blockchain, and so on. Figure 8 shows relative percentages of papers according to their primary focus.

Fig. 8
figure 8

Distribution of research on generic blockchain

We found a total of 26 papers dealing with consensus algorithms. In a decentralized system, we need algorithms to reach consensus among the participating nodes. Bitcoin, one of the first useful implementation of blockchain, uses the proof of work consensus algorithm. However, this algorithm is computationally expensive and faces several security threats. Efforts have been made to come up with new consensus algorithms like proof of luck, proof of stake, proof of trust, and even improvements on existing consensus algorithms. [P453] [40] describes the proof of luck method of consensus in trusted execution environments. The idea is to use a random number generator to pick a consensus leader, offering equitably distributed mining with lower latency and energy consumption. [P52] [54] attempts to solve the 51% majority attack on the Bitcoin network by proposing a modified proof of work consensus algorithm.

[P252] [61], [P429] [64], and [P610] [68] model blockchain from a game theoretic perspective to prove the robustness and security features of some blockchain-based solutions. [P252] [61] models their content caching system based on blockchain as a Chinese restaurant game and analyzes the Nash equilibrium of the game. [P429] [64] models the edge computing service provider as a Stackelberg game. There have not been many studies conducted on rigorous mathematical proofs to prove security and safety properties that blockchain-based solutions claim to offer. [P335] [15] and [P336] [1] focus on formal verification of these properties. [P416] [10] and [P516] [45] discuss alternate solutions for programming languages that can be used on blockchain. Currently, Ethereum uses solidity as the primary programming language to write smart contracts [11]. [P416] [10] proposes a language named obsidian which the authors claim to be safer than solidity. [P516] [45] proposes simplicity, a typed, combinator-based, functional language without loops and recursion to be used in blockchain-based applications.

Ethereum is one of the most widely used platforms not just by decentralized application developers but also researchers to test their ideas. [P313] [5] proposes a query language specific to the Ethereum blockchain based on SQL. This was proposed to extract transaction and block details in the Ethereum blockchain and filter them based on transaction details within the block. Ethereum-specific research is also done in [P394] [21] where a tool is proposed to analyze Ethereum smart contracts for out-of-gas vulnerabilities wherein a smart contract’s balance is locked permanently if it terminates abruptly when it runs out of gas, and this abrupt abortion is not handled properly.

In addition to this, several papers are dealing with issues of storage scalability, computational requirements, and faster and scalable consensus algorithms.

4.4 RQ4: How Is Research on Blockchain Distributed Geographically?

Figure  9 shows the comparison among the different countries where blockchain-based research papers have originated. The research is being significantly carried out forward by universities and industries in China (23.9%) and USA (13.6%) while all other countries have a contribution of less than 5%.

Fig. 9
figure 9

Research papers classified according to the country

China seems to be the current hot spot for blockchain research. Blockchain research in China is encouraged by several dominant institutions including the Communist Party, Central Bank, Supreme People’s Court, and the Bank of China [16]. Several strategic reasons for significant blockchain research in China have been described in [16].

4.5 RQ5 : What Are the Possible Directions for Future Blockchain Research?

The previous research question focused on the current research areas in blockchain. We identify some possible future research areas related to blockchain by identifying the most popular fields that researchers are working on by conducting a search for literature in very recent years (2017 and 2018). This section reviews these research areas by providing a comprehensive overview of the identified fields.

Scalability of blockchains has been a burgeoning area of research interest. In the initial stages of blockchain that was popularized by Bitcoin [42], blockchain consensus algorithms like proof of work focused on scalability with respect to the number of nodes. However, as the number of transactions on the Bitcoin increased, there began considerable work on increasing the throughput of the Bitcoin network. Some early work includes Bitcoin-NG [17] which uses proof of work to elect a leader and allows it to add micro-transactions in the inter mining period. The GHOST rule [56] proposed a new conflict resolution method in proof of work mining that makes it safer to increase the block mining frequency, thereby increasing scalability in terms of the number of transactions. Replacing a linear chain of blocks with directed acyclic graphs (DAG), another method was introduced by [30] to include all mined blocks in the log if they are not conflicting.

Bitcoin lightning network [47] proposes the creation of micropayment channels between two concerned parties to increase scalability by deferring broadcasting transactions to the rest of the blockchain network. Sharding of blockchain has also been proposed as a solution for the scalability problem. Sharding involves different nodes handling different subsets of the blockchain. ELASTICO [36] proposed a sharding algorithm that works in the presence of Byzantine failures. Sharding increases the throughput of the network linearly with respect to the computational power of the network. However, sharding reduces the security provided by the system making it susceptible to attacks as it reduces the number of attackers required to introduce compromised data into the blockchain network. Omniledger [28] proposes solutions to maintain security in a sharded blockchain. Recent solutions include using inspector nodes [7] to reshuffle validator nodes when required to reduce reshuffling overhead. Polyshard [31] claims to introduce scalability in terms of security, storage efficiency, and throughput by using a ‘polynomially coded sharding’ scheme.

Another area of potential future research seems to be the design of consensus algorithms. A change in consensus algorithms can also result in scalable blockchains. Reference [58] details how expensive consensus mechanisms are not needed in permissioned systems, thereby allowing usage of consensus mechanisms that are known to be scalable in terms of performance. Vukolić in [60] has compared proof of work and BFT protocols with respect to scalability. Many blockchain solutions use Byzantine fault-tolerant [29] protocols to construct scalable blockchains. However, such blockchains are not scalable with respect to the number of nodes because of the large number of messages exchanged between nodes. There has been some work done to reduce communication overheads in BFT. Many BFT protocols modeled after practical Byzantine fault tolerance [6] have been used as consensus protocols in blockchain. Stellar Consensus Protocol [38] introduces Federated Byzantine Agreement (FBA), removing the need for nodes to presuppose a unanimously accepted membership list. Algorand [19] proposes a novel Byzantine Agreement (BA) protocol to reach consensus among users. The core of Algorand uses a protocol called BA* that scales to many users, offers reduced latency, and ensures strict safety rule by making sure there are no forks in the blockchain. Ouroboros [27] is a proof of stake-based consensus algorithm where the authors have proved that honest behavior is a Nash equilibrium, thus proving that attacks are quickly neutralized.

Combination of blockchain with IoT, as suggested by our search results, is being considered as one of the most lucrative fields to work. Reference [9] has summarized the usability of blockchain in IoT. Research on blockchain with IoT parallels research on decentralized smart energy grids. There have been some solutions to enable peer-to-peer energy trading among devices in an industrial IoT setup. Reference [32] exploits a consortium blockchain to provide a secure energy trading mechanism. Energy markets where trading of locally produced renewable energy can take place without interference by an intermediary have been proposed and tested [39]. Blockchains are being used for communication between smart home devices [13]. An appropriate combination of blockchain, multi-signatures, and anonymous encrypted message propagation schemes can be used to build a decentralized smart energy grid system that provides increased security and privacy in comparison with centralized systems [3].

5 Conclusions

This paper aims to provide an overall idea of the domains where blockchain has been used to resolve existing issues or provide new innovative solutions. Through our screening process, we selected a set of 604 primary papers to conduct our mapping study. We have provided a broad classification of the areas under which work done can be classified. We have included statistical data regarding the number of papers published in each category, type of publication (Journal, Conference), and country of origin of the research. We included the year-wise distribution of various domains of blockchain research to better understand the change in research over the years. To further understand the research, we selected a few popular papers under the significant classifications and provided a gist about the type of work being carried out in the domain. We attempted to answer all the research questions that have been formulated. We have documented the entire process and published our results online for verification [51].

The frequency at which papers are being published is very high. The same search conducted a few weeks after the publication of this paper may lead to different results than what we have obtained. This is an inevitable drawback. However, we believe that this study will give researchers, both experienced and new, an idea about the work done so far.