1 Introduction

Disasters have taken place on innumerable counts and have caused severe damage to nature and civilization (Starr and Van Wassenhove 2014). In 2015, 160 disasters had struck Asia-Pacific region out of the total of 344 disasters that struck worldwide (UNESCAP Report 2016). Asia-Pacific region accounts for 47% of the disasters with over 16,000 deaths (64% of the total worldwide deaths caused by disasters) and this was more than double of the destruction caused in 2014 (UNESCAP Report 2016). Therefore, it is imperative that we understand how to mitigate, prepare for and respond to disasters more effectively in the future.

Disasters can be broadly classified as natural or man-made disasters. According to Van Wassenhove (2006), logistics operations constitute 80% of the total money allocated for disaster relief operations. Day et al. (2012) has argued that nearly 40% of this allocation is wasted because there is lack of time to perform data analysis and also there is duplication of the efforts. Akhtar et al. (2012), Li et al. (2016) and Papadopoulos et al. (2017) have also made similar claims in their studies as they mention that there is lack of collaboration and coordination during the disaster relief operations and thus the relationships between all the actors get adversely affected. Altay and Labonte (2014) and Aloysius et al. (2016) have strongly advocated that information processing is critical for decision making and coordination during humanitarian relief operations.

Data science and business analytics have made in-roads in our day to day life and advancements in this field has led to understanding the mystery that has been hidden in large sets of data (Agarwal and Dhar 2014). According to Huang and Chaovalitwongse (2015), the term “big data” has been defined as an umbrella term for any collection of large and complex data sets that are difficult to store, process, analyze, and comprehend using traditional database or data processing tools. Pu and Kitsuregawa (2013) have shown that National Science Foundation of USA and the Japan Science and Technology Agency have laid emphasis on big data and disaster management. These two scientific institutions have outlined that integration of multiple data sources and thereby achieving accurate information is a matter of concern in disaster management. There is no dearth of data and this is evident from the fact that a massive amount of remote sensing data is made freely available by NASA Open Government Initiative. Under this initiative, data is stored under archives and one such archive named Earth Science Data and Information System (ESDIS) has about 7.5 PB of data (\(1\,\hbox {petabyte}= 1024\,\hbox {terabytes}\)) (Ramapriyan et al. 2013). It is expected that the amount of data generated by 2020 will be more than 40 ZB (\(1\,\hbox {zettabyte} = 10^{6} \, \hbox {petabytes}\)) (Song et al. 2016).

Majority of the Asia-Pacific countries do not possess the capability to ascertain which place has suffered how much of the damage and where the disaster relief operations should be channelized so that disaster management plans are effectively executed (UNESCAP Report 2016). However, now there is a satellite named as South Asia Satellite (launched in May 2017 by India) which will benefit 8 countries in the South-East Asia region and this will hopefully enhance the capability of these countries to better manage disasters. On the other hand in Europe the Emergency Response Coordination Center working under the European Commission’s Humanitarian Aid and Civil Protection Department acts as emergency operation centers equipped with information systems to streamline the humanitarian operations during disasters (Ma and Zhang 2017). Similarly, there are operation centers like these which are established in various countries so that the relief operations are managed efficiently and effectively as well. Large amounts of data are created in these operation centers and the use of big data analytics has tremendous potential to improve upon humanitarian supply chain management (HSCM) practices (Prasad et al. 2016).

Since the interest in the use of big data within the context of humanitarian supply chain management has been increasing (adapted from Liu and Yi 2017), we decided to undertake a literature review in this field. The research objective of this study can be broadly considered as: (a) to showcase the state of the art in the field of big data and its role in supporting humanitarian supply chain decisions, (b) to understand the context from the perspective of organizational theories and (c) to suggest further research directions in the domain of big data and humanitarian supply chain management.

This paper has been divided into sections and sub-sections. Section 2 outlines the research methodology employed in undertaking the systematic literature review and this showcases the state of the art for the selected topic of study. Section 3 is the discussion section and it deals with answering the second and the third aspect of the research objectives.

2 Research methodology

In this section, we have discussed the process and methodology that has been followed to undertake this study. The systematic literature review that we have undertaken has been based on the guidelines laid out by Tranfield et al. (2003) and Dubey et al. (2017a). The research methodology has been broadly classified into three parts as (i) Planning the Review, (ii) Conducting the Review and (iii) Reporting the Review. We have addressed these three aspects in the following sub-sections and we have discussed the research methodology in detail. Section 2.1 describes the details that were considered while planning the review and brings out the manner in which the search was performed for this review. In Sects. 2.2 and 2.3, we have reported the various aspects of big data and humanitarian supply chain.

2.1 Identification of literature

The source of literature for this study is the Scopus database (https://www.scopus.com) which is now the biggest and the largest database for academic journals as well as conference proceedings. Scopus lists papers in four categories, namely (i) Life Sciences, (ii) Health Sciences, (iii) Physical Sciences and (iv) Social Sciences allowing for cross-disciplinary research, and this was another reason for selecting Scopus for this study. Disasters by their very nature lend themselves well to multi-disciplinary research. As the spectrum of our topic is represented by all of these four categories, use of Scopus has been a natural choice. There are other digital databases like Web of Science, DBLP and WorldCat but Scopus contains more academic journals than these databases.

Our study deals with the amalgamation of two independent concepts, one from the domain of information systems (big data) and the other from the domain of operations management (humanitarian supply chain). The keywords used to search the academic literature for these two concepts can be seen in Table 1.

Table 1 Keywords used in this study.

The search for these concepts was performed independently on Scopus using ‘or’ operator for each keyword and thereafter the search results of these two concepts was merged using an ‘and’ operator in Scopus. The search syntax can be seen in Table 2. In order to replicate this search on Scopus.com, this syntax can be copied and pasted in the advanced search section of that digital database. The search result will vary in the number of documents as this database is actively updated. The data for this study has been taken from the first possible date of indexation of a journal paper to the date of search performed on Scopus database (March 03, 2017). Thus, this data is a true reflection of the information that was showcased on March 03, 2017.

Table 2 Search syntax on scopus.

Our search process described in Fig. 1 resulted in 28 journal articles that we reviewed in this paper. In the first stage we conducted a search for big data related keywords (see Table 1). This resulted in 49,778 hits. The second stage, consisted of a search for HSC keywords within the results of the first stage, resulting in 21,937 hits. In the third stage, an intersection of the data from stage one and stage two has been selected and this process yielded in 119 papers. In stage four, we have considered only journal articles (38 papers) for the literature review and discarded conference proceedings. During the search it was found that there were 70 conference proceedings, 7 conference review, 3 book chapters and 1 short survey paper. Within the 70 conference proceedings, 57 papers were from the field of Computer Science only and this outcome is heavily skewed. Except for the domain of Computer Science, researchers from other academic domains prefer to publish in journals rather than conferences (Hermenegildo 2012; Derntl 2014). Thus, for this reason we did not consider conference proceedings in this study. Also, only those journal articles were considered which were published in English language. In stage five, we had manually gone through each of the 38 papers to ascertain which of the papers can be finally considered for literature review. We had considered papers that have addressed the domain of humanitarian supply chain management while highlighting the emergence of big data. Only, 28 papers were found to be a good fit for this study. In Fig. 2, it can be clearly seen that the spectrum of these 28 papers truly reflect the various academic domains. Since, all of the academic domains have a balanced representation, our decision to only consider journal articles is justified. In Appendix A, the journals in which these 28 papers have been published are shown. It can be clearly seen from Appendix A that there is no pattern or a particular journal where the number of publications is high. Although, within the HSCM field there is a dedicated journal, namely the Journal of Humanitarian Logistics and Supply Chain Management, only one paper has been published in this journal in 2016 that fits into the search criteria. However, based on Fig. 3 we can see that the number of papers is increasing in the intersection of big data and HSCM. Appendix D lists the 28 journal papers that we have considered for this study.

Fig. 1
figure 1

Source: Author’s compilation

Stages in data selection (Source: Scopus Database, Mar 03, 2017).

Fig. 2
figure 2

Source: Author’s compilation

Spectrum of journal papers after stage 5.

In Fig. 3 it can be clearly seen that this field has gained importance from 2015 onwards in the academic parlance.

Fig. 3
figure 3

Source: Author’s compilation

No. of journal papers per year.

2.2 Classification of literature

The papers in this study could be classified on the basis of various organizational theories (Sarkis et al. 2011) or by using various building blocks (Gunasekaran and Spalanzani 2012) but we have used the classification scheme employed by Dubey et al. (2017a). The classification scheme outlined by Dubey et al. (2017a) encompasses the strengths and merits of the seminal work of Sarkis et al. (2011) and Gunasekaran and Spalanzani (2012). Their classification scheme has been inspired by the seminal works of Whetten (1989) and Sutton and Staw (1995). In Fig. 3, the classification scheme for the selected journals can be seen and in Appendix B, we have categorized the papers based on this scheme.

The literature has been broadly categorized into two classes, namely, Theory Building and Application Based Research. In the Theory Building category, the focus has been on identifying papers that contribute to the existing organizational theories by either supporting, extending or even criticizing the same. This has further been sub-divided into the Rationalist Approach where papers have been selected based on the contribution to the theories as well as advancing the current state of research in this field by developing critical review papers. The other aspect of the Theory Building category, Alternative Methods revolves around conceptual frameworks that have been tested by empirical analysis or by case studies. The second aspect of listing papers apart from the Theory Building category is the Application Based Research segment. In this segment, papers are considered as cases where industry focused research has been undertaken.

2.3 Understanding the concepts used in this study

Based on the literature review of 28 papers, various enablers as well as concerns of big data in HSCM have been identified that will lead to a better understanding of this field of. These enablers and concerns have been listed keeping in view of the fact that this study is dealing with how big data can aid the humanitarian supply chain management. The papers that have mentioned about these enablers and sources have been shown in Appendix C.

The enablers for big data in humanitarian supply chain management are:

  1. (a)

    Volume This has been the most cited enabler amongst the rest of them. Disasters create chaotic environments which breeds uncertainty for the survivors. In such a situation people tend to reach out to their near and dear ones and thus this anxiety in the minds of people leads to the creation of a very large data set (Huang and Chaovalitwongse 2015; Zhan et al. 2016). The papers considered for this study have categorically mentioned this attribute in the following ways (Refer to Appendix C): (1) Scale of data, (2) Stored data and (3) Continuous creation of data.

  2. (b)

    Variety Data is generated by many devices and in many formats. Broadly, the data can be classified as unstructured data, semi-structured data and structured data (Agarwal and Dhar 2014; Banerjee et al. 2016). During any humanitarian operation, there is a surge in the creation of data in all possible formats within these three broad categories. The papers have highlighted the following aspects (Refer to Appendix C): (1) Data variety and (2) Interoperability within the different data formats.

  3. (c)

    Velocity The rate at which data is generated during humanitarian operations can be very fast. The aspects of velocity that have been considered in the papers we reviewed include (Refer to Appendix C): (1) Speed of data generation, (2) Data processing time and (3) Data transmission time.

  4. (d)

    Veracity Accuracy and reliability of data refers to veracity (White 2012). The selected journal papers have highlighted this aspect in the following way (Refer to Appendix C): (1) Quality of information, (2) Accuracy of data, (3) Reliability and (4) Accessibility.

  5. (e)

    Organizational mindfulness It is the capability to avert potential accidents by organizations which are highly reliable in their operations. Weick et al. (1999) list the five dimensions of organizational mindfulness as: (1) Preoccupation with failure, (2) Reluctance to simplify interpretations, (3) Sensitivity to operations, (4) Commitment to resilience and (5) Deference to expertise. These five dimensions have been reflected by few of the selected papers considered in this study as presented in Appendix C.

The concerns identified for big data in humanitarian supply chain management are:

  1. (a)

    Humanitarian logistics Warehousing and delivery of essential commodities during disaster relief operations constitutes humanitarian logistics (Van Wassenhove 2006; Boone et al. 2016; Chu et al. 2016; Hazen et al. 2016). The concerns identified during the review of the selected papers are (Refer to Appendix C): (1) Identification of logistics service providers, (2) Collaboration between agencies during humanitarian operations, (3) Response time of logistics agencies/organizations.

  2. (b)

    Remote sensing The capability to perform remote sensing monitoring before and after any disaster adds tremendous decision making capability during HSCM. The concerns identified from the selected papers are (Refer to Appendix C): (1) Real time rendering of disaster location, (2) High-speed buffering mechanisms, (3) Large-scale 3D model visualization.

  3. (c)

    Information security: Transmission of large sets of data on a continuous basis during disaster relief operations is a regular activity. The key aspects identified from the papers are (Refer to Appendix C): (1) Privacy and confidentiality of data, (2) Encryption, (3) Accountability, (4) Maintenance.

  4. (d)

    Social media (SM ): Although SM is a fairly new phenomenon it plays a key role when disasters strike. The amount of data generated by SM during humanitarian operations can help relief organizations to streamline their efforts (Amaye et al. 2016). For example, Facebook lets its users declare themselves safe if they are near the disaster zone and it also shares this information with various disaster relief organizations (TechCrunch 2017). This not only informs the near and dear ones but also serves as a tool for the aid agencies to estimate how many people are displaced. The concerns for SM identified from the selected papers are (Refer to Appendix C): (1) Diffusion of information by SM, (2) Validation of information, (3) Support coordination and collaboration among agencies.

3 Discussion

The results obtained from Scopus has been discussed in this section and we have expressed our understanding in the following three sub-sections.

3.1 Theoretical contributions and future research directions

This section focusses on addressing the second and third research objectives. We have identified the research gaps and future research directions based on the various organizational theories. We consulted the work of Sarkis et al. (2011) and Dubey et al. (2017b) in identifying the organizational theories that could be considered for this study. The organizational theories we see as potential sources of research questions, their short synopses, and our proposed research directions are presented in Table 3.

Table 3 Key organizational theories and future research directions.

3.2 Managerial implications

The field of Data Science has gathered enough momentum and use of big data analytics can provide answers to some of the strategic as well as tactical questions which were ignored until now (Ji-fan Ren et al. 2016). The analysis of unstructured data is the biggest innovation in HSCM. The amount of unstructured data generated during a disaster is immense and all of this takes place within a short span of time. One source for unstructured data is social media. Sentiment analysis of this unstructured data can assist in the execution of humanitarian relief operations. Furthermore, cognitive systems like IBM Watson have shown that machine-machine interaction to analyze large sets of data is going to be a reality in the near future (Chen et al. 2016). Until then however, man-machine interaction is vastly employed to better understand unstructured data. Thus, trained manpower in the domain of big data in HSCM will help in mitigating the limitations of this field.

3.3 Limitations of this study

First, in our search we concentrated only on journal papers and dismissed books and conference proceedings. It is possible that these academic avenues would contain information that would have been of much importance for this study. Second, other digital databases like Web of Science, DBLP and WorldCat have not been considered for collecting data for this study. The premise for this omission is based on the fact that Scopus is the largest and most inclusive digital database. Nonetheless, some papers might have been missed if the source is not covered by Scopus. Third, the data from Scopus has been collected in March 2017. We have made a conscious effort to show the result only till 2016 in Fig. 3. This is because the papers for 2017 are still in the process of being indexed and thus Fig. 3 will not show the true representation if we would have considered 2017 in that figure.