Introduction

A recent Pew Research Internet Project Report predicts that “The Internet of Things (IoT) will thrive by 2025,” and suggests that “the opportunities and challenges resulting from amplified connectivity will influence nearly everything, nearly everyone, nearly everywhere (Anderson and Rainie 2014).” In addition to providing more convenient and comfortable services and developments, IoT has made a significant impact on the traditional supply chain, as well as influenced social structure.

The academic field has been aggressively researching the themes and different aspects of IoT from different perspectives. Owing to the diversity of research topics, a researcher is required to understand the focal points and tendencies of the current research. Traditionally to identify research trends in a particular area, the method of dividing an analysis object into subcategories and summing the number of papers included in the corresponding category has been widely used. Whiles it can’t explain the distribution of various subject concepts existing in the domain or the relation between subjects. By using co-word analysis, the domain of knowledge can be quantitatively found and the connections between domains can be identified. So this paper examined IoT related research areas and trends through co-word analysis. Also, the intellectual structure of the IoT is examined as forming a cluster through clustering techniques and multidimensional scaling and schematizing correlations. It hopes that this paper will be helpful for the researchers quickly grasping the direction and subjects in IoT field.

Related studies

Internet of Things concept

Internet of Things was first coined by Ashton (2009) as the title of a presentation at Procter& Gamble in 1999 in the context of supply chain management. One definition has recently been formulated in the Strategic Research Agenda of the Cluster of European Research Projects on the Internet of Things: “Internet of Things (IoT) is an integrated part of Future Internet and could be defined as a dynamic global network infrastructure with self configuring capabilities based on standard and interoperable communication protocols where physical and virtual ‘things’ have identities, physical attributes, and virtual personalities and use intelligent interfaces, and are seamlessly integrated into the information network. (Vermesan et al. 2009).” Gubbi et al. (2013) make the definition more users centric and do not restrict it to any standard communication protocol. This will allow long-lasting applications to be developed and deployed using the available state-of-the-art protocols at any given point in time. Their definition of the Internet of Things for smart environments is interconnection of sensing and actuating devices providing the ability to share information across platforms through a unified framework, developing a common operating picture for enabling innovative applications. This is achieved by seamless ubiquitous sensing, data analytics and information representation with cloud computing as the unifying framework. Atzori et al. (2010) point out Internet of Things can be realized in three paradigms: internet-oriented (middleware), things oriented (sensors) and semantic-oriented (knowledge). Although this type of delineation is required due to the interdisciplinary nature of the subject, the usefulness of IoT can be unleashed only in an application domain where the three paradigms intersect.

To sum up, the term Internet of Things is not well defined and it has been used and misused as a buzzword in scientific research as well as marketing and sales strategies. Until now the definition of the Internet of Things is still rather fuzzy and subject to philosophical debate (Uckelmann et al. 2011).

Internet of Things-related research area

On the whole, we try to divide the current research into IoT application domain and IoT technology domain. IoT application domain includes aerospace and aviation, automotive, telecommunications, intelligent buildings, medical technology, healthcare, independent living, pharmaceutical, retail, logistics, supply chain management, manufacturing, product lifecycle management, oil and gas, safety, security and privacy, environment monitoring, people and goods transportation, food traceability, agriculture and breeding, media, entertainment and ticketing, insurance, recycling. IoT enabling technologies are: identification technology, Internet of Things architecture technology, communication technology, network technology, network discovery, software and algorithms, hardware, data and signal processing technology, discovery and search engine, technologies, relationship network management technologies, power and energy storage technologies, security and privacy technologies, standardization (Vermesan et al. 2009). Domingo (2012) proposes IoT architecture from a technical perspective is divided into three layers with functionalities are summarized as follows: (1) Perception layer: its main function is to identify objects and gather information. It is formed mainly by sensors and actuators, monitoring stations (such as cell phone, tablet PC, smart phone, PDA, etc.), nano-nodes, RFID tags and readers/writers. (2) Network layer: it consists of a converged network made up of wired/wireless privately owned networks, Internet, network administration systems, etc. Its main function is to transmit information obtained from the perception layer. (3) Application layer: it is a set of intelligent solutions that apply the IoT technology to satisfy the needs of the users. Borgia (2014) reviews the current research and defines the fundamental characteristics of IoT, describing the technologies involved in its realization as well as the envisaged applications. In addition, he has discussed the major challenges that need to be faced for supporting the IoT vision, which cover with different research areas: architecture, communication, addressing, discovery, data processing, data management, security and privacy, etc. Miorandi et al. (2012) aim at providing a holistic perspective on the Internet-of-Things concept and development, including a critical revision of application fields, enabling technologies and research challenges by using a survey. By doing so, they give us an overall concept of IoT.

Although the IoT has brought out for more than 10 years and many research papers have been focus on this filed. But we find that few researches relating to classify the domain and the intellectual structure of the Internet of Things (IoT) is still known.

Co-word analysis

Co-word analysis is based on counting and analyzing the co-occurrences of words in different parts of articles of a specific domain (Callon et al. 1991). It is generally a method of extracting words from the articles of corresponding subject fields, calculating the co-occurrence frequency of each word pair and obtaining correlations between words, for example, using various indexes and mapping subdomains. That is, if two keywords simultaneously appear in the same paper, the two subjects mentioned in the paper are correlated with each other. When measuring the intensity of correlation between the words, the research patterns and trends of corresponding fields can be examined. Thus, if using this analysis method, the structure of the particular subject field can be analyzed without a data classification system (Cho 2014). Most of the previous research use co-word analysis to depict structures of different scientific domains. These domains contain computer science (Hu and Zhang 2015; Wang et al. 2015), information science library science (Ding et al. 2001; Ravikumar et al. 2015), business economics (Vaughan et al. 2012), mathematical computational biology (Liu and Ding 2014), and engineering (Wu and Leu 2014) etc. While this kind of method is still no seen in using in IoT knowledge domain. Thus, we use co-word analysis in mapping the hot topics of IoT and find out the intellectual structure of it.

Research methods

Generally in co-word analysis, the correlation between words is obtained using various indices after extracting words from the literature in corresponding subject fields and calculating the co-occurrence frequency of each word pair. Next the subdomain can be understood as mapping the correlation on the multidimensional scaling (MDS). Although when directly performing multidimensional scaling without clustering, the group of words is also formed, a more easily understandable domain map can be formed if expressing clusters on the map as clustering words. Therefore, this paper has performed the clustering technique; additionally factor analysis also be used to classify the clusters. Finally, the MDS performed to show the knowledge structure. A more detailed research method is explained below.

Data retrieval strategy and keywords collection

The Science Citation Index Expanded (SCI-Expanded) and Social Sciences Citation Index (SSCI) of the Institute for Scientific Information (ISI) Web of Science have been used to retrieve the data for the study. The data were collected in October 2014. Co-word analysis generally extracts analysis object words from titles, abstracts, keywords, etc. The first step of co-word analysis involves extracting keywords from records in indexing databases.

So, the first step, we extracted keywords from papers searched with the query “Internet of Thing*” OR “IoT” among the WOS database. Article language was limited to English, and document type was limited to scholarly journal articles. We applied the search strategy for period of 2000–2014. We got 758 papers. The second step 758 papers contained 2081 keywords and we eliminated the invalid keywords and combined synonyms. Third, the 1976 keywords were extracted and the frequency analysis was performed on the preprocessed keywords. The 28 high frequency keywords were finally picked out by calculating the g-index, which was used in the research by Zhang et al. (2013) Fourth, the co-word analysis was performed on the 28 keywords by using the Bibexcel as the software to get the co-occurrence frequency.

Co-occurrence matrix and similarity index

Co-occurrence matrices, such as co-citation, co-word, and co-link matrices, have been used widely in the information sciences (Leydesdorff and Vaughan 2006). This paper constructed a co-occurrence matrix of the 28 high frequency keywords. It showed that whether two words co-occur in one paper. The higher co-occurrence frequency of the two keywords means a closer relationship between them. The similarity index was used to measure the similarity between words because it can standardize the difference between words with high and low appearance frequency as normalizing the co-occurrence frequency range (Cho 2014). We used the Pearson’s correlation coefficient to calculate the similarity as the former studies did (Ding et al. 2001). The data were processed by the software IBM SPSS statistics Version 20.

Factor analysis

Factor analysis is the name given to a group of statistical techniques that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors). The approach involves condensing the information contained in a number of original variables into a smaller set of dimensions (factors) with a minimum loss of information (Comendador et al. 2014). Zhu (2012) used factor analysis to gain the factors on virtual community research in China. In this paper, we made factor analysis to give a basis for further analysis on cluster and MDS.

Cluster analysis

Cluster analysis is an exploratory data analysis tool for solving classification problems. Its object is to sort cases (people, things, events, etc.) into groups, or clusters, so that the degree of association is strong between members of the same cluster and weak between members of different clusters. A cluster is a group of relatively homogenous cases or observations. Each cluster thus describes, in terms of the data collected, the class to which its members belong; and this description may be abstracted through use from the particular to the general class or type uses any of several techniques (viz. Nearest Neighbors, K-Means etc.) to classify people, objects, or variables into more homogeneous groups (Bihani and Patil 2014). The most frequently used clustering technique in co-word analysis is hierarchical clustering, which uses the Wards method and creates a cluster while minimizing the increase in the squared error that results when two clusters are merged. To obtain either similarity or dissimilarity between clusters, the similarity was remeasured between words lists included in the cluster. As in the analysis by Cho (2014), this paper calculated the sum of the co-occurrence frequency of indexes included in the cluster with the Pearson’s correlation coefficient.

Mapping

There are a variety of techniques for visualization of information such as factor analysis, multidimensional, eigenvector decomposition, pathfinder network scaling and self-organizing maps (Börner et al. 2003). As the most common method is multidimensional scaling (Boyack et al. 2005). Whereas the entities located near each other on the position map of a multidimensional scaling method indicate higher relative similarity, the entities located far from each other indicate lower relative similarity (Cho 2014). In order to examine the location of the keyword presented on the map with Pearson’s correlation coefficient, this paper calculated the Euclid distance and visualized it in two-dimensional space by applying the PROXSCAL algorithm.

Results

Overall output of papers

Data compiled in this study are yearly paper as shown in Fig. 1. The total number of papers on IoT from 2000 to 2014 is 758. As shown in Fig. 1, there are 674 papers during the period 2010 to 2014 in the field IoT, roughly 89 % of all the papers published during the 15 years. From 2000 to 2009, the number of papers accounts for only 11 % of the total papers. It shows that the period from 2000 to 2009 is the starting point of research on IoT. The number of papers is relatively small and the research contents are limited to the introduction of IoT. While from 2010 to 2014, the number is annually and significantly increases every year. Papers begin to experience faster growth trend in comparison with the former period. As the numbers shown in Fig. 1, there will be more and more papers published in this filed. The IoT has aroused the research interesting of the scholars.

Fig. 1
figure 1

Publication analysis

Frequency analysis

The results of the frequency analysis on data filter to 1976 keywords are shown in Table 1. ‘Internet of things’(379), ‘Wireless sensor networks’(112), ‘RFID’(54), ‘Security’(28), and ‘Cloud computing’(22) are concluded to be top 5 of high-frequency keywords. The term ‘6LoWPAN’(14), ‘CoAPs’(11), ‘Future internet’(11), ‘IPv6’(10), ‘Machine to machine’(10), and ‘Privacy’(10) et al. are also concluded to be high-frequency keywords. These keywords give us an overall cognition on the core areas of IoT research.

Table 1 High frequency keywords

First, ‘Wireless sensor networks’ and ‘RFID’, reflecting the basis of IoT, are high-frequency keywords in addition to ‘Internet of things’. RFID is the abbreviation of ‘radio frequency identification’. Radio frequency identification (RFID) and wireless sensor network (WSN) are two important components of pervasive computing on IoT, since both technologies can be used for coupling the physical and the virtual world (Zhang and Wang 2006). Second, ‘Security’, ‘Privacy’ and ‘Trust’ are also high-frequency keywords, reflecting the embedded nature of the technology and a lack of awareness of its potential social and personal consequences, as balanced against the more clearly articulated benefits, make a special issue dedicated to security, privacy and trust. Third, ‘Cloud computing’, ‘Ubiquitous computing’, ‘Cloud manufacturing’ and ‘Pervasive computing’ reflect that IoT is of interest to manufacturing. In industry, the “things” may typically be the product itself, the equipment, the transportation means, etc. It is obvious that these developments, too, accelerate the integration of smart objects in the Internet. Additionally pervasive computing has migrated from desktops to mobile phones, and computing is increasingly included into a variety of objects (Kuehnle 2014). Fourth, ‘6LoWPAN’, ‘CoAPs’ and ‘IPv6’ are the protocols. Internet Protocol Version 6 (IPv6) over Low-Power Wireless Personal Area Networks (6LoWPAN) refers to the use of contemporary Internet protocols in diverse types of hardware. References to 6LoWPAN relate to the tagging or design of different types of limited hardware in order to facilitate their participation in the Internet of Things (IoT) or a diverse IP-connected network. IPv6 is the latest Internet protocol edition and is developed by the Internet Engineering Task Force (IETF), which now devotes its attention to 6LoWPAN. CoAPs is a web protocol for the IoT. These keywords show that the communication protocols are vital to the construction the IoT. They are the hot topics to be studied.

Correlation matrix

The results of the co-occurrence matrix calculated through Bibexcel software is as shown in Table 2. Furthermore, the Pearson’s correlation analysis performed for measuring similarity is as shown in Tables 3. For the space limited, it only shows the top ten keywords co-occurrence matrix and similarity matrix. A high correlation coefficient means a high co-occurrence frequency of words. In other words, they can be interpreted as research concepts having high correlation with IoT field.

Table 2 Part of matrix of co-occurring words
Table 3 Part of the similarity matrix using correlation coefficients

Factor analysis

The factor analysis is conducted by using SPSS20.0 with principal components analysis method. The program generates 7 factors which explain almost 60 % of the variation to describe the relationships among the 28 keywords. The result is summarized in Table 4.

Table 4 Result of the principal components analysis

Clustering

The hierarchical group analysis is performed on the correlation analysis results drawn above. As a result of performing cluster using the Ward method and standardizing with the Z score, a dendrogram is shown as Fig. 2.

Fig. 2
figure 2

Result of hierarchical group analysis

The dendrogram can be divided into three clusters. BG1, which is the first cluster, forms a large group including up to 15 keywords from ‘Internet of Things’ to ‘Web services’. BG2 includes 8 keywords from ‘Internet’ to ‘Future internet’ and BG3 includes 5 keywords from ‘Wireless sensor Networks’ to ‘Machine to machine’. If divided into three groups as shown above, they could also be divided again into 7 clusters on the basis of the dendrogram and the front principal components analysis. Therefore 7 clusters are created as shown in Table 5. The representative keyword of each cluster display with one showing the highest frequency among the keywords included in each cluster.

Table 5 Seven clusters and representative keywords

Cluster1 is ‘IoT and Security’, Cluster2 is ‘Middleware’, Cluster3 is ‘RFID’, Cluster4 is ‘Internet’, Cluster5 is ‘Cloud computing’, Cluster6 is ‘Wireless sensor networks’, Cluster7 is ‘6LoWPAN’. The group share indicates the share of the occurrence frequency sum of keywords included in each cluster in the whole and Cluster1 (IoT and Security) is the biggest at 56.62 %; the next is Cluster6 (Wireless sensor networks) with 15.13 % and the third is the Cluster3 (RFID) with 10.59 %. The rest shows a share of Cluster2 (2.90 %), Cluster4 (2.14 %), Cluster5 (8.32 %) and Cluster7 (4.29 %).

Mapping

The results of standardizing the Pearson’s correlation coefficient matrix with a Z score, calculating the Euclid distance and visualizing it in two-dimensional space by applying the PROXSCAL algorithm are as shown in Fig. 3. Dividing the map of Fig. 3 subjecting a total of 28 keywords to three groups, BG1, including ‘IoT’ and ‘Social networks’, is located at the center and left bottom. BG2 including ‘Internet’ and ‘Cyber physical system’, is located at right bottom. On the up side is BG3, including ‘6LoWPAN’ and ‘IPv6’.

Fig. 3
figure 3

MDS map based on keyword

Conclusions

Since there is few papers on the quantitative review of the IoT field, this study moves beyond simply thematic classification and carries out co-word analysis to place special focus on the relation between subjects and the intellectual structure by clustering them into seven clusters. By going through the process in this way, this research has led to a clear explanation of the intellectual structure of the IoT research field. It gets the two main conclusions.

Internet of things research has a starting point from 2000

From the above analysis, we can see that Internet of things-related research started in 2000 when the confluence of efficient wireless protocols, improved sensors, cheaper processors, and a bevy of startups and established companies developing the necessary management and application software has finally made the concept of the Internet of Things (IoT) mainstream. But until 2010, it doesn’t be paid enough attention seen from the quantity of the research number. Progress has turned slow recently but has steadily increased from 5 articles in 2000 to 38 in 2010. While after 2010, there is a sharply development of IoT research because of by 2011 there were approximately 7 billion human beings on the face of the earth, and 12.5 billion devices connected to the Internet including nearly every PC in the world and well over a billion smart phones. These gives a promotion to IoT application and at the same time, more and more theoretical problems raised to be solved. From the number, we can estimate that there will be more research papers been published in the near future.

Internet of things research has three big groups and can be divided into seven clusters

There are three big groups of IoT research field. And if dividing the research domains of this field in detail, they are divided into the seven clusters of ‘Security’, ‘Middleware’, ‘RFID’, ‘Internet’, ‘Cloud computing’, ‘Wireless sensor networks’ and ‘6LoWPAN’.

The first big group includes three clusters. The main findings about the three sub-clusters are as follows: (1) The first cluster is IoT and security. IoT gets smart about the ‘smart objects’. When things or objects get smarter, the Internet of Things gets social. Social networks will change with the development of IoT. But the most important things on how IoT can go smoothly and further is depend not on the technology but on the customers. From this cluster we get the three aspects that a customer most concerned that are security, privacy and trust. (2) The second cluster is middleware. It also includes the keywords machine-to-machine communications and performance. Machine-to-machine (M2 M) communications refers to autonomous communication between devices/machines. M2 M technology involves five important technological parts: intelligent machines, M2 M hardware, communication network, middleware and applications. The middleware plays a bridging role between communication networks and the IT system. Middleware includes two parts: M2 M gateway and data-collection/integration components. So the middleware is very important to evaluate the performance of M–M communications. (3) The third cluster is RFID. It includes the keywords ubiquitous computing, web of things and web services. Internet of Things is more often used in the context of radio frequency identification (RFID) and how physical objects are tied to the Internet and can communicate with each other. Considered a subset of the Internet of Things (IoT), WoT focuses on software standards and frameworks such as REST, HTTP and URLs to create applications and services that combine and interact with a variety of network devices. So, you could think of the Web of Things as everyday objects being able to access Web services. The key point is that this doesn’t involve the reinvention of the means of communication because existing standards are used.

The second big group includes two clusters. The main findings about the two sub-clusters are as follows: (1) The first cluster is Internet and cyber physical system. Cyber physical system (CPS) represents the next evolutionary step from existing embedded systems. Together with the internet and the data and services available online, embedded systems join to form cyber physical system. So Internet has a close relationship with cyber physical system. (2) The second cluster is cloud computing. This cluster also includes the keywords energy efficiency, cloud manufacturing, cloud computing, environmental Internet of Things, CoAPs and future Internet.

The third big group includes two clusters. The main findings about the two sub-clusters are as follows: (1) The first cluster is wireless sensor networks and quality of service. Wireless sensor networks (WSNs) are required to provide different levels of quality of services (QoS) based on the type of applications. Providing QoS support in wireless sensor networks is an emerging area of research (Bhuyan et al. 2010). Therefore, QoS provisioning in WSN has some significant challenges. For example, there are extreme resource constraint, redundant data, heterogeneity of the sensor nodes, dynamic network topology and size, less reliable medium, mixed data arrival pattern, multiple sinks or base stations etc. (2) The second cluster is 6LoWPAN, IPv6 and machine to machine. IPv6 is an Internet Layer protocol for packet-switched internetworking and provides end-to-end datagram transmission across multiple IP networks. 6LoWPAN is an acronym of IPv6 over Low power Wireless Personal Area Networks. Owing to the amount of devices, M2 M will need a very large address space that could only be provided by IPv6. So applying IPv6 to M2 M is definitely the future trend. And this cluster shows that more and more scholars pay attention to this trend.

So above all, the research on IoT has involved experts working in industry, research and academia to provide their vision on IoT research challenges, enabling technologies and the key applications, which are expected to arise from the current vision of the Internet of Things.