Introduction

E-learning, short for “electronic learning”, arises as a name for the advances that have been made in education through the use of Information and Communication Technologies (ICT) and in particular the internet. Multiple definitions have emerged over the past few decades, it has been considered as a “new” form of learning (Nicholson, 2007) that uses the strengths of the Internet to provide synchronous or asynchronous interaction, teaching materials, and personalized programs to various communities. Even before the existence of the internet, interest was focused on the support that computer equipment and networks could offer to teachers and students, enriching education with technological findings (Fuller, 1962). Then, the interest turn towards the educational, on the one hand concerning learning (Stockley, 2006) and on the other hand to the understanding of this new modality as an evolution of the distance modality (Tick, 2006). Over time, it has been given a highly revolutionary character, granting it the capacity to transform education through the production, participation and consumption of content in various formats. This same character has led to it being a focus of multiple views. UNESCO (2013) defines it as a fundamental element in twenty-first century education that contributes to the construction and participation of the knowledge society; is also considered as an object of transversal study for the development of the networked society (Suárez, 2010); is also seen as a provider of tools and channels to access a new experiences, scenarios and information mediated by ICT, innovating in teaching-learning processes, research, extension (Freire, 2008) and other institutional practices, as described Conole and Oliver (2006), who also defines it as a field of research, with complex thematics, multiple tensions and rapid movement. Finally, Chiang determines on his work that e-learning research is expanding significantly (Chiang, Kuo, & Yang, 2010).

From the bibliometric point of view, one can learn a great deal by analyzing research manifested in scientific publications (journals and conference proceedings), as Taylor (2001) demonstrated at the beginning of the millennium. Several studies have been carried out in educational sciences (Diem & Wolter, 2013; Lee, Wu, & Tsai, 2009) and also from technology (Hsiao, Tang, & Liu, 2015). However, only few and limited studies have been conducted to know the scientific production in e-learning. One of the first was the work of Shih, Feng, and Tsai (2008), based on 5 journals, highlighting the trends in “Instructional Approaches”, “Learning Environments” and “Meta Cognition.” A similar approach was taken by Maurer and Khan (2010) when analyzing five journals and two conference proceedings, identifying 14 trends and 150 concept clusters. At the same time, a broader study of the Ed-Media journal archive from 2003 to 2008 was also conducted finding problems with the ambiguity of terms, institutions and authors, emphasizing the need to create an appropriate and comprehensive thematic category (Khan, Ebner, & Maurer, 2009). Chiang et al. (2010) found that the main applications of e-learning were presented mainly in Education and Computer Science and to a lesser extent in Medical Education, Information Sciences and Documentation, among other interdisciplinary areas. They used an empirical consultation of the terms “e-learning”, “distance learning” and “electronic learning”, focusing their analysis on 7 journals.

Changing the focus of journals and conference proceedings to databases and indexes, Liu, Wu, and Chen (2013) took WoS and ScienceDirect to identify Learning Technologies trends in special education. González, Saroil, and Sánchez (2015) analyzed the scientific production in Latin America with the SciELO database, putting on the table the growing thematic linkage of e-learning with the areas of education (like Chiang et al. (2010)) and health professions, also highlighting its multidisciplinary nature. Tai, Lee, and Lee (2013) using the Social Science Citation Index and Science Citation Index, analyzed the citation of journals in the period 2003–2012, finding a diversity of tendencies in multiple areas of knowledge. This source was also used by Hung (2012) to classify the publications in e-learning in 2 domains (system and content design, education and training) with 4 groups and 15 clusters. His analysis showed a change in the trend in the approach from the technical towards the educational.

These studies show the increase in the scientific production of e-learning over time, along with its trends, as technology and educational practices bring innovations. However, they contribute to the exploration of the thematic rather than its appropriate categorical definition (Khan et al., 2009). Therefore, we have for many reasons a large gap in the analysis of the scientific domain in this discipline, either because it is a very recent field (compared to other disciplines), or because it doesn’t have a global panorama of contrast to the recognition of that category, or because changes in intellectual environments occur so rapidly and cover new conceptions of industry, education, and politics, or because most of today’s scientific advances no longer align with disciplinary boundaries (Rafols, Porter, & Leydesdorff, 2010).

So how to know if e-learning is a discipline in itself? What are the main terms that describe it? Which publications contain the works described by these terms? This study seeks to answer these questions from a bibliometric approach with the support of visualization techniques.

The combination of bibliometrics with visualization techniques to analyze and/or define emerging disciplines has already been used in other cases, such as Nanoscience and Nanotechnology (Munoz-Ecija, Vargas-Quesada, Chinchilla-Rodríguez, Gómez-Nuñez, & Moya-Anegón, 2013), where journals and conference proceedings were analyzed, consolidating the identity of the scientific discipline with a high degree of transversality and without defined limits; the case of Environment Management Accounting (Schaltegger, Gibassier, & Zvezdov, 2013), whose results show an incipient development of the thematic; the case of altimetry (González, Pacheco, & Arencibia, 2016), which used the terms that in the opinion of the authors, better define the thematic, identifying the research trends that characterize it.

Other studies have used bibliometrics to identify scientific publications with the highest impact in a certain discipline, for example in the field of information systems (Chan, Guness, & Kim, 2015), information science (Waltman, van Eck, & Noyons, 2010), environmental social responsibility (Valenzuela, Linares, & Suárez, 2015), renewable energy, sustainability and environment (Fernández, Bote, & Moya-Anegón, 2013), business and technical communication (Lowry, Humpherys, Malwitz, & Nix, 2007), neurosurgery (Madhugiri, Ambekar, Strom, & Nanda, 2013), among others.

As for bibliometric analysis, co-citation of classes or categories can be used to construct maps of large scientific domains (Moya-Anegón et al., 2004), this may serve, as mentioned above, for the establishment of a global panorama of contrast on which different scientific fields can be recognized and their internal dynamics and cognitive structure understood (Cobo, López, Herrera, & Herrera, 2011), either as a field of research already consolidated or as an emerging discipline. In Guzman’s words, “we can say that the analysis of information with maps of science, supported by metric studies of information, allows us to graphically represent the relationships between documents that the disciplines or specific scientific fields publish. These show the sub-areas of research in which the discipline has been focused over the years in order to identify, analyze and visualize the intellectual structure, as well as the temporal evolution in which the disciplines are being developed” (Guzmán & Trujillo, 2013).

Rafols et al. (2010) developed a method for visually locating research bodies within science, by overlaying maps of science we can investigate the increase in the scientific development of disciplines and organizations that do not fit into the traditional disciplinary categories, this is achieved thanks to the existence or construction of a stable corpus over which another smaller corpus can be superimposed, producing intuitive comparisons. In In addition, these offer a greater interpretation and have the potential to be used in scientific analysis and for comparative purposes (Boyack, 2008). Following the method, these maps are matrices of similarity measures, calculated from the correlation between information items present in the structure of scientific communication, in other words, show the disciplinary structure of the sciences in terms of publications. The stable or base map is constructed with bibliographic data of a database that has a defined categorization of the sciences. The analysis performed on the overlap will be conditioned by the size of the data selected for it.

An example of the use of the mapping overlay technique was developed and published by the SCImagoFootnote 1 research group in its work on the graphical interface of SCImago Journal and Country RankFootnote 2 (Hassan, Guerrero, & Moya-Anegón, 2014) in which through a freely accessible web platform, the presence of SCOPUS publications in different scientific domains can be analyzed, as well as the global distribution of the scientific output performance of different regions and countries. This tool also allows seeing the thematic categories with which the scientific publications have been previously related, both the traditional knowledge areas and the research frontiers.

Given the need for a database that could represent the global scientific publication system, we used SCOPUSFootnote 3 as a data source because of its disciplinary coverage, as did Leydesdorff, Moya-Anegón, and Guerrero (2010), to carry out the comparative study between Journal Citations Reports and SCOPUS and later to measure the interdisciplinarity of SCOPUS (Leydesdorff, Moya-Anegón, & Guerrero, 2015).

As for the visualization technique, there are multiple methods and tools for visualizing bibliometric networks, such as distance-based, graph-based or time-based (Small, 2006; van Eck & Waltman, 2014). Mapping and clustering are also used to respond to concerns about the main fields of research in a scientific domain, the relationship between research fields and the evolution of the domain over time. As a tool, Leydesdorff et al. (2015) in their journal mapping work showed how VOSViewerFootnote 4 assure the comprehensive visualization of node labels on the map and how the stress minimization technique such as multidimensional scaling (MDS) facilitates its visual.

Materials and methods

This study is based on the existence of scientific communications that contribute to the development of the thematic, under the criteria of academic peer review and deposit in databases of international publications. In order to identify such communications and analyze them, the following methodology is developed.

Step 1. Definition of descriptors

Establish a list of all those terms that describe the thematic from the term core “e-learning.” This is achieved through the bibliometric analysis of articles that in their title, abstract and keywords include the term core. It begins with the definition of the type of information, in this case the primary literature is considered as the main and most important reference of knowledge in the world scientific field (Fernández et al., 2013), for its scientific contributions and for receiving most of the citations. The source of information consulted was SCOPUS as the database that mostly indexes journals and conferences proceedings (Leydesdorff et al., 2010). The search results are refined by source type (journal and conference proceedings), by primary literature (article, conference paper and review), the language (English), then the time period for the analysis is selected, preferably without erratic behaviors in the annual production rate and finally, a representative sample of the documents is selected on which to perform the bibliometric analysis based on the co-occurrence of keywords, with the aim of determining primary descriptors that are mainly present in articles, their relationships and relevance by means of the technique of Visualization of Similarities (VoS) (Waltman et al., 2010). This technique, as shown by Cobo et al. (2011), provide a very accurate look at how a document corpus is described and linked.

Based on the set of primary descriptors, new descriptors are included as result of linguistic similarities and/or acronyms or abbreviations used in natural language, for example, when including the keywords of an article, you can choose to use the descriptor e-learning or elearning (Chiang et al., 2010), or the acronym ICT to include the descriptor Information and Communication Technologies. These new descriptors, that reflect the same meaning as the one provided by the author, are called Secondary Descriptors.

Step 2. Correspondence of publications and descriptors

Build a matrix of articles volume for each descriptor (primary and secondary) and each publication indexed in the database. Using the same selection criteria described in the previous step, a query is made to the database for each of the descriptors that have appeared thus determining the number of articles of each descriptor. Finally, the primary and secondary descriptors of each term are added, assuming that the sum reflects unique works related by descriptors.

Step 3. Percentage of participation in the thematic

Determine the percentage of articles in the journal or conference proceeding that are related to the thematic during the time period established in the initial criteria. The total number of articles (TNA) of the journal is identified during the period of time, then the number of related articles (NRA) with the thematic is determined for each of the journals, this is done by taking the maximum number of articles by descriptor, considering that an article can be related to more than one descriptor. Then, the percentage of participation (PP) between these values is calculated:

$$ PP=\frac{NRA}{TNA}\times 100 $$

Step 4. Cut-off point for inclusion of publications in the category

The cut-off point should be determined on the PP from which the publications will be included for the thematic categorization. Previous studies on this classification task have been carried out on the basis of the distribution of publications between “pure”, “hybrid” and “non-related” publications (Chan et al., 2015) or on the determination of the core set of publications (Madhugiri et al., 2013). However, it was considered that this cut-off point must be established by identifying the maximum permissible error of the thematic relation of the publication. The higher the cut-off point the greater the precision in the selection of publications, although this precision means a reduced volume, and if not, a low cut-off point increases the error in the selection and its volume. Once the cut-off point is established, all publications that exceed this threshold will be considered in the categorization of the emerging discipline.

Step 5. Analysis of the set of publications

The degree of cohesion between publications is sought. The selected journals are analyzed under a bibliometric view to determine if they represent the existence of a scientific community that communicates their knowledge through these channels, to recognize it as an emergent and distinctive scientific discipline that can be delimited as a cross-thematic category (Leydesdorff et al., 2015). In this study we will use the mapping overlay technique (Rafols et al., 2010) that facilitates the exploration of the knowledge bases of an emerging discipline and its evolutionary dynamics both in terms of its internal cognitive coherence and diversity of their sources of knowledge with reference to disciplinary classifications. For this, it is necessary to have a base map on which to overlay a local map and thus make comparisons.

The base map will be a global map of science (Leydesdorff et al., 2015) that includes the total of journals and conference proceedings indexed in SCOPUS. The relationship degree of publications is established by the normalized value produced by the combination of cites, co-cites and coupling (Hassan et al., 2014). In addition, this analysis is enriched with the clustering performed by VOSViewer (van Eck & Waltman, 2010).

The local map that will be overlaid on the global map of science is the set of journals and conference proceedings selected in the previous step. This overlap will allow locating the discipline in the general topology of scientific knowledge and whether or not there is a cluster effect, which should be considered as evidence of the existence of a specific disciplinary field from the point of view of the scientific communication guidelines followed by researchers.

In summary, the methodology is based on the principle that a greater presence of field-specific descriptors in the items of an article is directly proportional to the number of interactions by citation, co-citation, and coupling of a publication with others that would form part of the discipline cluster.

Results and discussion

The SCOPUS database was used for its coverage and peer review of indexed publications to extract a representative set of primary literature that includes the term e-learning in its title, abstract and keywords during the period 2012–2014. In this timeframe, the production in e-learning stabilized with a growth rate close to 0 (2012: 3177, 2013: 3053, 2014:3065), favoring the analysis and allowing a window of 2 years to consolidate its impact. The database was consulted with the query chart of Table 1, obtaining the metadata of the first 2000 publications equivalent to 21% of the total, ordered by the Scopus relevance algorithm,Footnote 5 which guarantees the accuracy of the search term in the fields of the document. A total of 4521 keywordsFootnote 6 were recovered.

Table 1 Query chart on SCOPUS

These keywords were later analyzed with the VOSViewer tool to establish their co-occurrence in the articles. Under the technique of Visualization of Similarities (VoS), 51 primary descriptors were established and are listed in Fig. 1, including the occurrences of the term core as well.

Fig. 1
figure 1

“E-learning” primary descriptors

The secondary descriptors that complete the listing are: elearning, electronic learning, Learning management system, blearning, blended learning, mlearning, mobile learning, Information and communications technologies, eassessment, electronic assessment, VLE, Massive Open Online Courses, PLE. These 64 terms constitute the base descriptors of the consultation of the articles (Appendix 1), since they have been used to describe the scientific production around e-learning during the established period.

The correspondence matrix between descriptors and publications was elaborated on a total basis of 12,923 journals and conference proceedings indexed in SCOPUS.Footnote 7 The PP participation percentage is shown in Fig. 2.

Fig. 2
figure 2

Percentage of participation (PP) of the term in journals and conference proceedings

There are 3680 publications of the total base that do not have any article related to any of the 64 descriptors. Of the remaining 9243, 7801 have a participation rate of less than 5%. The cut-off point setting was performed based on the analysis of the error in selection of publications, in this case, by percentage bands as shown in Table 2.

Table 2 Percentage bands for establishing the cut-off point

As can be seen, a value less than 20 in the percentaje of participation has an error greater than 7% and the average participation percentage is less than 50%. We consider that an average percentage of participation should be maintained above 50% for the publication be considered in the category, which is why the cut-off point is set at 25% (coinciding with the classification of pure and hybrid publications made by Chan et al. (2015)) and excluding the 11 journals and congresses that are not related to the theme. With this, there are 82 journals and 137 conference proceedings (Appendix 2) to be analyzed in comparison to the global map of science.

The global science map was constructed using the VOSViewer from SCOPUS-indexed publications and the combined indicator (citations, co-citations, coupling) used by SCImago (Hassan et al., 2014), which guarantees to have normalized values for each of the publications to be visualized.

As can be seen in Fig. 3, the map is composed of seven clusters, which in a clockwise and wide sense can be denominated as: social sciences (red), psychology (clear cyan), medicine (green), health professions (purple), life sciences (yellow), physical sciences and engineering (dark cyan) and computer science (blue).

Fig. 3
figure 3

Global map of science based on SCOPUS and SCImago Journal & Country Rank using VOSViewer with its network map setting. (Source: self-made)

Figure 4 presents the overlap of the local map corresponding to the 219 selected publications (the color indicates the area of knowledge in which the publication is superimposed and its size corresponds to the percentage of participation) on the global map of science, showing the distribution of publications.

Fig. 4
figure 4

Distribution of publications related to the thematic, using the mapping technique with VOSViewer in its density map configuration. (Source: self-made)

Figure 5 clearly shows the cluster effect that shows a high interrelation of combined indicator (citations, co-citations, coupling). This cohesion is sufficient evidence, in terms of scientific communication, that there is a shared use of publications among researchers of this thematic and determines that e-learning is a distinctive scientific discipline. For there is a network of relationships and interactions that are established between researchers who share structures of thought, patterns of cooperation, language and forms of communication, as Hjørland and Albrechtsen (1995) put it in establishing that science must be evaluated Knowledge of the social practices of scientists.

Fig. 5
figure 5

Approximation of the distribution of publications related to the thematic, by map overlay in VOSViewer in its configuration of network map without links. The size of the selected publications has been modified for visual purposes. (Source: self-made)

There is also a core of publications within the cluster, located inside the social sciences area of knowledge. An analysis of the 26 publications in this nucleus shows that these relate mainly to the thematic categories of education (77%), library and information science (19%) and philosophy (4%).

On the other hand, the cluster can be contrasted and validated by categorizing the selected publications into an existing indexing system, for example, SCImago Journal & Country Rank. Figure 6 is the result of an analysis of common categories among the selected publications, plotted using the NodeXL tool with the force-direct visualization algorithm of Fruchterman and Reingold (1991). As can be seen, the strongest relationships are between computer science and social sciences, and then between them and engineering and management sciences.

Fig. 6
figure 6

Relationship of thematic categories of the selected publications according to their classification in SCImago Journal & Country Rank. The colors of the categories of the same source were used (Source: self-made)

Although the ratio is inverse, a strong relationship between the social sciences and computer science is coherent and validates the behavior analysis of the core previously made.

In addition to the above, these findings diverge in part from the findings of Chiang et al. (2010) that e-learning is mainly present in social sciences and computer science.

This study shows that there is a set of publications, derived from 64 descriptors related to the e-learning thematic, whose bibliometric analysis places them primarily in social sciences and secondly in the areas of computer science and health professions. These results converge with the conclusions of Hung (2012), who identified a change in the trend of e-learning research, going from the technical to the educational dimension.

Discussion

The bibliometric analysis of the keywords provides an objective approximation in the construction of the set of descriptors, taking the denomination provided by the authors of the scientific production. However, there is an absence of consensus in the scientific community about the descriptors of e-learning since they respond to different approaches given from pedagogy, technology, and organizations. For this reason, some descriptors present in the academic and commercial discourse may not have been included in this study, as well as other types of publications such as books, editorials, notes or surveys.

On the other hand, the categorization of scientific publications is an arduous and permanent task that is complex to manage from the publication systems, since a publication can vary over time in its central thematic and go to other research fronts without affecting its Initial categorization. This may be one reason why both the co-category analysis performed with the SCImago Journal & Country Rank and the previous categorization studies coincide only in the main areas, social sciences and computer science.

Likewise, this study can be updated under the same methodology as the corpus of articles and publications increases, generating new overlays and finally updating the thematic coverage.

Possible applications

This categorization of e-learning is first, a guide for researchers who wish to know and contribute to the development and strengthening of the discipline, knowing the journals and conference proceedings that comprise it, its impact and other bibliometric indicators. Secondly, it is an input for the development of new studies on the thematic, like georeferencing studies (Guerrero-Bote, Olmeda-Gómez, & Moya-Anegón, 2016), research of countries, institutions and authors and all kinds of bibliometric analysis. Finally, with this thematic categorization, the e-learning category can be included in databases (SCOPUS, WoS) and bibliometric analysis platforms (SCImago Journal & Country Rank) to facilitate access and analysis of publications related to the thematic.

The methodology and tools presented in this study can be used in principle, for the analysis of any other scientific field or possible emergent discipline. However, it is necessary to consider the prerequisites to ensure the verification of the analysis under the global science landscape, these are, access to databases that represent global scientific knowledge or an approximation to it and, standard values for each of the publications to be analyzed.

Conclusions

Using a combination of bibliometric indicators and analysis techniques, this study has categorized e-learning as an emerging discipline in the world system of scientific publications, consisting of 64 descriptors and 219 journals and congresses indexed by SCOPUS between 2012 and 2014.

According to other studies, the visualization analysis achieved by the map overlay technique clearly exposes a cluster effect of the global production of scientific knowledge in e-learning, that is to say, the existence of a concentration of scientific production with a high degree of cohesion between the indicators of appointments, co-appointments and coupling, which constitute a scientific communication channel of e-learning within the social sciences and with broad and strong bibliometric links between computer science and health professions, being this concentration sufficient evidence to consider the e-learning as an emerging discipline in the world publishing system represented by SCOPUS.

This discipline must be analyzed from its internal structure (both cluster and core) to identify common principles, to define its nature, its detailed thematic correspondence and its main contributions and contributors.

The bibliometric indicators used in this study are an approximation the impact of publication in the scientific community and as such help to solve the problem of lack of consensus on the definition and description of the thematic, providing a set of descriptors that can be increased over time, by including annual scientific production in databases.