1 Introduction

Open data is the idea that data can be freely available, used, reused and republished without the restriction of copyrights, patent or other controls or infringements (Auer et al. 2007). The Deloitte (2012) identifies government, businesses and citizens as the three key constituencies (stakeholders) in a successful open data ecosystem. Government produces a large volume of data and has launched open data initiatives and portals to open those data in reusable format so that data repositories, catalogues and portals have emerged worldwide. Such freely available data can be used for building useful applications which influence their value, provide access to government services and support transparency (Lnenicka 2015). IBM’s Smarter Cities Challenge Program held in 2013 developed a roadmap of the Helsinki open data strategy and pointed the way to support the evolution of open data ecosystem. Municipalities, gamers, developers, citizens, students, government officials, scientists, businessmen, artists, analysts, journalists and educators were all listed in the open data ecosystem. Prior to this initiative, different engagement and business models already existed for open data strategies (source: http://citizenibm.com/2012/01/visualization-and-open-data-in-helsinki.html). Similarly, the first “Open Data” day in the Nepal was held on February 23, 2013. To commemorate this event, various actors from different sectors took part in proliferating open data in Nepal. The actors were active in cyberspace and made contributions in the ecosystem of open data in Nepal.

Internet use and infrastructure is emerging in Nepal that can be identified from the Management Information System report of Nepal Telecommunications Authority published in February 2016 (source: http://www.nta.gov.np/en/mis-reports-en). There were 12.20 million internet subscribers’ customers by mid-December 2015, representing 46.04 percent of the total population. The internet penetration rate has doubled in only 5 years from around 23 % in 2010.

Few studies, however, have focused on network analysis and none on the network analysis domain of Nepal’s websites and their web relationships with the other actors, despite the rising internet penetration rate and high website self-ownership rate. Little systematic study has examined the key actors of open data in Nepal in terms of their web visibility, and their relationship in the domestic and international contexts. This study provides insights on their online communication, networking patterns and collaboration in cyberspace. We trawled and analyzed 49 seed websites of government organizations (GOs), nongovernmental organizations (NGOs), international organizations (IOs), educational institutions (EIs), student groups, and some software companies that have contributed in realizing the open data ecosystem. This exploratory study uses quantitative methods to examine the structural aspects of title mentions, links behaviors and their destination access representation by country.

Webometric techniques were employed to represent the structural characteristics of inter-mention, co-mention, and URL-based link analysis to evaluate some communication influences in the World Wide Web. According to Xu et al. (2015, 2016a, b), webometrics combined with social network analysis provides important evidence concerning social relationships among people, organizations, and/or nation-states. Further, Xu et al. (2016b), Park and Leydesdorff (2013) define that webometric network analysis is “an approach for visualizing and measuring positions and structures in a social network, which as reasoned above holds implications for the speed, scope and effect of cultural diffusions. In such networks, ties that connect individual members can indicate members’ friendship, exchange of ideas, and mutual involvement in activities.” In this regard, this study provides a webometric network analysis related to the seed websites classified as open data in Nepal, thereby filling a gap in the literature. This study further contributes empirically and methodologically to current academic endeavors in network research of open data ecosystems in developing countries. In order to examine current open data trends in Nepal, we set the following research questions:

  • RQ1 What are the patterns and network structures of the Nepalese actors in the World Wide Web?

  • RQ2 To what extent do these actors’ portals broadcast their presence to the outside world?

2 Literature review

2.1 The open data context in Nepal

The open data idea entered Nepal in 2012. The World Bank and the Global Facility for Disaster Reduction and Recovery (GFDRR) in partnership with the government of Nepal launched a project to build seismic resilience in the education and health infrastructure of the Kathmandu Valley. The first stage of the project was to create a disaster risk model to determine the most venerable buildings. At that time, there was no complete database of structures including geographic coordinates as well as information on construction types and materials. The World Bank launched a project called Open Cities Kathmandu in November 2012 to collect data on schools and buildings. This project was influenced by the role of the disaster-related Open Street Map (OSM; an open-license map created and updated by volunteers with local knowledge) during the Haiti and Indonesia disasters. The project set forth mappers and community mobilizers to work on the OSM with the public. The team made the public aware of OSM use by field surveys, developed software to provide training, and gave presentations on the OSM. The Nepal government facilitated this process by obtaining a spreadsheet containing an incomplete list of 1701 schools in the Kathmandu Valley from the Department of Education. Similarly, lists of hospitals, health posts, and polyclinics were obtained from the Ministry of Health and Population (Soden et al. 2014).

Every Ministry in Nepal worked on site through face-to-face interviews surveys to obtain data in difference fields every year or within some specified time periods under the department policy. The data were then finally stored with the National Planning Commission Secretarial Central Bureau of Statistics of Nepal. Documents were stored in PDF format, and the public was allowed to freely obtain data from relevant ministries or the central bureau of statistics (source: http://www.developmentgateway.org/assets/post-resources/understanding_government_data_use_in_nepal_final.pdf). Although the 1990 constitution acknowledged people’s access to information as a fundamental right, the Right to Information Act (RTI) only became effective in July 2007, when the Parliament of Nepal passed the Act to give the public the right to seek and receive information on any matter of public importance held by public agencies (Dahal 2010, 2011). Software Freedom Day (SFD) is a worldwide event celebrating the free and open spirit (source: http://np.okfn.org). In Nepal, SFD has been celebrated since 2005 by the Freedom and Open Source Software (FOSS) Nepal Community. The FOSS Nepal SFD celebrations in 2007, 2008, and 2009 were recognized as the best in the world. Various actors from GOs, NGOs, computer science and engineering student clubs, various IOs, and EIs worked hard to realize an open data community in Nepal. Similarly, a few companies emerged to make commercial applications by using data that were opened. Open data can be fostered and nurtured through ICT and internet use. This is the medium that collects, accumulates, interprets, and shows data. Therefore, this can be fulfilled by studying of the presence of open data on the World Wide Web (WWW).

2.2 Webometrics-related studies

Since Almind and Ingwersen (1997), many studies have employed webometric network analysis, for example, in different sectors of social media practices pertaining to different actors and organizations (Choi and Park 2014, 2015). As Meza and Park (2015) emphasize, webometrics is a combination of an analytical technique that extends traditional informetric methods to web-based transactions with quantitative methods for humanities and social science investigation. The unit of analysis includes webpages, parts of webpages, and entire set of webpages. Meza and Park adopted the webometric method to trace information exchange patterns on Twitter. In a recent study, Meier (2016, p. 73, lines 1–4) argues that webometrics based on “hyperlink network analysis of international NGOs has proved a fruitful and promising method for the detection of network structures of global civil society. The combination of social and spatial network analysis shows a low level of interconnectedness between the NGOs and at the same time a strong spatial concentration of all embedded network actors.” The study by Hsu and Park (2012) focused mainly on comparing webpages between Korea and China and showed that Korean webpages were more interactive than Chinese ones because of differences in the organizational culture. The webometric ranking methodology for world universities, research centers, and hospitals was used, and Turkey’s performance in the world was also evaluated based on webometric data on rankings of universities, research centers and hospitals Kaya et al. (2010). East Asia Research (EAR) of South Korea provided a network analysis for the 2004–2013 period, addressing paper production related to EAR, characteristics of authors and their affiliations, and the number of institutions dealing with EAR papers. In addition, social network analysis was conducted to examine nodes, average geo distances, degree centrality, and betweenness centrality to determine patterns of collaboration Park et al. (2016) between institutions and citation networks of papers published under EAR and indexed by the Korean Citation Index (KCI) by Park and Park (2015). A network analysis of the flow of students to different countries and the factors influencing this flow was studied. Research has claimed that the bilateral hyperlink, communication variables, trade, physical distance, common language, and border between countries influenced the flow (Barnett et al. 2016). A specific study in open data was done in South Korea. The Open Public Data Directive, a long-term plan for the 2013–2017 period to make data public open, was considered through a semantic network analysis of documents of 34 organizations (Jung and Park 2015). The webometric analysis of organizations was limited to counting links to their websites. A study by Thelwall and Sud (2011) introduced the methods of mentions of URL citations and titles of organizations, evaluated the differences between these two methods by using 131 U.K. Universities and 49 U.S. libraries and information science departments for a case study, and compared the usefulness of two search engines: Bing and Yahoo. They concluded that Bing’s Hit Count Estimates (HCEs) for title searches were not useful. Bing URL citation HCEs were consistent, whereas Yahoo’s HCEs for all three types of searches were consistent. In the case of URL counts, both search engines were consistent, showing that exact URL counts of all three months were consistent between both search engines. Similarly, four types of accuracy factors were examined. This study was extended by Thelwall et al. (2012) through the same case studies by representing URL citation and title mentions in binary and weighted, direct link and co-inlink network diagrams. They retrieved HCEs, counts of matching URLs, and filtered counts of matching URL types from network connections. Metrics based on URLs or titles were considered the best option to replace hyperlinks in both binary and weighted networks, and the best results for co-title mention and co-URL citation networks were achieved using filtered counts of matching URLs. Webometric analysis using Bing Search API 2.0 was studied by Thelwall and Sud (2012). Raw data for the webometric analysis are often collected from the API of the search engine. The most popular search engine Google is limited in terms of data affordability and software programming. Yahoo’s free search API has closed. Therefore, Bing was the only option freely available through Webometric Analyst 2.0 for our research.

2.3 Open data topics in Nepal and developing countries

Davies et al. (2013) propose a conceptual framework on the emerging impact of open data in developing countries. It discusses the conceptual framework of open data, different domain of governance and emerging outcomes and various factors involved to develop policies and strategies to realize open data that can be employed for poverty reduction and sustainable development. Schwegmann (2013) noted open data initiatives in developing countries and actors facilitating open data. Civil society organizations and external partners of developing country governments have encouraged the use of open data to increase transparency, accountability, and citizen participation through the Open Government Partnership, which suggests that open data are being rapidly initiated in developed as well as developing countries. Nevertheless, some barriers to its spread remain, such as a lack of robust statistical systems, technical difficulties, the income level, a low literacy rate, awareness of its use, data quality, and limited data use. Both the opportunity and the threat of big data analytics for international development are studied by Hilbert (2016). He also claims similar barriers, such as infrastructure, economic resource and education in developing countries, which result in a “digital divide: a divide in the use of data-based knowledge to inform intelligent decision-making”. This work summarizes the policy options for maximizing opportunities and minimizing risks on big data.

The existing literature on open data in Nepal mainly focuses on transparency issues. The open data plans and budgets in Nepal for a future model of action-oriented research consist of a six-point program to integrate open data and the RTI by networking, capacity building, focusing on common issues through dialogues and processes. Sapkota (2014, 2015) further extended this program and claimed that civil society was familiar with advocating transparency and accountability, emerging government support, and the government-friendly legislative framework in place. Furthermore, the community’s great interest in these issues with respect to open data could play a significant role in accountability issues such that the government and development work could be more transparent. It also pointed out the lower barriers to the adoption of open data in Nepal in terms of socioeconomic indicators because of the large social inequality and limited financial resources for data infrastructure and sharing. Other limiting factors were the low internet penetration rate, a lack of an open data policy, political and bureaucratic resistance to innovation, limited financial resources, a high level of corruption, a culture of secrecy, limited demand for open data, and a lack of collaboration between open data and the RTI legislative framework. Despite these limiting factors, skilled, dedicated, and innovative technical and thematic experts in Nepal have been working hard to improve the country’s supply of open data and develop an ecosystem to use that data. An open data barometer shows citizen and civil right readiness is highest among the other sectors in 2015 (source: http://opendatabarometer.org).

3 Methods

3.1 Data collection

The popular search engine Google and the Twitter search were used to select the seed sites. The search queries for Google, “open data Nepal”, and “open data day Nepal” were used. The search results included most of the NGOs, IOs and GOs that participated in the Open Data Day. These organizations’ homepages were individually visited and read in order to determine their significance in the open data field in Nepal. Similarly, Twitter search was conducted using hash tag #OpenDataDay Nepal and #OpenData Nepal through the Twitter account. The Twitter search highlighted many projects being done in the open data field in Nepal. We listed these actors and manually visited their websites to determine their importance in relation to open data projects in Nepal. We further classified them according to the concepts of Gonzalez-Zapata and Heeks (2015) and made them the seed sites of our research. Gonzalez-Zapata and Heek (2015) further pointed out different stakeholders in the context of open government data and the IBM’s Smarter Cities Challenge Program, which listed the various actors in the open data ecosystem. The stakeholders and actors in the open data ecosystem related to our open data research are defined as the individuals and organizations that contribute to the open data project in the form of knowledge, provide training, technical, financial or legal support, and influence the uses of open data, as well as those who can support and are impacted by the open data projects. The literature review reveals the various actors involved in realizing the open data in Nepal and their involvement in the open data day celebration.

Table 1 lists the organizations, their established date, the sector in which they belong, a description of their role in the Open Data Networks (ODNs) in Nepal, and their website address, or the URL and the country to which the URL is registered, as given by the country-code top-level domain (ccTLD) presented in the table.

Table 1 Description of the seed sites selection

3.2 The main concept

The aforementioned websites were analyzed using Webometric Analyst 2.0 (http://lexiurl.wlv.ac.uk) and the Bing API, which can run advanced Boolean searches and count external websites linked to sites under research in terms of URL citations. The lists of external websites matching the base query, i.e., the aforementioned websites, were (Table 2) thereby obtained. The analytical techniques are summarized as follows:

Table 2 Analytical techniques and concepts of Webometrics

4 Results and discussions

Figures 1, 2 and Table 3 explain and discuss our research question 1, Fig. 3 and Table 4 our research question 2. Figure 1 presents a network diagram showing the strength of interlinkages between websites. The red dots and the arrows show the linkage between the URLs whereas the green dots show the lack of any interconnectivity between the URLs. Each pair of website domain names or URLs was converted into a query in the syntax of Bing for those sites matching the URL. Bing estimated the URLs of pages matching the queries of the websites submitted. According to the results, www.openstreetmap.org and www.github.com occupy a central position. All international NGO sites are strongly interlinked based on their active work and strong web presence. www.openstreetmap.org and www.github.com are the most central websites between the IOs. The IOs such as www.ckan.org, www.okfn.org and www.openhandbook.org link to the www.openspending.org. Government and educational websites have no interlinkages. www.ckan.org is lightly interlinked to www.opennepal.net and the government site. www.kathmandulivinglabs.org is interlinked to an NGO site and the IO humanitarian open-street map data site www.hotosm.org, which may be due to the 2015 earthquake.

Fig. 1
figure 1

A representation of inter-mentions, N = 49 (no of seed sites)

Fig. 2
figure 2

A representation of co-mentions, N = 49

Table 3 Top 14 websites with the highest indegree and outdegree centralities
Fig. 3
figure 3

Link impact analysis

Table 4 Seed site calculation

The interlink age was further investigated to analyze the online networking patterns in different networking scenarios. While the networking density value for the directed (asymmetric) network is 0.021, the value for undirected network increases up to 0.030. Density refers to the number of relations (in terms of inter-mentions) divided by the maximum number of possible relations (Yoon and Park 2016). In a valued and weighted network matrix, density means an average value of entire cells. Next, ‘degree’ and ‘betweenness’ network centrality values are calculated. Degree centrality refers to the number of immediate ties that a node (i.e., website) has, rather than indirect ties to all others in the network (Yoon and Park 2016). Two degree centrality, indegree and outdegree, values are calculated per direction between two nodes. On the other hand, betweennes centrality measures how important a node is in terms of playing a role of broker, connecting a pair of nodes which pass through the node (Yoon and Park 2016). In this case, the number of the shortest paths via the node is considered. Network metrics were calculated using some options in Webometric Analyst 2.0.

The top 14 websites with the highest indegree and outdegree centrality are presented in Table 3. The IO github had the highest value in both measurements (indegree centrality = 1656 and outdegree centrality = 1841), which indicated its highest level of indegree and outdegree. The NGOs, GOs and EI are lightly reported. The indegree centrality indicates the popularity or the information contained in the subsequent website, whereas the outdegree centrality represents the website’s ability to provide or diffuse the information.

In terms of betweenness centrality, the IOs www.github.com and www.okfan.org have high betweenness centrality (233 and 181). The local NGO www.opennepal.net has the highest betweenness centrality among all NGOs (167). The GO www.mof.gov has a high betweenness centrality (5). The data and technology company Young Innovations has the highest betweenness centrality (5) among all companies. Note that the betweenness centrality of a website shows the amount of control that this website exerts over the interactions of other websites in the network.

Figure 2 shows the co-mention links of the sites. The green circles represent no mention with any pairs and the red circles represent co-mentions. The width of each line is proportional to the number of co-linking pages. Educational sites, NGO sites with similar missions, and government sites are co-mentioned. www.kathamndulivinglabs.org and www.opennepal.net are co-mentioned 7 times; www.kathmandulivinglabs.org and www.quakemap.org, 36 times; www.kathmandulivinglabs.org and kathmandu.gov.np, 40 times; www.kathmandulivinglabs.org and www.openstreetmap.com, 91 times; and www.openstreetmap.org and www.ku.edu.np, 114 times. In addition, www.github.com and www.openstreetmap.org are well co-mentioned. Further, www.okfan.org, with a regional office in Nepal, and www.okfn.org are co-mentioned 485 times; www.okfn.org and www.ckan.org, 187 times; and www.okfn.org and www.ku.edu, 131. In terms of the co-mention analysis, NGOs, IOs, and EIs are well co-mentioned with each other by external sites. The organizations that have the same purpose are co-mentioned together because they share something important. This shows that different sectors were working together to realize open data in Nepal. The emerging data and technology companies were not mentioned, indicating their low web presence.

Link analysis was conducted using Webometric Analyst 2.0. The Bing API gave domains of external sites linked to seed sites. Here who.is was used to determine the country to which the domain belonged. In addition, www.getip.com and www.ip-api.com were used when TLDs were not reported by who.is. Individual searches were performed for the .com,.org,.net,.info,.org,.asia, and .edu TLDs. Link Impact analysis shows the web popularity in the destination side. We further split them to see their views in the domestic and the international countries contexts. This may illustrate the collaborating or communicating patterns of our seed sites to the different countries’ websites. The circles in Fig. 3 represent seed sites: red represents the U.S.; green, Nepal; purple, the U.K.; and orange, Australia. The blue squares represent different countries, such as dk = Denmark, kg = Kyrgyzstan, au = Australia, cm = Cameroon, mx = Mexico, np = Nepal, in = India, ru = Russia, th = Thailand, dz = Algeria, rs = Serbia, tr = Turkey, ca = Canada, no = Norway, pa = Panama, fr = France, es = Spain, jp = Japan, ht = Haiti, ga = Gabon, id = Indonesia, nl = Netherlands, and it = Italy. The results are shown as data value > = 10.

Therefore, countries with seeds sites with circles and countries in squares aligned to the left are those whose data values are less than 10. Most of the seed sites point to U.S. domains, followed by ccTLDs of their origin, except for the.au seed site. One seed site is null, and the other sites point mainly to the U.S., the U.K., and Canada for the.au domain. Nepal seed sites point to countries such as India, Japan, China, Panama, Russia, Mexico, Spain, and France. U.K. seed sites also point to African countries such as Cameroon, Serbia, and Gabon. The US seed sites mostly point to the .us domain. These results illustrate the relationships among countries in the web-based domain, and they are used as tools to measure their popularity in those countries. The web relationship in the country domain can be inferred.

5 Conclusions

This study focuses on the webometric dimension of open data portals and portals of other sectors in Nepal that are fostering open data in the country, as well as on the international community that is helping in the development of this open data ecosystem. We explore the structure of these portals in terms of the source and destination. The source outlooks the URL-based hyperlinks or title mentions among the seeds sites and the destination outlooks the third sites that cite the pair of sites. Our research represents the structural inter-organizational and countrywide representation. According to the results, open data portals are cooperating with international portals, although not as strongly as international portals are among themselves, i.e., the cooperation between the NGOs, GOs, educational institutes and data and technology companies is weak. In the co-mention network, national websites are well connected and have strong cooperation with IOs. In some cases organizations with similar working nature are well co-mentioned. In addition, diverse organizations such as the NGOs, IOs, GOs, and educational intuitions are also co-mentioned. Most of the NGOs, GOs, and the emerging software companies have weak inter-mention relationships. The link impact analysis shows that seed sites of the U.S, the U.K., and Nepal point mainly to their country of origin, but that they are also present in different countries.

The findings of this study have several implications for web presence and web relationships in the World Wide Web. Websites with more links are more popular. However, our research revealed many weak relations among the seed sites. These organizations can correct their deficiencies. This can be a powerful tool to the webmaster. The web is a powerful tool for developing relationships among domestic and international communities. Our research shows the organizations’ present relationships and the country of interest in the destination. Based on the study results presented herein, the organizations can prepare the future relationships and country of interest and organizations. This study will help managers and policy makers gain structural understanding of online networking, collaboration and communication patterns domestically and internationally. The study suffers a number of limitations due to the lack of complex numerical analysis. First, we manually interpreted the purpose of the organizations rather than analyzing them in the online networking domain, manually selected the organizations from the search engine and listed the most important ones as seed sites due to Bing’s limited query. Second, some of our seed sites are not purely one object type (i.e., open data type) but are multifunctional and have made some contribution in the field of open data. Thirdly, though our study focused on open data in Nepal, we also incorporated the IOs, which initially creates confusion, and simultaneously pointed out their importance to Nepal open data. Fourthly, the data collection relied on the third party software and commercial search engine www.Bing.com, which raises the issue of reliability and validity. However, the search results from Bing using Webometric Analyst 2.0 are known as quite similar to those from Google in terms of HCEs and relational patterns (Thelwall 2012, 2014). The effectiveness, validity, detail, and/or extent etc. of Bing’s search results compared to Google’s, despite the latter’s dominance are beyond the scope of current research. Further study should address the limitations of this study. The importance of the online relationships, collaboration and co-operation patterns must be further empirically investigated. This research could be further expanded to the usage and application of open data. The final but smallest limitation is the issue of causality (Jung 2016). The webometric network analysis in current research cannot specifically determine what factor influences what outcome. It is difficult to conduct causality analysis using the social network method. Instead, current research has emphasized the importance of relational patterns of interorganizational collaboration in cyberspace.