Introduction

In recent years, spatial information has become one of the crucial elements in evolving a decision-making environment. It is estimated that more than 80% of the information used in decision-making has the spatial component (Klinkenberg 2003). Spatial data creation has become easy due to adoption of advanced technologies and hence many organizations are creating and maintaining their own spatial data. Spatial applications that have the diverse data requirements usually face the difficulty of creating them at one place. Such applications require the collection and integration of spatial data from different sources. The fragmentation of spatial data custodians causes diversity in policies, standards and tools to manage and share the multi-source spatial data (Sjoukema et al. 2017, 2020; Kotsev et al. 2020). Moreover, this way of multi-source spatial data creation, collection and integration lead to many technical and non-technical issues in the decision-making. Data specifications, data models, technical standards, metadata standardization, data attribution and tools are the major technical issues identified in the spatial data integration. However, non-technical issues involve diversity in pricing, incoherent sharing policies, legal disputes and social issues.

To overcome these technical and non-technical barriers, a holistic framework is required to manage and integrate the multi-source spatial data. Spatial Data Infrastructure (SDI) is considered as a holistic framework to share and integrate the multi-source spatial data on a common platform (Rajabifard 2001; Cooper et al. 2012; Maphale and Smit 2021). According to Coleman and McLaughlin (1998), SDI encompasses the policies, technologies, standards and human resources necessary for the effective collection, management, access, delivery and utilization of spatial data in a global community. SDI increases the awareness of the use of spatial data and the cooperation between decision-makers and stakeholders. SDI can be created at different hierarchy of levels from local (i.e., city) to state to national to regional to global. The city level SDI is the base level of hierarchy and can be integrated upwards with SDIs at higher levels in hierarchy. During the past several years, high adoption and utilization of SDI is reported by different countries and spatial agencies (Niaraki et al. 2010; Masó et al. 2012; Ali and Imran 2019; Ronzhin et al. 2019; Oskouei et al. 2019). The adoption of SDI at country level refers to National Spatial Data Infrastructure (NSDI). In the early 1990’s, Federal Geographic Data Committee (FGDC) initiated NSDI program for the United States (Clinton 1994). During the same time, Open Geospatial Consortium (OGC) also took up spatial data sharing initiatives and presented interoperable service standards for sharing the spatial data and metadata discovery (Yang et al. 2007; Xia et al. 2015; Agrawal et al. 2022). The most commonly used OGC services are Web Map Services (WMS) for map, Web Feature Services (WFS) and Web Coverage Services (WCS) for data service, Web Processing Service (WPS) for data processing and Catalogue Service for the Web (CSW) for metadata sharing.

With the advancement of technologies in the field of Geographic Information System (GIS) and the World-Wide-Web Consortium (W3C), many countries have taken the initiative to develop a web GIS based framework as a ubiquitous part of its NSDI (Zhang and Tsou 2009). This concept of web based NSDI ensures the standardization, interoperability and sharing of spatial datasets. This provides an interactive web based interface for spatial data discovery and access through a resource search engine which is an essential component of spatial web based infrastructure (Zhang and Tsou 2009). For this purpose, the three-tier client–server architecture consisting of data layer, application layer and client layer is commonly adopted for developing web GIS based framework for applications in the field of GIS. In fact, most of web based SDI applications are built via OGC compliant services using Open Source Software (OSS) (Willmes et al. 2014; Gupta et al. 2022).

During the recent past, several organizations have developed their own SDI at different levels from local to country level. Geodata Versatile Information Transfer environment (GeoVITe) is Academia SDI for Zurich, Switzerland which provides data visualization and download for academia (Matthys and Gkonos 2017). The Global Earth Observation System of Systems (GEOSS) of U.S. facilitates the discovery and access of Earth Observation (EO) data from metadata catalogue (Liu et al. 2011). Wang and Liu (2009) presented a TeraGrid GIScience Gateways implementation from web 2.0 technology which adopts Service Oriented Architecture (SOA) for parallel and distributed data processing. The Participatory Geographic Information Systems for Transportation (PGIST) is developed for public participation in transportation and for decision-making in Washington State of U.S. (Nyerges et al. 2006). Many other web based SDI examples can be seen in different spatial research domains such as agriculture (Chen et al. 2015), disaster assessment (Mazzetti et al. 2016), land (Xing et al. 2015, 2019), urban planning (Zhang et al. 2019a) and energy (Trumpy and Manzella 2017). These initiatives in SDI all over the world show that there is a need for the development of a robust web GIS based SDI framework using OSS and OGC compliant services for efficacious implementation of SDI from local to country level in both the developed and developing countries.

A diverse and large amount of multi-source data can be made available over SDI. However, it faces the challenges of concurrent data access, computational intensity and high workload due to big data and intensive resource demand of geoscience applications (Xia et al. 2015; Gkonos et al. 2019; Jiang et al. 2019). To improve the SDI performance in terms of processing, indexing, querying and handling concurrent requests, numerous technologies have been adopted (Knoblock et al. 2017) which includes High Performance Computing (HPC) systems like Graphics Processing Unit (GPU) (Zhang et al. 2014) and Hadoop Distributed File System (HDFS) (Shvachko et al. 2010; Tripathi et al. 2018). Several spatial analysis workload balancing methods are proposed to deal with the scheduling of service requests but they require high computational resources (Huang and Li 2009; Thulasidasan et al. 2010; Huang et al. 2011). The integration and configuration of these resources are not only costly but their customization is also very difficult (Huang et al. 2018; Mokarram and Khosravi 2021). Since the upgradation and / or customization of these computing resources by these technologies have their own advantages and limitations, outsourcing of computing resources hosted on other enterprises is required so that organizations can concentrate more on data, user and computational aspects rather than on system configuration.

Cloud computing is one such technology that has the ability to provide convenient and on-demand network access to shared computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (Mell and Grance 2011). Several research articles related to web based spatial data application within the cloud are available (Lee and Liang 2011; Tang and Feng 2017; Huang et al. 2018; Tripathi et al. 2020; Mete and Yomralioglu 2021). Zhang et al. (2019b) provide a cloud enabled WPS framework to solve the Earth science problems, especially for Community Multi-scale Air Quality (CMAQ) model. The developed framework used GeoBrain cloud with Infrastructure as a Service (IaaS) architecture from Apache CloudStack to create virtual machines (VMs) for the CMAQ-WS. Huang et al. (2018) proposed a Hybrid Cloud Computing (HCC) platform by using Eucalyptus as a private cloud platform and Amazon EC2 as a public cloud platform to test the dust storm forecasting. Wang et al. (2016) proposed an HPC framework after adopting the OpenStack private cloud to manage and process spatial big data. Kang and Lee (2016) performed image processing by using auto-scale enabled private cloud.

The above studies demonstrate that adoption of cloud computing is a better solution for spatial applications. In fact, virtualization technique is the key to fostering cloud computing which refers to the abstraction of computing resources such as storage, network, database and memory from the end-user (Xing and Zhan 2012). Cloud has many characteristics such as on-demand service, resource pooling, rapid elasticity, pay as per service and broad network access to meet end-users’ requirements (Vouk 2008; Zhang et al. 2010; Mell and Grance 2011; Yang et al. 2011). Rapid elasticity is one of the core characteristics of the cloud that enables the user to scale up or down the computing resources based on the application requirement (Herbst et al. 2013). Thus, cloud computing provides easy access to computing resources, customization of the computing resources, distributed computing environment for the large scale spatial datasets and fault-tolerant computing platform. But much research could not be found related to adoption of cloud computing in SDI applications. Further, the private cloud, a free but having many cloud enabling tools for creating private cloud environment (Sotomayor et al. 2009), has been limitedly adopted for the deployment and testing of the spatial applications (Nurmi et al. 2009; Kang and Lee 2016; Eken and Sayar 2019). The authors, to the best of their knowledge, could not track any research on the web GIS based SDI development at city level using open standards for multi-source geospatial data and related applications under open source based private cloud computing environment.

This research is a step forward to fill the aforesaid research gap by creating and deploying the SDI framework developed at city level on the private cloud OpenStack open source cloud solution. The main aim of the present research is to develop and implement cloud based multi-tier framework for city level SDI for Prayagraj city (acronym: GeoCloud4SDI). Accordingly, a web GIS based novel framework for an open and efficient GeoCloud4SDI is developed and implemented on the private but open source cloud through the adoption of multi-tier architecture and cloud enabled SDI services and workflow using load balancing and elastic computing architecture for spatial data discovery, access and processing.

The paper starts with a general introduction to the research background. This background helps in finding the research gaps and in framing the research objectives of the present research. The Methods section gives details of the study area, data used and details of geospatial database used in the development of SDI. The architecture of the GeoCloud4SDI is then discussed. The third section is Results and Discussion that includes the implementation, performance and functionalities of the GeoCloud4SDI. The Conclusion section presents the salient research outcomes.

Methods

Study area

Prayagraj city (formally Allahabad city) is located in the north-central region of Uttar Pradesh, India, and is a participatory of Smart City Mission of the Ministry of Urban Development (MoUD), Government of India. The city is aspiring to become a smart city is its true sense to achieve sustainable infrastructure and economic growth through inclusive development. The geographical extend of Prayagraj city is 25°23′08" N to 25°32′11" N latitude and 81°43′29" E to 81°54′11"E longitude, covering an area of 76.29 km2. The total population of the city is 1,112,544 and the city is divided into 80 wards as per Prayagraj Municipal Corporation. The map of Prayagraj city with ward boundaries is shown in Fig. 2.

Data used

The data used include: (a) Topographic maps- Open Series Map (OSM) number- G44P11, G44P14 and G44P15 at 1:50,000 scale with World Geodetic System-1984 (WGS-84) datum and Universal Transverse Mercator (UTM) projection; (b) Ward map of Prayagraj city; (c) Census data; (d) Satellite imagery of 04–04-2019 of Sentinel-2A at 10 m spatial resolution; (e) Land Use and Land Cover (LULC) map of Prayagraj city created using Sentinel-2A data; (f) Infrastructural facility data generated using Trimble Juno 3B handheld GPS based field survey with an accuracy level of 3-5 m. The data from (a) to (d) is public. The data (e) and (f) is private and generated specifically for this research but is also made available to the public/ decision makers for better planning of Prayagraj district.

Geospatial database development for SDI

The applicability of any SDI depends on the accuracy and comprehensiveness of the geospatial database created for this purpose. Accordingly, a clean, accurate and comprehensive geospatial database using a common spatial framework is created for GeoCloud4SDI using the methodology as shown in Fig. 1. The spatial and non-spatial databases are first created, followed by their proper integration through the use of common code/ identifier. The ward boundary map of Prayagraj city along with the cantonment area thus prepared is shown in Fig. 2. The datum used is WGS-84 and the projection is UTM-44 N. This map is used as base map for creation of various other thematic maps of the study area.

Fig. 1
figure 1

Methodology adopted for geospatial database development for SDI

Fig. 2
figure 2

Prayagraj City Map with ward boundaries

LULC map of Prayagraj city is generated using Sentinel-2A imagery of 10 m spatial resolution of April 04, 2019. For this purpose, Sentinel-2A band 2 (458–523 nm -blue band), band 3 (543–578 nm -green band), band 4 (650–680 nm -red band) and band 8 (785–899 nm -NIR band) are layer stacked using Erdas Imagine software. After layer stacking, the imagery of Prayagraj city was extracted using Clip operation. The image classification is then performed to classify the image in various LULC classes, namely, built up, vegetation, open land, sand, transportation and river using Maximum Likelihood Classifier (MLC), the most commonly used conventional classifier for supervised classification. The overall classification accuracy achieved is 89.06% with kappa coefficient being 87.08 which puts high confidence level in the classification. The classified LULC map of Prayagraj city is shown in Fig. 3.

Fig. 3
figure 3

LULC map of Prayagraj city (04–04-2019)

Thematic maps of infrastructural facilities and places of tourist interest are then created and are shown in Fig. 4 (a) to (h). These thematic are (a) Hospital map [Fig. 4(a)], (b) Police station map [Fig. 4(b)], (c) Fire station map [Fig. 4(c)], (d) Religious place [Fig. 4(d)], (e) Public buildings [Fig. 4(e)], (f) Tourist place [Fig. 4(f)], (g) Road network [Fig. 4(g)] and (h) Railway network [Fig. 4(h)]. All these thematic maps are represented as separate thematic layers in geospatial database created for SDI development and implementation.

Fig. 4
figure 4figure 4

Thematic maps of Prayagraj city showing (a) Hospital map, (b) Police station map, (c) Fire station map, (d) Religious place, (e) Public buildings, (f) Tourist place, (g) Road network and (h) Railway network

Architecture of cloud enabled SDI (GeoCloud4SDI)

The framework of high performance cloud enabled SDI, conceptualised and developed to addresses the spatial data accessing and processing, is shown in Fig. 5. The spiral system life cycle model is used for the development of GeoCloud4SDI. It starts with the requirement specification, followed by analysis, design, implementation and testing. In the first phase, the geospatial database for SDI is created. In the second phase, it is uploaded on the GIS server. Then, OGC compliant web services for GeoCloud4SDI are added. In the final phase, GeoCloud4SDI is deployed on the OpenStack cloud. The developed GeoCloud4SDI can now be accessed anywhere in the world by any user having internet connection.

Fig. 5
figure 5

Four-tier architecture developed for cloud enabled SDI

GeoCloud4SDI is based on the concept of multi-tier architecture that includes four layers, namely, (i) physical layer, (ii) cloud services layer, (iii) geospatial services layer and (iv) client layer. The physical layer and the cloud services layer combined together can also be termed as Cloud Virtual Layer. The cloud virtual layer enables the access of virtualized computing resources from the cloud. These virtualized resources are utilized to enhance the performance of any scientific application including development and implementation of City level SDI.

Physical layer

This layer enables the implementation of cloud infrastructure to build a data center as per the need of SDI under development using OpenStack to create IaaS environment to manage the network, storage and VMs. OpenStack provides VMs to users through the virtualization technique. Several types of VMs hypervisors are available to support server virtualization in the OpenStack cloud environment like Xen hypervisor (Barham et al. 2003), Kernel-based VM (KVM) (Kivity et al. 2007) and VMWare (Ali et al. 2012). Hypervisor running on this architecture is managed by KVM hypervisor. The management of these cloud services is done by web based dashboard in this study. A detailed description of the core components of the OpenStack is shown in Table 1 which are used to manage the cloud resources.

Table 1 Description of relevant OpenStack cloud components used

After installing and configuring the OpenStack, the setup of new VM, volume and network protocol is done. First, a new VM instance is launched using QCOW2 image format of Ubuntu 16.04 server. This new instance is then loaded and booted in cloud, and considered as a master node. This master node is used to setup the computation infrastructure for the SDI. Now, OpenStack volume for storing multi-source spatial data is created. It stores both remote sensing and facility data into geospatial database. This setup is done separately from the master node due to inherent advantages like ability to restore the data from volume when failure occurs on master node as well as to extend the volume size from present size of 1 GB to an acceptable size based on the system specifications as the data increases. Therefore, the present setup procedure makes the system fault-tolerant and also resolves the problem of storage.

Next, the setup of network configuration and firewall is performed which allows instances to communicate with each other and external network. Through this setup, developer gets the ability to have full control on the system. This setting is applied to the instances by defining the access and security rules. Further, SSH (Secure Shell) protocol, a remote access method to communicate with other networks on the range, is used to get permission to access the master instance externally. After SSH connection, login to the master node is done to setup the environment to host the SDI application.

First, the data directory is mounted to the created volume which includes all the input data (i.e., multi-source spatial data) and output data (i.e. processed data) for the proposed SDI. Next, the software environment is built to create the web based geospatial application by setting different OSS such as GeoServer (Huang and Xu 2011), GeoNetwork (Ticheler and Hielkema 2007), Apache Tomcat (Vukotic and Goodwill 2011) and Postgres/PostGIS. Apache Tomcat is used as a web server and acts as a gateway for the user to interact with the application environment. GeoServer acts as a GIS server where all the spatial data layers are published. PostgreSQL/PostGIS database is used to store all created multi-source spatial layers.

Cloud service layer

This layer controls the access to computing resources from the physical layer and acts as a middleware between the physical and geospatial service layer to customize the computing environment for different spatial applications. The user performs resource queries to search and access the cloud resources such as VM, VM images and networking services for VM which are provisioned through OpenStack dynamically by IaaS service. For this purpose, the snapshot of master instance is created. This snapshot presents as an OpenStack image to launch the other SDI instances as a computing node without worrying about setting up the computing environment again. These computing nodes directly share the volume data in local storage by mounting operation.

After this, the computing resources are utilized and operated for load balancing, auto-scaling and cluster monitoring. Workload balancing involves the distribution of load into multiple computing resources. Auto-scaling is achieved through elastic computing, a fundamental feature of cloud environment. It scales the VMs based upon a predefined threshold. The current status of computing resources is monitored through cluster monitoring functionality. Such a system provides high computational environment for system and data recovery as well as also reduces the duplication of effort to maintain the system for other users or organizations.

Geospatial services layer

This layer provides two different functionalities, namely, (a) the management, retrieval and access of spatial data and metadata, and (b) the retrieval of cloud services from cloud service layer. GeoServer and GeoNetwork are used for geospatial data and metadata management respectively. Here, both tools provide the OGC compliant services for data and metadata management. WMS and WFS services enable interoperable data management and CSW enables catalogue services for metadata management. WPS service forwards the request for data processing to end-user.

Client layer

This is the topmost layer of cloud enabled SDI framework and acts as an interface between users and services. In this layer, SDI application is developed through which client can make request for different services. For the development of SDI application, different programming language tools and packages like HyperText Markup Language (HTML), Java Server Pages (JSP), Cascading Style Sheets (CSS), JavaScript and OpenLayers are used. Clients can make different spatial requests such as access, discovery, processing and visualization from this layer either from desktop or mobile devices. These requests are then forwarded to the geospatial service layer. That responds with the corresponding result to the client using cloud services.

Results and discussion

Implementation of GeoCloud4SDI

To build the cloud environment, a high end computer system with processor of Intel(R) Core(TM) i7-2600 CPU, memory 16 GB and storage of 500 GB is used while the system configuration and specification of OpenStack are presented in Table 2.

Table 2 System specifications for building OpenStack cloud environment

After installing and configuring the OpenStack, a prototype model in the cloud is developed. This need to access and manage the VMs, install software, create and maintain spatial database and develop the GeoCloud4SDI geoportal. The steps of deployment of GeoCloud4SDI are done according to the proposed methodology. The basic workflow of the existing cloud enabled SDI framework includes two domains which are User domain and Cloud domain (Fig. 6).

Fig. 6
figure 6

Cloud enabled SDI Services and Workflow

User domain acts as Software as a Service (SaaS) to deliver a different type of infrastructure facilities information to the end-user by utilising different OGC services. Users can use these facilities for different processing purposes such as visualization, querying, access and download. Cloud domain hosts the smart facility services into the cloud. Request of processing services is accomplished from this domain through the IaaS. Resource manager has the responsibility to identify the resource requirement and allocates them for fast processing speed. The Cloud domain has two major components. First one is Cloud Load Balancer for handling large number of user requests and second one is Elasticity Manager to manage automatic provisioning of resources through scaling up or scaling down. The Load balancing and Auto-scaling utilized for SDI instances are hosted in OpenStack cloud region. These components communicate with each other to provide distributed environment and to monitor the user workload.

User domain services

The user domain provides the services of visualization, query and download. Advanced mapping tools and visualization techniques hosted on the cloud help users to customize their applications. Several studies integrated the mapping tools and services for spatial visualization. For example, Barik et al. (2019) developed a prototype model to compute and visualize the spatial data using the Q-GIS cloud plugin interface. Huang et al. (2018) developed a HCC platform to process the dust storm data and visualize it through GeoViz. Li et al. (2017) developed a web based visual analytic system integrated with the cloud to analyse and visualize the spatial model results. Fujioka et al. (2012) used GeoServer as a mapping service interface to visualize the marine life census result, which is maintained over the cloud. In this research, visualization tool will help to access are necessary facility datasets of Prayagraj city.

In GeoCloud4SDI, the user domain functions will respond directly to the user requests. Apache Tomcat server will receive the request and send it to GeoServer. GeoServer retrieves the data from Postgres/PostGIS database and returns the result to the user. Visualization service is provided on the GeoCloud4SDI through WMS which dynamically generates the spatially referenced image maps with the help of OpenLayers and JSP coding. Query service is also added to the GeoCloud4SDI in which the user can apply a condition on the data and view the filtered result based on this condition. Spatial query is performed through Common Query Language (CQL). The results of query is returned to the client in the form of WMS which is rendered on the client screen. Download functionality helps users to download the data and to use it in other applications or share it with users.

Cloud domain services

Cloud load balancer

In the cloud environment, several VM instances are created to respond to multiple incoming requests. The distribution of these requests to multiple instances increases the performance of the system. Cloud load balancer is an essential functionality of cloud computing that distributes user requests to multiple instances. Cloud load balancer used in the present study in OpenStack is described in Fig. 7. The benefits of a load balancer are to handle large number of requests, reduce the response time, improve the data access time and achieve fault-tolerant environment.

Fig. 7
figure 7

OpenStack cloud load balancer architecture

In this architecture, different instances are called from the cloud service provider end. Resource manager has the responsibility to create the VMs. For this purpose, resource manager first checks the availability of the resources based on storage capacity, images of operating systems, previously created VMs and private IPs. Then, GeoCloud4SDI instances are created as per the available resources. Now, public IP is assigned to the instances so that it can communicate with the external network. Network access policy is used to provide communication protocols like TCP, ICMP, HTTP, HTTPS, SSH to the instances. All these instances are referred to as a single access point for the GeoCloud4SDI and are managed by resource manager. These instances are called by the same IP address. Therefore, same public IP is assigned to all the instances. This public IP is the address of that system on which all the instances are running. The primary task of the resource monitor is to periodically check the current status of the instances like CPU, memory utilization and mean response time. When user requests for the OGC services, resource monitor forwards this task to the scheduler that laid it into the queue. In the ready state, these requests are distributed to the GeoCloud4SDI instances based on its current status, i.e. to those with a minimum workload.

Cloud elastic service

Cloud elastic service provides the functionality of auto-scaling of cloud resources (Fig. 8). It helps to scale the system as per the user requirements. Elastic service is mainly utilized to perform two important tasks. The first task includes the creation of instances based upon the available computing resources and application requirements. This resource assignment is done through automatic provisioning. Another task is to automatically increase or decrease the number of instances to perform the load balancing. Elasticity monitor continuously records the status of the running instances based on the load on the server. If the alarm is triggered by the Aodh and Ceilometer component of the OpenStack, then elasticity monitor communicates with load balancer and resource manager to control the overhead on the server by provisioning or releasing the computing resources. The elasticity scheduler performs this task of resource provisioning using predefined rules of auto-scaling.

Fig. 8
figure 8

OpenStack elastic service architecture

Performance of GeoCloud4SDI

The OpenStack cloud instance is configured by setting instance type as medium, CPU cores as 1 and memory as 2 GB. Apache JMeter is used to measure the experiment results in terms of average response time. For checking the performance of developed GeoCloud4SDI, following two evaluation tests are then conducted on spatial overlay operation in GeoCloud4SDI:

  1. 1)

    Tests of resource pooling and load balancing

  2. 2)

    Testing of dynamic workload balancing

Testing of resource pooling and load balancing

The efficacy of developed cloud enabled SDI framework, GeoCloud4SDI, is tested for improvement in the overall performance of the system in terms of computation and concurrency by resource pooling and load balancing. This evaluation is performed by launching different set of GeoCloud4SDI instances (1, 2 and 3 instances) from cloud resource pool. This set of instances is managed by load balancer that provides a single access point for multiple instances in the network to perform load balancing. The results of load balancing for GeoCloud4SDI are illustrated in Fig. 9.

Fig. 9
figure 9

Load balancing in GeoCloud4SDI

The results indicate that average response time is reduced by applying resource pooling and load balancing. Moreover, as the number of instances increases, better performance is measured which can be judged from the fact that average response time for a single instance, two instances and three instances are 3.5 s, 2.34 s and 1.73 s respectively in response to 1000 concurrent requests. It is also seen that average response time of system is not constantly proportional to the number of computing resources. For example, the percentage change in performance after increasing the instances from one to two is 33% while after increasing the instances from two to three, it is only 26%.

Testing of dynamic workload balancing

This performance evaluation shows how the implementation of GeoCloud4SDI is benefited from load balancing. This test improves the performance of the system through the elasticity characteristic of cloud. It is done by defining auto-scaling rule in terms of resource pooling and CPU utilization parameters. Two scenarios have been presented for dynamic workload balancing which are: (i) Increasing workload and (ii) Spiking workload.

To setup an auto-scaling experiment in OpenStack cloud, following rules are applied:

  1. 1.

    Min = 1 (i.e. one instance runs all time)

  2. 2.

    Max = 3 (i.e. a maximum of 3 instances can be created)

  3. 3.

    Min = 1 (i.e. a minimum of 1 instance should be running)

  4. 4.

    SD_Threshold = 20% (i.e. if the CPU Load is below 20% in a 30 s interval, then Scale_down() function executes that decreases the number of instances by one)

  5. 5.

    SU_Threshold = 50% (i.e. if the CPU Load is above 50% in a 30 s interval, then Scale_up() function executes that increases instance by one)

The performance measurement results for increasing workload scenario in auto-scaling are presented in Fig. 10 in which load balancer invokes the computing resources based on the above defined auto-scaling rules. It can be observed that at the starting time-interval, single instance is sufficient to respond to 10–350 concurrent users for the scaling instances pattern. Here, performance of single instance for scaling pattern is much similar to one instance performance result. When the number of concurrent users reaches more than 350, the next instance is launched because the CPU workload reaches above 50%. Now, two instances work together and the CPU workload reduces to below 50%. When the concurrent users reach above 750, the new instance is launched to balance the load and the performance pattern matches the result of three instances of load balancer. The result shows that performance of auto-scaling enabled GeoCloud4SDI is better than the load balancer enabled GeoCloud4SDI because it provides facility to save the cost of cloud resources while in load balancing, three instances work together even for lighter workloads.

Fig. 10
figure 10

Auto Scaling results for increasing workload in GeoCloud4SDI

The performance evaluation for spiking workload scenario in auto-scaling is presented in Fig. 11. In this test, 12 min timeline is taken within which the spiking workload performance is measured. It is observed that in the beginning, 30 concurrent users start accessing the system. When the number of users abruptly increases from 30 to 1000, the corresponding average response time grows from 21 ms to 3.42 s. Subsequently, scaling up event gets triggered. Therefore, one more instance is launched that takes approximately 90 s time to boot. Once this instance is launched, the average response time drops to 2.24 s from 3.59 s. The load of the system is still above 50%, even while two instances are working together. Therefore, another instance is launched as per the auto-scaling rule number 5. Now the average response time further drops to 1.64 s from 2.08 s. Afterward, when the requests decrease to 30, scaling down event is triggered and each running instance is released in about 60 s time-interval.

Fig. 11
figure 11

Auto Scaling results for spiking workload in GeoCloud4SDI

Functionalities of GeoCloud4SDI

GeoCloud4SDI is designed and developed to provide geospatial data to users. The developed interface provides several spatial functionalities to the end-users that they can directly apply onto the data. GeoCloud4SDI provides Visualization, Querying, Processing, Metadata Catalogue and Application Specific Module functionalities to the planners and decision makers. The functionalities of GeoCloud4SDI: Cloud based City level SDI developed and implemented for Prayagraj city are highlighted under subsequent heads.

User authentication

The users are authenticated so that only an authorized person can access the data. Before accessing the data and services, the user has to first register into the GeoCloud4SDI and sign in. For the registration, users will have to fill various details like user name, email, user id, password, contact, state and pincode. The custom validation of each detail is attached to the registration so that the necessary checks can be performed.

PostgreSQL database has been used to prepare the user registration and authentication database. The user is tracked at two levels through the creation of two database tables, viz., user_registration and user_schedule. The user_registration database includes the registration details of the users while the user_schedule database includes the login access time, date and session log. When the user authentication triggers, a user session is established with an instance of a database server and the user gets the file access permission which is called as a rollback security. After the successful login, user can access the different facility data and also perform different querying operations.

Visualization

Users can view the different thematic maps of various facilities and their information from the GeoCloud4SDI. Users can select the thematic layers by clicking the visibility checkboxes. Here, the request for data extraction is made by WMS GetMap request where the service URL, server type, name of layer and tiled are set. When user clicks on a layer, at that instance, a GetMap request has been sent to the GeoServer and the corresponding data is displayed on the screen. Here, users can set the visibility and opacity of the displayed data on the map window according to their needs. They can also perform the zoom in and zoom out operation to set the scale of data. One such visualization instance of GeoCloud4SDI interface is shown in Fig. 12.

Fig. 12
figure 12

Visualization on GeoCloud4SDI

Spatial querying

Spatial query is the process of selecting specific features from a layer and is used for data retrieval by the users. The users can access the required area or location of a feature by its attribute. User can navigate to Querying Interface by clicking ‘Geospatial query’ link. In order to create the web based spatial querying interface, spatial query language known as Common Query Language (CQL) is used which is created by OGC for catalogue web services specification. It simply uses text based syntax for filtering rather than XML based encoding. It uses attributes of the data for making spatial querying. In WMS, CQL filter is added with the GetMap request for querying, while in WFS, it is added with the GetFeature request. In this research, two different spatial query operations are implemented: attribute value based query and condition based spatial query operation.

Attribute based query

This is a simple query operation that includes the querying of the data based on their attribute value. For example, user wants to access the hospitals by their service type. This is implemented by performing CQL filtering operation on hospital data. The result of the hospital type wise spatial filtering for diagnostic center and its rendering on the map is shown in Fig. 13.

Fig. 13
figure 13

Attribute based query

Here, user first selects the type of hospital using the drop-down menu on the left side of the panel and then the corresponding result is displayed in the right panel of the interface. Two features are displayed on the map, first the existing hospital layer with lower visibility range and second hospital requested by user. Users can also clear the selected wards using ‘Clear’ button and thus performed another query.

Condition based spatial query

It is an extension of attribute value based query by adding a condition operator for feature selection. For example, user wants to access only those wards which have more than 15,000 populations. In this case, condition based CQL filtering operation is used. The result of population-wise ward filtering and its rendering on the map is shown in Fig. 14.

Fig. 14
figure 14

Condition based spatial query

Here, textbox is used to give input for population and drop-down menu is used to select the operator. After giving input, ‘Submit’ button is pressed, which executes the request and displays the corresponding result. The result displays only those wards which are having more than 15,000 populations.

Conclusions

SDI is an effective solution for discovery, access and utilization of geospatial data to ensure its availability to the public, reduction in duplication, improvement in the quality and reduction in cost. In the present research, GeoCloud4SDI, an interoperable and efficient SDI at city level, is developed and implemented on open source private cloud platform through the adoption of client server web GIS architecture and cloud enabled SDI services through the adoption of load balancing and elastic computing architecture for spatial data discovery, access and processing. The high performance cloud enabled SDI framework is conceptualised and developed based on the concept of multi-tier architecture that includes four layers, namely, physical layer, cloud services layer, geospatial services layer and client layer. GeoCloud4SDI provides an interactive web GIS based interface for geospatial data discovery and access through a resource search engine.

The performance of GeoCloud4SDI is evaluated through two tests, namely, resource pooling and load balancing; and dynamic workload balancing. The results of resource pooling and load balancing show that distribution of workload to multiple instances provides good performance. The dynamic workload balancing is evaluated through two different scenarios, viz., increasing workload and spiking workload. Both these evaluations show the effectiveness of GeoCloud4SDI in utilizing the computing resources used in the developed city level SDI when the concurrent users abruptly increase or decrease.

GeoCloud4SDI provides an effective solution to deal with the spatial challenges of data management, computing intensity and end-user access. Data management is achieved by distributing the data into multiple cloud instances through the cloud load balancer. Computing intensity is handled by the elastic load balancer that provisions or releases the computing resources as per the application requirements. End-user access, a common problem in spatial applications, is handled by auto-scaling and load balancing mechanism. It is observed that minimum response time is achieved when GeoCloud4SDI instances are increased.

This research presents the use of GeoCloud4SDI, developed on OpenStack private cloud, in public administration. The developed framework for GeoCloud4SDI not only provides the customization of private cloud system but also provides the deployment of the application within it. The developed solution is fault tolerant because of the data isolation from the system, i.e., no data will be lost if there is a fault occur in the system due to software or network failure. Security is a major drawback of the public cloud as the data is stored on the other cloud provider enterprises, but this framework is more secure due to being deployed on the private cloud using local computational facilities. The performance evaluation results show that GeoCloud4SDI framework provides several benefits like big data management, ease and fast access to computing resources, cost reduction and global access.

GeoCloud4SDI framework developed for Prayagraj city can also be adopted for other cities in India and the world because it is open, interoperable and cloud enabled due to the use of OSS, OGC interoperable standards and OpenStack cloud that are used worldwide. Further, the development of GeoCloud4SDI will allow planners and administrators, traditionally non-GIS professionals but having basic computer skills, to be able to use geospatial information and geo-processing tools through standard web browsers for achieving sustainable development in Prayagraj city.