Keywords

1 Introduction

With the popularity of mobile devices and wireless network, many things such as sharing bicycles can be deemed as moving objects. The problem of processing continuous range queries over moving objects has attracted extensive attentions because many location based services can be formulated as this problem. The semantic of the continuous range query in our work refers to the locations of queries and objects are both continuously changing. Although this semantic enhances the difficulty of processing this type of queries but it accords with the reality and there are some cases that can illustrate this point. To capture suspects, polices will monitor the vehicles passing into or out of a specified region and the polices will probably adjust the search region frequently, which is indeed a continuous query processing because the results need to be consecutively updated with vehicles moving and the query scope is also changing constantly.

In another example, a comprehensive service in cap-hailing applications is that the data center needs to continuously seek for nearby taxies for a user who is walking, and this service is also a continuous range query with locations of queries and objects are all dynamically varying. Since the continuous range query has broad applications, so we concentrate on devising an distributed framework with efficiency to address this problem.

In this work, a continuous range query (CRQ for short) over moving objects refers to returning moving objects inside a user-defined region in real-time and continuously monitoring the change of query results as the query region constantly changes over a certain time period. In the big data background, processing CRQs over moving objects is faced with unprecedented challenges. First, we have to face the tremendous volumes of moving objects and queries, which are far beyond the computing and storage capacities of one single server. Second, with the ubiquitous mobile internet, most range queries are online and they desire to be responded in real-time. Last but not the least, it is necessary to constantly monitor the results of CRQs as objects and queries are all moving, which probably involves a gigantic computing cost.

For the sake of challenges, most of existing works are not suitable for dealing with the continuous range queries. This is because many works [7, 10] do not consider the situation that locations of queries and objects are both changing simultaneously. Another critical issue is that most existing proposals [2, 4] always investigate the central algorithms to improve the search efficiency, but they cannot scale well to handle extensive concurrent range queries over a tremendous set of moving objects. Due to the limitations of existing methods, this work explores a distributed framework to search and monitor the results of given continuous range queries based on a cluster of servers in real-time. Specifically, we construct a distributed grid index structure that can be seamlessly deployed in a master-slaves model to support the maintenance of moving objects and the parallel processing of range queries. We further design a distributed incremental search method that can update the results of each CRQ with only computing the incremental result.

Our main contributions can be summarized as follows:

  • This work process CRQs in the scenario where queries and objects are simultaneously moving, which is seldom involved in other existing works.

  • We propose DGI, a distributed grid index, for supporting CRQs over moving objects in a distributed setting.

  • We design DIS, a comprehensive distributed incremental search approach that can take full advantages of a cluster of servers to continuously monitor the results of CRQs in real-time.

  • DGI and DIS are implemented on top of S4, and we conduct extensive experiments to evaluate the performance of DGI and DIS, which confirm its superiority over existing approaches.

2 Related Work

Continuous range query processing is very important due to its broad application base, and it has been extensively studied by existing proposals. Some works [7, 9] investigate processing range queries on road network. Stojanovic et al. [7] propose a framework for continuous range query processing for objects moving on network paths, and introduces an additional pre-refinement step which generates main-memory data structures to support reevaluation of continuous range queries. The work [9] studies the problem of processing range queries on road networks and proposes Voronoi Range Search based on the Voronoi diagram. Due to determining the Voronoi diagram for each object is very expensive, so this approach hardly can support range queries on frequent moving objects. Additionally, some works introduce the concept of safe region to reduce the reevaluation cost. The proposal [4] points out that the cost of monitoring and keeping the location of a moving query updated is very high, and it investigates an efficient technique by adopting the concept of a safe region. That is, as long as the query remains inside its specified safe region, expensive re-computation is not required.

Cheema et al. [2] also adopt the methodology that utilizes he concept of a safe zone to monitor moving circular range queries and propose powerful pruning rules improve the query efficiency. The above works all focus on searching the exact results of queries, but the work [5] pay more attention to approximate range search and proposes approximate static range search (ARS). The work [8] coins a term “Region Queries” to indicate a broad category of spatial range queries, and focuses on showing a complete picture of region queries. These proposals present some excellent search algorithms for monitoring (continuous) range queries, but methods are centralized and their scalability is restricted. Moreover, most of them does not consider the situation that every object is constantly moving as processing continuous range queries.

It is imperative that utilizing the distributed computing model to deal with concurrent range queries over moving objects, and some works [1, 3, 6, 10, 11] have explored this problem. In fact, the distributed framework in these approaches consist of a central server and extensive moving objects and they all require the moving devices to have considerable computational capabilities, which restricts their applicability. In contrast, our approach does not assume any computation capabilities at the mobile objects other than reporting their positions (e.g., the sharing bicycles can be a simple GPS tracking device), and thus has wider applicability.

3 Distributed Grid Index

Given an interest of region covering a large number of moving objects, a grid index partitions this region into four-square cells with the same size. For any cell c i , it is regarded as an index unit that records the locations of moving objects residing in the cell as well as the queries involving it. Grid index structure has been extensively utilized for processing spatial-temporary queries, and we also utilize it to support the processing of CRQs over moving objects. But unlike the existing approaches utilizing the grid index on a single server, we construct a Distributed Grid Index (DGI), namely, deploying the grid index on a master-slaves model, which consists of one master and multiple servers. Next, we will expound the structure of DGI as well as the deployment of DGI on the master-salves model. The symbols will be used in later text is summarized in Table 1.

Table 1. Summary of symbols

DGI consists of a Global Schema Index (GSI) and extensive cell indexes. In particular, DGI refers to deploying the grid index on a master-salves cluster with a general streaming data processing model. In this model, we use the conception of PE to represent a logical processing element, and the data will be encapsulated as an event that can be transferred from one PE to another. Every event is formulated as a triple (type, key, value). When a PE consumes a received event, it will encapsulate the intermediate results as a new event that can be routed to another PE. Here, each PE can receive their desired events by specifying their types and keys, which indeed forms the events routing rule. When this model assigns an event to a corresponding PE based on the routing rule, if the PE does not exist, a new PE will be automatically created to process this event.

In DGI framework, a unique EntrancePE on the master maintains the GSI and every CellPE takes charge of one or more cell indexes in different slaves. GSI is maintained by the master and cell indexes are distributed in different slaves. For GSI, it needs to contain identifiers and boundaries of every cell index. In addition to this, it also has to master the relationships between cells and slaves. To achieve above purposes, the GSI seems to be necessary to store much information that is apt to make it be a bottleneck. But in fact, the GSI designed by us only needs to record the bottom left cell as the reference cell and a naming rule that can be used to identify the identifier of each cell rapidly. Since every cell is a four-square, then the identifier and boundaries of every cell can be deduced instantly. As to the relationships between cells and slaves, we introduce an hash function f(x) that can map the cells to different slaves on the basis of the identifiers of cells, that is, s i  = f(c i ), where s i and c i are the identifiers of slaves and cells.

Each cell index is charged of recording the objects covered by itself and the queries with search scopes intersecting with this cell. For any cell c i , it has three major components, i.e., OL i , FL i , and PL i . The list OL i is used to record the locations of moving objects covered by itself. FL i stores the identifiers of queries whose search regions fully cover c i , while the queries with the search scope partially overlapping with c i are maintained by PL i .

4 Distributed Incremental Search Approach

In this section, we propose DIS, a comprehensive approach to address the challenges of incrementally searching the results of extensive CRQs in real-time.

4.1 Search Initial Results of CRQs

The DRQS framework (as shown in Fig. 1) can be deemed as the fundamental computing architecture, based on which we design search algorithms to process CRQs. Now we discuss SIR, an algorithm to search initial results of CRQs.

Fig. 1.
figure 1

The framework of DRQS

When EntracePE receives a CRQ (q i ) with the search scope sq i , it first determines the cells intersecting with sq i based on GSI and these cells are called candidate cells. We use ℱ to label the set of candidate cells. Since the boundaries of cells are static, the cells in ℱ can be determined with a brute-force comparison of boundaries of each cell with sq i , but which will greatly waste computing costs. To settle this issue, we will firstly find the cell c h that covers the center point of the search scope sq i , and then detect whether each adjacent cell c i of c h has an intersection with sq i . This process will iteratively enlarge the search region until a group of cells surrounding c h do not overlap with sq i . This strategy can quickly find the candidate cells of q i even if the shape of sq i is irregular.

After determining the set ℱ for q, EntrancePE will send a queryEvent carrying q to the CellPEs corresponding to cells in ℱ. To facility presentation, we suppose that each CellPE is charged of one cell index. Hence, when a CellPE receives the queryEvent, it will instantly find the objects covered by sq i and these objects form a partial result of q i . Every CellPE encapsulates the partial result as a resultPEvent and sends it to the QueryPE. In DRQS framework, the final result of every query is calculated and maintained by a unique QueryPE, and the resultPEvent carrying partial results of q i will be routed to the same QueryPE with identifier of q i as the key of resultPEvent. Therefore, QueryPE can obtain the final result of q i by merging all partial results.

4.2 Incrementally Computing Results of CRQs

After obtaining the initial results of CRQs, we still need to monitor the results of every query. We first design IOS, an algorithm for incrementally searching results of CRQs as object are moving based on a publish/subscribe mechanism to constantly update the results for each query in real-time. In IOS algorithm, every CellPE and QueryPE are regarded as a data publisher and query subscriber respectively. We use cp i to label the CellPE matching the cell c i . As to the CellPE cp i , it maintains two registered query lists FL i and PL i . For a given query q i , its candidate cells forms the set ℱ. If c i \( \in \) , q i will be registered in c i . Specifically, q i will be inserted into FL i if sq i \( \simeq \) c i or inserted into PL i if s q \( \sim \) c i . The query q i has to be registered in every candidate cell in this way. Once an object in the cell c i moves, the corresponding CellPE only needs to detect whether the registered queries can be influenced rather than recompute the results of all existed queries.

In fact, we further observe that not all registered queries in c i will be influenced by the movement of every object. Since the result of q i just concerns about the objects covered by sq i rather than their exact positions, so we can deduce that if a CRQ covers the cell ci, it will not be influenced by the movements of objects in

Theorem 1.

For a given cell c i, \( \forall \) o i \( \in \) OL i , its location is \( \left( {x_{i}^{\prime } ,\;y_{i}^{\prime } } \right) \) at the time point t i and the location becomes (x i , y i ) at the time point t i+1 . If \( \left( {x_{i}^{\prime } ,\;y_{i}^{\prime } } \right) \) and (x i , y i ) are both covered by c i , then \( \forall \) q j \( \in \) FL i, its result will not be influenced by the changed location of o i with the condition that \( t_{{q_{j} }}^{s} \) is smaller than t i .

With the help of Theorem 1, if the location of o i in a cell c i changes, we should update the results of the registered queries in c i by handling the following two cases.

  • The first case is that \( \left( {{\text{x}}_{\text{i}}^{{\prime }} ,\;{\text{y}}_{\text{i}}^{{\prime }} } \right) \) \( \prec \) ci and (xi, yi) \( \prec \) cj (i \( \ne \) j). In this case, the results of all registered queries of c i and c j will be probably influenced. For each registered query q i of c i , if \( \left( {{\text{x}}_{\text{i}}^{{\prime }} ,\;{\text{y}}_{\text{i}}^{{\prime }} } \right) \) \( \prec \) sqi, then the CellPE corresponding to c i will notify the QueryPE maintaining q i to remove \( \left( {{\text{x}}_{\text{i}}^{{\prime }} ,\;{\text{y}}_{\text{i}}^{{\prime }} } \right) \) from the result of qi by sending a resultREvent; meanwhile, for every registered query qj of cj, if (xi, yi) \( \prec \) sqj, then the CellPE matching cj will send a resultAEvent to the QueryPE maintaining qi, which aims to notify this QueryPE to insert o i (xi, yi) into results of qj.

  • Another case is that \( \left( {{\text{x}}_{\text{i}}^{{\prime }} ,\;{\text{y}}_{\text{i}}^{{\prime }} } \right) \) \( \prec \) ci and (xi, yi) \( \prec \) ci. According to Theorem 1, \( \forall \) q i \( \in \) FL i , it cannot be affected by the object o i . Hence, only the results of queries in PL i need to be updated. In this case, the CellPE matching ci can directly identify the registered queries affected by oi, and then notify the corresponding QueryPEs to update the query results.

As above, every CellPE corresponds to a publisher and each QueryPE serves as a subscriber. Only if a CRQ q is registered in the CellPEs corresponding to its candidate cells, these CellPEs will continuously monitor the results of q in a parallel way only if their matching cells belong to the set ℱ of q. In this process, different CellPEs will send the moving objects involved with q to a unique QueryPE, which can be deemed a subscriber. This QueryPE will process the received locations of objects as soon as possible to guarantee the exact result of q all the time. Moreover, the QueryPE as a subscriber will take charge of maintaining the result of the query q all through the lifecycle of q. In this scenario, we organize extensive CellPEs and QueryPEs in different servers as a publish/subscribe mechanism that can well support incrementally searching the results of CRQs in a distributed environment.

As to a query q i , if its search scope is sq i at time point t j and sq i becomes \( sq_{i}^{{\prime }} \) at time point t j+1 , then q i is required to be resubmitted to EntrancePE. When receiving q i , EntrancePE will compute ℱ, the set of candidate cells for q i , based on \( sq_{i}^{{\prime }} \), and then send q i to each cell ck (ck \( \in \) ℱ). After receiving q i , the cell c k has to update its two lists of registered queries. Meanwhile, the matching CellPE will instantly find the objects covered by sq’ i from the cell c k . In this case, CellPE indeed employs an incremental search strategy to only compute the incremental result of q i at time point t j+1 based on its existing result, and this incremental search strategy includes the following steps.

  1. (1)

    If q i is a new registered query for c k , the following cases need to be considered.

  • If \( sq_{i}^{{\prime }} \) \( \simeq \) c k , then q i will be inserted into FL k and all objects of c k form a part of the result of q i .

  • If \( sq_{i}^{{\prime }} \) \( \sim \) c k , then q i will be inserted into PL k and c k has to find the objects covered by sq’ i .

  1. (2)

    If q i is an existed registered query of c k , it will be handled with next steps.

  • If (sq i \( \simeq \) c k ) && (\( sq_{i}^{{\prime }} \) \( \simeq \) c k ), which means q i has been in FL k and we have no need to insert q i into FL k again. In this case, q i still covers all objects in c k though it moves. As to the cell c k , it does not need to update the result of q i ;

  • If (sq i \( \simeq \) c k ) && (\( sq_{i}^{{\prime }} \) \( \simeq \) c k ), we need to remove q i from PL k and insert it into QL k . In this case, the incremental result of q i is the set of objects covered by the region \( \left( {sq_{i}^{{\prime }} - sq_{i} } \right) \), which can be rapidly determined.

  • If (sq i \( \simeq \) c k ) && (\( sq_{i}^{{\prime }} \) \( \sim \) c k ), we has to remove q i from QL k and insert it into PL k . At this moment, we only need to remove the objects residing in the scope \( \left( {sq_{i} - sq_{i}^{{\prime }} } \right) \) from the result of q i .

  • If (sq i \( \sim \) c k ) && (\( sq_{i}^{{\prime }} \) \( \sim \) c k ), q i is not necessary to be added into PL k again but c k needs to update the information of q i . Now, we need to search all objects covered by the region (\( sq_{i}^{{\prime }} \) −(\( sq_{i}^{{\prime }} n \) \( \mathop {\bigcap }\nolimits \) sq i )) and these objects belong to the new result of q i .

5 Experiments

We conduct experiments to evaluate the proposed DGI index and DSI approach. To better evaluate the performance of DSI, we introduce two other distributed algorithms as baseline methods. The first method, NS, is a naive search algorithm which does not use any index. For any object, NS uses a hash function to determine which server should store it. Processing a CRQ thus involves scanning all objects maintained in all servers at any time point. The second method, S-DIS, is a simplified DIS method that handles a CRQ as a new query at any time point, that is, it does not utilize the incremental search strategy when processing CRQs.

We use the German road network data to simulate two different datasets for our experiments. In the datasets, all objects appear on the roads only. In the first dataset (UD), the objects follow a uniform distribution. In the second dataset (GD), 70% of the objects follow the Gaussian distribution, and the other objects are uniformly distributed. In these two datasets, the whole area is normalized to a unit square and this square is partitioned into small cells with edge length being 0.01. Moreover, all objects move along the road network, with the velocity uniformly distributed in [0, 0.002] unless otherwise specified. We use V p and V q to represent the velocities of an object and a query. The experiments are conducted on a cluster of 8 Dell R210 servers with Gigabit Ethernet interconnect.

We first evaluate the performance of DGI. Figure 2 demonstrates the time of building DGI with the number of objects varying. Based on the results, we observe that the time cost is in proportion to the number of objects and it is almost not influenced by the distribution of objects. When objects are moving, the velocity of objects will exert an impact on the build time of DGI. In Fig. 3, we evaluate the maintaining time of DGI by processing a set of objects in a specified period of time and find that the maintaining time is slightly affected by the velocity of objects. This is because DGI always needs to remove the obsolete location and insert the new position for processing the movement of an object regardless of the distance it moves one time.

Fig. 2.
figure 2

Building time of DGI

Fig. 3.
figure 3

Influence of V p on maintenance time of DGI

Next, we conduct an evaluation on the performance of DIS approach. In Fig. 4, we first test the time of three approaches when processing the same set of queries. In this group of experiments, we make 10000 queries continuously move and test the time of processing these queries at five consecutive time points. As can be observed, DIS performs better than other two approaches especially at the last four time points. The reason why the time cost of DIS decreases is that only incremental result of queries need to be calculated after the first time point, which can greatly reduce the computing time, while other two methods always process each query as a new one.

Fig. 4.
figure 4

Comparison of three approaches

Figure 5 demonstrates the influence of number of queries on the processing time of each method. Here, we observe that the response time of each approach increases obviously as more and more queries are processed, but the consuming time of NS approach grows more sharply. Due to this set of experiments does not involve continuous queries, so the performances of DIS and S-DIS are almost identical.

Fig. 5.
figure 5

Processing time of three approaches

6 Conclusion

With the dramatic increase of mobile devices and the advances in wireless network, the efficient processing of CRQs has been of increasing interest. This work propose a distributed incremental search approach that sufficiently considers the situation that queries and objects are both moving and only needs to reevaluate the incremental result of every CRQ to cut down the computing costs as well as communication expenses between CellPEs and QueryPEs. Finally extensive experiments are conducted to verify the performance of our proposal.