A new approach based on greedy minimizing algorithm for solving data allocation problem

Mahi, Mostafa; Baykan, Omer Kaan; Kodaz, Halife

doi:10.1007/s00500-023-08452-x

A new approach based on greedy minimizing algorithm for solving data allocation problem

Optimization
Published: 23 May 2023

Volume 27, pages 13911–13930, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

A new approach based on greedy minimizing algorithm for solving data allocation problem

Download PDF

151 Accesses
1 Citation
Explore all metrics

Abstract

Distributed database functionality depends on sites responsible for the allocation of fragments. The aims of the data allocation problem (DAP) are to achieve the minimum execution time and ensure the lowest transaction cost of queries. The solution for this NP-hard problem based on numerical methods is computationally expensive. Despite the success of such heuristic algorithms as GA and PSO in solving DAP, the initial control parameters tuning, the relatively high convergence speed, and hard adaptations to the problem are the most important disadvantages of these methods. This paper presents a simple well-formed greedy algorithm to optimize the total transmission cost of each site-fragment dependency and each inner-fragment dependency. To evaluate the effect of the proposed method, more than 20 standard DAP problems were used. Experimental results showed that the proposed approach had better quality in terms of execution time and total cost.

Heuristic Algorithms for Fragment Allocation in a Distributed Database System

Data allocation in distributed database systems: a novel hybrid method based on differential evolution and variable neighborhood search

Article 29 November 2019

Access Patterns Optimization in Distributed Databases Using Data Reallocation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, one of the most attractive applications has been the development of a distributed database called data allocation problem (DAP). The aim of DAP is to determine the fragments placement in various sites to reduce the total transaction cost when the query is taken from one site to another one. The optimization algorithms of performance analysis with special constraints are used for DAP with the standard test problem (Tosun 2014a; Tosun et al. 2013a). The problems of data allocation for sites are very difficult. The data for the locations of the fragments can be changed. In this situation, the data organization becomes more important. For instance, some items such as parallel query executions, network load, and the server load balancing need to be managed. DAP is an NP-hard problem, without considering the problems mentioned above. DAP can be solved by two types of algorithms: dynamic and static. The static algorithms are implemented based on data allocation on static transaction execution patterns in the target environment. These patterns are changed in the dynamic algorithm (Tosun et al. 2013a; Gu et al. 2006; Mashwani and Salhi 2012). DAP has been solved by several algorithms such as Genetic Algorithm (Tosun et al. 2013a; Mashwani and Salhi 2012), Ant colony optimization (Tosun 2014b; Adl and Rankoohi 2009), Particle Swarm Optimization (Mahi et al. 2015, 2018), and Metaheuristic methods. In this part of the paper, we have reviewed some studies on the DAP solution. Peng et al. (2022) propose an allocation scheme for the storage of data in a collaborative edge-cloud environment, with a focus on enhanced data privacy. Specifically, they first divide the datasets by fields to eliminate as much as possible the correlation between the leaked data.

Anita Brigit Mathew proposed a heuristic algorithm based on separating a database graph among nodes by defining all information on the same or adjacent nodes (Mathew 2018). This heuristic algorithm includes the best-fit decrease with Ant Colony Optimization, which refers to the data allocation in the distributed architectures of the NoSQL database graph (Mathew 2018). An effective data allocation method, which contemplates static and dynamic specifications of data centers to make more effectual datacenter resizing, was proposed by Chen et al. (2018); they used a heuristic algorithm for analyzing the current traffic in the network of data centers through first transmitting the data allocation problem into a chunk distribution tree (CDT). An improved heuristic method based on division and allocation has been proposed by Amer et al. (2012, 2017) all of the mentioned methods are combined into a single, efficient one, which has an effectual solution for Distributed Database Systems (DDBS). Nashat et al. (2018) proposed a method based on a complete taxonomy of the accessible division and allocation in the distributed database schema. Data division in the DDBS has been surveyed by Asma et al. (2017) An improved data allocation through data migration algorithm on task level (TODMA) has been presented by Du et al. (2017); Mayne and Satav (2017). Simulated annealing (SA) is done by Sen et al. to solve DAP (Sen et al. 2016). Radio Frequency Identification (RFID) tag oriented DAP as a nonlinear knapsack problem has been modeled by Wang et al. The heuristic Quadratic Assignment Problem (QAP) was designed and implemented for DAP by Tusun et al. (2013a) They proposed a fast and scalable hybrid genetic multi-start tabu search algorithm that outperformed the other well-known heuristics in terms of the execution time and solution quality (Tosun 2014a; Tosun et al. 2013a). Tosun et al. have presented a set of SA, GA, and fast ACO to solve DAP (Tosun 2014a). Nasser et al. also proposed an innovative hybrid method. Differential Evolution and Variable Neighborhood Search (DEVNS) have also been offered for solving DAP in distributed database systems (Lotfi 2019).

ACO-DAP model based on ACO and local search has been presented by Adl and Rankoohi (2009). In this approach, overcoming on RAPs has been targeted. Genetic algorithms were considered in their method and the simulation results demonstrated that its performance was good. Ulus and Uysal also presented a new dynamic DDBS called threshold algorithm (Ulus and Uysal 2003). In this approach, data reallocation has been done by changing the data access pattern. The obtained results were compared with the genetic algorithm (Tosun 2014a), Tabu search (Tosun 2014a), ant colony (Tosun 2014a), and simulated annealing (Tosun et al. 2013b) in regard to solving 20 problems with various dimensions. The execution time and the total cost were important factors. The proposed algorithm had suitable and comparable results in time; it could be inferred that Greedy-DAP execution time, in comparison with other algorithms, could be regarded as the best. In our work, we want to solve DAP through Greedy-DAP utilization and adaptation. The execution time and fragment allocation quality are investigated experimentally by Greedy-DAP. The simulation results reveal that the Greedy-DAP’s (Mahi et al. 2018) execution time together with cost could have a suitable performance in comparison with other algorithms.

Three goals for data allocation are presented, Cao (Cao 2022), minimizing the number of active servers, minimizing the average number of partitions per data, and balancing servers' workload. Li et al. (2022) are proposed demonstrates that the conventional “graph data allocation = graph partitioning” assumption is not true, and the memory access patterns of graph algorithms should also be taken into account when partitioning graph data for communication minimization. Thalij (2022), a novel high-performance data allocation approach, is designed using Chicken Swarm Optimization (CSO) algorithm. Then the CSO algorithm optimally chooses the sites for each of the data fragments without creating much overhead and data route diversions. Then, the CSO algorithm optimally chooses the sites for each of the data fragments without creating much overhead and data route diversions. Then, the CSO algorithm optimally chooses the sites for each of the data fragments without creating much overhead and data route diversions. This scenario is formulated using an optimization problem called the data allocation problem (DAP). In this paper, in addition to the fact that algorithms based on randomness are not used, we directly find the best solution to the problem in each step and refer to the next steps, and in the new step, we find the best solution and move to the next step. Of course, at each stage after the best solution for resource allocation on the site is obtained, in the matrix of costs, the column in which the part is selected, we put all the columns of that matrix as a very large number. Unfortunately, this problem is that in the next steps, maybe the best answer is again in that column, where we lose this amount. However, according to the obtained results, we have the minimum answer compared to the previous algorithms. The meaning of the best answer, according to the resource allocation matrix on the sites, is the minimum value found in the entire cost matrix. Which is done by replacing the source on the site with the lowest transaction cost. Finally, after replacing the source on the site, we put a very large number matrix in the columns of that column, so that in the next steps, the same minimum will not be found again from that column.

Section 1 serves as the introduction and the rest of the paper is structured as follows; Sect. 2 contains materials and methods of the background information for the greedy algorithm and introduces the proposed method (Greedy-DAP). Comparisons and the experimental results are presented in Sect. 3. Finally, Sect. 4 will conclude the paper.

2 Materials and methods

A greedy algorithm is a simple and intuitive algorithm used in optimization problems. It has a function that calculates the optimal choice made at each step along with finding the overall optimal method to solve the entire problem. Two examples of the problems which can be solved successfully using greedy algorithms including Huffman encoding and Dijkstra’s algorithm; these are used to compress data and find the shortest path through a graph, respectively. The greedy algorithm operates in a way that takes all of the data to a certain problem and sets a rule for the elements to add solution at each step of the algorithm (Astrachan, et al. 2002). Kadam and Kim (2022) show that it is NP-Complete and propose a greedy algorithm to solve it. Table 1 refers to the notations.

Table 1 Description of notations (Adl and Rankoohi 2009)

	1	2	…	j	…	m	Fragment number
Vector p:	2	3	…	i	…	n	Site number of the fragment

A new approach based on greedy minimizing algorithm for solving data allocation problem

Abstract

Similar content being viewed by others

Heuristic Algorithms for Fragment Allocation in a Distributed Database System

Data allocation in distributed database systems: a novel hybrid method based on differential evolution and variable neighborhood search

Access Patterns Optimization in Distributed Databases Using Data Reallocation

Explore related subjects

1 Introduction

2 Materials and methods

2.1 Data allocation problem

2.2 The proposed method (Greedy-DAP)

3 Experimental results

3.1 State 1

3.2 State 2

3.3 State 3

3.4 Discussion

4 Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation