Scalable Distributed Genetic Algorithm Using Apache Spark (S-GA)

Maqbool, Fahad; Razzaq, Saad; Lehmann, Jens; Jabeen, Hajira

doi:10.1007/978-3-030-26763-6_41

Fahad Maqbool¹¹,
Saad Razzaq¹¹,
Jens Lehmann^12,13 &
…
Hajira Jabeen¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11643))

Included in the following conference series:

International Conference on Intelligent Computing

1629 Accesses
6 Citations

Abstract

In this era of big data with facilities for advanced real-time data acquisition, the solutions to large-scale optimization problems are strongly desired. Genetic Algorithms are efficient optimization algorithms that have been successfully applied to solve a multitude of complex problems. The growing need for large-scale optimization, and inherent parallel evolutionary nature of the algorithms calls for new solutions exploiting existing parallel, in-memory, distributed computing frameworks like Apache Spark. In this paper, we present an algorithm for Scalable Genetic Algorithms using Apache Spark (S-GA). S-GA makes liberal use of rich APIs offered by Spark. We have tested S-GA on several numerical benchmark problems for large-scale continuous optimization containing up to 3000 dimensions, 3000 population size, and one billion generations. S-GA presents a variant of island model and minimizes the materialization and shuffles in RDDs for minimal and efficient network communication. At the same time it maintains the population diversity by broadcasting the best solutions across partitions after specified Migration Interval. We have tested and compared S-GA with the canonical Sequential Genetic Algorithm (SeqGA). S-GA has been found to be more scalable and it can scale up to large dimensional optimization problems while yielding comparable results.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Big optimization with genetic algorithms: Hadoop, Spark, and MPI

Article 20 May 2023

Implementing Parallel Differential Evolution on Spark

Apache Spark as a Tool for Parallel Population-Based Optimization

Keywords

1 Introduction

Owing to inherently decentralized nature of Genetic Algorithms (GA), a multitude of variants of Parallel GA (PGA) have been introduced in the literature [1, 2]. However, their application has remain limited to moderately sized optimization problems and the research focused mostly on speeding up the performance of otherwise time-consuming and inherently complex applications e.g. assignment and scheduling [11,12,13], or prediction [8], tasks. To deal with large-scale optimization problems multi-core systems and standalone clusters architectures have been proposed by Zheng et al. [6]. They have used distributed storage file system or distributed processing framework like Apache Hadoop, to achieve scalability in PGA [3,4,5,6]. Hadoop Map Reduce [7], is a reliable, scalable and fault tolerant framework for large scale computing. Hadoop requires writing data to HDFS after each iteration to achieve its resilience. In case of CPU bound iterative processing, e.g. in case of Genetic Algorithms, this I/O overhead is undesirable and substantially dominates the processing time. PGA has been explored for numerous interesting applications like software fault prediction [8], test suite generation [9], sensor placement [10], assignment and scheduling [11,12,13], dynamic optimization [23], adapting offspring population size and number of islands [24].

Researchers have made significant efforts to explore the intrinsically parallel nature of genetic algorithms using island model [16], and other PGA models [18]. A lot of efforts have also been made by implementing and testing these models on Hadoop framework [8, 17, 18, 25], by using map reduce strategy to improve scalability. PGAs have been implemented using distributed frameworks and the effectiveness is evaluated in terms of execution time, computation effort, solution quality in comparison with Sequential Genetic Algorithm (SeqGA). However, the above mentioned efforts have been tested on simple problems which have been solved using limited population size and small number of generations overlooking the scalability that can be achieved by using these frameworks to solve large-scale optimization problems. Apache Spark [14] is an open source distributed cluster computing framework that has gained popularity in recent years. It has been shown to be faster than Hadoop for large scale optimization problems. it especially works better for iterative processing [14]. Contrary to Hadoop, Spark keeps data in memory and uses lineage graphs to achieve resilience and fault tolerance. This makes computing faster and eliminates the I/O overhead of read/write to the Hadoop distributed file system (HDFS) incurred in case of map-reduce. Spark provides APIs for generic processing in addition to specialized libraries for SQL like operations [29], stream processing using concepts of mini-batches [26], iterative machine learning algorithms [30], and a Graph processing library [15]. Spark’s efficient data processing has proven to be 100 times faster for in-memory operations and 10 times faster for disk operations when compared to Hadoop MapReduce [25].

In this paper, we propose a Scalable GA (S-GA) for large-scale optimization problems using Apache Spark. S-GA aims to reduce the communication overhead of Apache Spark by optimal resource utilization for large scale optimization tasks. This is contrary to the traditional island model [16], of PGA, where communication among different subpopulation islands is directly proportional to the population and solution size. In S-GA, the communication is independent of the population size and is limited by the Migration Rate and Migration Interval. Hence, reducing a significant amount of data transfer between parallel computations making it scalable and applicable to large scale problems. We have compared S-GA with SeqGA for continuous numerical optimization problems. The experiments have been performed on five different large scale benchmark problems. The results of S-GA have been compared with GA and found to be more efficient.

The paper is structured as follows: In Sect. 2, related work is discussed. SeqGA and proposed S-GA is explained in Sects. 3 and 4 respectively. Experiments and evaluations are discussed in Sect. 5. Finally, we discuss the Conclusions and future work in Sect. 6.

2 Related Work

Generally, there are three main models to parallelize GA i.e. global single-population master-slave (global model), single-population fine-grained (grid model), multiple-population coarse-grained (island model) [1]. Mostly, PGA divides a population into multiple sub-populations. Each population independently searches for an optimal solution using stochastic search operators like crossover and mutation. The Global Model works like SeqGA with one population. The master is responsible for handling the population by applying GA operators while slave manages the fitness evaluation of individuals. In Grid Model, GA operators are applied within each sub-population and each individual is assigned to only one sub-population. This helps in improving the diversity. However, this model suffers from the problem of getting stuck in a local optima, and it has high communication overhead due to frequent communication between the sub-populations. Island Model, [16] uses a large population divided among different sub-populations called islands. GA operates on these islands independently with the ability to exchange/migrate some of the individuals. This helps in increasing the diversity of chromosome and avoid to get stuck in a local optima. SeqGA uses single large population pool and apply stochastic operators on them. Details about SeqGA, is given in Sect. 3.2. Whitley et al. [16], expected that Island model would outperform SeqGA, because of the diversity of chromosomes and migration of individuals among several islands. However, results revealed that Island model may perform better only if migration among sub-populations is handled carefully.

A comparison of Hadoop Map Reduce based implementation of three main PGA models, global single-population master-slave GAs (global model), single-population fine-grained (grid model), multiple-population coarse-grained (island model) is discussed by Ferrucci et al. [8]. They observed that overhead of Hadoop distributed file system (HDFS) make Global and Grid models less efficient as compared to Island model for parallelizing GA for because of HDFS access, communication and execution time. Island model performs less HDFS operations, resulting in optimized resource utilization and efficient execution time. However, they reported experimental results of Global, Grid, and Island models on population size of 200 only, with 300 generations on smaller problems with a limited number of dimensions (only up to 18). Their results concluded that distributed frameworks provide efficient support for data Distribution, parallel processing, and memory management but they incur significant overhead of communication delays.

Verma et al. [17], used Hadoop MapReduce framework to make GA scalable. Their experiments were performed on OneMax problem and they addressed the scalability and convergence as decreasing time per iteration, by increasing the number of resources while keeping the problem size fixed. Keco and Subasi [18] discussed PGA using Hadoop MapReduce. Their focus was to improve final solution quality and cloud resource utilization. They obtained improved performance and fast convergence but there were no improvements in the solution quality due to lack of communication among the subpopulations. Edgar et al. [19], proposed a diversity based parent selection mechanism for speeding up the multi-objective optimization using Evolutionary Algorithm. This novel parent selection mechanism helped to find the Pareto front faster than the classical approaches. Osuna et al. [19], focused on individuals having high diversity located in poor explored areas of the search space. Gao et al. [20], contributed to maximizing the diversity of population in GA, by favoring the solutions whose fitness value is better than a given threshold. They worked on OneMax and Leading One’s [27], problems. The results revealed that algorithm efficiently maximized the diversity of a population. They have presented a theoretical framework and haven’t addressed the contribution of diversity in large-scale optimization problems.

PGA using Apache Spark framework [9], was proposed for the pairwise test suite generation. Parallel operations were used for fitness evaluation and genetic operations. They did not address the large scale data problems and only focused on test suite size generation. Results were compared with SeqGA on synthetic and real-world datasets [9].

Both, GA and PGAs are widely used in several applications. Junior et al. [21], applied parallel biased random-key GA with multiple populations on irregular strip packing problem. In this problem, items of variable length and fixed width need to be placed in a container. For an efficient layout scheme, they used collision-free region as a partition method along with a meta-heuristic and a placement algorithm. Gronwald et al. [22], determined location and amount of pollutant source in air by using Backward Parallel Genetic Algorithm (BPGA). A concentration profile was compiled by considering the readings from different points in an area. BPGA utilized multiple guesses in a generation, and the best one was determined by a fitness function. This best guess was used in the reproduction of next generation.

Previously proposed parallel implementations of GA majorly differ in structuring the population and subpopulations named as the topology. The topology of PGA determines the sub-population model and the sharing of solutions (i.e. sending and receiving solutions from each other) among these subpopulations. These models, when executed using distributed frameworks like Apache Spark, suffer from substantial communication and network overhead. On one hand there is substantial parallelism intrinsic in Genetic Algorithms, and on the other hand, the desired communication hinders the ideal speed-up that could be achieved by using parallel/distributed techniques. There exists a tradeoff between sub-population communication and solution quality (due to population stagnation, getting stuck in the local optima and the lack of diversity).

In conclusion, there is a strong need to develop fundamental new approaches for parallel and distributed GA, while keeping in view the I/O, network, and communication overhead present in the existing distributed large scale computing frameworks. In order to exploit the existing frameworks, the implementation must make optimal resource utilization to gain the ideal speedup.

3 Background

3.1 Apache Spark

Apache Spark [28] was introduced by RAD Lab at the University of California in 2009 in order to overcome the limitations of Hadoop MapReduce. It has been designed for faster in-memory computation for interactive queries and iterative algorithms while achieving efficient fault recovery and compliance with the Hadoop stack. At its core, Spark is a “computational engine” that is responsible for scheduling, distributing, and monitoring applications consisting of many computational tasks across many worker machines, or a computing cluster. Apache Spark provides data distribution using resilient distributed datasets (RDDs), which are Spark’s main programming abstraction. RDDs represent a collection of items distributed across many compute nodes that can be manipulated in parallel. RDD supports two types of operations: (i) Transformations, (ii) Actions. Transformations are lazy operations that create a new RDD from existing data in RDD. Lazy evaluation means that transformations are not executed, and an execution graph is created instead, until an action is called. The actions materialize the lazy evaluations and perform operations (e.g. aggregation) that transfer data from worker nodes to the master node. In order to efficiently work with RDDs it is important to be aware of the internal working details of RDDs, use of narrow transformations and dependencies, reducing the number of actions etc. in order to achieve better speed up with the parallel computing.

3.2 Sequential Genetic Algorithm (SeqGA)

SeqGA [11,12,13, 16, 18, 20], also known as Canonical GA is a stochastic search method that is used to find the optimal solution for a given optimization problem using the Darwinian’s principal of evolution “Survival of the Fittest”. It creates a single pool of possible solutions population (panmixia) and applies stochastic operators (i.e. Selection, Crossover, Mutation, and Survival Selection) to create a new evolved population. This process of new population evolution continues until the population has converged to an optimal solution, or desired time/effort has elapsed. For large scale, or complex problems, SeqGA may require more computational effort like more memory and long execution time (for large population size and more generations).

Algorithm 1 explains the working of SeqGA. (Line 3), Select Parents specifies the individual selection mechanism for reproduction or recombination. Crossover (Line 4) helps to explore the search space by generating new solutions after recombination, while Mutation (Line 5) exploits the solutions for improvement by random perturbation of the selected solution. The Survival Selection scheme decides the number of individuals to be selected from parents and offspring’s for the next generation.

4 Scalable Distributed Genetic Algorithm Using Apache Spark (S-GA)

S-GA creates an initial random population of solutions and distributes them on different partitions as an RDD. The GA operators and fitness evaluations are performed within each partition, independent of the other partitions. We have used roulette wheel selection operator, uniform crossover, interchange mutation operation, and weak parent survival selection, for creation of new offspring’s for the next generation.

In S-GA each partition (corresponding to an island in island model) replaces its weakest solution by the fittest solutions broadcasted by other partitions. Migration Size (Ms) specifies the number of solutions to be broadcasted to other partitions during each migration step. S-GA significantly reduces the communication overhead by minimizing the actions on RDD.

The pseudo code of S-GA is elaborated in Algorithm 2. The population is randomly initialized at line (1) then distributed among m partitions at line (2). Solutions are evolved using stochastic operators at line (6–12). It is worth mentioning here that we have used operations that calculate and sort the fitness within each partition (MapPartitionsWithIndex), therefore reducing the communication overhead and achieving efficient performance. At line (14), SGA broadcasts evolved best solutions (s) to other partitions and the weak solutions from the partitions are replaced with the new broadcasted solutions at line (6). Migration Interval (Mi) defines the number of generations after which S-GA broadcasts the fittest individual (s) of each partition to other partitions. This helps achieving diversity in each subpopulation while searching for the better solutions. The size of the broadcast and Migration Interval contribute to the network communication delay, and directly affect the performance and convergence. We have experimented with several values in our evaluations. Finally the above steps are iterated until the stopping criteria is met. Figure 1 explains the idea of Migration, Migration Size and Migration Interval with an example.

Let’s assume value of Mi = 2, Ms = 1, and fitness function as sphere (i.e., \( f\left( {xi} \right) = \sum\nolimits_{i = 1}^{n} {x_{i}^{2} } \). Initial RDD is created using a population of random solutions. These initial solutions are then evolved using crossover and mutation operators. After every 2nd generation (as Mi = 2), best solution (as Ms = 1) from each partition is migrated to other partitions. As the solution migrates, each partition at the start of very next generation picks all the migrated solutions and replaces them with its weakest solutions at each partition.

5 Experiments

5.1 Experimental Setup

The experiments are performed on a three node cluster: DELL PowerEdge R815, 2x AMD Opteron 6376 (64 Cores), 256 GB .RAM, 3 TB SATA RAID-5 with spark-2.1.0 and Scala 2.11.8. Both S-GA and SeqGA used Crossover scheme: Uniform, Mutation: Interchange, Replacement Scheme: Weak parent, Selection Scheme: Roulette Wheel, Crossover Probability: 0.5, Mutation Probability: 0.05, P = D, and Function: Griewank as configuration parameters. While S-GA also used m: 24 and M_s: 2 as configuration parameters.

5.2 Evaluation Matrics

Speed Up:

It is the ratio of sequential execution time to the parallel execution time. It reflects how much parallel algorithm is faster than a sequential algorithm. Table 1 reflects speed up for all the cases where SeqGA and S-GA converge to VTR (Value To Reach). VTR defines the threshold for convergence. We have used \( \frac{1}{Number \,of\, Dimensions} \) as VTR in experimentations.

Table 1. Experimental results of S-GA and SeqGA.

Full size table

In Table 1, we with different values of Migration Interval and Migration Size. it can be seen that for large Migration Interval and Migration Size, a high speedup was achieved.

Execution Time:

The execution time of SeqGA and S-GA was measured using system clock time. This time was recorded for a maximum of 1 billion generations. Table 2 shows average execution time over 5 runs for each configuration of S-GA. We can observe that execution time reduces significantly when we increase M_i from 50000 to 100000, however fitness error also decreases significantly. This difference in time reduces with an increase in the number of partitions. Migration overhead defines the total number of migrated individual (s) by all partitions after M_i. Increase in m and M_i results in increased network overhead (m* M_i) and hence execution time. But on the other hand this also helps S-GA to converge in a lesser number of generations. Table 2 lists the execution time of Sphere, Ackley, Griewank, Rastrigin, Zakharov, and Sum-of-Different-Power-functions for optimization upto 3000 dimensions (D). For simplicity population size (N) has been assumed to be equivalent to the number of dimensions. G represents the number of generations that have been consumed using specified configurations. VTR as mentioned earlier, is reciprocal to D. Hence VTR would be lesser for 3000 dimensions compared to 2000 and 1000 dimensions. Bold values in Table 2 represents the fitness error that has decreased beyond the specified threshold i.e. VTR.

Table 2. Experimental Results of S-GA.

Full size table

It can be seen from Table 2 that for higher values of M_i (i.e. 100000), each function consumes less time in most of the cases. Broadcasts are also important as they help each sub-population P_j to increase it’s diversity and helps each P_j to get out of local optima. Increased M_i values reduces frequent broadcasts and hence the network overhead. In case of higher M_i, more number of iterations may not improve the optima significantly, due to reduced diversity in the particular sub population. Table 2 reveals the discussed fact as Error is less for M_i = 50000 as compared to M_i= 100000 in most of the cases.

6 Conclusion

In this paper, we have proposed initial results for S-GA using Apache Spark for large-scale optimization problems. The results have been compared with SeqGA. We have tested S-GA for Sphere, Ackley, Griewank, Rastrigin, Zakharov, and Sum-of-Different-Powers functions that are typical benchmarks for continuous optimization problems. We have used population size of up to 3000, Dimensions of up to 3000, Partition Size up to 30, Migration Size up to 03, and Migration Interval to 100000. For few cases S-GA has outperformed SeqGA for higher Population, Partitions, Migration Size, and Migration Interval in term of execution time. In future, we plan to extend S-GA and evaluate different migration and distribution strategies for larger scale and more complex optimization problems.

References

Luque, G., Alba, E.: Parallel Genetic Algorithms: Theory and Real-World Applications. Springer, Heidelberg (2011)
Book Google Scholar
Knysh, D.S., Kureichik, V.M.: Parallel genetic algorithms: a survey and problem state. J. Comput. Syst. Sci. Int. 49(4), 579–589 (2010)
Article MathSciNet Google Scholar
Chávez, F., et al.: ECJ + HADOOP: an easy way to deploy massive runs of evolutionary algorithms. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9598, pp. 91–106. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31153-1_7
Chapter Google Scholar
Di Geronimo, L., Ferrucci, F., Murolo, A., Sarro, F.: A parallel genetic algorithm based on hadoop MapReduce for the automatic generation of JUnit test suites: In: IEEE International Conference on Software Testing, Verification and Validation (2012)
Google Scholar
Salza, P., Ferrucci, F., Sarro, F.: Develop, deploy and execute parallel genetic algorithms in the cloud. In: Genetic and Evolutionary Computation Conference (GECCO) (2016)
Google Scholar
Zheng, L., Lu, Y., Ding, M., Shen, Y., Guoz, M.: Architecture-based performance evaluation of genetic algorithms on multi/many-core systems. In: IEEE International Conference on Computational Science and Engineering (2011)
Google Scholar
Hashem, I.T., Anuar, N.B., Gani, A.Y., Xia, F., Khan, S.U.: MapReduce review and open challenges. Scientometrics 109, 389–422 (2016)
Article Google Scholar
Ferrucci, F., Pasquale, S., Federica, S.: Using hadoop MapReduce for parallel genetic algorithm: a comparison of the global, grid and island models. Evol. Comput. Early Access 26(4), 535–567 (2017)
Article Google Scholar
Qi, R.Z., Wang, Z.J., Li, S.-Y.: A parallel genetic algorithm based on spark for pairwise test suite. J. Comput. Sci. Technol. 31(2), 417–427 (2016)
Article Google Scholar
Hu, C., Ren, G., Liu, C., Li, M., Jie, W.: A spark-based genetic algorithm for sensor placement in large-scale drinking water distribution systems. Cluster Comput. J. Netw. Softw. Tools Appl. 20(2), 1089–1099 (2017)
Google Scholar
Lim, D., Ong, Y.-S., Jin, Y., Sendhoff, B., Lee, B.-S.: Efficient hierarchical parallel genetic algorithm using grid computing. Future Gener. Comput. Syst. 23(4), 658–670 (2007)
Article Google Scholar
Liu, Y.Y., Wang, S.: A scalable parallel genetic algorithm for the generalized assignment problem. Parallel Comput. 46, 98–119 (2015)
Article MathSciNet Google Scholar
Trivedi, A., Srinivasan, D., Biswas, S., Reindl, T.: Hybridizing genetic algorithm with differential evolution for solving the unit commitment scheduling problem. Swarm Evol. Comput. 23, 50–64 (2015)
Article Google Scholar
Gu, L., Li, H.: Memory or time performance evaluation for iterative operation on hadoop and spark. In: High-Performance Computing and Communications and IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC) (2013)
Google Scholar
Wani, M.A., Jabin, S.: Big data: issues, challenges, and techniques in business intelligence. In: Aggarwal, V.B., Bhatnagar, V., Mishra, D.K. (eds.) Big Data Analytics. AISC, vol. 654, pp. 613–628. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-6620-7_59
Chapter Google Scholar
Whitley, D., Rana, S., Heckendorn, R.B.: The island model genetic algorithm: on separability, population size, and convergence. CIT J. Comput. Inf. Technol. 7(1), 33–47 (1999)
Google Scholar
Verma, A., Llorà, X., Goldberg, D.E., Campbell, R.H.: Scaling simple, compact and extended compact genetic algorithms using MapReduce. Illinois Genetic Algorithms Laboratory (Illinois) report no. 2009001, illegal, University of Illinois, Urbana-Champaign (2009)
Google Scholar
Keˇco, D., Subasi, A.: Parallelization of genetic algorithms using hadoop Map/Reduce. SouthEast Eur. J. Soft Comput. 1(2), 56–59 (2002)
Google Scholar
Osuna, E.C., Gao, W., Neumann, F., Sudholt, D.: Speeding up evolutionary multi-objective optimization through diversity-based parent selection. In: Genetic and Evolutionary Computation Conference, Berlin, Germany (2017)
Google Scholar
Gao, W., Neumann, F.: Runtime analysis of maximizing population diversity in single-objective optimization. In: Genetic and Evolutionary Computation Conference, Vancouver, Canada (2014
Google Scholar
Junior, B.A., Pinheiro, P.R., Coelho, P.V.: A parallel biased random-key genetic algorithm with multiple populations applied to irregular strip packing problems. Math. Probl. Eng. 2017, 11 (2017)
MathSciNet MATH Google Scholar
Gronwald, F., Chang, S., Jin, A.: Determining a source in air dispersion with a parallel genetic algorithm. Int. J. Emerg. Technol. Adv. Eng. 7(8), 174–185 (2017)
Google Scholar
Lissoni, A., Witt, C.: A runtime analysis of parallel evolutionary algorithms in dynamic optimization. Algorithmica 78(2), 641–659 (2017)
Article MathSciNet Google Scholar
Lässig, J., Sudholt, D.: Adaptive population models for offspring populations and parallel evolutionary algorithms. In: 11th Workshop Proceedings on Foundations of Genetic Algorithms, Schwarzenberg, Austria (2011)
Google Scholar
Shoro, A.G., Soomro, T.R.: Big data analysis: apache spark perspective. Global J. Comput. Sci. Technol. 15(1), 09–14 (2015)
Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Witt, C.: Runtime analysis of the (µ + 1) EA on simple pseudo-Boolean functions. Evol. Comput. 14(1), 65–86 (2006)
Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 59–65 (2016)
Article Google Scholar
Armbrust, M., et al.: Spark sql: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM, May 2015
Google Scholar
Meng, X., et al.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
MathSciNet MATH Google Scholar

Download references

Acknowledgment

This work was partly supported by the EU Horizon2020 projects Boost4.0 (GA no. ~ 780732), LAMBDA (GA no. ~ 809965), SLIPO (GA no. ~ 731581), and QROWD (GA no. ~ 723088).

Author information

Authors and Affiliations

University of Sargodha, Sargodha, Pakistan
Fahad Maqbool & Saad Razzaq
Bonn University, Bonn, Germany
Jens Lehmann & Hajira Jabeen
Fraunhofer IAIS, Sankt Augustin, Germany
Jens Lehmann

Authors

Fahad Maqbool
View author publications
You can also search for this author in PubMed Google Scholar
Saad Razzaq
View author publications
You can also search for this author in PubMed Google Scholar
Jens Lehmann
View author publications
You can also search for this author in PubMed Google Scholar
Hajira Jabeen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fahad Maqbool .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Polytechnic University of Bari, Bari, Italy
Vitoantonio Bevilacqua
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maqbool, F., Razzaq, S., Lehmann, J., Jabeen, H. (2019). Scalable Distributed Genetic Algorithm Using Apache Spark (S-GA). In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2019. Lecture Notes in Computer Science(), vol 11643. Springer, Cham. https://doi.org/10.1007/978-3-030-26763-6_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-26763-6_41
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26762-9
Online ISBN: 978-3-030-26763-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics