1 Introduction

With the rapid development of parallel systems, relying only on the speed of the processing elements is not satisfactory. The types and the ways of interconnecting these processing elements play a great role in their performance too. This stimulates researchers for proposing hybrid interconnection networks that use both electronic and optical interconnection topologies, which utilize optical and electronic links. These interconnection networks are known as optoelectronic architectures or swapped interconnection networks [1].

One of the well-known optoelectronic architecture that gained attention recently is optical transpose interconnection system (OTIS) [1], where the processors are connected to form groups of basic network, such as ring, mesh, hypercube, etc. The processors inside each group are connected using electronic links, while the groups themselves are connected to each other using optical links. This emerges subsequent types of OTIS networks such as OTIS- Hypercube, OTIS-Mesh [2, 3], OTIS-Mesh-of-Trees [4], and OTIS-Hyper Hexa-Cell (OHHC) [5]. These optoelectronic architectures stimulate the researchers to develop parallel algorithms for basic operations that could be mapped efficiently on them. Such as sorting, routing, data accumulation, prefix sum, consecutive sum and matrix multiplications, all of them were implemented on OTIS-Mesh [6,7,8]. Other operations such as load balancing were implemented on OTIS-Hypercube [9]. However, very little attention was paid in exploiting the optical attractive feature of OTIS optoelectronic architectures in solving NP-hard optimization problems, such as traveling salesman problem (TSP). Thus, TSP was chosen in this paper in order to evaluate the performance of OTIS optoelectronic architectures and investigate the usefulness of them.

TSP is an optimization problem that stands out among the most challenging problems in operational research and computer science [10]. The problem can be described as a group of cities, and a salesperson in a certain city has to visit all the remaining cities once and only once and return to the starting city with a minimum cost tour [10,11,12,13]. This problem can be solved using exact algorithms that evaluate all possible solutions to provide the optimal one for a small number of cities, but they become inefficient when the number of cities is large. Accordingly, numerous heuristic algorithms were proposed to solve this problem to get a suboptimal solution rather than the optimal one. One of those were the constructive methods that build a tour then stop when one solution is obtained, such as nearest neighbor heuristic algorithm [14, 15]. One variation of this algorithm is the repetitive nearest neighbor algorithm, where the minimum route is obtained by applying the nearest neighbor algorithm on each city as a starting city, then finding the minimum route among all the computed ones. However, it is considered inefficient in terms of time, since this involves iterative processes that exhaust the available resources, particularly when it is applied to a huge input size using a sequential machine. Thus, TSP should be solved by parallel algorithms that can perform the computations of large instances of cities in less time. Consequently, in order to solve TSP using the attractive features of optoelectronic architectures and to carry out a performance evaluation between OTIS-Hypercube and OTIS-Mesh, this paper presents a parallel repetitive nearest neighbor (PRNN) algorithm for solving TSP on OTIS-Hypercube and OTIS-Mesh optoelectronic architectures. The algorithm is evaluated analytically under the following performance metrics: number of communication steps, parallel run time, speedup, efficiency, cost and communication cost on OTIS-Hypercube and OTIS-Mesh. Also, its performance was evaluated by simulation runs on both optoelectronic architectures. To the best of our knowledge, there is no work that has been investigated the performance of the OTIS optoelectronic architecture in solving TSP.

The organization of the paper is as follows: Sect. 2 introduces a background on OTIS optoelectronic architectures, namely OTIS-Hypercube and OTIS-Mesh. Section 3 illustrates the PRNN algorithm in solving TSP over OTIS-Hypercube and OTIS-Mesh. Section 4 presents an analytical evaluation of this algorithm on both optoelectronic architectures. Section 5 shows the simulation setup and results. Finally, Sect. 6 summarizes the overall work and presents some suggested future work.

2 OTIS optoelectronic architecture

OTIS optoelectronic architecture provides efficient connections with high scalability and lower complexity among the connected processors, in addition to utilizing the optical technology in sending and receiving data between the connected groups [1]. They are all (group)-to-all (group) network with P processors. This network consists of groups, where these groups are connected as a basic network: such as ring, mesh, hypercube, etc., or these groups can be hybrid networks such as Mesh-of-Trees, Hyper Hexa-Cell, etc. The communication links within each group are electronic and between groups are optical. It is called transpose (swapped) interconnection network because of its strategy in connecting the groups [1]. The connection between processors in different groups is done by transposing processor’s and group’s labels. For example, the third processor in the second group is connected by an optical link with the second processor in the third group and so on [1, 16]. Two OTIS optoelectronic architectures with basic factor networks were chosen, OTIS-Hypercube and OTIS-Mesh, to solve TSP using PRNN algorithm and to evaluate its performance in providing near-optimal solutions with less computational time. The following subsections illustrate the structure of OTIS-Hypercube and OTIS-Mesh.

2.1 OTIS-Hypercube

OTIS-Hypercube consists of \(n^{2}\) or P processors that are partitioned into n groups, and each group contains n or \(P_{G}\) processors interconnected as a hypercube of dimension d, which is equal to log n. For example, in 16-processor two-dimensional OTIS-Hypercube, there are four groups, and each group has four processors, as depicted in Fig. 1. The solid arrows represent electronic links among processors, and the dashed arrows represent optical links among groups. A label of any processor inside OTIS-Hypercube must represent the address of the group and the address of the processor within this group. For example, processor 2 in group 1 must have the label (01, 10). Formally, processor (ij) represents processor j in group i and this processor is connected to processor i in group j via optical link [2].

Fig. 1
figure 1

Two-dimensional OTIS-Hypercube [2]

2.2 OTIS-Mesh

OTIS-Mesh is another example of OTIS basic networks, which consists of \(n^{2}\) or P processors that are partitioned into n groups, and each group contains n or \(P_{G}\) processors interconnected as a two-dimensional \(\sqrt{n} \times \sqrt{n}\) mesh. The communication links within each group are electronic and between groups are optical. For example, in 81-processor OTIS-Mesh, there are 81 processors grouped into 9 groups, and each group has 9 processors, as shown in Fig. 2. Solid lines in Fig. 2, represent electronic links and dashed lines represent optical links, where processor (i, j) is connected to processor (ji). For simplicity, only the optical links for group zero were shown in Fig. 2. It is important to mention that for 16 processors in OTIS-Mesh is similar to 16 processors in OTIS-Hypercube [2], as depicted in Fig. 1.

Fig. 2
figure 2

OTIS-Mesh of nine groups and each group is \(3\times 3\) two-dimensional mesh

3 Solving TSP using PRNN algorithm

Tour construction algorithms can be identified by building a solution through a sequence of steps based on immediate advantageous. These steps keep evolving until a valid solution is obtained. All tour construction heuristics stop when a solution is found without any improvement to get better solutions. nearest neighbor (NN) is the simplest and well-known among TSP tour construction heuristics [14]. Obviously, its popularity resulted from its simplicity in implementation, in addition to its ability to generate a good solution in a polynomial time. The sequential version of NN algorithm starts at a random city and traverses to the nearest city in a greedy way, as shown in Table 1.

Table 1 Sequential nearest neighbor algorithm [14]

The time complexity of this algorithm is O(\(N^{2}\)), since for each city among N cities the algorithm will search other N cities to find which city is the closest. However, NN algorithm can generate an approximate solution above 25% of the Held Karp lower bound [15]. Generally, the quality of the solutions depends basically on the starting city. Therefore, another variation of nearest neighbor algorithm includes repetitive execution of the algorithm on each city as a starting city to get a better solution, but it requires intensive computations and time complexity of O(\(N^{3}\)). On the other hand, the architecture of OTIS interconnections enables us to adopt the repetitive nearest neighbor (RNN), where there are standalone processors connected together using electronic and optical links as a data medium for high-speed communication, and each processor has enough memory to perform different tasks in parallel. This adaptation will meet the purpose to obtain high-quality solutions in less time, in a simple way without compromising the solution quality and with less communication time.

In this section, a well-known TSP heuristic algorithm is selected to be parallelized over interconnections of interest. The parallel repetitive nearest neighbor (PRNN) algorithm along with its illustration followed by its analytical evaluation is presented first.

3.1 PRNN algorithm on OTIS-Hypercube

The PRNN algorithm in this paper is designed based on the attractive features of the selected OTIS optoelectronic architectures, such as the iterative structure among groups, and the optical links existence between these groups. The algorithm is composed of four phases, namely: load balancing phase, data distribution phase, local repetitive nearest neighbor phase and data combining phase. The main coordinator (MC), which is processor 00 in group \(G_{0}\), is responsible for the load balancing phase. This phase is required, since partitioning the cities among processors yields to uneven number of cities assigned to processors. This is due to the fact that, TSPLIB data set contains a number of cities that is not a power of two [17]. Consequently, we have introduced a load balancing algorithm, as illustrated in Fig. 3, and the PRNN algorithm on OTIS-Hypercube is illustrated in Fig. 4.

Fig. 3
figure 3

Load balancing algorithm

Fig. 4
figure 4

PRNN algorithm on OTIS-Hypercube

3.1.1 Phase 1: load balancing phase

The load balancing phase is achieved by the MC processor via applying the load balancing algorithm (Fig. 3), which guarantees only one extra city for the balanced processors. This clearly can be demonstrated through lines 1–13 in Fig. 3. Lines (1–2) calculate the number of cities that must be handled by each processor, and the number of extra cities that must be balanced between them. Lines (3–5) assign the calculated number of cities to each processor in the optoelectronic architecture. Obviously, as mentioned before, partitioning the total number of cities on the total number of processors will yield a remainder that serves as the number of extra cities. To alleviate this problem, lines (6–9) calculate the entire number of groups that must be balanced and the number of processors in the last group that must be balanced too. Lines (10–13) will assign the extra cities to the processors in the balanced groups, then to the processors inside the last group. The time complexity of this algorithm is \(\Theta (P)\), since the extra cities will not exceed the total number of processors P. At the end of this stage, MC processor generates the allocated cities array (ACA) which indicates the set of cities for each processor to apply the sequential repetitive nearest neighbor on; for example, processor P\(_{\mathrm {00\, }}\)in group 0 will apply the sequential repetitive nearest neighbor on cities 0 through 8, as depicted in Fig. 5. Recalling the rest of the phases of the PRNN algorithm: data distribution phase, local repetitive nearest neighbor phase and data combining phase, where they are illustrated in Fig. 4.

Fig. 5
figure 5

ACA created by MC processor

The PRNN algorithm on OTIS-Hypercube in Fig. 4 is illustrated in more details as in Sects. 3.1.23.1.4.

3.1.2 Phase 2: data distribution phase

The main group (MG) is group 0 in OTIS-Hypercube. It contains the MC processor, which is processor <0, 0>. Similarly, each group contains group coordinator (GC) processor, which is the processor that connects the group \(G_{i}\) with the main group via optical link. This processor has label \(j\), 0>. For example, GC processor with label <2, 0> is connected to processor 2 at group 0 (main group) via optical link. Now, assume that the distance matrix (DM) as an array of size \(n^{2}\), where n is the number of cities, is stored on MC. Then, the distribution phase is composed of three steps as follows:

I.:

Electronic main group distribution (lines 2–5 in Fig. 4): MC processor starts the process by balancing the total number of cities among the total number of processors based on the load balancing algorithm, as shown in Fig. 3. According to this algorithm, each processor will obtain a set of cities, where the sequential nearest neighbor algorithm is applied N/P times based on considering different city as a starting city each time. Since DM and ACA are located at MC processor, then it is responsible to send a copy of the DM and ACA to all other processors using one to all broadcast. Initially, MC processor has DM and ACA, and at the termination of this broadcasting there will be \(P_{G}\) copies of DM and ACA, each copy belongs to each processor in \(G_{{0}}\), where \(P_{G}\) is the number of processors in each group. The communication starts along the highest dimension which can be specified by the most significant bit (MSB) of the binary representation of the processor’s label and continues for each lower dimension, as shown in Fig. 6, through steps 1, 2, 3 and 4. This will take \(\log P_{G}\) electronic moves. Then, all the processors that receive DM and ACA in MG start the optical distribution of data as shown in the next step.

II.:

Optical distribution of data (lines 6–7 in Fig. 4): All processors in \(G_{{0}}\) (in parallel) start sending DM and ACA through the optical links to their corresponding processors in other groups. This will require 1 OTIS move. For example, in Fig. 6 (step 5), processor 9 in group \(G_{{0}}\) will send DM to processor 0 in group \(G_{{9}}\) via optical link.

III.:

Inter-group distribution of data (lines 8–9 in Fig. 4): each GC processor in each group in OTIS-Hypercube except \(G_{{0}}\) will repeat in parallel the task of MC processor that was done in electronic main group distribution, in which the GC processor will resend DM and ACA to each processor in its group using the hamming distance, as shown in Fig. 6, through steps 1, 2, 3 and 4. Again, this will take \(log~P_{G}\) electronic moves.

Finally, at the end of this phase, P copies of DM and ACA are distributed to all processors in all groups.

Fig. 6
figure 6

Distribution of distance matrix (DM) on OTIS-Hypercube

3.1.3 Phase 3: local repetitive nearest neighbor phase

  1. I.

    After finishing phase 2, all processors in OTIS-Hypercube will have the distance matrix and the allocated cities array. The allocated cities array determines for each processor the starting city in order to apply the sequential nearest neighbor on; for example, processor 2 in group 0 will apply the sequential repetitive nearest neighbor on cities 18 through 25, as depicted in Fig. 5. Such that P02 will apply the sequential nearest neighbor at the first time on city 18 and stores the route in its route matrix (RM), then applies the sequential nearest neighbor on city 19 and stores the route in its RM and so on. Therefore, in parallel, all processors in OTIS-Hypercube will apply the sequential nearest neighbor algorithm on each city in the set of cities that belongs to each processor. The algorithm will be applied many times based on considering different city as a starting city every time, resulting in different routes for each starting city stored in RM such that each processor will have its own RM. Thus, phase 3 is shown in Fig. 4, lines 10–11.

3.1.4 Phase 4: data combining phase

Data combining phase is done by overturning the order of steps in the distribution phase as follows:

I.:

Inter-group data combining (lines 12–14 in Fig. 4). The aim of this step is to combine all the RMs from all the processors in all groups to their associated GC processors via electronic links. Utilizing the hamming distance, a gather schema will be applied to combine RMs. This step will be performed in parallel to route the RM of each processor to all GCs processors in OTIS-Hypercube. During each communication step, each processor will receive the RM of its directly connected processor and will combine it with its own RM to be forward in the next communication step. The process continues in a similar way until each GC processor in each group has gathered the collected RMs in one array called group route matrix (GRM) which will be sent through the optical link. Note that the size of the communicated message will be enlarged based on the size of combined matrices.

II.:

Optical data combining (lines 15–16 in Fig. 4). All GCs processors in whole OTIS-Hypercube, except GC processor in \(G_{0}\), will send their GRMs via optical links to their corresponding processors in the main group \(G_{0}\).

III.:

Main group data combining (lines 17–18 in Fig. 4). Each processor will combine the collected GRM with its own RM and sends it to its neighbor that differs in dth bit position along the lower dimensions in \(log~P_{G}\) steps. At the end of this step, MC processor collects the GRMs and combines it with its own, in order to find the minimum route among all the collected routes.

3.2 PRNN algorithm on OTIS-Mesh

As illustrated previously PRNN algorithm is composed of four phases. The implementation of these phases over OTIS-Mesh is demonstrated in Fig. 7. Both load balancing and local repetitive nearest neighbor phases are identical in OTIS-Mesh and OTIS-Hypercube, whereas distribution and combining phases differ from one interconnection to another based on the topological structure of these optoelectronic architectures. Thus, only the distribution and combining phases will be illustrated in more details in this section.

Fig. 7
figure 7

PRNN algorithm on OTIS-Mesh

3.2.1 Phase 2: data distribution phase

The distribution phase is composed of three steps as follows:

I.:

Electronic main group distribution (lines 2–6 in Fig. 7): MC processor must send a copy of the DM and ACA to all other processors using one to all broadcast. The communication will be accomplished in two phases. The first phase is row-wise phase, where MC processor will perform one to all broadcast to other \({\sqrt{P_{G}}} -1\) processors in the same row. The second phase is column-wise phase, where each processor received DM will start one to all broadcast to all processors in its corresponding column through electronic links. At the end of this phase, \(P_{G}\) copies of DM and ACA will be sent to \(P_{G}\) processors in \(G_{{0}}\), and this requires \(2{\sqrt{P_{G}}}\)-1 communication steps. Then, all the processors that received DM and ACA in MG start the optical distribution of data as shown in the next step.

II.:

Optical distribution of data (lines 7–8 in Fig. 7): All processors in \(G_{{0}}\) (in parallel) start sending DM and ACA through the optical links to their corresponding processors in other groups. This will require one OTIS move.

III.:

Inter-group distribution of data (lines 9–10 in Fig. 7): Each GC processor in each group in OTIS-Mesh except \(G_{{0}}\) will repeat in parallel the task of MC processor that was done in electronic main group distribution, in which the GC processor will resend DM and ACA to each processor in its group through row-wise and column-wise phases, as have been done in electronic main group distribution.

3.2.2 Phase 4: data combining phase

Data combining phase is done by overturning the order of steps in the distribution phase as follows:

I.:

Inter-group data combining (lines 13–16 in Fig. 7). A gather schema will be applied to combine RMs. This step will be performed in parallel to route the RM of each processor to the GCs processors in the OTIS-Mesh. In column-wise phase, each processor in row number \({\sqrt{P_{G}}}-1\) will perform a gather operation to send its own RM to its directly connected processor in the same column. In the next communication step, each processor in each column combines its own RM with the received one and forwards the concatenated RMs. This process continues, until all processors in the same row with GC processor received the RMs. In row-wise phase, each processor in the same row with the GC processor will combine the received RMs and forward them to the next processor in the same row. Now, GC processor will combine the collected RMs in one array called group route matrix (GRM) that will be sent through the optical links. Note that the size of the communicated message will be enlarged based on the size of the combined matrices.

II.:

Optical data combining (lines 17–18 in Fig. 7). All GCs processors in OTIS-Mesh, except GC processor in \(G_{{0}}\), will send their GRM via optical links to their corresponding processors in the main group \(G_{{0}}\).

III.:

Main group data combining (lines 19–20 in Fig. 7). Each processor will combine the collected GRM with its own RM and sends it to its neighbor column-wise and then row-wise as illustrated in inter-group data combining phase. At the end of this step, MC processor collects the whole GRM and combines it with its own, in order to find the minimum route among all the collected routes.

4 Analytical evaluation

This section provides the analytical evaluation of the PRNN algorithm on OTIS-Hypercube and OTIS-Mesh optoelectronic architectures. The performance metrics used to evaluate the algorithm are parallel time complexity, speedup, efficiency, cost and communication cost.

4.1 Analytical evaluation of PRNN algorithm on OTIS-Hypercube

In this section, the PRNN algorithm on OTIS-Hypercube is evaluated analytically in terms of parallel time complexity, speedup, efficiency, cost and communication cost.

4.1.1 Parallel time complexity

The parallel run time equals to the total communication time, plus the total computation time. The communication time can be measured as a number of communication steps that is required in both the distribution and combining phases. The time required in applying the sequential nearest neighbor algorithm on a set of cities represents the computation time. Thus, the time complexity of PRNN algorithm is captured in Theorem 1.

Theorem 1

The average-case time complexity of PRNN algorithm on OTIS-Hypercube is shown in Eq. (1), where T is the time complexity, N is the number of cities, P is the number of processors and d is the dimension of the hypercube.

$$\begin{aligned} T (N, p) = \Theta \left( P + \frac{N^{{3}}}{P}+N^{2}\times d\right) \end{aligned}$$
(1)

Proof

The analytical evaluation of the parallel run time complexity for all phases of PRNN algorithm on OTIS-Hypercube is demonstrated by tracing the algorithm in Fig. 4, as shown in Table 2.

Table 2 Run time complexity

The overall parallel run time complexity of phases 1–4 is shown in Eq. (2).

$$\begin{aligned} T(N, p) = \Theta \left( P \right) +\, {\Theta }\left( N^{2}\times \, d \right) +\, {\Theta }\left( \frac{N}{P}\times N^{2} \right) +\Theta \left( N^{2}-\frac{N^{2}}{P}\right) \end{aligned}$$
(2)

Equation (2) can be reduced to Eq. (3).

$$\begin{aligned} \textit{T(N, p)} \approx \Theta \left( P+ \frac{N^{{3}}}{P}+N^{2}\times d\right) \end{aligned}$$
(3)

4.1.2 Speedup

The speedup (S) is represented as the ratio between the execution time required to solve a given problem sequentially on a single processor over the execution time required to solve the same problem on parallel machine, that is, \(S = T_{S} / T_{P}\) [18], where \(T_{S}\) is the time required by the sequential algorithm and \(T_{P}\) is the time required by the parallel algorithm. The sequential version of repetitive nearest neighbor requires \(N^{3}\) time complexity, and PRNN algorithm requires time complexity which is illustrated in Eq. (3). Therefore, the speedup of the PRNN algorithm on OTIS-Hypercube is shown as in Eq. (4).

$$\begin{aligned} S= \frac{N^{3}\times P}{P^{2}+N^{3}+N^{2}\times d\times P} \end{aligned}$$
(4)

4.1.3 Efficiency

Efficiency can serve as a performance metric to measure how much the processors being utilized [18] in the optoelectronic architecture. It is the ratio between speedup and the number of processors. Therefore, the efficiency of the PRNN algorithm on OTIS-Hypercube is shown in Eq. (5).

$$\begin{aligned} E=\frac{N^{3}}{P^{2}+N^{3}+N^{2}\times d\times P} \end{aligned}$$
(5)

4.1.4 Cost

The cost of solving any problem on a parallel machine can be defined as the total time required by all the processors in the parallel machine to solve the problem. It can be calculated by multiplying the number of processors with the parallel time, where \(cost=P \times T_{P}\) [18]. \(T_{P}\) is presented in Eq. (3), and P is the number of processors in the optoelectronic architecture. Therefore, the cost of the PRNN algorithm on OTIS-Hypercube is shown in Eq. (6).

$$\begin{aligned} Cost=P^{2}+N^{3}+N^{2}\times d\times P \end{aligned}$$
(6)

4.1.5 Communication cost

Communication cost presents the total number of parallel network links which are utilized during solving any problem. In our case, it is the total number of electronic links and optical links used by PRNN algorithm on OTIS-Hypercube. The communication cost of PRRN algorithm depends on the proposed communication pattern among both the distribution and combining phases. As depicted in Table 3, electronic main group distribution in OTIS-Hypercube requires \(P_{G}-1\) electronic links, due to the one to all broadcast communication pattern. For instance, in the case of 64 processors in OTIS-Hypercube, the one to all broadcast operation is performed among three steps: in the first step, the number of utilized electronic links is 1. In the second step, the number of utilized links is 2, since two processors will send the received distance matrix in parallel. In the third step, the number of utilized links is 4, since 4 processors will send the received distance matrix in parallel, so the total number of the utilized electronic links in this phase is 7. The optical distribution of data requires \(P_{G}-1\) optical links, which is equal to the number of groups in OTIS-Hypercube minus one. The reason of subtracting one from the number of groups is related to the topological structure of the OTIS networks, where all processors in group 0 except processor 0 are connected via optical links to their transposed processors in other groups. Now, in inter-group distribution of data, the number of the utilized electronic links is equal to \({P_{G}^{2}-2P}_{G}+1\), which is obtained by multiplying the number of groups minus one and the number of electronic links that is utilized during the distribution inside one group. Note that, the communication cost for the combining phase is like the communication cost for the distribution phase but in reverse order. Therefore, the total communication cost of applying PRNN algorithm on OTIS-Hypercube is illustrated in Eq. (7). Consequently, the communication cost of OTIS-Hypercube depends directly on the number of processors in each group.

$$\begin{aligned} {Comm}_{Cost}={2(P}_{G}^{2}-1) \end{aligned}$$
(7)
Table 3 Communication cost of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh

4.2 Analytical evaluation of PRNN algorithm on OTIS-Mesh

In this section, the PRNN algorithm on OTIS-Mesh is evaluated analytically in terms of parallel time complexity, speedup and efficiency.

4.2.1 Parallel time complexity

The parallel time complexity of PRNN algorithm on OTIS-Mesh is captured in Theorem 2.

Theorem 2

The average-case time complexity of PRNN algorithm on OTIS-Mesh is shown in Eq. (8), where T is the time complexity, N is the number of cities, P is the number of processors and \(P_{G}\) is the number of processors in each group.

$$\begin{aligned} T(N, p) = \Theta \left( P+ \frac{N^{{3}}}{P}+N^{2}\times \sqrt{P_{G}}\right) \end{aligned}$$
(8)

Proof

The analytical evaluation of the parallel run time complexity for all phases of PRNN algorithm on OTIS-Mesh is demonstrated by tracing the algorithm in Fig. 7, as shown in Table 4. \(\square \)

Table 4 Run time complexity

The overall parallel run time complexity of phases 1–4 is shown in Eq. (9).

$$\begin{aligned} T(N, p) = \Theta \left( P \right) + {\Theta }(N^{2}\times \sqrt{P_{G}} {)+\, \Theta }\left( \frac{N}{P}{\times }N^{{2}} \right) + {\Theta } (N^{{2}}\times \sqrt{P_{G}} ) \end{aligned}$$
(9)

Equation (9) can be reduced to Eq. (10).

$$\begin{aligned} T(N, p) \approx { \Theta (}P{+ }\frac{N^{{3}}}{P}+N^{2}\times \sqrt{P_{G}}) \end{aligned}$$
(10)

4.2.2 Speedup

The speedup (S) is equal to \(T_{S} \,/\, T_{P}\), where \(T_{S}\) is the time required by the sequential algorithm to perform any task on sequential machine and \(T_{P}\) is the time required by the parallel algorithm to perform the same task on a parallel machine. The sequential version of repetitive nearest neighbor requires \(N^{3}\) time complexity, and PRNN algorithm requires time complexity which is illustrated in Eq. (10). Thus, the speedup of the PRNN algorithm on OTIS-Mesh is shown in Eq. (11).

$$\begin{aligned} S = \frac{N^{3}{\times }P}{P^{2}+N^{3}{+}N^{2}\times P\times \sqrt{P_{G}} } \end{aligned}$$
(11)

4.2.3 Efficiency

The efficiency (E) performance metric can be used to measure how much the processors being utilized [18] in the optoelectronic architecture. It is the ratio between speedup and the number of processors. Therefore, the efficiency of the PRNN algorithm on OTIS-Mesh is shown in Eq. (12).

$$\begin{aligned} E = \frac{N^{3}}{P^{2}+N^{3}{+}N^{2}\times P\times \sqrt{P_{G}} } \end{aligned}$$
(12)

4.2.4 Cost

The cost of solving any problem on any parallel machine can be calculated by multiplying the number of processors in this parallel machine with the parallel time needed to solve the problem, where \(cost=P \times T_{P}\) [18]. \(T_{P}\) is presented in Eq. (10), and P is the number of processors in the optoelectronic architecture. Therefore, the cost of the PRNN algorithm on OTIS-Mesh is shown in Eq. (13).

$$\begin{aligned} {\text {Cost}}=P^{2}+N^{3}{+}N^{2}\times \sqrt{P_{G}} \times P \end{aligned}$$
(13)

4.2.5 Communication cost

As depicted in Table 3, electronic main group distribution in OTIS-Mesh requires \(\left( \sqrt{P_{G}} -1 \right) +(P_{G}-\sqrt{P_{G}} )\) electronic links, because of the one to all broadcast communication pattern. For example, in the case of 256 processors in OTIS-Mesh, the one to all broadcast is performed among two phases, the row-wise phase and the column-wise phase. In row-wise phase, three steps are required to distribute the distance matrix to all processors within the same row of MC processor, and this requires \(\sqrt{P_{G}} -1\) electronic links. While in the column-wise phase, every processor received the distance matrix will send it to all processors within the same column, in parallel, that is, 12 electronic links will be utilized during this phase. This can be calculated by multiplying the number of processors in one row inside one group, by the number of processors in one row inside one group minus one. The optical distribution of data requires \(P_{G}-1\) optical links, which is equal to the number of groups in OTIS-Mesh minus one.

Now, in inter-group distribution of data, the number of the utilized electronic links is equal to \({P_{G}^{2}-2P}_{G}+1\), that is obtained by multiplying the number of groups minus one and the number of electronic links that is utilized during the distribution inside one group. Therefore, the total communication cost of applying PRNN algorithm on OTIS-Mesh is illustrated in Eq. (14).

$$\begin{aligned} {\text {Comm}}_\mathrm{Cost}=2(P_{G}^{2}-1) \end{aligned}$$
(14)

Table 5 summarizes the parallel run time complexity, speedup, efficiency, cost and communication cost of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh, where \(T_{Distribution}\) is the time needed for distribution phase as a number of communication steps, \(T_\mathrm{Combinng}\) is the time needed for combining phase and \(T_\mathrm{Parallel}\) is the total parallel run time. As shown in this table, after simplifying the equation of the parallel run time of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh, we notice that the difference between the equations of parallel run time, speedup, and efficiency of OTIS-Hypercube and OTIS-Mesh is the term d in case of OTIS-Hypercube and the term \(\sqrt{P_{G}}\) in case of OTIS-Mesh, where d is the diameter of the Hypercube and \(2\sqrt{P_{G}}\) is the diameter of the Mesh. Note that, d \({\ll }<<\)2\(\sqrt{P_{G}}\) ; therefore, OTIS-Hypercube results are superior relative to OTIS-Mesh.

Table 5 Parallel run time complexity, speedup, efficiency, cost and communication cost of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh

5 Simulation results

A simulation was developed in order to evaluate the performance of PRNN algorithm on both OTIS-Hypercube and OTIS-Mesh optoelectronic architectures. In this section, we present the simulation results obtained from the implementation of the PRNN algorithm on OTIS-Hypercube and OTIS-Mesh optoelectronic architectures.

5.1 Simulation setup

The simulation environment has been set up using Java jdk8 under the Eclipse environment. All simulation runs were conducted on Intel (R) Core (TM) i7, 3.2 GHz Processor with 16 GB RAM, and 4 MB cache memory and windows 8.1 64-bit as an operating system. To conduct the simulation, we used a startup time equals to \(55\,\upmu s\) [19], speed of electronic links equals to 250 Mb/s and speed of optical links equal to 2.5 Gb/s [20]. The simulation measures computation time, communication time, speedup and efficiency. Several runs were conducted, where each run was repeated ten times, and the average was considered.

The implementation of our simulation has the following classes:

  • The distance matrix class is responsible for reading TSPLIB instances from the .tsp files and converts them to a distance matrix that stored the distances from each city to another.

  • The repetitive nearest neighbor class is responsible for performing the sequential nearest neighbor    algorithm on each city in the distance matrix.

  • The load balancing class is responsible for balancing N cities among P processors, where the balanced cities will be stored in the allocated cities array to be distributed to all processors.

  • The OTIS-Hypercube class contains objects of the adopted network size ranges of OTIS-Hypercube, where the simulation starts by specifying the optoelectronic architecture followed by choosing the network size to be one of the classes A, B, C or D.

  • The node class represents each processor and its data, such as its corresponding set of cities, the time required by each processor to find the nearest neighbor, the route matrix to store the generated routes and the cost matrix to store the cost of each route.

  • The OTIS-Mesh class contains objects of the adopted network size ranges of OTIS-Mesh.

The simulation starts by determining the desired optoelectronic architecture and the dimension chosen by the user. Thus, the number of groups, the number of processors inside each group and the type of the communication links optical or electronic links will be determined.

Table 6 shows the range of sizes for each optoelectronic architecture, which varies from 16 to 1296 processors. For simplicity, we named each range with class as illustrated in Table 6. These ranges were specified to obtain the desired size for each optoelectronic architecture such that a proper comparison can be accomplished between them. Each row in Table 6 represents a size range, where the values were chosen in a way that minimize the gap of sizes in each range between optoelectronic architectures.

Table 6 Optoelectronic architectures size ranges

To conduct the simulation runs, the traveling salesman problem library (TSPLIB) [17] was used as test set, which enriched the TSP with a great number of sample benchmarks of different TSP types and different formats. Moreover, it contains the current best known solutions for each data instance. Table 7 shows the chosen TSP data instances from both symmetric and VLSI data sets for these simulation runs. The size of these data instances is varying from 1304 cities to 19,289 cities to test the PRNN algorithm on small, medium and large number of cities.

Table 7 TSPLIB data instances [17]

5.2 Comparative evaluation

In this section, we compare and present detailed discussion of the simulation results of the PRNN algorithm on OTIS-Hypercube and OTIS-Mesh with different granularities ranges from 16 to 1296 processors. The simulation runs were tested using TSPLIB data instances, which are shown in Table 7. Figures from 8–16 demonstrate the performance evaluation metric results obtained from the simulation runs.

Computation time is the time required by each standalone processor to carry out the task. In our algorithm, it is the time in seconds to find the nearest neighbor route for each city among N/P cities, as it should be obvious in Fig. 8. The computation time decreases as the number of processors increases since the term N/P will decrease as the number of processors increases. The impact of the number of processors turns out to be more observable in this figure. Thus, the main contribution is in the computation time.

PRNN algorithm on OTIS-Hypercube and OTIS-Mesh has the same computation time in class A, since both optoelectronic architectures have the same number of processors.

Fig. 8
figure 8

Computation time of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh for frh19289 instance

Thus, PRNN algorithm on OTIS-Mesh recorded better computational time in class B than PRNN algorithm on OTIS-Hypercube, since in this class OTIS-Hypercube has less number of processors, which is equal to 64, whereas OTIS-Mesh has 81 processors. In class C, PRNN algorithm on OTIS-Hypercube and OTIS-Mesh recorded the same computational time, since each has 256 processors involved in the computation. Correspondingly, PRNN algorithm on OTIS-Mesh provides slightly better computational time in class D than PRNN algorithm on OTIS-Hypercube, where PRNN algorithm on OTIS-Mesh required 9.8 s, while PRNN algorithm on OTIS-Hypercube required 11.1 s. Consequently, PRNN algorithm on OTIS-Hypercube gave the worst computational time compared with PRNN algorithm on OTIS-Mesh, since it has less or an equal number of processors in each class.

Table 8 Distribution, combining and total communication time for PRNN algorithm on OTIS-Hypercube and OTIS-Mesh with different size ranges for frh19289 TSP instance

Table 8 demonstrates the distribution time, combining time and total communication time for the PRNN algorithm on the two optoelectronic architectures with the given size ranges for frh19289 TSP instance. As distribution phase relies on distributing the distance matrix with fixed size equal to \(N^{2}\) among all processors in all groups, the diameter of the optoelectronic architectures played an exclusive role in the number of communication steps for this distribution. Therefore, the PRNN algorithm on OTIS-Hypercube and OTIS-Mesh in class A have the same distribution time since they have the same number of communication steps, which is equal to 5, as shown in Table 9. Note that, PRNN algorithm on OTIS-Hypercube was superior in class B, since 7 communication steps were sufficient for the whole distribution. Conversely, PRNN algorithm on OTIS-Mesh provides higher distribution time with 9 communication steps for the distribution. Although OTIS-Mesh has higher number of processors in class B, but it recorded the worst distribution time, and this proved the argument that the number of processors do not contribute in the distribution phase. The PRNN algorithm on OTIS-Hypercube outperforms PRNN algorithm on OTIS-Mesh in class C. In class D, the PRNN algorithm on OTIS-Mesh recorded the worst distribution time with 21 communication step, while the PRNN algorithm on OTIS-Hypercube recorded the best distribution time with 11 communication steps. It is important to emphasize that only the diameter influenced the distribution results here, since the message size was fixed with this one to all broadcast communication. The optoelectronic architecture with low diameter requires a smaller number of communication steps than the one with high diameter. Therefore, the PRNN algorithm on OTIS-Hypercube was predominant in this phase. Interesting observations were seen in this phase, PRNN algorithm on OTIS-Hypercube with 256 processors (Class C) can simulate the communication steps of PRNN algorithm on OTIS-Mesh in class B, because 256 processors of OTIS-Hypercube have the same communication steps as OTIS-Mesh with 81 processors. Accordingly, we can gain higher number of processors and smaller number of communication steps with OTIS-Hypercube in comparison with OTIS-Mesh.

Table 9 Electronic and optical (OTIS) moves for OTIS-Hypercube and OTIS-Mesh

The previous discussion was about the impact of each optoelectronic architecture topological structure on the distribution time. Now, we focus on the discussion about the combining time and its impact factors. As a starting point, it is worth to mention that the communicated message size varies during our combining phase, where each processor in each group starts the gathering phase with different messages of \(N\times N/P\) size, then along the process the message size will be enlarged, since each processor in each communication step will concatenate the received data with its own particular message and resend it in the next communication step. Subsequently, each GC processor will send message with \(N\times N/P\times P_{G}\) size to its transpose processor in the main group, where \(P_{G}\) is the number of processors inside one group. Each processor inside the main group will start the main group combining phase with different message of size \(N\times N/P + N\times N/P\times P_{G}\), and this process proceeds until the MC processor received the entire data with \(N\times N/P\times \) (\(P-1\)) size. Clearly, there are three factors influencing the combining phase, which are the number of processors, communicated message size \(N\times N/P\) and the number of communication steps, which depends on the diameter of the optoelectronic architecture.

The combining time in Table 8 exposes the distinctions and the similarities between OTIS-Hypercube and OTIS-Mesh. In this table, the combining time of PRNN algorithm on OTIS-Hypercube decreases as the number of processors increases in each dimension. As a rule of thumb, when the number of processors increases the communicated message size decreases, the combining time will decrease too. In addition, the amount of augmentation on the number of communication steps in these optoelectronic architectures is a constant factor of two. So, as outlined previously, the combining time depends on the number of processors. Thus, this increment diminishes this effect and will not have the significant impact when the number of processors increases in each dimension. Therefore, the dominant factor in OTIS-Hypercube will relate to the increment of the number of processors. The combining time of PRNN algorithm on OTIS-Mesh increases with the increment on the number of processors; this is due to the large difference between the number of communication steps from one dimension to another in this optoelectronic architecture. So, with the augmentation in the number of processors, this difference will influence the behavior of the combining time. In general, the diameter becomes worse as OTIS-Mesh size increases and this leaves a substantial impact on the number of communication steps in this optoelectronic architecture.

Figure 9 clarifies the total communication time for PRNN algorithm on OTIS-Hypercube and OTIS-Mesh, where PRNN algorithm on OTIS-Hypercube recorded the best communication time and PRNN algorithm on OTIS-Mesh recorded the worst communication time. An interesting observation of PRNN algorithm on OTIS-Hypercube with 1024 processors in class D, where it can achieve near the communication time of PRNN algorithm on OTIS-Mesh with 81 processors in class B. Thus, we can gain less communication time and higher number of processors on OTIS-Hypercube in comparison with OTIS-Mesh.

Fig. 9
figure 9

Communication time of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh for frh19289 TSP instance

Fig. 10
figure 10

Parallel time of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh for frh19289 TSP instance

Parallel time is the time for the whole parallel process it elapses from starting of the distribution process till finishing the final computation by the MC processor. Figure 10 reflects the results regarding the parallel time, which was calculated based on the summation of distribution time, computation time, communication time and the time required by MC processor to find optimal route. It is obvious from this figure, PRNN algorithm on OTIS-Hypercube recorded the smallest parallel time and PRNN algorithm on OTIS-Mesh recorded the largest parallel time. A careful look at this figure shows in class A both optoelectronic architectures have similar parallel time since both have the same structure and the same number of processors (16 processors), as depicted in Fig. 1. Likewise, they recorded the same parallel time in class B, since the loss in communication time compensated the increment in the computation time as illustrated in the previous discussion. In classes C and D, both manifested the differences between the optoelectronic architectures, where PRNN algorithm on OTIS-Hypercube acquired less parallel time than PRNN algorithm on OTIS-Mesh.

Speedup results were illustrated in Fig. 11. Generally, this figure shows the growing of speedup as the number of processors increases, this growth can almost approach the number of processors. In particular, in classes A and B, speedup approaches to 14.3 and 47.4, respectively, for PRNN algorithm on OTIS-Hypercube. This is due to the large ratio between the required time for the computation operation over the required time for the communication operation. However, this ratio gets smaller in classes C and D, owing to the increment on the number of processors while the data size remains small relative to these processors, so that the computation time becomes small relative to the communication time. This result will definitely be better with larger data instances.

Fig. 11
figure 11

Speedup of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh for frh19289 TSP instance

In the same way, a careful look at PRNN algorithm on OTIS-Mesh reveals a slight degradation, particularly in class D, owing to the fact that class D for OTIS-Mesh contains a large number of processors, which is equal to 1296 processor. Thus, the computation time for each processor in class D becomes smaller and simultaneously the communication time becomes large proportional to the communication time in class C; this explains why this degradation occurred.

The previous discussion was about the whole figure which showed in general the influence of the number of processors on the speedup. Now, let’s focus our discussion on each class separately. Beginning with class A, you can observe that the speedup in class A for PRNN algorithm on OTIS-Hypercube and OTIS-Mesh is the same, since they have same topological structure, such as number of processors, number of groups, the way these processors are connected in the groups and the diameter. Therefore, they can achieve the same speedup. Intuitively, classes C and D clarified the difference between the optoelectronic architectures in speedup, as we can see, PRNN algorithm on OTIS-Hypercube outperforms PRNN algorithm on OTIS-Mesh in speedup, due to its factor network (hypercube) which fulfilled less communication steps than mesh, because of its low diameter.

Figure 12 depicts the efficiency of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh. An intuitive result is the decreasing of efficiency as the number of processors increases, since efficiency is defined as the ratio between speedup and the number of processors. So, when the number of processors increases, the ratio will decrease. A careful examination of class A in this figure denotes the achieved excellent efficiency, which approaches to 0.9 for PRNN algorithm on OTIS-Hypercube and OTIS-Mesh. This equivalence of efficiency between these optoelectronic architectures considers the mentioned reasons of the similarity between the topological structures of these two optoelectronic architectures. In general, the PRNN algorithm on OTIS-Hypercube outperforms the PRNN algorithm on OTIS-Mesh in regard to the efficiency for classes B, C and D.

Fig. 12
figure 12

Efficiency of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh for frh19289 TSP instance

Fig. 13
figure 13

Speedup of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh for class D of various TSP instances

Figure 13 demonstrates the speedup of eighteen different TSP instances tested on OTIS-Hypercube and OTIS-Mesh in class D. They are rl1304, dca1389, u1817, djb2036, xqc2175, pcb3038, xqe3891, bgb4355, rl5934, xsc6880, bnd7168, ida8197, dga9698, xmc10150, rl11849, xvb13584, d15112 and frh19289 TSP instances [17]. The name of the TSP instance indicates the number of cities, for example the data instance xsc6880 has 6880 cities, as shown in Table 7. Figure 13 shows increasing of speedup when increasing of TSP instance size for both optoelectronic architectures. It is important to mention that we chose this size range despite the low speedup it provided compared to other size ranges as justified before, because it can show clearly the differences between OTIS-Hypercube and OTIS-Mesh in regard to speedup. As shown in the figure, PRNN algorithm on OTIS-Hypercube gained the best speedup among all data instances.

Fig. 14
figure 14

Speedup of PRNN algorithm on OTIS-Hypercube for reduce and gather operations

Figure 14 shows the results of applying a reduce operation rather than a gather operation on OTIS-Hypercube. As it is obvious from the figure, an improvement on the speedup was obtained using a reduce operation, since each processor only needs to forward the minimum cost route instead of gathering several routes received from other processors. This improvement is about 9% in case of 1024 processors. However, there are two reasons for adopting a gather operation rather than a reduce operation. The first reason is that if the main coordinator processor wants to improve the suboptimal solutions that were provided by the nearest neighbor algorithm, then it needs to have the generated routes, and since it is not necessary that the minimum route generated by the nearest neighbor algorithm will give the best improvement; that is, if it is improved by any TSP improvement algorithm, then the MC processor needs to have all the generated routes by the nearest neighbor algorithm through a gather operation. The second reason is that, if a reduce operation replaces a gather operation, then the behavior of the combining phase will be similar to the distribution phase in which it will be difficult to explain the different factors that affect the performance of combining phase. Thus, the optoelectronic architecture’s performance will be assessed based only on the diameter of the basic network. For example, the performance of OTIS-Hypercube will be assessed based only on the diameter of the Hypercube. So, a performance evaluation between the selected optoelectronic architectures will not be performed as done in the discussion of the results.

Fig. 15
figure 15

Cost of PRNN algorithm on OTIS-Hypercube and OTIS-Mesh for frh19289 TSP instance

Figure 15 illustrates the cost of the PRNN algorithm on OTIS-Hypercube and OTIS-Mesh, where the cost represents the total time required by all the processors in OTIS-Hypercube to apply PRNN algorithm. As it can be seen from the figure, the cost increases as the number of processors increases in the optoelectronic architecture. OTIS-Hypercube recorded less cost among all classes of size ranges, particularly in class D with less than half the cost of OTIS-Mesh. This can be explained based on the definition of the cost, which can be represented as the parallel time multiplied by the number of processors. And as illustrated previously in Fig. 10, the parallel run time for OTIS-Hypercube was less than the parallel run time for OTIS-Mesh. Therefore, multiplying less parallel time in the case of OTIS-Hypercube with less number of processors equals to 1024 processors compared to 1296 processors in the case of OTIS-Mesh, will yield this result.

Table 10 Communication cost for PRNN algorithm on OTIS-Hypercube and OTIS-Mesh
Table 11 Tour quality solutions; \(\Delta \) stands for percentage deviation from the optimal solution

Table 10 demonstrates the behavior of the communication cost as the number of processors increases. It is obvious from the table that, as the network size increases the communication cost will increase too, and this is trivial, since the increment in the number of processors will cause increment in the cost of the optoelectronic architecture, represented by both the electronic links and optical links. This table shows clearly the superior performance regarding the communication cost of OTIS-Hypercube compared to OTIS-Mesh; this can be noticed in classes B (64–81 processors) and D (1024–1296 processors) of the selected size ranges. This can relate to the fact that OTIS-Hypercube requires less communication steps and hence less communication time. Therefore, OTIS-Hypercube utilizes less number of electronic and optical links during the communication phases compared to OTIS-Mesh which recorded higher communication time.

Solution quality of a heuristic algorithm is a major concern since it determines how close the produced solution to the optimal one. Consequently, since the optimal solution of the chosen TSP instances is known, then we are able to measure the solutions quality based on them using the percentage deviation [21] as illustrated in Eq. (15), where \(S_{q}, F_{S}, O_{S}\) denote Solution quality, Found Solution and Optimal Solution, respectively.

$$\begin{aligned} S_{q}=\frac{F_{S}- O_{S}}{O_{S}} \times 100\% \end{aligned}$$
(15)

The PRNN algorithm recorded better solutions than the sequential nearest neighbor algorithm. The starting city in the nearest neighbor algorithm can play an important role in the solution quality. So, we decided to apply the algorithm N times, where N is the number of cities and each time with different starting city, then choose the route with the minimum distance. This helped to obtain better results than applying the sequential nearest neighbor algorithm on a random starting city. Thus, in Table 11, you can notice that the algorithm gave better solutions, for example, the percentage deviation of the best solutions gave an average of 21.2% approximate solution within the optimal solution. The percentage deviation of the average solutions gave an average of 25% approximate solution within the optimal solution. Applying any random initial city such as the case of the sequential nearest neighbor gave in the worst-case an average of 29.1% approximate solutions within the optimal solution. With respect to the solution quality, the solution quality of the majority of heuristic algorithms becomes poor as increasing of the data size [22]. This can be clearly seen in the first three rows of this table compared to the rest of it, where the best percentage deviation gave an average of 15.4% approximate solution within the optimal solution. Moreover, Fig. 16 shows this fact, by illustrating how the solution quality decreases as data size increases. Note that in this figure, the best solution quality is represented as how far it is from the optimal solution. On the other hand, it is important to mention that the solution quality was not influenced neither by the type of optoelectronic architecture nor by the number of processors, but depends upon the data size; therefore, the obtained solutions were similar on both optoelectronic architectures for all size ranges.

Fig. 16
figure 16

Solution quality degradation as data size increases

6 Conclusions and future work

In summary, this paper introduced and evaluated PRNN algorithm for solving symmetric TSP on two selected OTIS optoelectronic architectures namely: OTIS-Hypercube and OTIS-Mesh. We discussed in detail four phases of the algorithm, these phases are load balancing phase, data distribution phase, sequential nearest neighbor algorithm phase and data combining phase. Each phase was evaluated analytically and was carried out by simulation on each optoelectronic architecture. The conducted runs examined the algorithms over different data instances from TSPLIB with various sizes. For comparison purposes, we suggested four classes of size ranges such that a performance evaluation of these optoelectronic architectures can be established. A comparative evaluation is presented as a detailed discussion of the simulation results of the PRNN algorithm on OTIS-Hypercube and OTIS-Mesh with different granularities ranges from 16 to 1296 processors. We evaluated the performance of the PRNN algorithm over OTIS-Hypercube and OTIS-Mesh in terms of the number of communication steps, parallel run time, speedup, efficiency, cost and communication cost. In general, the number of processors in the optoelectronic architecture has a great effect on the computation time of the PRNN algorithm. For example, PRNN algorithm on OTIS-Mesh recorded better computation time than on OTIS-Hypercube, since OTIS-Mesh has equal or higher number of processors in each class of the network size ranges. Regarding the communication time, particularly the distribution phase, the diameter of the optoelectronic architecture played the exclusive role, where in the case of OTIS-Mesh, the diameter becomes worse as OTIS-Mesh size increases and this leave a substantial impact on the number of communication steps in this optoelectronic architecture, compared to OTIS-Hypercube. On the other hand, the combining phase was affected by different factors, such as the diameter of the factor network that exists in each group, the number of communication steps, the communicated message size and the number of processors in each optoelectronic architecture.

The analytical evaluation results show that the parallel run time of PRNN algorithm over OTIS-Hypercube equals to \({\Theta }\left( P{+ }\frac{N^{{3}}}{P}+N^{2}\times d \right) \), and over OTIS-Mesh equals to \({\Theta (}P{+ }\frac{N^{{3}}}{P}+N^{2}\times \sqrt{P_{G}} )\), which shows that the parallel run time of PRNN algorithm over OTIS-Hypercube is better than over OTIS-Mesh.

The simulation results achieved high speedup among the two optoelectronic architectures of interest. It was clear from the simulation results that PRNN algorithm on OTIS-Hypercube gained the best results in terms of communication time, parallel time, speedup, efficiency, cost and communication cost, in comparison with PRNN on OTIS-Mesh. For instance, the speedup is recorded 90.9 by OTIS-Hypercube in class D, while by OTIS-Mesh the speedup is recorded 45.7 in the same class. The results were justified based on the factors that influenced by both the computation and communication time.

These optoelectronic architectures share attractive features; since partitioning OTIS optoelectronic architecture into N groups of N processors can support large-scale systems with less cost and less complexity design. These interesting features enabled us to adopt the algorithm in a way that meet the purpose to obtain a high-quality solution in less time. Moreover, they helped to record near-linear speedup approaches to 14.3 and high-efficiency approaches to 0.9 in case of having 16 processors. Therefore, this will stimulate researchers to apply other problems in computer science field as a future work. Also, a comparative study can be applied between each OTIS optoelectronic architecture with its factor network; for example, OTIS-Hypercube and Hypercube to study the amount of gaining performance between them.