1 Introduction

The goal of optimization is to get the best or near the best acceptable solution, given the constraints and needs of the problem. For a problem, there may be several different solutions that are defined to be compared and then an optimal solution is selected. Research has shown that meta-heuristic algorithms and multi-agent systems can be used as a solution for improving the optimization process. Meta-heuristic algorithms are algorithms that are inspired by nature, physics, and humans and are used to solve many optimization problems. Optimization algorithms are usually used in combination with other algorithms to reach the optimal solution or to exit from the local optimal solution. Multi-agent systems are among the new approaches that have been proposed to achieve the above-mentioned capabilities. Agent-based systems in the field of distributed artificial intelligence have been studied for several years and have been used in various fields of computer science. Agents are able to perceive and react against the environment, having an extensive range of meanings depending on the problem to be solved. Generally, an agent can be characterized by the following four properties: (1) living and acting in an environment; (2) having the ability to sense its local surroundings; (3) acting based on a specific purpose; and (4) having some reactive behaviors (Li and Jing 2016). In a summarization problem, the goal is to create the best or near the best optimum summary of one document or more documents, which consist of three features obtained by extracting the most important contents. These features are as follows (Oufaida et al. 2014):

  • Coverage All the information of important topics in the documents should be covered by the summary.

  • Non-redundancy It indicates the novelty in the summary.

  • Relevancy All the information of contents that are relevant to a user should be contained in the summary.

In general, summarization algorithms can be categorized into two classes, namely, extractive and abstractive summarization (Ferreira et al. 2013). Extractive summarization has to do with concatenating multiple relevant sentences which are chosen without any modification, i.e. exactly as they are used in the original document. Moreover, creating abstractive summaries is a difficult task since it requires paraphrasing sections of the source document. Therefore, it needs natural language generation tools. Furthermore, abstractive algorithms may reuse clauses or phrases from original documents (Nenkova et al. 2011). In this paper, we proposed an algorithm named MAMHOA algorithm, which is a combination of a biogeography meta-heuristic algorithm and multi-agent systems aimed at improving extractive text summarization. In MAMHOA, a set of agents i.e. habitats, is considered which are competing or cooperating with their neighbors to find the best conditions and best habitats. The agents collaborate in a multi-agent system to achieve a specific, shared goal, and each agent performs its own task. This collaboration can be done through communication and exchange of information with neighbors of each agent. The performance of the proposed algorithm is evaluated for single-document extractive text summarization on DUC 2002Footnote 1 datasets. As for evaluation, the proposed algorithm is compared with some of the Baseline and state-of-the-art methods of extractive text summarization.

The rest of the paper is organized as follows: In Sect. 2, the literature review is discussed. In Sect. 3, the original Biogeography-based optimization is explained. In Sect. 4, the proposed algorithm, i.e. MAMHOA is introduced and assessed on the single-document extractive text summarization. In Sect. 5, the results of experiments are presented and discussed, and finally in Sect. 6 the paper is concluded.

2 Literature review

Our work generates a new meta-heuristic optimization algorithm for single-document extractive text summarization based on multi-agent concepts. In the literature, we will briefly review some of the methods in field of text summarization. As a first algorithm, Luhn (1958) is used for text summarization based on statistical features such as the frequency of words and the relative positioning of each sentence in the document. The authors in (Mihalcea and Tarau 2004) introduced a graph-based technique called TextRank. This algorithm represents the document in the form of a graph of sentences, and the edges between two sentences are connected based on the similarity between them. LexRank (Radev and Erkan 2004) applied the notion of eigenvector centrality to the graph representation of sentences, with the aim of estimating the importance of each sentence in the document. Moreover, Steinberger and Jezek (2004) put forth a topic-based algorithm named Latent Semantic Analysis (LSA), which seeks to spot salient sentences in the document through applying Singular Value Decomposition (SVD) over document matrix D of size m× n, consisting of m sentences and n number of terms. Nenkova and Vanderwende (2005) developed a greedy search approximation algorithm called SumBasic that makes use of a frequency-based sentence selection component with a component to re-weight the word probabilities in order to minimize redundancy. Amancio and Tohalino (2018) proposed an adapted model, which includes sentences and edges, which are created given the number of shared words between sentences. The proposed algorithm distinguishes between edges linking sentences in different documents (interlayer) and those connecting sentences from the same document (intra-layer). Rautray and Balabantaray (2018) put forth a new Cuckoo search–based multi-document summarizer (MDSCSA) to deal with the problem of multi-document summarization. They compare the new MDSCSA with two other nature-based summarization techniques, including Particle swarm optimization based summarization (PSOS) and Cat swarm optimization based summarization (CSOS). Haghighi and Vanderwende (2009) developed an algorithm for text summarization, which generates the output summary of the document because of the criteria of Kullback-Lieber (KL) divergence. As a phrase-based summarization technique, Integer Linear Programming (Woodsend and Lapata 2010) encodes the grammaticality constraints across phrase dependencies using integer linear programming. Wan (2010) proposed URANK, which is a unified rank methodology based on graph model. It is able to handle single-document and multi-document summarization simultaneously. Asgari and Masoumi (2013) provided an algorithm to improve the performance of text summarization using bacterial foraging optimization algorithm. Egraph (Parveen et al. 2015) is considered as an entity graph-based system, which makes use of a bipartite graph of sentence and entity nodes for document representation. They developed a system called “Egraph + Coh” which includes a coherence measure calculated by using the averaged-out degree of an unweighted projection graph. Tgraph (Parveen et al. 2015) is an unsupervised graph-based text summarization algorithm that applies Latent Dirichlet Allocation (LDA) for topic modeling. “Tgraph + Coh” takes account of coherence calculated using weighted projection graph for generating a summary. Cheng and Lapata (2016) developed NN–SE, which is a neural summarization model for sentence extraction. SummaRuNNer (Nallapati et al. 2016) is a recent extractive text summarization algorithm, which is based on RNNs. Mirshojaee et al. (2017) used an original biogeography-based optimization algorithm for extractive text summarization. The proposed algorithm is inspired by the natural migration of species between habitats.

CoRank (Fang et al. 2017) is a graph-based unsupervised extractive text summarization approach, which involves combining word-sentence relationship with the graph-based ranking model. Now we discuss some approaches, which use BBO meta-heuristic algorithm in field of optimization. Zhang et al. (2018) presented a novel hybrid algorithm based on BBO and Grey wolf optimizer (GWO), named as HBBOG. In the proposed algorithm, both algorithms (BBO and GWO) are improved. Yang and Li (2017) introduced an improved biogeography-based optimization algorithm that uses the nonlinear migration operator. They have used this proposed algorithm in mobile robot’s path planning. Chen and Tianfield (2016) developed a covariance matrix-based migration (CMM) to reduce BBO’s dependence on the coordinate system, resulting in the enhancement of BBO’s rotational invariance. Khishe et al. (2017) proposed a novel exponential-logarithmic migration operator for the original BBO, named as ELBBO. Paraskevopoulos et al. (2017) proposed a modified BBO named as real-coded biogeography-based optimization (RCBBO) that combined fuzzy decision making for the cognitive radio engine design used in the internet of things (IoT). To improve the population diversity, the authors employed Gaussian, Cauchy and Levy mutations as the mutation operator. Feng et al. (2017) proposed a modified BBO named as PRBBO that was improved for solving global optimization problems. In the modified algorithm, triple combination includes, migration operator combined with random ring topology, a modified mutation operator, and a self-adaptive Pow-ell’s algorithm. In the proposed algorithm, local ring topology is used instead of the global topology to increase population diversity. Lohokare et al. (2013) presented a mimetic BBO named as ABBOMDE, in which the performance of BBO is accelerated with the help of a modified mutation and clear duplicate operators.

3 Biogeography-based optimization algorithm

In this section, we introduce the original BBO algorithm, which is used in the proposed algorithm. BBO is a new intelligence-based meta-heuristic algorithm inspired by nature based on the concept of animal migration to find the best habitat (Simon 2008; Harish 2015). In BBO, each habitat represents a candidate solution of the problem. Species migrate from one habitat to another habitat depending on a suitability index called habitat suitability index (HSI). HSI actually represents the individual’s fitness. Parameters such as rainfall, temperature, region, humidity, etc., affect the excellent characteristics of biological habitats. In BBO algorithm, from Simon’s point of view, these characteristics are called suitability index variables (SIVs). Simply n-dimensional habitat, which is a candidate solution of the problem, is formed by n SIVs whose fitness is denoted by HSI. In BBO (Fig. 1 ), each habitat has an immigration rate (λ) which means that it has a desire to accept poor habitats, and emigration rate (μ) which means that there is a strong tendency to migrate to rich habitat. In BBO, the solution is improved with a focus on the immigration and emigration of solution features within habitats. High HSI habitats share their good features with low HSI habitats, with low HSI habitats accepting the novel features of high HSI habitats. In accordance with the principle of the BBO, it is highly probable that good solutions share their SIVs with other solutions and poor solutions accept SIVs from other solutions. There is a decrease in emigration rates from high HSI habitats to low HSI ones so that the highest HSI habitat has the maximum emigration rate while there is an increase in immigration rate from high HSI habitats to low HSI ones, with the highest HSI habitat having the lowest immigration rate.

Fig. 1
figure 1

The curve of BBO Linear Migration (Simon 2008)

Immigration rate λ and emigration rate μ are calculated according to the formulas (1) and (2).

$$ \lambda_{i} = I\left( {1 - \frac{{k_{i} }}{n}} \right) $$
(1)
$$ \mu_{i} = E\left( {\frac{{k_{i} }}{n}} \right) $$
(2)

The parameters of \( \lambda_{i} \) and \( \mu_{i} \) are the immigration rate and emigration rate for ith habitat, respectively. The parameters of I and E are the maximum immigration rate and maximum emigration rate, respectively, and n denotes the population size. The \( k_{i} \) stands for the fitness rank of ith habitat after fitness of ith habitat is sorted, so that worst solution has \( k_{i} \) of 1 and the best solution has \( k_{i} \) of n. Migration and mutation are two crucial operators in BBO. The migration operator is responsible for generating new solution at each iteration and is similar to the crossover operator in the evolutionary algorithm. The mutation operator is done randomly on habitats and is responsible for maintaining the diversity of habitats and preventing the trapping of the algorithm in the local optimal point. According to formulas (3) and (4), the parameter of \( P_{s} \) is the probability which denotes s species in the habitat ranging from t to \( (t + \Delta t) \) where \( \lambda_{s} \) is immigration rate and \( \mu_{s} \) is emigration rate when there are s species in the habitat.

$$ P_{s} (t + \Delta t) = P_{s} (t)(1 - \lambda_{s} \Delta - \mu_{s} \Delta t) + P_{s - 1} \lambda_{s} \Delta t + P_{s + 1} \mu_{s + 1} \Delta t $$
(3)
$$ p_{s} = \left\{ {\begin{array}{*{20}c} {(\lambda_{s} + \mu_{s} )p_{s} + \mu_{s + 1} p_{s + 1} \cdots s = 0} \\ {(\lambda_{s} + \mu_{s} )p_{s} + \lambda_{s - 1} p_{s - 1} + \mu_{s + 1} p_{s + 1} \cdots 1 \le s \le S_{{\max} } - 1} \\ {(\lambda_{s} + \mu_{s} )p_{s} + \mu_{s - 1} p_{s - 1} \cdots S = S_{{\max} } } \\ \end{array} } \right. $$
(4)

At time \( t + \Delta t \), one of the following conditions must hold for s species in the habitat.

  1. 1.

    If there were s species in the habitat at time t, then there is no immigration and no emigration of species within time t and \( t + \Delta t \).

  2. 2.

    If there were (s – 1) species in the habitat at time t, then there is one species immigrated within time t and \( t + \Delta t .\)

  3. 3.

    If there were (s + 1) species in the habitat at time t, there is one species emigrated within time t and \( t + \Delta t \).

To ignore the probability of more than one immigration or emigration, we set \( \Delta t \) very small so that \( \Delta t \to 0 .\)

Solutions with very high HIS and very low HSI are equally improbable. Medium HSI solutions are relatively probable. The mutation rate is expressed by formula (5) where mmax denotes a user-defined parameter, \( p_{i} \) denotes priori existence probability, and pmax = max {pi, i = 1, 2… ps}.

$$ m_{{_{i} }} = m_{{\max} } \left( {1 - \frac{{p_{i} }}{{p_{{\max} } }}} \right) $$
(5)

In the case of the original mutation operator, a random value, created in the whole solution space, is used to probabilistically substitute SIV in each solution. Mutation represents the sudden changes in the habitat. This operator is tasked with maintaining the diversity in the population during BBO process. Mutation randomly makes minor changes to habitat SIVs based on the habitat’s a priori probability. The algorithm in Fig. 2, indicates the pseudo-code of BBO algorithm with mutation and migration operators, where \( H_{i} \) and \( H_{j} \) represent the ith habitat and the jth habitat, respectively. N denotes the highest number of species and D denotes the dimension of a solution.

Fig. 2
figure 2

Pseudo code of BBO algorithm

Finally, following the migration and mutation, the specific habitat is compared against the criterion habitat. If the HSI value of this specific habitat exceeds that of the criterion habitat, the latter is substituted by the former. Otherwise, the criterion habitat should be preserved as a better option. By continuously iterating evolution, the algorithm preserves the better conditions, improving the poor individual and guiding the search process towards the global optimal individual approximation.

4 Proposed MAMHOA algorithm

In this section, we present our proposed algorithm, i.e. a multi-agent meta-heuristic optimization algorithm (MAMHOA) with an approach for single-document extractive text summarization. Given that the proposed algorithm uses BBO algorithm, firstly, we need to consider the following three initial important definitions:

  1. 1.

    In MAMHOA, each agent represents a habitat in BBO, which is a candidate solution.

  2. 2.

    In MAMHOA, all habitats live in a lattice-like environment, where each habitat is placed at one point of the lattice. These habitats seek to increase habitat suitability index (HSI) as well as efficiency through competing or cooperating with their neighbors.

  3. 3.

    The lattice dimension is \( H_{\text{size}} \times H_{\text{size}} \) so that habitats can be placed on this lattice as shown in the Fig. 3.

    Fig. 3
    figure 3

    The habitats lattice

According to formulas (6) and (7), habitat location is defined as row and column (r, c), and represented as \( H_{r,c} \), \( r,c = 1,2,3 \ldots H_{\text{size}} \) in lattice, so the neighbors of a habitat are defined as follows:

$$ {\text{Neighbors}}_{r,c} = \left\{ {H_{r',c} ,H_{r,c'} ,H_{r,c''} ,H_{r'',c} } \right\} $$
(6)
$$ \begin{aligned} r^{{\prime }} = \left\{ {\begin{array}{*{20}c} {r - 1 \ldots r \ne 1} \\ {H_{\text{size}} \ldots r = 1} \\ \end{array} } \right.\quad r^{{\prime \prime }} = \left\{ {\begin{array}{*{20}c} {r + 1 \ldots r \ne H_{\text{size}} } \\ {1 \ldots r = H_{\text{size}} } \\ \end{array} } \right. \hfill \\ c{\prime } = \left\{ {\begin{array}{*{20}c} {c - 1 \ldots c \ne 1} \\ {H_{\text{size}} \ldots c = 1} \\ \end{array} } \right.\quad c^{{\prime \prime }} = \left\{ {\begin{array}{*{20}c} {c + 1 \ldots c \ne H_{\text{size}} } \\ {1 \ldots c = H_{\text{size}} } \\ \end{array} } \right. \hfill \\ \end{aligned} $$
(7)

Figure 4 shows each habitat along with its four neighbors, which forms a small local environment that can only sense the environment. As already mentioned, habitats compete or cooperate with their neighbors to achieve the final goal, i.e. the best or near the best result. Given that a habitat only senses its local environment, competition and cooperation can only occur between the habitats and their neighbors. This helps to spread the information to the entire lattice. Given that competition and collaboration are done locally, the information transmission is slow across the habitat lattice.

Fig. 4
figure 4

The habitat neighbors

4.1 Habitats selection strategies used by MAMHOA

The following procedure is performed to achieve better solutions more quickly and accurately. Initially, the habitats compete with each other and with their neighbors based on their competencies. Moreover, habitats use the self-learning operator to improve their ability (Figs. 3, 4).

The competition operator is performed on the habitat located at (r, c), \( H_{r,c} = (h_{1} ,h_{2} , \ldots ,h_{n} ). \) we consider that the maximum cost function is \( \max_{r,c} = (m_{1} ,m_{2} , \ldots ,m_{k} ),k = (1,2, \ldots ,n) \) and \( H_{r,c} \) is the habitat neighbor with the maximum habitat suitability index (HSI). Therefore, given the neighbor relations shown in Fig. 4, the best habitat conditions are as follows: \( \max_{r,c} \in {\text{Neighbors}}_{r,c} \) \( \forall \alpha \in {\text{Neighbors}}_{r,c} \)\( f(\alpha ) \le f({ {\max} }_{r,c} ) \). If the habitat suitability index is represented as \( {\text{HSI}}(H_{r,c} ) \ge {\text{HSI}}(\max_{r,c} ) \), then \( H_{r,c} \) is the winner because its cost function value is greater than the maximum cost function of its neighbors, so it can still live in the habitat lattice and its position does not change in the search space. Otherwise, \( H_{r,c} \) is the loser and the position of \( H_{r,c} \) in the habitat lattice should be occupied by the new habitat \( ({\text{New}}_{r,c} ) \). The proposed algorithm presents two strategies to select the new habitat.

  1. 1.

    In the first strategy, when \( H_{r,c} \) is the loser, it has still useful information to obtain good solutions. Hence, this strategy as a kind of heuristic crossover involves using the information of loser habitat as well. The new habitat, represented by \( {\text{New}}_{r,c} = (e_{1} ,e_{2} , \ldots ,e_{n} ) \), is calculated using formula (8) where rand (0, 1) is a random number in [0, 1] and \( X_{{\max} } ,X_{{\min} } \) are the maximum and minimum HSIs of habitats, respectively.

$$ e_{k} = \left\{ {\begin{array}{*{20}c} {X_{{\max} } \ldots \ldots \ldots e_{k} \ge X_{{\max} } } \\ {X_{{\min} } \ldots \ldots \ldots e_{k} \le X_{{\min} } } \\ {m_{k} + {\text{rand}}(0,1).(m_{k} - h_{k} ),k = 1,2, \ldots ,n. \ldots \ldots \ldots {\text{Otherwise}}} \\ \end{array} } \right. $$
(8)
  1. 2.

    In second strategy, \( \max_{r,c} \) is first mapped to [0, 1], using formula (9) and new habitat neighbor, i.e.\( {\text{New}}_{r,c} \) is calculated using formula (10).

$$ m_{k} = \frac{{(m_{k} - X_{{\min} k} )}}{{(X_{{\max} k} - X_{\begin{subarray}{l} {\min} k \\ \end{subarray} } )}}k = 1,2, \ldots ,n $$
(9)
$$ {\text{New}}_{r,c} = (m_{1} ,m_{2} , \ldots ,m_{{i_{1 - 1} }} ,m_{{i_{2} }} ,m_{{i_{2 - 1} }} , \ldots m_{{i_{1 + 1} }} ,m_{{i_{1} }} ,m_{{i_{2 + 1} }} ,m_{{i_{2 + 2} }} , \ldots ,m_{n} ) $$
(10)

If \( 1 < r_{1} < n \), \( 1 < r_{2} < n \) and \( r_{1} < r_{2} \). Finally, \( {\text{New}}_{r,c} \) is obtained through mapping \( New_{r,c} \) back to \( [X_{{\min} } ,X_{{\max} } ] \) by formula (11).

$$ e_{k} = X_{{\min} k} + e^{'}_{k} .(X_{{\max} k} - X_{{\min} k} ),k = 1,2, \ldots ,n $$
(11)

As described above, the first strategy increases exploitation and the second strategy increases the exploration. The \( p_{c} \) probability parameter determines which of two strategies should be used. That is, if \( {\text{rand}}(0,1) < p_{c} \), then the first strategy is used, otherwise the second strategy is used. The algorithm in Fig. 5 presents the competition between self-learning operators in MAMHOA algorithm to create best solution. Table 1 shows the initial values of the used parameters in MAMHOA algorithm (Fig. 5). The initial parameters are based on original BBO parameters (Simon 2008).

Fig. 5
figure 5

Self-learning operators in MAMHOA algorithm to create best solution

Table 1 Initial parameters used for the MAMHOA algorithm

As shown by the algorithm in Fig. 5 and Table 1, the variables and operators used in the proposed algorithm are as follows: Crossover probability (Pc), Mutation probability (Pm), Habitat Number (Habitats), Lower limit of decision variables (LB), Upper limit of decision variables (UB), Self-learning Operator (SR). According to algorithm in Fig. 5 and Figs. 3 and 4, initially a \( H_{\text{size}} \times H_{\text{size}} \) lattice is created. Then, each c and r location is assigned a habitat. Given the SR probability, the random values are generated and assigned to each habitat in terms of optimum conditions. Then, the habitats are compared in terms of their conditions and the habitat with the highest HSI is selected as the best habitat. Consequently, the migration occurs from a habitat to a new habitat with better conditions.

4.2 Documents summarization processing

In this section, the process of documents summarization using MAMHOA algorithm is shown in Fig. 6. According to Fig. 6\( D = \{ d_{1} ,d_{2} , \ldots ,d_{n} \} \) are input source documents. Each \( d_{i} \) consists of a collection of sentences \( S = \{ s_{1} ,s_{2} , \ldots ,s_{n} \} \) and each \( s_{i} \) consists of a collection of words \( W = \{ w_{1} ,w_{2} , \ldots ,w_{n} \} \). The summarization steps are shown as follows:

Fig. 6
figure 6

Documents summarization processing using MAMHOA

Step 1: Input documents

Reading documents \( D = \{ d_{1} ,d_{2} , \ldots ,d_{n} \} \) including n sentences \( S = \{ s_{1} ,s_{2} , \ldots ,s_{n} \} \) and n words \( W = \{ w_{1} ,w_{2} , \ldots ,w_{n} \} \).

Step 2: Preprocessing

Firstly, we pre-process each document (D) before passing it on to the next step. In pre-processing step the following tasks are performed:

  1. 1.

    Omitting additional characters such as {}, [], …

  2. 2.

    Omitting unessential words such as at, in, on, and, of, …

  3. 3.

    Finding the roots of the words and the verbs used in the main text.

  4. 4.

    Separating words and omitting repeated words.

  5. 5.

    Identifying the end of the sentences and separating the sentences.

  6. 6.

    Calculating the number of words (W) and sentences (S) used in the main text.

Step 3: Weighting sentences and create similarity matrix

After pre-processing (step 2), the sentences extracted are weighted using TFIDF method in formulas (12) and (13) (Ledeneva et al. 2008). According to formulas (12) and (13),\( {\text{freq}}_{ij} \) represents the number of repetitions of i word in j sentence, \( {\text{freq}}_{lj} \) represents l word in j sentence, \( \max_{i} {\text{freq}}_{lj} \) represents the maximum number of i word repetitions in j sentence, N and \( n_{i} \) represent the number of sentences in an input text and the number of sentences, respectively. Then, a weight of each word in sentence \( w_{ij} \) is calculated, using formula (14).

$$ tf_{ij} = \frac{{{\text{freq}}_{ij} }}{{\max_{i} {\text{freq}}_{lj} }} $$
(12)
$$ idf_{ij} = \log \frac{N}{{n_{i} }} $$
(13)
$$ w_{ij} = tf_{ij} *idf_{ij} $$
(14)

Following the sentences weighting, a similarity matrix including extracted sentences and words are created and weighted, using formula (15). In a similarity matrix, sentences are compared based on their keywords and important words. In formula (15), \( {\text{sim}}(s_{i} ,q) \) is the sentence similarity matrix, \( w_{iq} \) and \( w_{ij} \) are keywords and title weight, and the weight of each word, respectively.

$$ {\text{sim}}(s_{i} ,q) = \frac{{\sum\nolimits_{i = 1}^{t} {w_{ij} *w_{iq} } }}{{\sqrt {\sum\nolimits_{i}^{t} {w^{2}_{ij} *\sum\nolimits_{i = 1}^{t} {w^{2}_{iq} } } } }} $$
(15)

Step 4: Processing in MAMHOA

In this step, as explained in Sect. 4.1, a habitat lattice is created and its parameters are set (Fig. 5). Here, the steps below are followed:

  1. 1.

    Creating a habitat lattice \( H_{\text{size}} \times H_{\text{size}} \) and setting parameters.

  2. 2.

    Assigning similarity matrix including extracted sentences from step 3 to habitats.

  3. 3.

    Evaluating habitat’s conditions in order to find the best habitat (Best summarize).

  4. 4.

    Evaluating new habitat repeatedly if conditions are not met.

5 Experiments and discussions

In this section, to evaluate the performance of the proposed algorithm, several experiments have been conducted whose results are reported below. In Experiment 1, we have compared our proposed algorithm with four non-multi-agent algorithms for 200-word and 400-word documents summarization. In the experiment 2, we have compared the proposed algorithm with several Baseline and state-of-the-art methods on 100-word documents published in the literature on DUC 2002. For all experiments, each reported value is obtained by averaging over 200 runs. In all of the tables of our experiments, the best value of each row or each column is shown in bold.

5.1 DUC 2002 datasets

DUC 2002 is a standard datasets of documents for evaluation in the field of summarization, created by NIST (National Institute of Standards and Technology). In all of the experiments, we chose 9 DUC 2002 topics (D061, D065, D067, D070, D073, D075, D079, D085, and D0105). Totally, 73 documents were chosen for tests. Table 2 presents the characteristics of the topics in DUC2002 dataset, which are used in Experiment 1 and Experiment 2.

Table 2 Characteristics of the topics taken from DUC2002 dataset

5.2 Evaluation measures

To analyze the performance of the proposed algorithm we used Recall-Oriented Understudy for Gisting Evaluation (ROUGE) (Lin 2004). This software includes different packages, which determine the similarity between a summary generated by computer and a summary generated by human. It has been shown that ROUGE is very effective for measuring document summarization. The ROUGE metrics are defined according to different Ns and different strategies, such as ROUGE-N (N: 1, 2), ROUGE-L, ROUGE-S and ROUGE-SU. ROUGE-1 is used to compare the overlap between the system summary and the manual summaries created by human. ROUGE-2 is the measure between the candidate summary and final summary. ROUGE-L computes the ratio of the length of the summaries’ longest common subsequence (LCS) to the length of the reference summary. ROUGE-S uses the overlap ratio of a candidate summary to a set of reference summaries. ROUGE-SU is an extension of ROUGE-S with the addition of unigram as the counting unit. Each of these ROUGE evaluation methods generates three criteria (Recall, Precision, and F-score). In this paper, we report the average of scores generated by ROUGE-1, ROUGE-L and ROUGE-SU values. The criteria of Precision, Recall and F-score are calculated using formulas (16) and (17) and (18) (Abuobieda et al. 2013), respectively. The criterion of Precision in formula (16) is an intersection of extracted summarized sentences and an ideal summary of sentences divided by all the extracted sentences. The criterion of Recall in formula (17) is an intersection of relevant sentences and retrieved sentences divided by all the relevant documents, and F-score in formula (18) is a statistical criterion which is a combination of precision and recall criteria and shows the score for final selected sentences in a summary.

$$ Precision = \frac{{{Re}lvantSentences \cap {Re}trievedSentences}}{{{Re}trievedSentences}} $$
(16)
$$ {Re}call = \frac{{{Re}lvantSentences \cap {Re}trievedSentences}}{{{Re}levantSentences}} $$
(17)
$$ F - score = \frac{2*precision*recall}{precision + recall} $$
(18)

5.3 Experiment 1: Comparison of proposed algorithm with the non-multi-agent algorithms

In this experiment, we compare the performance of the proposed algorithm with four non-multi-agent optimization algorithms i.e. BBO (Mirshojaee et al. 2017), CSOA (Mirshojaei and Masoomi 2015), BFOA (Asgari and Masoumi 2013) and GA (Fattah and Ren 2009). Tables 3, 4, and 5 show the analytical results of Precision (P), Recall (R), and F-score (F) criteria for 200-word text summarization using ROUGE-1, ROUGE-L and ROUGE-SU. In experiment 1, we evaluate the results of F-score criteria of our algorithm with other algorithms.

Table 3 The criteria average ranges of (Precision, Recall and F-score) based on ROUGE-1 scores. (200-word documents)
Table 4 The criteria average ranges of (Precision, Recall and F-score) based on ROUGE-L scores. (200-word documents)
Table 5 The criteria average ranges of Precision, Recall and F-score based on ROUGE-SU scores. (200-word documents)

Figure 7 and graph in Fig. 8 compare the average of F-score criteria of the proposed algorithm with those obtained by other optimization Algorithms. As the results show, the summarization F-score criterion for 200-word document has optimized.

Fig. 7
figure 7

Comparison between F-score criteria of proposed algorithm and those of other algorithms by ROUGE-1. (200-word documents)

Fig. 8
figure 8

Comparison between the averages of F-score criteria of the proposed algorithm and those obtained by other optimization algorithms. (200-word documents)

In Tables 4 and 5, the comparison of the Precision, Recall and F-score criteria are shown by ROUGE-L and ROUGE-SU.

Below, Tables 6, 7 and 8 show the analytical results of Precision (P), Recall (R), and F-score (F) criteria for 400-word documents using ROUGE-1, ROUGE- L and ROUGE-SU.

Table 6 The criteria average ranges of (Precision, Recall and F-score) based on ROUGE-1 scores. (400-word documents)
Table 7 The criteria average ranges of (Precision, Recall and F-score) based on ROUGE-L scores. (400-word documents)
Table 8 The criteria average ranges of (Precision, Recall and F-score) based on ROUGE-SU scores. (400-word documents)

Figure 9 and graph in Fig. 10 show that the proposed algorithm has an optimum performance compared to CSOA and BFOA. According to the graph in Fig. 10, the performance of the proposed algorithm is near that of GA and BBO.

Fig. 9
figure 9

Comparison between the proposed algorithm and other algorithms by ROUGE-1 in terms of F-score. (400-word documents)

Fig. 10
figure 10

Comparison between the averages of F-score criteria obtained by the proposed algorithm and those obtained by other optimization algorithms. (400-word documents)

Table 9 and graphs in Figs. 11 and 12 show the results of comparison between the average of F-scores obtained for ROUGE-1, ROUGE-L and ROUGE-SU on 200-word and 400-word documents.

Table 9 Compression between the averages of obtained F-scores from the mentioned algorithms and those from the proposed algorithm
Fig. 11
figure 11

Comparison between the averages of F-score criteria obtained from the proposed algorithm and those obtained by other optimization algorithms. (400-word documents)

Fig. 12
figure 12

Comparison between the averages of F-score criteria obtained from the proposed algorithm and those obtained by other optimization algorithms. (200-word documents)

5.3.1 Discussion 1

At experiment 1, we discuss four non-multi-agent optimization algorithms including BBO, GA, CSOA and BFOA, which are, cited approaches in the literature. These algorithms are selected for comparative analysis on DUC 2002 documents. The analytical values of F-score criteria are generated using ROUGE Software. The Tables 3, 4, 5, 6, 7, and 8 shows the results of F-score criteria obtained for ROUGE-1, ROUGE-SU, and ROUGE-L. The Table 9 and graphs in the Figs. 11 and 12 show the averages of F-score values are broken down into ROUGE-1, ROUGE-L and ROUGE-SU for 200-word and 400-word documents, respectively.

From the results obtained from ROUGE-1, ROUGE-SU, and ROUGE-L the following points can be made.

5.3.1.1 200-word documents summarization
  1. 1.

    Based on the results of ROUGE-1 (Table 3), the f-score criteria extracted using the proposed algorithm for documents d061, d065, d070, d075, d079, d085 and d105 are more optimal than those obtained by other algorithms. Given the criterion of ROUGE-1, there is more overlap between the extracted sentences of the summary and the sentences of the original text. CSOA performs better on documents d067 and d073.

  2. 2.

    Based on the results of ROUGE-L (Table 4), the f-score criteria extracted using the proposed algorithm for documents d065, d070, d075, d085 and d105 are more optimal than those obtained by other algorithms. Given the criterion of ROUGE-L, the ratio of the length of the summaries’ longest common subsequence (LCS) to the length of the reference summary has optimized. Algorithm BBO performs better on documents d067, d073 and d061 and GA performs better on document d079.

  3. 3.

    Based on the results of ROUGE-SU (Table 5), the f-score criteria extracted using the proposed algorithm for documents d070, d065, d075, d079 and d105 are more optimal than those obtained by other algorithms. Given the criterion of ROUGE-SU, word redundancy and repeated words have decreased in the summary. Algorithm BBO performs better on documents d073 and d061. GA performs better on document d067 and BFOA performs better on document d085.

5.3.1.2 400-word documents summarization
  1. 1.

    Based on the results of ROUGE-1 (Table 6), the F-score criteria extracted using the proposed algorithm for documents d070, d065, d079, and d075 are more optimal than those obtained by other algorithms. Given the criterion of ROUGE-1, there is more overlap between the extracted sentences of the summary and the sentences of the original text. BFOA performs better on document d073. CSOA performs better on document d067. Algorithm BBO performs better on documents d061, d085 and d105.

  2. 2.

    Based on the results of ROUGE-L (Table 7), the f-score criteria extracted using the proposed algorithm for documents d065, d070 and d079 are more optimal than those obtained by other algorithms. Given the criterion of ROUGE-L, the ratio of the length of the summaries’ longest common subsequence (LCS) to the length of the reference summary has optimized. Algorithm BBO performs better on documents d061, d105 and d085. GA performs better on document d075. CSOA performs better on document d067 and BFOA performs better on document d073.

  3. 3.

    Based on the results of ROUGE-SU (Table 8), the f-score criteria extracted using the proposed algorithm for documents d070, d065, d075, d079 and d067 are more optimal than those obtained by other algorithms. Given the criterion of ROUGE-SU, word redundancy and repeated words have decreased in the summary. Algorithm BBO performs better on documents d061, d085 and d105. BFOA performs better on document d073.

In sum, based on the above discussion as well as Table 9 and graphs in Figs. 11 and 12, the proposed algorithm is generally more efficient in the summarization of 200-word documents compared to that of 400-word documents due to the following: more overlap between the summary and the original text, smaller ratio of the length of the summaries’ longest common subsequence to the length of the reference summary, and a decrease in the redundancy of words.

5.4 Experiment 2: comparison of proposed algorithm with the baseline and state-of- the- art benchmarks

In experiment 2, the efficiency of the proposed algorithm is compared to that of the algorithms discussed in literature review including baseline and state-of-the-art algorithms. In this experiment, the proposed algorithm was applied to 100-word documents. The results of the proposed algorithm were compared to those of other mentioned algorithms in terms of Recall criterion.

5.4.1 Baseline and state-of-the-art extractive summarization algorithms

In this section, we compare our algorithm with the several widely used baseline approaches (Table 10), and seven state-of-the-art algorithms published in literature on the dataset DUC 2002 (Table 11). We compared our work with baseline and state-of-the-art algorithms. The methods are selected based on state-of-the-art accuracy reported by the, which are the most cited approaches in the literature (i.e. between 2010 and 2017). A brief description of the baseline approaches and state-of-the art are shown in Tables 10 and 11.

Table 10 Baseline extractive summarization algorithms
Table 11 Seven state-of-the-art extractive summarization algorithms

5.4.2 Comparative analysis on DUC 2002 dataset

Table 12 show the average of Recall obtained for the summarization of 100-word datasets of DUC 2002. It can be observed that on these dataset, MAMHOA reports the highest averages for ROUGE-1 (51.0), ROUGE-SU (24.3) and ROUGE-L (47.2) on 100-word documents.

Table 12 ROUGE recall scores of MAMHOA approach for different topics of DUC 2002

5.4.3 Comparative analysis on DUC 2002 benchmark

The averages of Recall criteria obtained from the MAMHOA are compared to Recall criteria obtained for Baseline and state-of-the-art approaches.

Table 13 and graph in Fig. 13 show the comparative results of efficiency for the proposed algorithm and baseline and state-of-the-art algorithms in terms of Recall, using ROUGE-1, ROUGE-L, and ROUGE-SU.

Table 13 Comparison between the averages of Recall criteria obtained from the MAMHOA with those of Baseline and state-of-the-art approaches
Fig. 13
figure 13

Comparison between the averages of Recall criterion obtained from the proposed algorithm with Baseline algorithms. (100-word documents)

5.4.4 Discussion 2

The results in Table 13 show ROUGE-1 score of 51.0, ROUGE-SU score of 24.3 and ROUGE-L score of 47.2 on this dataset. As a result, our proposed algorithm (MAMHOA) has a better performance than many of the existing state-of-the-art text summarizers on DUC 2002 dataset with respect to ROUGE-1, ROGUE-SU and ROUGE-L scores such as ILP, graph-based approaches, Tgraph, Egraph, URANK, and even those, which are based on supervised learning such as SummaRuNNer and NN–SE. Our approach cannot surpass the ROUGE-1 score of a recently proposed CoRank approach, though our ROUGE-SU and ROUGE-L scores, which are better evaluation measures than ROUGE-1, are far better than the CoRank method.

The graph in Fig. 13 Show the comparison between the averages of Recall criterion obtained from the proposed algorithm with Baseline algorithms for 100-word documents. As shown in Fig. 13 and Table 13, the performances of proposed algorithm is better than Baseline and stat-of-the-art mentioned algorithms.

6 Conclusion

In this work, we have discussed an optimization algorithm named MAMHOA with two strategies for extractive text summarization to find the best or near the best optimum summary. The proposed algorithm is a hybrid of a biogeography meta-heuristic algorithm and multi-agent systems. The proposed algorithm finds the best or near the best summary by creating a habitat lattice and comparing them to find the best conditions. The proposed algorithm was used for an extractive document text summarization issue. Our algorithm was compared with several non-multi-agent optimization algorithms, baseline and state-of-the-art algorithms using DUC2002 datasets. The 100-word and 200-word and 400-word documents were selected for summarization.

In the first experiment, we compared performance of our algorithm with that of some non-multi-agent algorithm on 200-word and 400-word documents and F-score results were obtained. In the second experiment we compared our algorithm with Baseline and state-of-the-art algorithms with 100-word documents and the Recall results were obtained. The experimental results were analyzed, using ROUGE software. Generally, the results show that the proposed algorithm has a better performance than that of other mentioned algorithms so that it can be used for summarizing.