Introduction

In the big data era, the daily volume of text data from the variety of different sources has increased exponentially. This huge amount of text contains invaluable information & knowledge which needs to be effectively summarized. Manually summarizing this data is very challenging and difficult task for humans. Summary has many benefits like quick view of key points, easy search, selection & most important time saving [1]. Automatic text summarization system produces summary, which is compressed form of the input source document that contains a few important sentences to help the users quickly grasp the key points without reading the whole document. In the field of automatic text summarization, researchers are continuously working since 1958 [2] & trying to improve the performance of summarization techniques. Of course, there is a great improvement in this field, but still to summarize text like human, there are many issues & challenges like readability, redundancy, coverage, & cohesion, etc. [3,4,5]. Text Summarization is “an open representation of representation, in diverse forms, from multiple dimensions, and through interactions in multiple spaces”. Most of the existing approaches on the basis of statistical, semantic, and linguistic, but not on the basis of understanding and representation. Text summarization research should be done with various dimensions & interactions involved in human, machine, and various other representations, including text, picture, and video [95].

Extractive and Abstractive are two major categories of text summarization systems. Extractive summarization techniques are simple, robust and select the few relevant sentences by considering their statistical features and keywords, feature scores are computed, and then, high score sentences containing these features or keywords are selected to generate the summary. Extractive summarizations gain a standard in summarization field due to its simplicity & feasibility [ 6 ], but lack in terms of sentence connectivity, readability & cohesion [ 7 ], whereas abstractive text summarization methods generates the summary from scratch. In this method, new sentences are generated by analysing the semantic information. For semantic information, abstractive summarization needs extensive knowledge of natural language processing techniques and these techniques are more complex than extractive summarization. In the recent years with the widespread use of neural networks & deep learning models, the research attention is shifted towards abstractive and hybrid methods of text summarization. Automatic text summarization can also be classified based on usage, content of document, number of source documents, genre, output summary type, target audience, etc. Classification of text summarization is discussed in detail in “Categorizing Automatic Summaries"

This paper highlighted the current state of art & research trends in single document automatic text summarization. A concise overview of extractive & abstractive text summarization techniques, their advantages, disadvantages, comparison analysis based on datasets, and evaluation metrics is presented. Extractive summarization covered most of the techniques from statistical to machine learning. In abstractive summarization structure, semantic and more recent techniques deep learning are discussed in the detail with their pros, cons & comparison analysis based on different aspects. Sentence fusion, compression, and paraphrasing techniques for abstractive summarization are also surveyed and discussed. Important datasets, evaluation metrics, natural language tools, and challenges are also discussed in detail. Deep learning-based techniques are the current state of art in text summarization. In that different neural network-based architectures, pretrained models, sequence to sequence encoder–decoder architectures, BERT model & transformers are being used [96]. For extractive as well as abstractive different deep learning techniques are also discussed in this article. Overall, the aim is to provide the concise report about different research work done and their challenges in the automatic text summarization field which can help academics, researchers, and professionals in their research.

The article is organized as: Sect. "Categorizing Automatic Summaries" covers classification of automatic text summarization, and Sect. "Text Summarization Approaches" covers different extractive and abstractive summarization techniques and their comparison analysis. Sect. "Text Summarization Evaluation" presents evaluation metrics, Sect. "Resources" discusses the various available datasets. Sect. "Discussion on issues / Challenges in Automatic Text Summarization" covers research gaps, future directions, and finally, conclusion is drawn in Sect. "Conclusion".

Categorizing Automatic Summaries

Different methods of automatic text summarization have been proposed since 1958 [2]. Each summarization method is intended to solve a problem in a different context. As shown in Fig. 1 an automatic text summarization system can be classified using many different factors and criteria, such as usage, number of documents, genre, context, form, and output summary type [8, 9].

Fig. 1
figure 1

Classification of automatic text summarization systems

Usage

Based on the usage, automatic text summarization is classified as informative and indicative summary. Informative summaries convey the crucial info of the source document. Types of informative summaries are abstracts of scientific articles and synopses, whereas indicative summaries give the description of the document [10].

Number of Documents

In automatic text summarization, systems use either one input document or more than one. Single document summarizers process one input document, whereas multidocument summarizers can take many documents of the same topic [11]. In multidocument uses various different heterogeneous documents on specific topic, but redundancy is the big challenge in the multidocument summarization, because most of the information is common in documents.

Output Summary Type

Text summarization systems also can be classified basis on type of output summary generated. Output summary can either be extractive or abstractive. In extractive summary, sentences and phrases are selected from source document by extracting the keywords, statistical features from the text, and each sentence is given a score based on these features and final summary is created by ranking these sentences [2], [2]. In abstractive text summarization methods, direct sentences are not selected from the input document, and new phrases and sentences are generated by building internal semantic representation of text with the help of natural language generation techniques. These are more coherent and grammatically correct compared to extractive summaries [13].

Input Document Form

In summarization, the form of input document significantly influences the understandability of summary. Input document can be classified basis on their form like have scale, structure & medium. Summarization methods vary based on scale of input text document. Input text document can be short text like, paragraph, tweets, and microblogs or can be full article. Structure of the document means how document is organized into different sections. In legal documents and judgements, thematic structures are used to generate summary [14]. Apart from the textual documents, summarization techniques can also be used for different file formats like images, audio, and video [15].

Target Audience

Based on target audience summarization, it can be classified as generic, query driven and update summarization. Generic Summarization gives the overall gist of the document and it is not depending on the type of input document and output summary. The Query-focused summary is based on the user query. The system picks out only the information which is related to the given query and presents a concise summary to the user. The query can be a keyword, phrase, or a sentence. Most of the search engines use this type of summarization [16]. Update summary is a special type of multidocument summary, where end user is already familiar with some facts about the input document. Update summarization generates the new summary by neglecting the already known facts.

Genre

According to genre of document, text summarization can be classified as:

News summary: a summary of news articles;

Specialized: It is a summary of documents relating to a specialized domain (science, technology, law, etc.);

Literary: A summary of narrative documents, literary texts, etc.

Encyclopaedic: A summary of encyclopaedic documents, such as Wikipedia;

Social networks: a summary of blogs and very short documents (such as tweets).

Text Summarization Approaches

In this section, different automatic text summarization methods are discussed. In this paper, the focus is given on extractive, abstractive, and neural network-based text summarization techniques. The extractive summarization produces a summary by extracting the important statistical features, and keywords from the input document. The abstractive summarization produces a summary either by rephrasing the sentences using compression and fusion techniques or by generating the new phrases, sentences through a structure, semantic or deep learning techniques. Figure 2 shows the different techniques used under the extractive as well as abstractive text summarization methods.

Fig. 2
figure 2

Text summarization methods

Extractive Summarization

Extractive text summarization methods extract the sentences from source document on the basis of different keywords and features. These features can be extracted with the help of different techniques like statistical, linguistic, graphical, and machine learning. All these techniques are discussed below in detail.

Statistical-Based Approaches

Earlier techniques of automatic text summarization are based on Statistical approaches. These methods use surface-level features like word and sentence to decide which parts of a text are important and relevant.

Word- and Sentence-Level Features

The first sentence extraction algorithm based on the term frequency to generate the summary was developed in 1958. The idea was used that, when someone writes about a given topic, certain words repeat often in the text. Thus, term relevance is considered proportional to its in-document frequency [2]. The term frequencies are later used to score and select sentences for the summary. Other good indicators of sentence relevance are the position of a sentence and have good potential to capture the importance of the sentence within the document [8]. It was demonstrated that the combination of the presence of cue words, title words, and the position of a sentence produces the most similar extracts to abstracts written by a human [17].

Latent Semantics

In natural language processing, latent semantic analysis (LSA) is a technique which is used to analyse relationships between a set of documents and the terms they contain [18]. In this technique, term–document matrix is created by m rows as document terms, and n columns as sentences. LSA learns latent features and topics by performing singular value decomposition (SVD) on term–document matrix. LSA generally used as noise and dimension reduction technique.

Graph-Based Approaches

Graph-based approaches are extensively used in text summarization, since document structures are efficiently represented by graphs. In this method documents are represented as graphs, with words, sentences, or paragraphs as nodes and edges are represented by relationship between them. Weights of the edges are based on similarities between the sentences. In graph-based approaches, the TextRank & LexRank are the two most popular methods. These two methods are the modified version of PageRank algorithm to score sentences. LexRank method [19] uses weighted cosine similarity measure to construct undirected graph; based on the similarity measure sentences are clustered into groups and the ranking of sentences is done based on their LexRank score. In TextRank [20] method is similar to LexRank except, directed graph is constructed from the input document. In multidocument summarization, sentence to document relationship in a graph-based ranking method is also added, since in multidocument summarization, one document may be important than other documents [21].

Linguistics-Based Approaches

In text summarization using linguistics-based approaches uses natural language processing techniques like rhetoric relations, semantic, cohesion, and Coreference information to generate summaries either by extraction or abstraction.

Rhetorical Structure

Rhetorical Structure Theory (RST) is about text structure & organization that describes the relationship that exist between text elements. The hierarchical structure of a text is represented by binary tree and this structure of a text helps to identify the important parts of text to be included in the summary [22]. In [23], nested tree is constructed from source document to exploit the words and rhetorical dependency. Nested tree is composed of sentence and document tree, dependency parser is used in sentence tree where words are used as nodes which are connected by head modifier, whereas in document tree, RST is used to connect the nodes with head modifier relationships between sentences.

Coreference & Cohesion

In text summarization, co-reference & cohesion are the two very important concepts. In linguistics, co-reference resolution is the “task of finding all expressions that refer to same entity in an input text” and text cohesion is about relations between expressions which determine the text connectivity. In [24], summarization based on lexical chains was introduced, and these lexical chains can be composed with the help of different relations. Different cohesive relations like synonymy, antonymy, hypernymy, etc. between terms are determined using WordNet database. Based on the type and number of relations, chains are assigned scores and highly scored chains are selected for final summary.

Semantic Role Labelling

In text summarization, semantic representation of text plays very important role. Semantic representation can be found with the help of WordNet database and domain ontologies. In [25], one of the semantic representation semantic role labelling (SRL) is used to identify semantic roles. These semantic roles provide useful information about the text event. A text event or an action in a sentence is represented by predicate; predicate can be the verb in the sentence. The verb in a sentence is considered as predicate, and the remaining part of the sentence can be used as the proper argument.

Machine Learning-Based Approaches

Machine Learning methods learn from the training data. Summarization task using machine learning can be done in supervised and unsupervised way. In supervised methods, with input documents, summary of each document is also provided, so that the machine can learn how to summarize, and in unsupervised methods, reference summary is not given, by extracting the features and analysing a document, machine generates the summary.

Different text document features are extracted like sentence length, frequency of words in the sentences, cue words, and position in the article etc. and combined statistically. In machine learning methods, summarization task treated as classification problem. In [26], naive-Bayes classifier is used to classify the sentences based on the extracted features. Using Bayes’ rule, the probabilities of different features are learned statistically from the training data. Probability score of the sentence is calculated, and based on the score, sentence is included in the final summary.

Different classifiers use their own scoring function. Decision trees, Bayes classifier, support vector machines [27], hidden Markov models [28], and conditional random fields [29] are mostly common classifiers used in text summarization tasks. The only difference is that hidden Markov models and conditional random fields are explicitly assuming the dependency between sentences, and in other classifiers, sentences are considered as independent of each other.

Deep Learning-Based Extractive Text Summarization Approaches

With the bloom in deep learning models, neural network-based text summarization methods attracted the huge attention. The performance of these methods is much better compare to traditional methods in case of huge training data. Sentence representation and selection is the two main concerns in extractive summarization using deep learning, for sentence representation different word embedding techniques and convolutional neural networks [79], recurrent neural networks [80], and gated recurrent units [81] are used. For sentence selection, different optimization techniques are used. An unsupervised neural network-based restricted Boltzmann Machines (RBM) deep learning approach is used for extractive summarization [97]. A set of statistical and semantic features are extracted from input text documents & these features are enhanced using RBM model. This methodology gives better performance compared to TextRank, LexRank, and LSA methods.

Table 1 discusses the pros and cons of various extractive methods discussed above. Table 2 shows the comparison of various extractive text summarization methods based on technique, dataset, and evaluation metric used.

Table 1 Advantage & disadvantages of extractive text summarization methods
Table 2 Comparison between different extractive summarization methods based on method, dataset, & evaluation metric used

Different extractive text summarization methods are compared basis on their pros & cons, different datasets used, evaluation metrics, and their performance. According to our observation, performance of graph-based methods along with statistical methods is better in sentence scoring and selection compared to other methods. Input document length also plays very important role in performance. Although overall summary quality in terms of coherence is better in linguistic methods compared to other methods. Figure 3 shows the performance of Rouge score on DUC dataset, and from this analysis, recent deep learning methods are best among all methods on short and long document datasets.

Fig. 3
figure 3

Comparison of the Extractive Text Summarization methods using Rouge1 & Rouge2 on DUC dataset

Abstractive Summarization

In abstractive summarization novel sentences are generated unlike extracting the text from the input document. Abstractive summarization techniques require knowledge of natural language processing. It generates the summary by understanding and analysing the main concepts and key points of input document using NLP techniques. These techniques produce more concise, readable, and grammatically correct summary, and also, abstractive summarization techniques helps to achieve the more non-redundant summary compare to extractive methods by reducing the sentence size as it uses various sentence compression, generalization, and fusion techniques to merge the sentences. Abstractive summarization consists of following steps:

  1. i.

    Pre-processing

  2. ii.

    Creating an intermediate representation

  3. iii.

    Final summary generation.

In text pre-processing noise removal, tokenization, sentence segmentation, named entity recognition (NER), stop word removal, word frequency count, etc., techniques are generally used in automatic text summarization tasks. After text pre-processing, to generate the synthetic or semantic representation of text, various different feature extraction techniques are used. Abstractive text summarization techniques broadly classified as structure, semantic, neural network, sentence compression, generalization, and sentence fusion as hybrid methods.

Structure-Based Approaches

In structure-based techniques, different form of structures is used to represent the text like template, tree, graph, ontology, and rule based. These methods generally used with other techniques like semantic, extractive, and deep learning.

In template-based methods, text feature are extracted using keywords, and to form a final coherent summary the extracted text feature snippets are populated into predefined templates. In [30], template-based technique called “GISTEXTER” is used to create the multidocument summaries. In this method, topic-related information is identified and converted into database entries and based on the user request sentences are added in the summary from database. Another template-based technique is used in [31] to generate the abstractive summary, noun phrases and hypernyms are used for template creation. For final summary, templates are clustered by extracting the root verbs and fused using word graph.

In tree base approaches, initially extractive summarization methods are used to extract the important text and sentences. The shallow parser is used to identify the similar sentences from the extracted text. These similar sentences are then converted into the tree structure. For final abstractive summary, predicate-argument structure or sentence fusion techniques are used. In [32], for abstractive summarization, dependency tree structure is used. In this method, phrases with common information are identified and combined using multisequence alignment, this approach is used for multi documents. Using theme selection central theme of documents is identified, then clustering algorithm is used for sentence ordering.

Mostly multidocument abstractive summarization methods utilise directly existing phrase structures extracted from input documents to generate summary. These methods can suffer from lack of coherence and consistency in merging phrases. Therefore, a novel approach for abstractive multidocument summarization through partial dependency tree extraction, recombination, and linearization is used [33]. The method generates its own topically coherent sequential structures from scratch for effective communication.

Graph data structures are widely used by many researchers in extractive as well abstractive text summarization techniques to represent the text. Directed graphs with words as nodes and the structures of sentence as edges are used in abstractive methods. In Opinosis summarization system [34], graphs are used for generating concise summary of highly redundant opinions. The system does not require any domain knowledge and highly flexible. In this approach, initially, text representation is done using textual graph. To generate candidate abstractive summary various subpaths in the graph are explored and scored. Output summaries generated by Opinosis system have reasonable agreement with human summaries.

Mehdad et al. [35] extended the word graph method with the following novel contributions: (i) in this approach, advantage of lexical knowledge is taken to merge the similar nodes by finding their relations in WordNet; (ii) new sentences are generated through generalization and aggregation of the original ones; (iii) new ranking strategy was adapted to select the best path in the graph by taking the information content and the fluency of the sentence into consideration.

Ontology is defined as a “formal and explicit specification of a shared conceptualization”. Ontologies are defined for specific domain, and usually, they are created by domain experts [36]. Many documents on the internet are domain related, because they discuss the same concept or topic. Each domain has its own knowledge structure and that can be better represented by ontology. In these methods, sentences are reduced using reformulation and compression. Tran et al. [37] used ontologies to interpret the keyword queries, and these queries are translated into description logic (DL) conjunctive query which is evaluated with respect to underlying knowledge base.

Hennig et al. [38] described how sentences can be mapped to nodes of a flexible, wide-coverage ontology, and this mapping provides a semantic representation of the information content of sentences that improves summarization quality. For sentence classification support vector machine (SVM), hierarchical classifier is used which is trained on various sentence features. In [39], author presented a multidocument abstractive summarizer system, to capture the actual meaning and context of the document sentences, an established entity recognition and disambiguation step based on the Yago ontology is integrated into the summarization process.

Lee et al. [40] introduced Chinese news summarization, the fuzzy ontology with fuzzy concepts to model the uncertain information. The approach consists of various phases, in the first pre-processing phase, important keywords, terms & sentences extracted from news corpus. In the second phase, meaningful terms are classified on the basis of news events.

In the next phase, the fuzzy inference phase generates the membership degrees for each concept of the fuzzy ontology, and then, final summary is generated by news agent based on fuzzy ontology. Although the approach can handle the uncertain data with the help of fuzzy ontology, but it is time-consuming and limited to Chinese news.

Tanaka et al. [41] analysed broadcast news synthetically with the help of lead and body chunks of the sentences. For analysis, synthetical parsing is used. The method identifies the common phrases in the lead and body phrase, and then, insertion and substitution were applied to generate the new sentences. Substitution is applied if the body phrase has rich information and has the same corresponding phrase and if the body phrase has no counterpart insertion is applied. Insertion step basically ensures the coherency and non-redundancy and substitution enhances the information by substituting body phrase in the lead chunk.

In the rule-based methods [42], rules and categories are defined to find the important concepts from input document. In these approaches, based on input document, domain questions are formed and answers are extracted by finding the terms and concepts, and then, answers are fed into some patterns to generate the final abstractive summary.

Semantic-Based Approaches

In these methods, semantic features are extracted, and for feature extraction, text is represented by predicate-argument structure, information items, or semantic graphs, and then, this representation is given to natural language generation system to generate the final abstractive summary.

In information item-based (INIT) methods, Genest et al. [43] used “the smallest unit of coherent information in the text” called information item to find the abstractive summary. All text entities, their attributes, and different predicates between them are used as information items, and to find the information items, co-reference analysis, semantic role modelling and predicate logic analysis are used. To create the summary sentences, subject–verb–object triples information is used.

Semantic graph-based methods: These approaches are very popular and mostly used in abstractive text summarization tasks. In these methods, input text is represented as semantic relation like syntactical or ontological between the sentences. Synonymy, hyponymy, hypernymy, etc. are used as ontological relations and for syntactical relations; subject–object–verb relationship as dependency tree or syntactic tree is used. The input document is represented by graph where nodes are verbs and nouns, their semantic relationship as edges.

Moawad et al. [44] used semantic graph called “Rich Semantic Graph” (RSG) based on ontology. In this approach, author exploits a semantic graph called Rich Semantic Graph (RSG) for abstractive summary and based on ontology. The approach consists of three phases: creating a rich semantic graph for the source document, reducing the generated rich semantic graph to more abstracted graph, and finally generate the abstractive summary from the abstracted rich semantic graph. Leskovec et al. [45] presented a semantic graph as subject–predicate–object triples along with set of other linguistic features like co-reference resolution and cross sentence pronoun resolution. Support vector machine classifier is used to extract the sentences which contain set of triplet. Abstract Meaning Representation (AMR) Graphs are also used in abstractive text summarization tasks for semantic representation. These are labelled, directed acyclic graphs and rooted graphs [46].

Semantic text representation methods aim to analyse input text using semantics of words rather than syntax or structure of text. Foland et al. [47] proposed framework for multiple documents abstractive summarization in the form of semantic representation of input documents. In this method, semantics of words are analysed by assuming text is anaphora resolved and sense disambiguated. Most significant predicate-argument structures are used for content selection. Summary is generated using a language generation tool.

To extract the predicate-argument structure from each sentence, semantic role labelling is used [48]. For assigning the sentence position numbers, SENNA semantic role labeller API is used. The similarity matrix is constructed from semantic Graph for Semantic similarity scores. After that, modified graph-based ranking algorithm is used to determine semantic similarity, predicate structure, and document set relationship. Finally, to reduce the redundancy in summarization, Maximal Marginal Relevance (MMR) is used.

Deep Learning-Based Abstractive Methods

Recently neural network-based deep learning models are being applied on various complex NLP and computer vision tasks successfully. Neural network-based text summarizers have attracted considerable attention for automatic summarization. Deep Learning is a part of machine learning that involves learning and training of data. For abstractive summarization, performance of deep learning methods is better compared to traditional methods. Most neural network-based text summarizers use the following steps for summary generation:

  1. i.

    Input document words are transformed into continuous vectors called word embeddings; these word embeddings are capable of capturing semantic similarity, relation, and context with other words in the document. Neural network-based word embedding like Word2Vec (word to vector) and GloVe (Global vectors for word representation) mostly used in various NLP applications [49, 50].

  2. ii.

    Different neural network models like “Recurrent neural networks” (RNNs) or “Convolutional neural networks” (CNNs) and “Long short-term memory” (LSTM) are used as encoders for extracting document features.

  3. iii.

    These features are then fed to a decoder model (RNN or LSTM) for selection in extractive summarization or for generation in abstractive summarization.

In abstractive text summarization, semantic meaning representation of the whole document is captured, and final summary is generated based on this representation. In neural-based abstractive text summarization, encoder is used to represent the whole document and decoder is used to convert this representation into word sequences. In this paper, we have done survey of these methods based on encoder–decoder architecture.

To capture the meaning representation, author proposed three different kinds of encoders [52]. The first encoder is bag of word encoder which computes the summation of sentence word embeddings; this encoder does not preserve the word order. Second encoder is CNN model; in each convolution layer, sequence of feature vectors is extracted and max pool layers reduce the feature vectors by a factor of two. Third encoder is based on attention; in this encoder at each time step, document representation is produced based on previous context words generated by the decoder and feed-forward neural network-based language model is used as decoder for estimating the output probability distribution. Nallapati et al. [53] used gated recurrent unit as hierarchical attentive along with additional linguistic features like named entity recognition tags, parts of speech tags, term frequency, and inverse document frequency of the word.

In Pointer generator network model [54], single-layer bidirectional LSTM is used as encoder. Attention weights and the encoder’s hidden states are used for document representation. Decoder is single-layer unidirectional LSTM. For penalizing repeated attentions on already attended words, coverage mechanism is proposed. Cohan et al. [55] proposed long document abstractive summarization model. They introduced the two new large-scale datasets of long and structured scientific papers obtained from “ArXiv” and “PubMed” papers. Scientific papers are Model which includes a hierarchical encoder for capturing the discourse structure of the document and a discourse-aware decoder for generating the summary. In this model, decoder attends to different discourse sections and allows the model to represent important information more accurately from the source. In [98], another approach for long document summarization is discussed, in that salient sentences are selected and trained using classifier than “BART—Bidirectional Auto-Regressive Transformers” pretrained model is used for abstractive summary.

Table 5 discusses the comparison of the various deep learning techniques used in the abstractive summarization based on framework, dataset, training method, and evaluation metric used.

Hybrid Methods: Sentence Compression, Fusion & Generalization

To generate abstractive summary, apart from synthetic, semantic and deep learning other approaches are hybrid methods like sentence compression, fusion, and paraphrasing. Hybrid methods are combination of extractive and abstractive. In hybrid methods, initially sentences are selected based on different extractive methods and later abstractive techniques are used to generate the summary. In hybrid methods sentence compression, fusion and generalization techniques are being used. These techniques play very important role in NLP to generate abstractive summary. In sentence compression, sentence length is reduced, while original meaning of sentence is retained. In summarization and question answering task in NLP, sentence compression addresses the problem of removing words or phrases that are not important or necessary in the generated output. Discourse or tree-based structure is mostly used in sentence compression techniques.

Galley et al. [56] treated sentence compression as optimization problem. In this approach, concept of integer linear programming (ILP) is used for inferring optimal compressions in the presence of linguistically local and global constraints. Delete- and generate-based approaches are also used in sentence compression [57]. In delete-based approaches final sentence is created by deleting the unimportant words from the sentence and by connecting the rest of the text. In generate-based models different operations like text insert, substitute and replace are used. Probabilistic approaches are used in sentence compression [58]. To compress the sentence, they used decision tree and generative noisy channel methods to compress the sentence.

Knight et al. [59] proposed model based on discriminative large margin learning framework along with compressed bigram features. For sentence compression, deep syntactic analysis and representation is done with the help of dependency parser and phrase-structure parser. In [60], hybrid approach is proposed to generate abstract summary. In this method, three-step approach is used. In the first step, sentence clusters are generated using sentence-level relationship with Markov clustering principle. In the second step, sentence ranking is done in each cluster and top weighted sentence of each cluster is fused using some linguistic rules within that cluster to generate a new sentence. In the final third step, top ranked sentences from each cluster are compressed using support vector machine classification technique to generate the abstract summary.

In [61], multidocument summarization approach called “TRIMMER” is used. It consists of a three-stage process. First, a syntactic trimmer is used to provide multiple trimmed versions of each sentence in each document of a topic set. Each of these trimmed candidates is given a relevance score, either to a query or to the topic set as a whole. For ranking, eight different features like position, sentence relevance, and document relevance etc. Finally, sentences are chosen according to a linear combination of these features.

Generally, in sentence fusion and generalization techniques, extractive summarization methods are used to extract the important sentences and on that sentences, merging and fusion techniques are applied to generate the abstractive summary. Fusion tree is created based on the common information, extracted from the alignments of dependency structures of the sentences. Alignments can be at word, phrase, or substring level. Alignment of sentences helps to understand that how the words are related to each other. Natural language generation system is used to create the final summary from the best paths of fusion tree. [62]. A generalization technique helps in compressing the text by replacing the different concepts with one.

In [63], concept generalization & fusion approach is suggested for abstractive sentence generation. In this methodology, initially, generalizable sentences are obtained with the help of extraction and deletion method. Further sentence size is reduced by NLTK corpora and machine learning method. In [64], author introduces a novel text-to-text generation technique for synthesizing common information across documents for sentence fusion. To identify the sentences & phrases which conveying similar information, bottom–up local multisequence alignment is used. In [32], author presents a novel unsupervised sentence fusion method. German biographies were used as corpus. For a group of related sentences, dependency graph was built by aligning their dependency trees. Integer linear programming was used to compress dependency graph into dependency tree. For checking semantic compatibility of co-arguments, GermaNet and Wikipedia were used.

In [65], accurate and fast summarization model is proposed, which first selects the salient sentences and then rewrites them abstractively by compressing and paraphrasing to generate a concise overall summary. A sentence-level reinforcement learning technique is used for effectively utilizing the word-then-sentence hierarchical structure by maintaining language fluency.

Table 3 present the advantages and disadvantages of various abstractive methods discussed above. Initial most of the abstractive text summarization methods were based on template, rule, and ontology. These methods are time-consuming; require domain expertise and not scalable to different domains. Although recent deep learning methods require lot of labelled data and high end machines to process the data, but semantic structure, context and diverse domains can be easily handled with these methods.

Table 3 Advantages and disadvantages of abstractive text summarization approaches

Tables 4 and 5 show the comparison of various abstractive text summarization methods based on technique, dataset, and evaluation metric used. According to our observation, DUC dataset is used for most of the techniques and recent deep learning methods also used CNN daily mail & Gigaword dataset. For abstractive methods, Rouge1 score is better for deep learning methods compared to other methods and performance of rouge score is shown in Fig. 4 on DUC dataset.

Table 4 Comparison between different abstractive summarization methods
Table 5 Comparison of abstractive summarization techniques using deep learning
Fig. 4
figure 4

Comparison of the Abstractive Text Summarization methods using Rouge1 on DUC dataset

Text Summarization Evaluation

Text summarization evaluation is one of the most difficult tasks, because there is no ideal summary for an input document. Which is the good summary, it is very difficult to judge. Since for a same document, different humans can generate different summaries, and also, the evaluation must be accurate & fast. Because of use of various evaluation metrics and lack of a standard evaluation metric, summary evaluation task becomes very difficult and challenging [67]. Evaluation methods for text summarization can be classified as intrinsic or extrinsic [68]. Intrinsic evaluation is to evaluate the system in of it & evaluation is based on overall summary informativeness and coherence, whereas extrinsic evaluation is used to measure the impact of summary on different task-based performance, such as question answering, information retrieval & categorization, etc.

Intrinsic evaluation is done by comparing the automatic generated summary with the human written reference summary. The intrinsic methods can be of type text quality or content [69]. Text quality means linguistic aspects of the output generated summary, such as clarity, grammaticality, and coherence. The content evaluation further can be of two type co-selection or content. In co-selection, different metrics like precision, recall, & F score can be used and in content-based measures like cosine similarity, “Recall-Oriented Understudy for Gisting Evaluation” (ROUGE) & PYRAMID score can be used. From the literature survey, we found that generally content-based evaluation methods like precision, recall, F score, and ROUGE are used in summarization tasks.

For a text summarizer, input document, reference summary, and output generated summary are given, and then, we can find the true-positive, false-negative, & false-positive relations. From these relations, we can define the precision recall and f score. The “precision is the quantity of right information recovered by a system comparing to what it has recovered”, i.e., it is ratio of given in Eq. (1)

$$P=\frac{True Positive}{True Positive+False Positive}.$$
(1)

“The recall is the quantity of right information recovered by a system comparing to what it should recover”, given in Eq. (2)

$$R=\frac{True Positive}{True Positive+False Negative}.$$
(2)

The most used F-Score is F1-score which is a trade-of between precision & recall, given below in Eq. (3)

$$F=\frac{2\times P\times R}{P+R}.$$
(3)

These evaluation measures are not useful when the system selects the other sentences than the reference sentences, so ROUGE score is generally used in abstractive summarization.

“Recall-Oriented Understudy for Gisting Evaluation” (ROUGE), proposed in [70], is a method inspired from “BiLingual Evaluation Understudy” (BLEU) score proposed by [71] used for machine translation evaluation. In these methods, the number of N grams in system output summary and reference summary is computed and from that recall value is calculated. Since in text summarization tasks, one input document can have many reference summaries and these metrics allow the use of many reference summaries. ROUGE score has many variations like have been proposed [72]: “ROUGE-N”, “ROUGE-L”, “ROUGE-W”, and “ROUGE-S”. Equation (4) shows ROUGE-N calculations

$$ROUGE \left(N\right)= \frac{\sum_{ S \in summref }\sum_{N- gram \in S} Count match (N-gram)}{\sum_{S \in summref} \sum_{N- gram \in S} Count (N-gram)},$$
(4)

where N is N gram size, and Count (N gram) is the number of N grams in the reference summary. Count match (N gram) is the number of N grams in the candidate and reference summaries. Using these similarity scores for text summarization can present some limitations like human variation, granularity analysis, and semantic equivalence. To solve these limitations, semi-automatic approach Pyramid score is used [73].

Resources

One of the major challenges in text summarization is the lack of resources, especially for an abstractive text summarization. Nowadays, there are many powerful tools & natural language processing libraries are available for pre-processing, parsing, etc. Besides tools and libraries, annotated corpus as well can be seen as a big challenge for text summarization. In this section, some of the standard datasets available, which can be helpful to researchers, are discussed.

Document Understanding Conference (DUC) Datasets

DUC datasets are provided by the “National Institute of Standards and Technology” (NIST) and they are the most commonly used datasets in the automatic text summarization research work. The DUC conference released these datasets as a part of their summarization shared task. DUC 2001 to 2007 datasets are available on DUC website. Three different summaries are there in the each dataset, manual summary, baseline summaries, and the summaries that were generated as a challenge. In 2008, DUC became a summarization track in the Text Analysis Conference (TAC).

Gigaword Dataset

It is also popularly known as Gigaworld dataset and contains nearly 10 million documents (over four billion words) of the original English Gigaword Fifth Edition [74]. It consists of articles and their headlines. It is mostly used for abstractive summarization using neural networks.

CNN/DailyMail Dataset

It is an English-language dataset containing just over 300 k unique news articles as written by journalists at CNN and the Daily Mail. The original version was created for machine reading and comprehension and abstractive question answering and the current version supports both extractive and abstractive text summarization task.

Opinosis Dataset

Opinosis dataset contains sentences extracted from user reviews on a given topic. There are total 51 topics are there with each topic having approximately 100 sentences on average. The reviews were collected from various sources like Amazon for various electronics, Tripadvisor for hotels reviews, and Edmunds for cars reviews. Dataset also contains gold standard summaries.

ArXiv Dataset

ArXiv dataset is a free, open pipeline and machine-readable repository with 1.7 million articles from arXiv, with relevant features such as article titles, authors, categories, abstracts, full-text PDFs, and more [75]. Apart from text summarization, this dataset can be used in various other applications like trend analysis, paper recommender engines, category prediction, co-citation networks, knowledge graph construction, and semantic search interfaces.

Discussion on Issues/Challenges in Automatic Text Summarization

There has been lot of research in the field of text summarization. New methods & approaches have been developed to improve the performance of existing techniques. But also there exist a lot of challenges which makes difficult the advancement in automatic text summarization, so in this section, will discuss the important issues and challenges in the field of text summarization research that needs to be addressed by the research community, a few challenges are listed below:

Dataset Availability

The large and good dataset play very important role in automatic text summarization. From the survey, we have seen most of the datasets available belong to short length of news articles like DUC, TAC. Gigawords dataset and CNN daily mail dataset are also the mostly used datasets for abstractive text summarization methods using deep learning. Recently, two new long datasets arXiv and PubMed on scientific articles are introduced in [55]. Recently, deep learning methods are proved to be very effective for abstractive text summarization, so more research and work is required to create good datasets for task & query-specific abstractive summarization. These types of datasets will be very helpful for personalized and sentiment-based summary.

Language Support

In automatic text summarization, most of the work is on English language, because it is most widely used language worldwide, and for English language, lot of natural language processing computational tools and resources like stemmers, parsing tools stop word list etc. are available. On the contrary, other multi-lingual languages have a various constraints as far as the computational tools & resources are concerned like NLP tools, spelling variations, different punctuations, etc. [76]. Therefore, to improve the quality of current summarization systems for other multi-lingual languages, there is a need to develop and improve the NLP tools and resources like Part of speech tagger, name entity recognition & parsing tools, etc.

Scalability

In text summarization, most of the work is performed on simple and compound sentences, and also, there is a need to consider the complex compound sentences, and for those, more scalable approaches are required. Summarization using deep learning algorithms requires lot of data, power and slow encoding mechanism to give good results with long documents.

Algorithm for Sentence Generalization & Fusion

In abstractive summarization sentence fusion, paraphrasing and generalization are one of the very challenging tasks. These techniques require extensive knowledge of natural language processing and not much work is done in this field. Most of the available methods are rule based and require exponential space complexity, and thus, more focus &work is required to find the algorithms in sentence generalization & fusion.

Issue of Unknown/Rare Words in Deep Learning Approaches

The problem of rare and unknown words is an important issue that can potentially affect the performance of many NLP systems, including traditional count-based and deep learning models [77]. The rare words occur less frequently in the training set and thus are difficult to learn a good representation, resulting in poor performance. It is a major challenge in text summarization methods also, due to which these methods generally fails to preserve the meaning of final summary.

Summary Evaluation

Summary evaluation is very challenge task either manually or automatically, because it is hard to define ideal or perfect summary. In text summarization assessment, mostly ROUGE score is used for evaluation. For abstractive summary, it may be not a good metric, since these scores only matches the N grams and helps in measuring coverage of entire summary but not in coherence and non-redundancy. Therefore, there is need of an evaluation measures which can find the semantic overlap between the sentences.

Conclusion

Automatic text summarization is becoming increasingly important due to availability of vast amount of data. In this paper, survey is about the recent research and progress done in the field of automatic text summarization methods and their comparison based on different parameters. Extractive and abstractive text summarization methods are studied in detail and presented. For extractive summarization, performance of unsupervised graph-based methods is better compared to other methods on short length documents, whereas recent deep learning methods are better for large datasets and long documents. However, these summaries lack in various linguistic aspects like coherence, sentence cohesion and Coreference, whereas in abstractive summaries, they are grammatically more correct and coherent, but these methods are more difficult and challenging as they require natural language generation. Evaluation of summarization is also a challenging task; apart from automatic evaluation like Rouge score, it also requires human evaluation which is tedious and time-consuming. Finally, various challenges & gaps in existing methods are also discussed which can help researchers to identify and focus the areas where further research & improvement can be done.