Keywords

1 Introduction

The main objective behind the idea of summarization [1] of a text is to reduce the total size of the given text by removing the unnecessary parts and keeping the necessary important information and the overall meaning same. These days, sharper and more compact text summarization is highly needed and appreciated as it reduces the time required to read the lengthy articles. Text Summarization is basically the concept of analyzing the data and extracting the important and necessary information from it and to form a concise paragraph while preserving the initial meaning of it. It also plays an important role in today’s age of huge data available online. There are several techniques to determine and replace untrustworthy data.

Text Summarization is mainly of two types: Extractive summarization [2] creates the summary by concatenating the key passages from the text. The text’s meaning, in this approach, is ignored and only the subset of key sentences is highlighted. Abstractive summarization [3] analyzes and provides a sound and concise understanding of the text. Using the techniques of NLP [4], this approach tries to figure out the text’s meaning and identify the key sentences in it. In contrast to the extractive approach, it tries to generate new words to form the summary. Abstractive summarization is preferred and used over extractive summarization because it generates a concise and effective summary that a human can construct from it’s original text thus providing an improved comprehension of the summary [5]. As the model was developed for PIB India [6] to summarize their press releases, it was important in generating summaries that the readers can have an improved understanding of and relate to. Hence the approach of abstractive summarization has been followed here.

The summary generated by the existing summarization techniques has certain insufficiency such as repetition of same sentences or phrases, inclusion of a high number of named entities, and lower degree of abstraction. Hence, in this work, a novel architecture, BASiP, is proposed for text summarization. The existing models are contrasted with the proposed architecture. and the results found are promising. At the end, certain case studies are supplied on different articles of PIB as a reference. In the application, a text-to-speech converter is also added to read out the summarized news for visually impaired individuals.

The remaining paper is organized as follows: In Sect. 2, a literature review on existing summarization techniques are discussed. In Sect. 3, an overview of the proposed architecture, BASiP, is given. In Sect. 4, the various experimental results are discussed. Finally, the conclusion is given in Sect. 5.

2 Literature Review

Text summarization has been extensively used and researched upon in recent years. Abstractive summarization techniques have vastly emerged in this field due to their ability to learn complex representations of input data. In this literature review, we explore the recent advances in the field of text summarization models.

In [7], one common approach of the sequence-to-sequence (Seq2Seq) model including attention mechanism has been evaluated. The Seq2Seq model produces a summary by mapping the inputted sequence of words to the output summary using an encoder-decoder architecture. This mechanism helps the model focus on the most relevant areas of the inputted sequence while generating the summary. However same sentence or phases often get repeated in the produced summary, and the model struggles with handling rare or unknown words, which can lead to incomplete summarization.

Factual consistency of the abstractive text summarization method has also been identified in [8]. This mechanism enhances this models’ factual consistency by identifying important entities in the inputted text and ensures that those entities are accurately represented in generated summary. It works by first identifying entities in the input text using an off-the-shelf Named-Entity Recognition (NER) system. It then trains a separate model to predict the importance of each entity for the overall meaning of the text. Finally, during summarization, the mechanism ensures that important entities are precisely shown in the generated summary by using the predicted importance scores to guide the selection and generation of summary content. However, there is still room for improvement, and more ways can be researched upon and explored in future researches to further enhance the accuracy of entity importance prediction and develop more advanced mechanisms for incorporating entity-level information into the summarization process. Another promising approach is the use of pre-trained language models such as BERT for the summarization of text that has been provided by [9, 10]. These models have produced excellent results on a range of natural language processing tasks and have been used for summarization nowadays.

One advantage of pre-trained models is that they can leverage large amounts of unsupervised data to learn rich representations of language. Reinforcement learning has also been explored for text summarization in [11, 12]. These models use a reward function to enhance the generated summary and are showcased to produce summaries with improved fluency and informativeness. However, reinforcement learning requires careful tuning of the reward function and can be computationally expensive. Graph-based models have also been suggested for in [13, 14]. These models represent the input document as a graph and use graph algorithms to extract important nodes and edges for the summary. Graph-based models can capture both local and global dependencies between sentences and are showcased to produce high-quality summaries.

A novel attention mechanism for abstractive text summarization that takes into account discourse information to better capture the structure and coherence of long documents has been put forward in [15]. The discourse-aware attention module uses a graph neural network to model the discourse relationships between sentences and learn discourse-aware representations for each sentence. These representations are then used in the sentence-level attention module to compute the appropriateness of each sentence to the summary. Although the research that will be conducted in future could explore ways to enhance the model’s discourse modeling capabilities by incorporating more sophisticated graph neural networks or other methods for capturing more complex discourse relationships between sentences, [16] incorporates multiple sources of guidance to boost the accuracy and informativeness of abstractive text summarization. The proposed model, known as GSum, consists of many modules that cooperate to deliver summaries that are influenced by the input data. The first module analyzes the input text to find keyphrases and entities, while the second module creates an extractive summary using the text’s most essential passages. The third module generates an abstractive summary using an attention-based neural network that is guided by the extracted summary and keyphrases/entities. The GSum model’s main novelty is the inclusion of multiple sources of guidance, which enables the model to generate summaries which are more precise and useful than those produced by models that just depend on one source of information. However, the incorporation of other sources of guidance, such as discourse or sentiment information, can be done to further improve the quality and coherence of the generated summaries.

Because of the huge amount of data, the data can also be stored in the cloud, and data privacy should also be maintained [17,18,19].

Table 1 presents a gist of the literature review.

Table 1 Summary of the existing work

3 Methodology

In this work, a novel abstractive text summarization architecture, ‘BASiP’, is proposed, which is a combination of BART, SimCLS Framework, and Paraphrasing model. Figure 1 shows the proposed architecture, BASiP. The summarization model ‘f’ which uses an evaluation metric M, that depends on the source dataset D, aims to provide a summary ‘S’ of the candidate S = f(D) [20]. This receives the greatest ROUGE score m, where m is defined as

$$\begin{aligned} {}M(S, S\,\hat{\,}){}\end{aligned}$$
(1)
Fig. 1
A flowchart of a data processing pipeline. It starts with a raw dataset and ends with a summarized text, passing through steps like preprocessing, model application, and evaluation.

Proposed Architecture BASiP(BART+SimCLS+Paraphraser)

In our suggested approach, the entire generation process is divided into a number of stages, each of which consists of a base model BART, for producing candidate summaries and a robust framework SimCLS for optimizing the summary.

  • Step 1: Preprocessing: Incorrect interpretation of the overall statistics of the data may result from duplicate or missing values. The overall learning of the model is frequently disrupted by outliers and inconsistent data points, which leads to inaccurate predictions. That is why the raw dataset D is being preprocessed which further includes the steps of tokenization and fine-tuning. In machine learning models, tokenization and fine-tuning are used for text data representation, improvement of model performance, to reduce training time, and to improve accuracy. NLP-based Text2Text Generation tokenizer BartTokenizer from “Yale-LILY/brio-cnndm-uncased” is used for tokenization purposes. Tokenizing a text consists of the following steps:

    • Sentence segmentation: Divide the text into sentences.

    • Word tokenization: Divide each and every sentence into individual words or tokens.

    • Removing stop words: Stop words are defined as frequently used words such as ‘and’, ‘the’, and ‘of’ which adds semantic meaning to the text. The stop words are cleaned from the text to reduce it’s size and speed up the training process.

    • Removing punctuation and special characters: Remove punctuation marks and special characters from the text to reduce the size of the data and eliminate any potential distractions for the algorithm.

    Once the text data has been tokenized, it can be further fine-tuned by preparing the data by tokenizing the text and encoding it into a numerical format that the model can process. It is important to note that the fine-tuning process can be time-consuming and computationally expensive, as the model must be trained on a large amount of data to achieve good performance. However, the results can be very powerful, as the pre-trained language model has already learned gained knowledge about the framework of the given text, which can be leveraged to perform various types of NLP tasks with high accuracy.

  • Step 2: BART: BART is the summarization model developed by Facebook AI Research. BART uses a sequence-to-sequence model for summarization, which means it takes a sequence of text as input and produces a sequence of output text. BART achieves summarization of text by using the encoder-decoder architecture [21]. The encoder takes the text and converts it into a series of hidden states, which capture the input text’s meaning. After the BartTokenizer encodes the input text into a numerical format, the parent model BART processes the data to give a summarized encoded output. The decoder generates the human-readable output summary from the hidden states. To summarize text, BART is typically fine-tuned on a dataset of paired input and summary texts. During fine-tuning, this model is trained to produce a summary that captures the most significant information in the given input text while keeping the summary concise. BART uses a combination of attention mechanisms and beam search to generate accurate and fluent summaries.

  • Step 3: SimCLS Framework: The summary S1 is then fed as input into SimCLS to further optimize the summary generated in step 2. SimCLS is an abstractive summarization model. It uses a two-stage approach that comprises a generator and a scorer [22]. In the first stage, the generation model g(\(\cdot \)) is trained to maximize the likelihood of reference summary S\(\hat{\,}\) given dataset, D. g(.) is a Seq2Seq model. After that, using an instance approach like Beam Search on the pre-trained g(\(\cdot \)), many candidate summaries S–1, ..., S–n are produced, where n = the number of sampled candidates. In the second stage, the scorer assigns a score to each candidate given the source document. The main motive is to improve the produced candidate summary Si to increase the ROUGE score in comparison to the original text D. It is addressed using contrastive learning and construct an evaluation function h(\(\cdot \)) that seeks to distinguish the generated candidates by giving them different ROUGE scores r1, ..., rn on the basis of the similarity found between the source text and the candidate Si. That is

    $$\begin{aligned}{}ri = h(Si , D).{}\end{aligned}$$

    This value is the cosine similarity generated among the first tokens when encoded. The candidate which has the highest rating is the final summary of the output S.

  • Step 4: Paraphrasing: To provide a more precise and sound output, as a part of the package, we have introduced a paraphrasing part that has the main objective of improving any lousy summary into a more grammatically accurate one. The paraphraser subdivides and collects the article into each sentence. It then reframes each sentence individually and before giving the final output, it joins all of them together. Sometimes, it does this to make the summary more fluent and to strengthen the sentence structure by substituting the terms with their synonyms. This beautifies the generated summary S making it more appropriate and industry-ready for usage.

4 Experimental Results

The data sets which are used for testing the efficiency of the suggested model BASi(BART+SimCLS) are CNNDM and XSUM. CNNDM stands for Cable News Network and Daily Mail dataset. This is an English language dataset that contains over 300k news articles which are written by the journalists of the CNN and the Daily Mail. This dataset supports data for extractive as well as abstractive summarization [23, 24]. XSUM stands for Extreme Summarization dataset which is used for evaluating abstractive single document summarization systems. This mainly aims to generate short single sentence precise summary for an article [25]. ROUGE, also known as the Recall Oriented Understudy for Gisting Evaluation, is a set of measurement metrics which is used for calculating and evaluating the summarization generated automatically by natural language processing. It mainly has 3 main parameters of measurement, Recall, Precision, and F1 Score which provides an analysis of the automatic summarized data when compared to the original summary [26, 27]:

  • $$\begin{aligned} {} {\text {Recall}}= \frac{\text {No of Word matches}}{\text {No of Words in References}}{}\end{aligned}$$
    (2)
  • $$\begin{aligned} {} {\text {Precision}}= \frac{\text {No of Word matches}}{\text {No of Words in Summary}}{}\end{aligned}$$
    (3)
  • $$\begin{aligned} {} {\text {Recall}}= 2\bigg (\frac{\text {Precision * Recall }}{\text {Precision + Recall}}\bigg ){}.\end{aligned}$$
    (4)

Table 2 depicts the ROUGE value calculated on the factors of R1, R2, and RL.

Table 2 Comparison of BASi with existing work

The results from the following table provide a comparative study about the different existing models and the proposed model, BASi. In this work, BART has been implemented as the initial model for the CNNDM dataset, while for the XSUM dataset, the initial model is PEGASUS. The XSUM dataset generates a single sentence dataset, and thus to achieve finer results, the backbone of the model here is the pre-trained PEGASUS model [28, 29].

Fig. 2
A group column chart plots values versus rouge 1, 2, and L. It plots values for B A R T, B A S I, and B A S I P in three groups. For all the groups, the value of B A S I is the highest.

Comparison of ROUGE values of BART, BASi, and ‘BASiP’

It can be seen from Fig. 2, BASi outperforms the other models in a two-stage summarization framework for the datasets XSUM and CNNDM. Models like GSUM require additional assistance for input along with a separate encoder to encode the information, while the proposed model, BASi, uses the techniques used in BART. Thus, BASi outperforms different models and provides an effective summarization technique.

Table 3 shows some example references from PIB and the summary generated by our proposed model, BASiP. The findings of BASiP and BART show how our strategy aids the abstractive model in removing superfluous characters and symbols from the original input. BASiP learns to resolve unnecessary extra characters which BART cannot. As can be seen from Table 3, the first summary generated by BART contains ‘/’ at the start and end of several words, the second summary is made up of the meaningless term ‘Dsy’, and the third summary of the preceding table lacks a numerical value following ‘Rs’. Instead, our proposed model ‘BASiP’ learned to disregard or change these error patterns and never generated them over the whole test set. This was probably because it noticed that candidates with this pattern were rarely generated with high ROUGE scores and appropriately de-weighted the probability.

5 Text-to-Speech Conversion

The summarized text by the proposed architecture ‘BASiP’ was converted to speech for visually impaired users who would use the proposed application. The content of the website can be read aloud by a synthetic voice when the text is selected, enabling persons with visual impairments to access information without the use of traditional visual cues. This significantly enhances the overall user experience and broadens the site’s accessibility.

Table 3 Case study on PIB dataset

In the literature, sophisticated methodologies have been used for visual speech recognition [33]. In this work, an inbuilt library of html-5 called responsive voice has been used [34].

6 Conclusion

In this work, a novel architecture, BASiP, has been proposed for abstractive text summarization. The experimental results of BASi (BART + SimCLS) on comparison with the existing work shows the effectiveness of the proposed architecture. BASi outperforms the other models in a two-stage summarization framework for the datasets XSUM and CNNDM. Moreover, a few case studies has also been provided which shows that BASiP constructively generates summary from the PIB references. The future work involves generation of summary for text given in tabular format and representation of the same in a concise and meaningful way.