An Ensemble-Based Approach for Generative Language Model Attribution

Abburi, Harika; Suesserman, Michael; Pudota, Nirmala; Veeramani, Balaji; Bowen, Edward; Bhattacharya, Sanmitra

doi:10.1007/978-981-99-7254-8_54

Harika Abburi¹²,
Michael Suesserman¹³,
Nirmala Pudota¹²,
Balaji Veeramani¹³,
Edward Bowen¹³ &
…
Sanmitra Bhattacharya¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14306))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1324 Accesses

Abstract

Recently, Large Language Models (LLMs) have gained considerable attention due to their incredible ability to automatically generate texts that closely resemble human-written text. They have become invaluable tools in handling various text-based tasks such as content creation and report generation. Nevertheless, the proliferation of these tools can create undesirable consequences such as generation of false information and plagiarism. A variety of LLMs have been operationalized in the last few years whose abilities are heavily influenced by the quality of their training corpus, model architecture, pre-training tasks, and fine-tuning processes. Our ability to attribute the generated text to a specific LLM will not only help us understand differences in the LLMs’ output characteristics, but also effectively distinguish machine-generated text from human-generated text. In this paper, we study whether a machine learning model can be effectively trained to attribute text to the underlying LLM that generated it. We propose an ensemble neural model that generates probabilities from multiple pre-trained LLMs, which are then used as features for a traditional machine learning classifier. The proposed approach is tested on Automated Text Identification (AuTexTification) datasets in English and Spanish languages. We find that our models outperform various baselines, achieving macro \(F_{macro}\) scores of 0.63 and 0.65 for English and Spanish texts, respectively.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Supervised Machine-Generated Text Detectors: Family and Scale Matters

Computer-Generated Text Detection Using Machine Learning: A Systematic Review

Classification and Prediction of Text Data by Using a Natural Language Processing Algorithm

Keywords

1 Introduction

Recent advancements in machine learning and natural language processing research have paved the way for the development of sophisticated LLMs. The widespread availability and the ease with which they can generate coherent content are contributing to the production of massive volumes of automatically generated online content. LLMs have demonstrated remarkable performance in producing human-like language, showcasing their potential use across a wide range of applications, such as domain specific tasks in legal [20] and financial services [23]. Foundation models such as OpenAI’s GPT-3 [1] and Big Science’s Bloom [19] are publicly available, and can generate highly sophisticated content with basic text prompts. This often presents a challenge to discern between human and LLM-generated text.

While LLMs demonstrate the ability to understand the context and generate coherent human-like responses, they do not have a true understanding of what they are producing [12]. This could potentially lead to adverse consequences when used in downstream applications. Generating plausible but false content (hallucination [10]), may inadvertently help propagate misinformation, fake news, and spam [9].

There is a considerable body of research available on detecting text generated by artificial intelligence (AI) systems [9, 21]. However, the identification of a specific LLM responsible for generating such text is a relatively new area of research. We argue that attributing the generated text to a specific LLM is a vital research area, as the knowledge of the source LLM would enable one to be vigilant regarding potential known biases and limitations associated with that model and use the content appropriately in downstream applications with suitable oversight [21].

In this study, we focus on identifying the source of the AI-generated text (referred to as model attribution hereafter) in two different languages, English and Spanish. More specifically, given a piece of text, the goal is to determine which specific LLM generated the text. To address this problem statement, we propose an ensemble classifier, where the probabilities generated from various state-of-the-art LLMs are used as input feature vectors to traditional machine learning classification models to produce the final predictions. Our experiments show multiple instances of the proposed framework outperform several baselines using well-established evaluation metrics.

2 Related Work

The majority of research in this area is focused on differentiating between text authored by humans and text generated by AI [3, 17].

The use of neural networks leveraging complex linguistic features and their derivatives is most prevalent in detecting AI-generated text. DetectGPT [15] generates minor perturbations of a passage using a generic pre-trained Text-to-Text Transfer Transformer (T5) model, and then compares the log probability of the original sample with each perturbed sample to determine if it is AI-generated. Deng et al. [4] build upon the DetectGTP model by incorporating a Bayesian surrogate model to select text samples more efficiently, which achieves similar performance as DetectGTP using half the number of samples. Mitrovic et al. [16] developed a fine-tuned Transformer-based approach to distinguish between human and ChatGTP generated text, with the addition of SHapely Additive exPlanations (SHAP) values for model explainability. This approach provides insight into the reasoning behind the model’s predictions. Statistical methods have also been applied for detection of AI-generated text, such as the Giant Language model Test Room (GLTR) approach [6].

The increasing sophistication of generative AI models coupled with adversarial attacks make detection of AI-generated text especially challenging. Two forms of attacks that create additional complications are paraphrasing attacks and adversarial human spoofing [17]. Automatically generated text may also show factual, grammatical, or coherence artifacts [14] along with statistical abnormalities that impact the distributions of automatic and human texts [8]. The importance of detecting AI-generated text and the corresponding challenges will foster further research on this topic.

In addition to distinguishing between human and AI-generated text, identifying a specific LLM that generates the artificial text is becoming increasingly important. Uchendu et al. [21] explored the Robustly optimized BERT approach (RoBERTa) model to classify AI-generated text into eight different classes. Li et al. [11] developed a model for AI-generated multi-class text classification on Russian language using Decoding-enhanced BERT with disentangled attention (DeBERTa) as a pre-trained language model for category classification. These prior works focused on model attribution for only a single language, such as English or Russian. In contrast to the aforementioned research, and to the extent of our knowledge, our approach to model attribution is the first one to be applied across multiple languages, demonstrating the robustness of our approach across attributable LLMs, languages, and domains.

3 AuTexTification Dataset

The dataset used in the study comes from the Iberian Languages Evaluation Forum (IberCLEF)-AuTexTification shared task [18]. The data consists of texts from five domains, where three domains (legal, wiki, and tweets) are used for training, and two different domains are used for testing (reviews and news). It contains machine generated text from six text generation models, labeled as bloom-1b7 (A), bloom-3b (B), bloom-7b1 (C), babbage (D), curie (E), and text-davinci-003 (F) for two different languages, English and Spanish. The LLMs used to generate the text are of increasing number of neural parameters, ranging from 2B to 175B. The motivation here is to emulate realistic AI text detection approaches that should be versatile enough to detect a diverse set of text generation models and writing styles. The number of samples in each class for both languages is shown in Table 1. To showcase the complexity of the problem, we also present samples for each category from both the English and Spanish datasets in Tables 2 and 3.

Table 1. Label distribution across the languages for model attribution task. Train and test splits for each language are also shown.

Full size table

Table 2. Samples of English AI-generated text, with corresponding source models (labeled A-F).

Full size table

Table 3. Samples of Spanish AI-generated text, with corresponding source models (labeled A-F).

Full size table

Table 4. Models explored for English and Spanish datasets

Full size table

4 Proposed Ensemble Approach

In this Section, we detail our approach for conducting the generative language model attribution. We first provide a description of the LLMs and machine learning models that we explored for model attribution. Next, we discuss the proposed ensemble neural architecture, where we fine-tuned the LLMs and then passed their predictions to various traditional machine learning models to perform the ensemble operation.

4.1 Models

LLMs: We explored various state-of-the-art LLMs [22], such as Bidirectional Encoder Representations from Transformers (BERT), DeBERTa, RoBERTa, and cross-lingual language model RoBERTa (XLM-RoBERTa) along with their variants. Since the datasets are different for each language, and the same set of models will not fit across them, we fine-tuned different models for different languages. We investigated more than 15 distinct models for each language and selected the ones presented in this paper based on their performance on the validation data. This selection was made to ensure model diversity, which aids in generalisation and improved comprehension of context and semantics. Table 4 lists the different models that we selected for the two languages under consideration. We briefly describe each of the LLMs below.

microsoft/deberta-base [7] is a transformer model which improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder.
xlm-roberta-large-finetuned-conll03-english is XLM-RoBERTa based model [2] which is a large multi-lingual language model trained on 2.5TB of filtered Common Crawl data. The conll03-english model is fine-tuned on the XLM-RoBERTa model with conll2003 dataset in English.
roberta-large, PlanTL-GOB-ES/roberta-large-bne are RoBERTa based models [13] which are pre-trained on a large corpus of English data in a self-supervised fashion using a Masked Language Modeling (MLM) objective. The roberta-large-bne model has been pre-trained using the largest Spanish corpus with a total of 570GB of text compiled from the web crawlings.
dbmdz/bert-base-multilingual-cased-finetuned-conll03-spanish, hiiamsid/sentence_similarity_spanish_es, allenai/scibert_scivocab_cased, bert-large-uncased-whole-word-masking-finetuned-squad, and allenai/longformer-base-4096 are BERT-based models [5]. The bert-base-multilingual model is pre-trained on 104 languages with the largest Wikipedia data using a MLM objective and further pre-trained on the CoNLL-2002 dataset in Spanish. The sentence similarity Spanish model is a sentence-transformer model where the base model is BETO which is trained on a large Spanish corpus. The scibert model is trained on papers taken from Semantic Scholars. The BERT-large SQuAD model is slightly different from other BERT models since it is trained with a whole word masking technique and further fine-tuned on the Stanford Question Answering Dataset (SQuAD). The Long-Document Transformer (Longformer) model is a BERT-like model stemmed from the RoBERTa checkpoint and pre-trained for MLM on long documents which supports sequences of lengths up to 4,096.

Machine Learning (ML) Models: We explored various traditional machine learning and ensembling models such as Bagging , Voting, OneVsRest, Error-Correcting Output Codes (ECOC), and LinearSVC [24].

4.2 Proposed Ensemble Neural Architecture

As shown in Fig. 1, an input text is passed through variants of the pre-trained LLMs such as, DeBERTa (D), XLM-RoBERTa (X), RoBERTa (R), and BERT (B). During the model training phase, these models are fine-tuned on the training data. For inference and testing, each of these models independently generate classification probabilities (P), namely \(P^{D}\), \(P^{X}\), \(P^{R}\), \(P^{B}\), etc. In order to maximize the contribution of each model, each of these probabilities are concatenated (\(P^{C}\)) or averaged (\(P^{A}\)), and this output is passed as a feature vector to train various traditional ML models to produce final predictions.

5 Experiments

In this section, we discuss the evaluation of the proposed methods. We report model performance using well-established metrics such as accuracy (Acc), macro F1 score (\(F_{macro}\)), precision (Prec) and recall (Rec).

5.1 Baselines

We establish Linear Support Vector Classification (SVC), Logistic Regression (LR), and Random Forests (RF) as baselines, where each baseline model takes two distinct feature sets – word n-grams and character n-grams. We also explored other baselines like the Symanto Brain Few-shot and Zero-shot without label verbalization approaches^{Footnote 1}, but due to their relatively low performance compared to the approaches presented in Table 5, we do not report those results.

5.2 Implementation Details

During model training we set aside 20% from the training data for validation. However, for the held-out testing phase, the validation set is merged with the training set. The following hyper-parameters are used for model fine-tuning: batch size - 128, learning rate - \(3e^-5\), max sequence length - 128, and number of epochs is set to 20. We also used a sliding window to prevent the truncation of longer sequences, allowing the model to handle longer sentences.

Table 5. Baseline results of model attribution for both English and Spanish.

Full size table

Table 6. Results of model attribution on the English dataset

Full size table

Table 7. Results of model attribution on the Spanish dataset

Full size table

5.3 Results

Table 5 shows results produced using three traditional ML methods (Linear SVC, LR, and RF) across two different feature sets (word n-grams and character n-grams) for both languages. LR with character n-grams outperforms other approaches on the macro F1 performance metric for both languages.

Tables 6 and 7 provide results on English and Spanish datasets respectively, with different variants of the proposed architecture. The first block in the table shows the results for individual LLMs. The second and third blocks show the ensemble results with \(P^{C}\) and \(P^{A}\) respectively, as input feature vector to several machine learning models.

The results on the English test data are shown in Table 6. Out of all the combinations, Linear SVC with concatenated feature vector (\(P^{C}\)) as an input, outperforms other approaches for a majority of the evaluation metrics with an \(F_{macro}\) score of 0.63. Table 7 shows the results on the Spanish test dataset where the concatenated feature vector (\(P^{C}\)) is passed as an input to the Linear SVC classifier outperforms the other approaches with an \(F_{macro}\) score of 0.656.

Overall, we observed that the ensemble models performed well when compared to individual LLMs. Ensembling the models provides additional cues from each individual model, which helps enhance the performance. Furthermore, several variants of the proposed framework outperforms each of the baselines across the evaluated metrics.

Table 8. Samples form the English test dataset where the prediction from the ensemble model (Linear SVC) is accurate, that from the individual LLM is not.

Full size table

Figure 2 shows the class-wise performance comparison of our best ensemble method (Linear SVC) with that of the best baseline (LR with character-n-grams) on English and Spanish datasets. For all the classes in both datasets, the macro F1 score of the proposed method outperforms the baseline macro F1 scores. Even though the number of parameters for LLMs that we explore are not huge, our proposed ensemble approach performed very well on text generated using the large model with 175B parameters (text-davinci-003).

Tables 8 and 9 show a few samples from the test data for English and Spanish, respectively. In these samples, we demonstrate that while no individual LLM predict the ground truth label correctly, the ensemble Linear SVC classifier predicts the correct label. We also show the ground truth label associated with each sample.

Table 9. Samples form the Spanish test dataset where the prediction from the ensemble model (Linear SVC) is accurate, that from the individual LLM is not.

Full size table

6 Conclusion

In this paper, we explored generative language model attribution for English and Spanish languages. We proposed an ensemble neural architecture where the probabilities of individual LLMs are concatenated and passed as input to machine learning models. Each of the variants of the proposed ensemble approach outperformed several traditional machine learning baselines and the individual LLMs for both languages. Our model results in macro \(F_{macro}\) scores of 63% and 65.6% on English and Spanish data, respectively, outperforming other baseline approaches. Our analysis showed that our proposed approach is also effective at classifying the samples that are generated using LLMs with large number of parameters. Our approach also performs well for out-of-domain themes since themes in the test dataset were different from the training dataset.Directions for future work include developing a multi-task approach for generative language model attribution as well as exploring other multilingual datasets.

Notes

1.
https://www.symanto.com/nlp-tools/symanto-brain/.

References

Brown, T., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Crothers, E., Japkowicz, N., Viktor, H.: Machine generated text: a comprehensive survey of threat models and detection methods. arXiv preprint arXiv:2210.07321 (2022)
Deng, Z., Gao, H., Miao, Y., Zhang, H.: Efficient detection of LLM-generated texts with a Bayesian surrogate model. arXiv preprint arXiv:2305.16617 (2023)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Gehrmann, S., Strobelt, H., Rush, A.M.: GLTR: statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043 (2019)
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=XPZIaotutsD
Ippolito, D., Duckworth, D., Callison-Burch, C., Eck, D.: Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650 (2019)
Jawahar, G., Abdul-Mageed, M., Lakshmanan, L.V.: Automatic detection of machine generated text: a critical survey. arXiv preprint arXiv:2011.01314 (2020)
Ji, Z., et al.: Survey of hallucination in natural language generation. ACM Comput. Surv. 55(12), 1–38 (2023)
Article Google Scholar
Li, B., Weng, Y., Song, Q., Deng, H.: Artificial text detection with multiple training strategies. arXiv preprint arXiv:2212.05194 (2022)
Li, H., Moon, J.T., Purkayastha, S., Celi, L.A., Trivedi, H., Gichoya, J.W.: Ethics of large language models in medicine and medical research. Lancet Digit. Health 5, e333–e335 (2023)
Article Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692
Massarelli, L., et al.: How decoding strategies affect the verifiability of generated text. arXiv preprint arXiv:1911.03587 (2019)
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., Finn, C.: DetectGPT: zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305 (2023)
Mitrović, S., Andreoletti, D., Ayoub, O.: ChatGPT or human? Detect and explain. explaining decisions of machine learning model for detecting short ChatGPT-generated text. arXiv preprint arXiv:2301.13852 (2023)
Sadasivan, V.S., Kumar, A., Balasubramanian, S., Wang, W., Feizi, S.: Can AI-generated text be reliably detected? arXiv preprint arXiv:2303.11156 (2023)
Sarvazyan, A.M., González, J.Á., Franco Salvador, M., Rangel, F., Chulvi, B., Rosso, P.: AuTexTification: automatic text identification. In: Procesamiento del Lenguaje Natural. Jaén, Spain (2023)
Google Scholar
Scao, T.L., et al.: Bloom: a 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022)
Sun, Z.: A short survey of viewing large language models in legal aspect. arXiv preprint arXiv:2303.09136 (2023)
Uchendu, A., Le, T., Shu, K., Lee, D.: Authorship attribution for neural text generation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8384–8395 (2020)
Google Scholar
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Google Scholar
Wu, S., et al.: BloombergGPT: a large language model for finance. arXiv preprint arXiv:2303.17564 (2023)
Zhou, J.T., Tsang, I.W., Pan, S.J., Tan, M.: Heterogeneous domain adaptation for multiple classes. In: Artificial Intelligence and Statistics, pp. 1095–1103. PMLR (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Deloitte and Touche Assurance and Enterprise Risk Services India Private Limited, Hyderabad, India
Harika Abburi & Nirmala Pudota
Deloitte and Touche LLP, New York, USA
Michael Suesserman, Balaji Veeramani, Edward Bowen & Sanmitra Bhattacharya

Authors

Harika Abburi
View author publications
You can also search for this author in PubMed Google Scholar
Michael Suesserman
View author publications
You can also search for this author in PubMed Google Scholar
Nirmala Pudota
View author publications
You can also search for this author in PubMed Google Scholar
Balaji Veeramani
View author publications
You can also search for this author in PubMed Google Scholar
Edward Bowen
View author publications
You can also search for this author in PubMed Google Scholar
Sanmitra Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harika Abburi .

Editor information

Editors and Affiliations

Renmin University of China, Beijing, China
Feng Zhang
Victoria University, Footscray, VIC, Australia
Hua Wang
Qatar University, Doha, Qatar
Mahmoud Barhamgi
Swinburne University of Technology, Hawthorn, Australia
Lu Chen
Swinburne University of Technology, Hawthorn, Australia
Rui Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abburi, H., Suesserman, M., Pudota, N., Veeramani, B., Bowen, E., Bhattacharya, S. (2023). An Ensemble-Based Approach for Generative Language Model Attribution. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering – WISE 2023. WISE 2023. Lecture Notes in Computer Science, vol 14306. Springer, Singapore. https://doi.org/10.1007/978-981-99-7254-8_54

Download citation

DOI: https://doi.org/10.1007/978-981-99-7254-8_54
Published: 21 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7253-1
Online ISBN: 978-981-99-7254-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Ensemble-Based Approach for Generative Language Model Attribution

Abstract

Similar content being viewed by others

Supervised Machine-Generated Text Detectors: Family and Scale Matters

Computer-Generated Text Detection Using Machine Learning: A Systematic Review

Classification and Prediction of Text Data by Using a Natural Language Processing Algorithm

Keywords

1 Introduction

2 Related Work

3 AuTexTification Dataset