Introduction

Horizontal gene transfer (HGT) allows beneficial genetic plasticity within microbial communities. Via HGT, microbes exchange genetic material among unrelated organisms; this allows an efficient adaptation to immediate changes in the environment independent of vertical gene transfer among parents and offspring. HGT confers selective advantages to prokaryotic ecosystems e.g. resistance to antibiotics, virulence, photosynthesis, or nitrogen fixation (Syvanen 1994). In comparison, HGT in eukaryotes is much less predominant and occurs as intracellular gene transfer from organelles to the nucleus or with bacteria as donors (White et al. 1982; Bergthorsson et al. 2004; Richardson and Palmer 2006).

In plants, foreign DNA can be received by HGT from viruses, bacteria or even from other parasitic plants (Richardson and Palmer 2006). A well-known example of HGT from bacteria to plants happens when phytopathogenic Rhizobium (previously Agrobacterium) infects its host. The phytopathogenic subphylum of Rhizobium is a gram-negative genus of soil bacteria and its pathogenesis is dependent on the transfer of a plasmid-borne DNA fragment (T-DNA) which is integrated into the host genome by HGT. Some of the most predominant species within the genus are R. radiobacter, R. rhizogenes, R. rubi and R. vitis each of which causes distinct abnormal tissue growth such as crown gall tumours, hairy root disease, cane gall disease and neoplastic tumours, respectively (Riker 1930; Hildebrand 1940; Ophel and Kerr 1990). Upon transfer of T-DNA to the plant genome, opines are produced to supply the pathogen with a nitrogen and carbon source. Opines are derived from plant amino acids and their synthesis is controlled by genes located on the T-DNA of bacterial origin (Hong et al. 1997). Disease infected plant cells are not expected to transfer T-DNA to subsequent generations of plants as they are in a determined state. Therefore, they will not develop zygotes or gametes, which constitute the elements of plant embryogenesis and hence will not regenerate into whole plants (Goldberg et al. 1994; Mordhorst et al. 1997).

Reports of the occurrence of T-DNA fragments of bacterial origin in plant genomes have increased during the last decades. Initially, cellular T-DNA (cT-DNA) was detected in uninfected Nicotiana glauca (White et al. 1982). A later expanded analysis of multiple Nicotiana species revealed cT-DNA in N. tabacum, N. tomentotiformis, N. tomentosa and N. otophora (Fürner et al. 1986). Similarly, Russian species of Linaria namely L. vulgaris, L. genistifolia and L. creticola contain cT-DNA like sequences from R. rhizogenes (Matveeva et al. 2012, 2018; Pavlova et al. 2014). Interestingly, via the emergence of plant genomic and transcriptomic sequence data online, numerous additional natural transformants have been detected. A transcriptome study on cultivated sweet potato (Ipomoea batatas) identified several insertions of R. rhizogenes cT-DNA homologues in both I. batatas and in the wild relative I. trifida (Kyndt et al. 2015). Also, bioinformatic analysis of sequence data from more than 600 dicot species indicated that at least 49 domesticated plant species can be added to the list of plants containing variations of cT-DNA (Matveeva and Otten 2019).

Scrophulariaceae is a family with a worldwide geographical distribution. Albach et al. (2005) revised and restructured the taxonomy of the family using nuclear and plastid DNA sequence analysis, suggesting Scrophulariaceae as a part of Plantaginaceae. Linaria belongs to the Scrophulariaceae sensu stricto taxonomic rank but despite the ongoing studies on the evolution of Plantaginaceae, no phylogenetic certainty has been defined yet (Fernandez-Mazuecos et al. 2013; Vigalondo et al. 2015). According to Albach et al. (2005) the current version of Plantaginaceae sensu lato, it contains approximately 92 genera and 2000 species, including Veronica as the largest genus.

Previous studies on ancient bacterial rolC sequences primarily focused on the Scrophulariaceae sensu stricto clade and the presence and distribution of R. rhizogenes cT-DNA fragments were revealed in Russian accessions of Linaria (Matveeva et al. 2012, 2018; Pavlova et al. 2014). It is therefore of great interest to investigate vaster phylogenetic and geographical areas. This study was undertaken to determine the distribution of ancient bacterial rol genes within uncultivated plants of Scrophulariaceae in Europe. Plant materials were seeds from Scrophulariaceae collected in Denmark and seeds from botanical gardens in Germany and England. Here, we report the discovery of ancient bacterial rolC sequences in three additional genera: Antirrhinum, Digitalis and Veronica within the Figwort family (Scrophulariaceae) in addition to those disclosed previously in Linaria.

Materials and methods

Plant materials and growth conditions

Seeds of the Antirrhinum, Digitalis, Linaria and Veronica genera were obtained from Botanischer Garten der Universität Hamburg, Germany; The Royal Botanic Gardens, Kew, England; The University Gardens, Frederiksberg Campus, Faculty of Science, University of Copenhagen (UCPH-F) and the seed collection available at Taastrup Campus, Faculty of Science (UCPH-T). Furthermore, uncultivated plants were collected in North-Zealand, Denmark at the following GPS positions; 56°04′ 14.4″ N 12°33′ 03.1″ E, 56°02′ 57.2″ N 12°26′ 27.4″ E and 55°40′ 06.3″ N 12°18′ 12.9″ E. Plant species and origin of seeds are presented in Table 1. Additionally, a range of other plant samples were assessed from Plantago species of Danish and Greenlandic origin as well as Wulfenia carinthiaca (data not shown).

Table 1 Suppliers of seeds of wild species of Antirrhinum, Digitalis, Linaria and Veronica genera

The majority of seeds were germinated in pools of 5–15 seeds in 5 cm ×  5 cm pots in a greenhouse. Fertilisers were added daily with every watering (Brun Komplet Garta A/S, Denmark). Plants were grown with a minimum temperature of 20 °C and a 16-h photoperiod of natural light, supplemented with artificial light (190–220 µmol m− 2s− 1). Plants were grown a full life cycle to ensure correct morphological identification (Fig. 1).

Fig. 1
figure 1

Flowers of selected Scrophulariaceae species. a Antirrhinum majus. b Linaria purpurea. c Linaria vulgaris. d Veronica agrestis

Seeds unable to germinate in soil were surface-sterilised and germinated in vitro. Following sterilisation (1 min 70% (v/v) EtOH, 3 min 2–5% (v/v) NaOCl and rinse 3 times in sterile water), seeds were placed in plant boxes containing 3% sucrose (w/v), 0.7% Plant agar (w/v), 4.4% (w/v) Murashige and Skoog including vitamins and 0.05% 2-(N-morpholino) ethanesulfonic acid (MES, w/v). In vitro boxes were placed in a growth chamber with 120–150 µmol m− 2s− 1 light intensity at 25 °C in an 8 h photoperiod. Seeds germinated in vitro were from Kew, UK; L. repens, V. agrestis, V. arvensis, and V. chamaedrys. Plantlets were transferred to soil 3–4 weeks after germination and grown to maturity.

DNA extraction

Leaf material for DNA extractions was obtained from the youngest fully developed leaf of approximately 3 months old plants. Harvested plant material was frozen in liquid N2 and stored at − 80 °C. All DNA extractions were performed twice from independent plants using the Shorty method developed for the identification of T-DNA insertion mutants in Arabidopsis (Visscher et al. 2010) with modifications described in Hegelund et al. (2018). DNA concentrations were determined by a Nanodrop™ 1000 spectrophotometer (Thermo Fisher Scientific Inc., USA).

Primer design and detection of rolC sequences

In this study, primers specific for rolC were designed to align to regions conserved among Linaria and R. rhizogenes rolC sequences. Specifically, open reading frames of rolC, derived from the NCBI Genbank (Benson et al. 2013) and originating from L. genistifolia (KC309424), L. vulgaris (EU735069) and R. rhizogenes (EF433766), were aligned using Clustal Ω (Madeira et al. 2019). Conserved regions were selected as primer annealing sites. Primers applicable as positive controls in PCRs and as reference genes in RT-PCR were designed similarly. Sequences of the Rubisco large subunit (RBC) were obtained via the NCBI Genbank from Anthirrhinum majus (L11688), Brassica napus (JF807908), L. repens (MG222642), L. vulgaris (KM360853), Veronica dabneyi (HM850449) and V. officinalis (AY034024). RBC specific primers suitable for use in Anthirrhinum, Linaria and Veronica genera were designed using Primer3Plus software (Untergasser et al. 2012). In Digitalis, primers designed to amplify Digitalis purpurea ACT2 (HQ853642) functioned as positive control and reference gene.

For the initial detection of rolC, PCR reactions included 100–250 ng of genomic DNA, 2% (v/v) DMSO and Ex Taq polymerase as recommended by supplier (Takara Bio Inc., Japan). Cloning and subsequent sequencing reactions were based on PCR products amplified with the proof-reading polymerase LA Taq using 2% DMSO (v/v) and following supplier's instructions (Takara Bio Inc.). PCR reactions followed the program 94 °C for 4 min, 35 cycles of [30 s at 94 °C, 30 s at 60 °C, 30 s at 72 °C] and a last step of 72 °C for 7 min in a MyCycler (Bio-Rad) or a Master Cycler Gradient (Eppendorf, Germany). Primer sequences and PCR program exceptions due to primer specific annealing temperatures (Ta) are listed in Table 2.

Table 2 PCR and RT-PCR primers specific for ACT, RBC rolC and virD2. RBC served as positive control/reference gene in Anthirrhinum, Linaria and Veronica genera. ACT was used as positive control/reference in Digitalis. Temperature of annealing in PCR and RT-PCR (Ta) and product sizes are presented

In reactions amplifying genes or transcripts of bacterial origin, positive controls used purified R. rhizogenes plasmid pRiA4 as template. Except in cases where rolCexpr primers were used, here gDNA of L. vulgaris was used as positive control. In all reactions, H2O was used as a negative control. To verify the absence of bacterial contamination in the extracted DNA, virD2 detection was used as an additional negative control (data not shown).

Cloning and sequencing of rolC

PCR products amplified by LA taq were cloned into the PCR4 TOPO TA vector supplied via the TOPO TA cloning kit® (Invitrogen, USA) according to the manufacturer’s instructions. Plasmids were purified using GenElute Plasmid DNA Miniprep Kit (Sigma-Aldrich) as recommended by manufacturer. Inserts were verified by EcoRI digestion (New England BioLabs Inc., USA) and subsequent gel electrophoresis. Purified plasmids having inserts of the expected size (221 bp) were sequenced by Eurofins Genomics, Germany.

Expression analyses of rolC

A group of rolC positive species available from KEW were selected for expression analyses; A. majus, D. purpurea, L. purpurea, L. vulgaris, V. agrestis and V. beccabunga. RNA extractions were done from 80 to 100 mg tissue of the youngest fully expanded leaves of plants in the vegetative or the generative growth stage using RNeasy Plant Mini kit as recommended by the supplier (Qiagen). RNA yield and purity were estimated using a Nanodrop™ 1000 spectrophotometer. RNA integrity was verified on 1% bleach agarose gels as described by Aranda et al. (2012). Prior to expression analyses, 1 µg of RNA was treated with DNase I Amplification Grade (Sigma-Aldrich, USA) and cDNA synthesis were done via the iScript cDNA synthesis kit (Bio-Rad, USA) as recommended. cDNA was diluted 5-fold in RNase/DNase free Tris-EDTA pH 7.4 (Sigma-Aldrich) before use. Expression analyses were done by RT-PCR with primers specific for rolC produced in this study or the rolC expression primers (rolCexpr) designed by Matveeva et al. (2018). Positive controls were RBC for cDNA of Linaria, Antirrhinum and Veronica and ACT for cDNA of Digitalis. Primer details are presented in Table 2. RT-PCR reactions were conducted using Ex Taq polymerase as previously described.

Bioinformatics

Sequence identification, alignments and analyses were done using CLC Sequence viewer (Qiagen), BLAST and Clustal Ω (Coordinators 2013; Madeira et al. 2019). Phylogenetic analyses were conducted using the Maximum Likelihood method and Tamura-Nei model in MEGA-X (Tamura and Nei 1993; Kumar et al. 2018). Bootstrap values were inferred by tests of 1000 replicates. As references rolC homologs of L. vulgaris (LvrolC [Genbank: EU735069]), L. creticola (LcrolC [Genbank: MF997051]), L. genistifolia (LgrolC [Genbank: KC309424]), N. tabacum (NtrolC [Genbank: X91881]), N. glauca (NgrolC [Genbank: X03432]) and R. rhizogenes (RrrolC [Genbank: EF433766]) were used. Genbank accession numbers for sequences described here are presented in the Data Availability statement.

Results

rolC in European Scrophulariaceae species

This research was initiated to determine the distribution of ancient bacterial DNA within uncultivated plants of the Scrophulariaceae family in Europe. PCR primers were designed to anneal to conserved regions of available rolC sequences of R. rhizogenes and Linaria. Using PCR, rolC homologues were identified in plant species of the Antirrhinum, Digitalis, Veronica genera and in an additional species of the Linaria genus, Linaria purpurea (Table 3). Additionally, to further explore the presence of rolC homologs, samples of Plantago spp. and Wulfenia carinthiaca were assessed but did not contain rolC sequences (data not shown). To complement the studies of rolC sequences in Russian Linaria species (Matveeva et al. 2012, 2018; Pavlova et al. 2014), plants of this study were collected from European sources (Table 1). Of the 15 new species included here, 6 were positive for the presence of rolC sequences of bacterial origin. There was no correlation between the presence of rolC sequences and origin of the plant material (Table 3).

Table 3 Uncultivated plant species of Antirrhinum, Digitalis, Linaria and Veronica genera tested for the presence of rolC sequences of bacterial origin. rolC haplotype a-e; common rolC homologous, u; unique rolC sequences, nd; not detected. New plant species found to contain ancient bacterial rolC sequences are marked in bold. Accessions denoted with a * were included in expression analyses. All accessions were analysed twice from independent DNA extractions

Within each genus, the presence of rolC sequences varied (Table 3). A. majus contained rolC homologues, whereas A. braun-blanquetii did not. In Digitalis, D. purpurea contained several rolC sequences. From the Linaria genus, L. purpurea and L. vulgaris were tested positive for rolC, whereas no rolC sequences were detected in L. arenaria, L. minor, L. repens, L. supina and L. triornithophora. Finally, within the Veronica genus, V. agrestis, V. beccabunga and V. chamaedrys carried several rolC haplotypes whereas in V. arvensis, V. persica and V. serpyllifolia rolC sequences were not detected.

Sequence variations of ancient rolC in Scrophulariaceae

Comparison of the isolated rolC sequence from different Scrophulariaceae genera revealed five different haplotypes. The haplotypes cannot be associated to any specific plant species but are distributed across the genera (Fig. 2). Most prominent was rolCa which was present in A. majus, D. purpurea, L. purpurea, L. vulgaris, V. agrestis, V. beccabunga and V. chamaedrys. When aligned to the 221 bp rolCa fragment identified here, rolCb, rolCc, rolCd and rolCe exhibited sequence variations compared to rolCa of 2, 3, 13 and 14 nucleotides, respectively. rolCd and rolCe differed by a single nucleotide. Additionally, seven sequences were unique for individual species but nevertheless, closely related to rolCa-rolCe (Table 3, sequence data not shown). Species having species-specific rolC haplotypes are documented in Table 3, common for these were that only single nucleotide changes made them differ from the rolCa-e nucleotides.

Fig. 2
figure 2

Nucleotide alignment of partial genomic fragments of rolC sequences found in Scrophulariaceae. aLvrolCa was found in L. vulgaris, A. major, D. purpurea, L. purpurea, V. agrestis, V. beccabunga and V. charmeadrys. bLvrolCb was found in L. vulgaris, D. purpurea, V. agrestis, V. beccabunga and V. charmeadrys. cLvrolCc was found in L. vulgaris and V. charmeadrys. dLvrolCd was found in L. vulgaris and D. purpurea. eDprolCe was found in D. purpurea, and V. charmeadrys. As references corresponding fragments of rolC homologs from L. vulgaris (LvrolC [Genbank: EU735069]) and R. rhizogenes (RrrolC [Genbank: EF433766]) are included. The alignment was produced in Clustal Ω (Madeira et al. 2019)

Phylogenetic analyses including GenBank reference sequences of rolC from R. rhizogenes, Nicotiana and Linaria, and the rolCa-rolCe sequences identified here showed a Scrophulariaceae specific sequence cluster. Also, the Scrophulariaceae rolC sequence cluster showed a closer phylogenetic relationship to the R. rhizogenes rolC gene than to the Nicotiana rolC reference sequences (Fig. 3).

Fig. 3
figure 3

Phylogenetic analyses of partial genomic fragments of rolC sequences in Scrophulariaceae. LvrolCa was found in L. vulgaris, A. major, D. purpurea, L. purpurea, V. agrestis, V. beccabunga and V. charmeadrys. LvrolCb was found in L. vulgaris, D. purpurea, V. agrestis, V. beccabunga and V. charmeadrys. LvrolCc was found in L. vulgaris and V. charmeadrys. LvrolCd was found in L. vulgaris and D. purpurea. DprolCe was found in D. purpurea, and V. charmeadrys. As references corresponding fragments of rolC homologs from L. vulgaris (LvrolC [Genbank: EU735069]), L. creticola (LcrolC [Genbank: MF997051]), L. genistifolia (LgrolC [Genbank: KC309424]), N. tabacum (NtrolC [Genbank: X91881]), N. glauca (LvrolC [Genbank: X03432]) and R. rhizogenes (RrrolC [Genbank: EF433766]) are included. The phylogenetic analysis was produced in Mega-X (Kumar et al. 2018)

Expression analyses of ancient rolC sequences in Scrophulariaceae

The rolC positive species L. vulgaris, L. purpurea, A. majus, V. beccabunga, V. agrestis and D. purpurea, were analysed in expression analyses to identify rolC transcripts (Table 3). RNA was extracted from leaves in vegetative and reproductive growth stages, and RNA quality and integrity were verified experimentally (Supplementary table 1 and Supplementary Fig. 1). Following cDNA synthesis, RT-PCR was conducted using the appropriate reference genes as control. However, no transcripts of ancient rolC could be detected in any of the species investigated (Supplementary Fig. 2).

Discussion

Horizontal gene transfer mediated by phytopathogenic species of Rhizobium has resulted in naturally transformed plant species belonging to the Nicotiana, Ipomea and Linaria genera (White et al. 1982; Fürner et al. 1986; Matveeva et al. 2012; Kyndt et al. 2015). How T-DNA mechanistically moves from a stable genomic integration in root cells into the genome of sexually transmitted cells is not clear. Additionally, the occurrence of natural transformation in nature on the plant species level remains elusive, as the genomes of most plant species are unknown. In this study, European species of Scrophulariaceae were screened for the presence of rolC of bacterial origin to expand the understanding of the occurrence of horizontal gene transfer with respect to geographical regions and species.

rolC in European Scrophulariaceae

This study supports results obtained by Matveeva et al. (2012) who were the first to identify cT-DNA of bacterial origin in L. vulgaris of Russian origin. In the current study, not only L. vulgaris of European origin, but also L. purpurea, contains a rolC haplotype in its genome. Furthermore, three new genera within the same family as L. vulgaris can now be added to the list of naturally transformed plants namely Antirrhinum, Digitalis and Veronica. The distribution of Scrophulariaceae species harbouring rolC sequences within each genus is however not uniform.

In the study by Matveeva et al. (2012) which identified L. vulgaris as a naturally transformed plant, seven additional Linaria species were tested and did not contain cT-DNA (Matveeva et al. 2012). Here, L. purpurea contains rolC suggesting a close phylogenetic relationship between L. purpurea and L. vulgaris (Vargas et al. 2004). Experimental data from 2013 indicate that L. purpurea and L. repens should be in the same clade as L. vulgaris. However, no rolC sequences were detected in L. repens (Fernandez-Mazuecos et al. 2013).

The ability to detect cT-DNA in plants is not related to the phylogenetic relationship of the plant species. Collectively we confirmed, one species of Digitalis, one out of two Antirrhinum species and three out of seven Veronica species to contain fragments of rolC sequences. This is a more uneven occurrence of rolC sequences than expected if close phylogenetic ties are the determining factor. The occurrence of multiple independent transformation events within different genera of the same family would require an uncharacterized ability of Scrophulariaceae members to regenerate themselves from the initially transformed tissue e.g. regeneration of intact plants from diseased roots (hairy roots) (Chen and Otten 2017). Sequence analyses of rolC haplotypes showed high interspecies conservation within 5 unique rolC sequences (rolCa-rolCe) thus we speculate that the original transformation event or events could have occurred early in the evolution of Scrophulariaceae prior to the differentiation of the individual genera (Figs. 2, 3). This would however require that in several of the investigated plant species, parts of the cT-DNA have been lost again during speciation—or at least parts specific to our primers have been lost. To further investigate the hypothesis of early transformation events within Scrophulariaceae before genus differentiation, we assessed Wulfenia carinthiaca for presence of rolC sequence fragments. W. carinthiaca is a Miocene (approx. 23−5 MA) relic plant with disjunct distribution in Europe and could be a potential ancestor within Scrophulariaceae (Surina et al. 2014). However, our study did not identify rolC in W. carinthiaca (data not shown) and further studies are needed in that respect.

Antirrhinum and Linaria are phylogenetically closely related (Albach et al. 2005) but remnants of cT-DNA have not previously been detected in the Antirrhinum genus. As for Antirrhinum, rolC sequences have not been reported in the genus of Digitalis, but we have isolated all but one of the five rolC haplotypes, plus unique sequences, from different sources of D. purpurea. V. chamaedrys was included on the list of species screened for cT-DNA by Matveeva et al. in 2012, but rolC was not detected. This could be due to differences in primers used. Alternatively, the Russian population of V. chamaedrys does not contain rolC homologues, whereas the European V. chamaedrys does. In summary, data presented here identify rolC sequences in four genera within the Scrophulariaceae family sensu stricto. However, as we only preliminarily tested other genera within the family of Plantaginaceae sensu lato, further studies are needed to clarify how widespread the natural transformation event or events have become within Scrophulariaceae and Plantaginaceae.

Expression of rolC

Previously, rolC transcripts were seen at low levels in shoots and calli from in vitro grown plantlets of L. vulgaris (Matveeva et al. 2018). In the current study, the expression of rolC was investigated in leaves of L. vulgaris and five species revealed in this study to contain rolC sequences, but no transcript was detected (Table 3, Supplementary Fig. 2). This indicates that Scrophulariaceae rolC homologues are not expressed in leaves of plants grown in vivo but our results cannot exclude the presence of transcripts in other tissues such as root, meristem, or flower tissue.

Perspectives

This study demonstrates that uncultivated plants of European Antirrhinum, Digitalis and Veronica contain rolC haplotypes derived from ancient bacterial genes, however other putative sequences of the inserted cT-DNA have not been characterised. To decide if these cT-DNA in Scrophulariaceae share a common origin, the positions of the integration sites need to be determined in each plant species. Also, more sequence information of the Scrophulariaceae cT-DNA is needed to fully address why some species closely related to natural transformants in the same genus do not appear to contain rolC sequences in this study. Furthermore, it is also important to determine which R. rhizogenes strain(s) facilitated the transformation events.

Although this study raises new questions it is a critical step on the way to understanding the dynamics among plant species holding cT-DNA of ancient bacterial origin.