Virus diseases have impeded cucurbit cultivation in the Republic of South Africa (RSA) [1, 2]. Viruses, especially from the genus Potyvirus in the family Potyviridae, can cause severe damage to cucurbits, with losses up to 100 % when infection occurs early in the season [3]. Moroccan watermelon mosaic virus (MWMV), a potyvirus infecting cucurbits, was reported for the first time in Morocco in 1974 as a strain of Watermelon mosaic virus (WMV) on the basis of its host range [4]. It was later found to be a distinct potyvirus based on coat protein (CP) sequence analysis with other potyviruses [5]. At a molecular level, MWMV is part of the Papaya ringspot virus (PRSV) cluster [4]. MWMV was reported in cucurbits cultivated in RSA in 1987 [2]. Symptoms associated with MWMV on cucurbits include mosaic, stunting, dark-green blisters, vein banding, and leaf and fruit malformation [2, 4]. MWMV has been identified, in RSA, as the most prevalent potyvirus infecting cucurbits in various surveys [1, 2, 6]. MWMV has also been reported to occur in Zimbabwe [4], Tanzania [7], DRC [8], Cameroon [4], Nigeria [9], Niger [4], Sudan [10], Canary Island [4], France [11], Greece [12], Italy [13], Portugal [4], and Spain [4]. Most molecular studies on the MWMV genome have been centered on the portion of the genome flanking the coding sequences of the C-terminal part of the NIb to the N-terminal part of the CP [1, 813]. Only one complete genome sequence of MWMV, a Tunisian isolate (accession number: EF579955) [4], is available on the NCBI GenBank. The full genome sequence of two MWMV isolates infecting cucurbits cultivated in the province of KwaZulu-Natal (KZN), RSA were elucidated and compared with the Tunisian isolate in this communication.

Baby marrow (Cucurbita pepo L.) and pattypan (C. pepo L.) were identified as highly susceptible to MWMV during the surveys that were conducted in the cucurbit growing areas of KZN in the 2011, 2012, and 2013 growing seasons [1]. Consequently, two MWMV-infected leaf samples, from a pattypan and a baby marrow, were randomly selected for next-generation sequencing (NGS) to recover the full genome of MWMV. Total RNA, extracted using the plant RNA NucleoSpin kit (Macherey–Nagel, Germany), was treated with Ribo-Zero Plant prior to NGS library preparation and 125 × 125 bp paired-end sequencing on an Illumina HiSeq. The library preparation and NGS were performed at the Agricultural Research Council’s Biotechnology Platform (ARC-BTP) in Pretoria (RSA). FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess the quality of the reads before and after trimming and removal of adapters that were performed using Trimmomatic (version 0.33) software [14]. De novo assembly was subsequently performed using SeqMan NGen (version 12.3.1 build 48; DNASTAR) software with the default settings. De novo assembly was chosen over the option of mapping reads to a reference genome as a precautionary measure in case the genome of the RSA MWMV isolates had undergone recombination, or considerably diverged from the genome sequence of the Tunisian isolate. The host genome data was excluded during de novo assembly. The contigs generated were blasted to the GenBank database using the Netsearch function on the SeqMan pro (version 12.3.1 build 48 421; DNASTAR) software.

The longest MWMV identified contig in the pattypan sample was 9730 bp. MWMV mapped reads from the pattypan sample were 3,453,027 (Median Coverage: 42054.34), and had an average length of 114 bp. Regarding the MWMV mapped reads from the baby marrow sample, the longest contig identified was 9735 bp, made up of 3,425,094 reads (Median Coverage: 37626.61) with an average length of 113 bp. A gap from position 6 to 11 on the genome of the Tunisian isolate was the only ambiguity spotted when aligning the genome sequences of the RSA with the Tunisian MWMV isolates. This gap was found to be consistent after direct sequencing of the RT-PCR amplicon obtained using the primers MWMVF1: 5′-AAA CAC TCA ACA CAA CAC AAC ATC-3′, and MWMVR530: 5′-CCC TGT CTT GCT TCA GCT AAA TTC-3′. Although 5′ RACE experiments to determine the accurate 5′ ends sequence of the isolates under study were not carried out, further Sanger sequencing of the 5′ end sequences obtained using a Double-stranded cDNA synthesis kit (ThermoScientific, USA) yielded the same sequences as those generated de novo. This led us to conclude that de novo assembly gave an accurate sequence of the 5′ ends of the RSA isolates. The 9.73 kb contigs from the baby marrow and pattypan samples were therefore considered as the consensus sequences of the genome of the RSA MWMV isolates. Both genome sequences of the RSA MWMV isolates (accession number: KU315175 and KU315176) consist of 9719 nucleotides, excluding the poly(A) tail, organized into 150 noncoding nucleotides at its 5′ terminus followed by 9375 nucleotides encoding the polyprotein, and 194 noncoding nucleotides at 3′ terminus (Table 1). The two RSA isolates share the same predicted polyprotein cleaving sites as the Tunisian isolate.

Table 1 Genome organization of SA isolates of MWMV and protein MW comparison

Comparative genomics of the RSA and Tunisian MWMV isolates included nucleotide and amino acid sequence comparisons, evaluation of genetic distances, and phylogenetic analysis. The online tool SIAS (http://imed.med.ucm.es/Tools/sias.html) was used for sequence comparisons. Genetic distances and phylogenetic analyses were performed using MEGA6 software [15]. Multiple sequence alignment was performed using the Muscle program, implemented in MEGA6 software. The genetic distances were computed using the maximum composite likelihood model [16] with a gamma distribution. The phylogeny of the MWMV isolates was inferred with 500 iterations bootstrap analysis using the maximum likelihood method based on the general time reversible model with a discrete gamma distribution and evolutionarily invariable sites. Putative recombination events on the genomes of the RSA isolates were checked using the automated methods in the RDP (version 4.56) software [17].

The genome sequences of the RSA MWMV isolates were almost identical to each other (Table S1). A 100 % nucleotide sequence identity was observed between the RSA isolates for the 5′ and 3′ noncoding regions (NCR), 6K1 and PIPO. The lowest nucleotide sequence identity, i.e., 98.93 %, was recorded for P1. The nucleotide sequence identity of the RSA MWMV isolates varied between 80 and 95 % (Table S1) when compared with the Tunisian isolate, with the highest nucleotide sequence identity recorded with PIPO, followed by the 3′ NCR. The 5′ NCR had the lowest nucleotide sequence identity among all MWMV isolates. In terms of the amino acid sequence identity (Table S1), a 100 % sequence identity was observed between the RSA MWMV isolates for the 6K1, 6K2, NIa, and PIPO. The amino acid sequence identities for the VPg and P1 were the lowest at around 98 %. The RSA and the Tunisian MWMV isolates share between 82 and 99 % amino acid sequence identity.

RSA MWMV isolates were expected to be less diverse from each other than the Tunisian isolate. The genetic distances (Table S2) confirmed this hypothesis. Genetic distances between the RSA isolates were zero for PIPO and the 6K1. The longest distance was observed for P1. Between the RSA and the Tunisian isolates, the genetic distances were as follows in ascending order: PIPO, P3, CI, NIa, NIb, 6K1, VPg, HC-Pro, 6K2, P1, and CP. The highlight of this test was the high level of divergence between the CP of the RSA and Tunisian MWMV isolates. This diversity is consistent with the previous studies conducted by Yakoubi et al. [4] and Ibaba et al. [1]. No recombination event was identified in the genome of all MWMV isolates. Phylogenetic tree (Fig. 1) showed that all MWMV isolates formed a monophyletic group sharing a common ancestor with the Algerian watermelon mosaic virus isolate H4 (EU410442.1). This study provided two genome sequences of the RSA MWMV isolates infecting cucurbits. Future genome studies of MWMV isolates from the other countries where it occurs may shed light on the evolution of this virus.

Fig. 1
figure 1

Maximum likelihood Phylogram of the Moroccan watermelon mosaic virus (MWMV) isolates in the Papaya ringspot virus (PRSV) cluster. The bootstrap values are indicated next to the branches. The branch lengths are measured in the number of substitutions per site. ZTMV: Zucchini tigré mosaic virus [18], AWMV: Algerian watermelon mosaic virus [19]