Introduction

Neospora caninum is a protozoan parasite that is considered one of the main bovine abortifacient pathogens worldwide (Dubey et al. 2007). Notably, South American countries and Mexico account for more than 386 million cattle, representing one of the main agricultural activities in this region (Moore 2005). Economic losses associated with bovine neosporosis may exceed more than 1 billion dollars worldwide, and for South America and Mexico, the global disease cost was estimated at 403 million dollars (Reichel et al. 2013). Different studies have shown that N. caninum is widespread with high seroprevalence rates in dairy cattle (Moore 2005). In Argentina, a study performed in La Pampa region reported 9 and 20.5% seroprevalences in beef and dairy cattle, respectively (Fort et al. 2015). In Uruguay, an overall seroprevalence of 13.9% in beef cattle was estimated (Bañales et al. 2006). Seroprevalences of 46.7 and 10.6–21.6% were reported for dairy herds in Peru and Brazil, respectively (Granados et al. 2014; Boas et al. 2015). Similarly, in Mexico, seroprevalence rates varied from 11.6 to 42% for beef and dairy cattle, respectively (García-Vázquez et al., 2005, 2009). However, these studies are not comparable due to different experimental designs, serological tests, and cutoff values employed. Cutoff values are particularly relevant since the diagnostic performance may significantly influence the success of control programes. Unfortunately, there is no vaccine currently available for N. caninum. Therefore, the control of neosporosis relies on management measures coupled with diagnosis (McAllister 2016; Reichel et al. 2015). At this stage, serological monitoring is the most useful tool for decision-making during disease control (Dubey et al. 2007).

Even though there are many serological assays available, there is no appropriate reference test to define a true-positive or true-negative animal (Ortega-Mora et al. 2006). Agglutination tests (NAT) have the advantage of not requiring specific conjugates and, therefore, are suitable for wildlife species (Almería 2013; Donahoe et al. 2015). However, false-positive results are a major drawback (Ortega-Mora et al. 2006; Moraveji et al. 2012). Immunoblot (IB) is highly sensitive and specific, but laborious and time consuming, therefore used as a confirmatory test for doubtful results (Campero et al. 2015). Hence, the most commonly used techniques for the detection of anti-N. caninum antibodies in cattle in Mexico and South America are the indirect fluorescent antibody test (IFAT) and ELISA, the latter being suitable for large-scale investigations and more objective in result interpretation compared to IFAT. There are many commercial ELISAs available with wide distribution in Europe and North America (Álvarez-García et al. 2013). Unfortunately, the high costs of acquiring them and the endless importation process are significant obstacles for many Ibero-American countries. In this scenario, many local laboratories use in-house serological tests for anti-N. caninum antibody detection and, more frequently, IFAT (Campero et al. 2015).

Unfortunately, comparisons and interpretations of data are less reliable and more difficult due to the lack of standardization inter-laboratory trials among South American countries as well as other countries in the Americas, such as Mexico. This type of study has been performed in Europe and North America (von Blumröder et al. 2004; Wapenaar et al., 2007; Álvarez-García et al. 2013), where a comparison of the diagnostic performances of the most routinely used in-house and commercial tests caused a readjustment in the techniques. Moreover, a constant reassessment and adaptation of diagnostic tests to different epidemiological situations are highly recommended (World Organization for Animal Health (OIE) 2013).

To address this issue, the aim of the present study was to compare the serological tests commonly used for the detection of anti-N. caninum specific antibodies (i.e., seven IFATs and three ELISAs) in Ibero-American countries with the ultimate goal of standardizing the serological tests to obtain comparable results.

Materials and methods

Experimental design and serum panel conformation

A serum panel of 523 samples was analyzed by the following eight participating laboratories: Argentina: National Institute of Agricultural Technology (INTA) Agricultural Experimental Station (EEA) Anguil, INTA EEA Balcarce, INTA EEA Rafaela and Immunoparasitology Laboratory, School of Veterinary Sciences, National University of La Plata; Brazil: Veterinary Medicine Department, Rural Federal University of Pernambuco; Mexico: El Llano Technical Institute; Peru: Veterinary Medicine and Zootechny Faculty, Cayetano Heredia Peruvian University; and Spain: SALUVET, Faculty of Veterinary Sciences, Complutense University of Madrid. Each laboratory provided serum samples that were submitted to the Immunoparasitology Laboratory in Argentina, where the panel was blind coded, and aliquots were shipped on dry ice to each participating laboratory. Most sampled animals were older than 6 months to avoid the presence of colostral antibodies, and precolostral sera from newborn calves were also included.

Sera from Group 1 came from Spain, whereas sera from Groups 2 and 3 came from Mexico, Brazil, Argentina, and Spain. The serum panel (n = 523) comprised the following three categories:

Group 1 comprised sera from animals infected with Besnoitia besnoiti (n = 29). The cross-reactivity with the apicomplexan parasite B. besnoiti was studied to determine the analytical specificity (Sp). Sera came from herds with clinically affected animals that showed the clinical signs of chronic besnoitiosis, such as pathognomonic tissue cysts in scleral conjunctiva, hyperkeratosis, and alopecia. Besnoitia besnoiti infection was confirmed by immunoblot (García-Lunar et al. 2013).

Group 2 included well-characterized sera (n = 143) from naturally and experimentally infected cattle. From naturally infected cattle, 80 serum samples from dairy cattle (38 positive and 42 negative sera) were analyzed; of these, 56 samples came from mother-calf pairs (n = 28 pairs); 6 samples came from precolostral calves; and 18 sera came from cows. The criterion to classify the sera as positive or negative was based on a combination of clinical data and a well-defined serostatus as follows: (1) mother and their corresponding calves were both either seropositive (n = 11 pairs) or seronegative (n = 17 pairs); (2) 4 seropositive precolostral and 2 seronegative precolostral calves were born from either seropositive or seronegative cows, respectively; (3) 12 seropositive and 6 seronegative cows. In addition, positive sera came from herds with a previous history of Neospora-associated abortions and three seropositive cows had previously aborted due to N. caninum infection. The serostatus was assessed by two complementary tests (ELISA or IFAT by the submitting laboratory and by a complementary immunoblot) (Campero et al. 2015) to discriminate between positive and negative results, and all samples showed repetitive serological results in at least two month-consecutive samplings.

In addition, 63 samples were collected from experimentally infected heifers. Twenty-three heifers were intravenously infected (iv) with 107 live N. caninum tachyzoites of NC-7 (n = 6) and NC-8 (n = 6) isolate (Regidor-Cerrillo et al. 2014) and with 108 live tachyzoites of NC-1 isolate (n = 11). Eleven heifers received phosphate-buffered saline (PBS) iv and remained as negative controls. Sequential serum samples were collected twice a week until 13 days post infection (dpi), then once a week until the end of the experiment (35 dpi). These samples were assayed by CIVTEST ELISA (de Yaniz et al. 2007, Hecker et al. 2013). Infected animals seroconverted from 14 dpi and all samples from 21 dpi were positive and included in the present study. In summary, 45 positive and 18 negative sera were analyzed. Group 2 was considered as reference sera according to “Pre-test information” (see “Statistical analysis of data” section).

Group 3 comprised field sera (n = 351). Sera came from (1) dairy (n = 122) and beef (n = 72) cattle from herds with a previous history of reproductive failure (mainly abortions) and N. caninum seropositive animals, and (2) dairy cattle from herds without reproductive problems (n = 157).

Serological assays

Serum samples were analyzed by seven in-house IFATs, two in-house ELISAs, and one commercial ELISA. The tests were performed following the laboratory protocols for the in-house tests and the manufacturer’s instructions for the commercial test.

IFATs

A similar procedure was carried out by all participants. The most relevant differences relied on the secondary antibodies and the fluorescence microscope employed (see Table 1). Sera were diluted by two-fold serial dilutions starting at a 1:50 dilution in PBS to the endpoint titer. Suspensions of intact formalin fixed N. caninum (NC-1 isolate) tachyzoites (107/mL) and tachyzoites purified by a Percoll gradient (IFAT 3) were air-dried on glass slides (10 μL/well) and fixed with either ice-cold acetone or methanol. Sera diluted in PBS were added and incubated for 30 min (37 °C). Then, the slides were gently rinsed with carbonate buffer at pH 9 and washed for 10 min. A fluorescein isothiocyanate (FITC)-labeled affinity-purified rabbit anti-bovine IgG antibody conjugate was incubated with the samples at the appropriate dilution in PBS. After a 30-min incubation (37 °C), the washing step was repeated. Slides were observed with a fluorescence microscope. Unbroken fluorescence of the tachyzoite membrane was considered a positive reaction. A cutoff value of 1:100 was applied for all IFATs.

Table 1 Tests used in the inter-laboratory trial

ELISAs

The ELISA procedures were carried out as previously described by others (see Table 1). The three ELISAs employed sonicate lysate of NC-1 tachyzoites as antigen to coat the wells. The major differences relied on the secondary antibodies and cutoff values employed. The test results for ELISA 1 were expressed as percentage of positivity (PP), calculated as follows: PP = (OD405 sample × 100)/(OD405 positive control). ELISA 2 and ELISA 3 results were expressed as follows: the optical density (OD) was converted into a relative index percent (RIPC) by the following formula: RIPC = (OD405 sample) − (OD405 negative control)/(OD405 positive control) − (OD405 negative control) × 100. The cutoffs employed for ELISA 1, ELISA 2, and ELISA 3 were PP ≥ 25, RIPC > 10 (RIPC values between 6 and 10 were considered doubtful), and RIPC > 8.2 (RIPC values between 6 and 12 were considered doubtful), respectively.

Statistical analysis of data

Diagnostic performance of serological tests for the detection of antibodies to N. caninum was calculated according to the following criteria based on previous works (von Blumröder et al. 2004; Álvarez-García et al. 2013). The first criterion was based on the pre-test information (“Pre-test information”). This information was only available for samples from Group 2 and the criteria to consider a sample as positive or negative have been thoroughly described in “Experimental design and serum panel conformation” section. The second criterion was based on the results of the majority of the tests here evaluated (“Majority”). Samples from Groups 2 and 3 were analyzed using the last criterion. For Group 3, “Majority” values were defined by combining all ten tests, only the seven IFATs or only the three ELISAs in separate analyses (see Tables 4, 5, and 6).

Two-graph receiver operating characteristic (TG-ROC) analyses were carried out relative to the Pre-test information criterion (SigmaPlot 12.0 software, Systat Software, Inc., San José, CA, USA). According to an arbitrary guideline for the ROC analysis, the area under the curve (AUC) was evaluated as follows: non-informative (AUC = 0.5), less accurate (0.5 < AUC ≤ 0.7), moderately accurate (0.7 < AUC ≤ 0.9), highly accurate (0.9 < AUC < 1), and perfect tests (AUC = 1) (Swets 1988). According to TG-ROC analyses, improved cutoff values (recalculated cutoffs) were applied when plausible.

Sensitivity (Se), Sp, and test agreement (expressed as Kappa values; k) with 95% confidence intervals (95% CI) were calculated using the Epidata 3.1 software (Organización Panamericana de la Salud y Xunta de Galicia, Consellería de Sanidade). Kappa (k) values were considered as follows: poor agreement (k = 0), slight agreement (k = 0.001–0.20), fair agreement (k = 0.21–0.40), moderate agreement (k = 0.41–0.60), substantial agreement (k ≤ 0.61–0.80), and almost perfect agreement (k > 0.81) (Thrusfield 2007). Agreement values were calculated individually for each serological test and between pairs of tests with original cutoff (45 pairs) and with recalculated cutoff (30 pairs).

Results

Analytical specificity

The following tests showed cross-reactions with the B. besnoiti seropositive sera (Group 1): IFAT 1 (original cutoff = 13/29; recalculated cutoff: 9/29), IFATs 3 and 4 (1/29), IFAT 6 (3/29), IFAT 7 (2/29), ELISA 1 (original cutoff = 1/29; recalculated cutoff = 1/29) and ELISA 3 (original cutoff = 1/29). IFAT 2, IFAT 5, and ELISA 2 showed no cross-reaction with either the original or recalculated cutoff values.

Sensitivity (Se), specificity (Sp), and agreement (k values) of the tests and TG-ROC analyses according to the Pre-test information and Majority gold standard criteria

Group 2: Well-characterized sera

The Se, Sp, and k values were calculated for each test based on the original cutoff values recommended by each laboratory relative to the Pre-test information criterion for Group 2. In addition, the cutoff values for those tests in which Sp could be increased without a significant reduction in Se were recalculated (Table 2). The results showed variability among the tests with Se or Sp values lower than 90% in five of the ten evaluated tests. Moreover, two IFATs and one ELISA showed Se values lower than 80%. Initially, the best test in terms of Se, Sp, and k values was IFAT 1. Improved cutoff values were suggested for IFAT 1 and ELISAs 1, 2, and 3 (see Table 2) according to TG-ROC analyses. The ROC curves were calculated for each test, and the resulting AUCs were almost perfect for IFAT 1 and ELISA 1 and ELISA 2; highly accurate for ELISA 3, IFAT 3, IFAT 4, IFAT 5, and IFAT 6; and moderately accurate for IFAT 2 and IFAT 7 (Table 2). The AUCs of IFAT 2 and IFAT 7 showed significant differences compared to the AUCs of the other tests (p < 0.05). With the recalculated cutoffs, the performance and k values of the ELISAs improved. Thus, the best tests in terms of Se, Sp, k, and AUC values were IFAT 1, IFAT 4, and ELISA 1.

Table 2 Se, Sp, k, and AUC values relative to the Pre-test information criterion based on the original cutoff values suggested by either laboratories or manufacturer for the commercial test and the recalculated cutoff values after the TG-ROC analysis in Group 2

Finally, the Se, Sp, and k values were calculated for each test based on the cutoff values recommended by each laboratory relative to the Majority criterion (Table 3). In general, most tests experienced a moderate improvement with few exceptions as follows: IFAT 5 Sp and ELISA 3 Sp decreased compared to Table 2. The highest performance corresponded to IFAT 1, IFAT 6, and ELISA 1. When the recalculated cutoff values were employed, the results improved to those obtained by the Pre-test information criterion. IFAT 1, IFAT 3, IFAT 6, ELISA 1, and ELISA 2 showed the highest Se, Sp, and k values.

Table 3 Se, Sp, and k values relative to the Majority criterion based on the original cutoff values suggested by either laboratories or manufacturer for the commercial test and the recalculated cutoff values after the TG-ROC analysis in Group 2

Group 3: Field sera

The Se, Sp, and k values were calculated for each test based either on the original cutoff values recommended by the laboratory or the recalculated cutoff values relative to the Majority criterion (Tables 4, 5, and 6). When all the tests were compared by considering the original cutoff values, the performance of the following five tests notably diminished: IFAT 1 Sp, IFAT 2 Se, IFAT 4 Sp, IFAT 5 Se, and IFAT 6 Se compared to Table 3. In contrast, the performances of the remaining five tests were barely affected (IFAT 3, IFAT 7, ELISA 1, ELISA 2, and ELISA 3). The highest k values corresponded to ELISA 1 and ELISA 2 (Table 4). IFAT 6 and ELISA 2 Se improved notably when either the IFATs (n = 7) (Table 5) or the ELISAs (n = 3) (Table 6) were compared separately.

Table 4 Se, Sp, and k values relative to the Majority criterion including all tests based on the original cutoff values suggested by either laboratories or manufacturer for the commercial test and the recalculated cutoffs after the TG-ROC analysis in Group 3
Table 5 Se, Sp, and k values relative to the Majority criterion considering IFAT tests based on the original cutoff values suggested by laboratories and the recalculated cutoffs after the TG-ROC analysis in Group 3
Table 6 Se, Sp, and k values relative to the Majority criterion considering ELISAs based on the original cutoff values suggested by either laboratories or manufacturer for the commercial test and the recalculated cutoffs after the TG-ROC analysis in Group 3

Test agreement (k values)

The agreement between the tests was calculated prior to and after the TG-ROC analyses for Group 2 (Supplemental Table 1) and Group 3 (Supplemental Table 2).

For Group 2, considering the original cutoff values, 7 of the 45 pairs of tests showed a moderate agreement, 33 of the 45 pairs of tests showed a substantial agreement, and 5 of the 45 pairs of tests showed an almost perfect agreement. When considering the agreement for the 30 pairs of tests with recalculated cutoff values (after the TC-ROC analyses), 1 of the 30 pairs of tests showed a moderate agreement, 22 of the 30 pairs of tests showed a substantial agreement, and 7 of the 30 pairs of tests showed an almost perfect agreement.

For Group 3, considering the original cutoff values, 15 of the 45 pairs of tests showed a moderate agreement, 29 of the 45 pairs of tests showed a substantial agreement, and 1 of the 45 pairs of tests showed an almost perfect agreement. When considering the agreement for the 30 pairs of tests with recalculated cutoff values, 7 of the 30 pairs of tests showed a moderate agreement and 23 of the 30 pairs of tests showed a substantial agreement.

As expected, the agreement for the 30 pairs of tests with recalculated cutoff values increased for Group 2 (28/30) and Group 3 (26/30).

Discussion

We carried out an inter-laboratory trial among eight Ibero-American laboratories from Argentina, Brazil, Peru, Mexico, and Spain. The purpose of this study was to compare a wide panel of serological tests routinely employed for N. caninum-specific antibody detection, following OIE principles and methods of diagnostic test validation (Jacobson 1998). Few comparative studies of serological tests have been carried out in Europe and North America for anti-N. caninum antibody detection, most of them consisting of one participating laboratory evaluating several serological tests (Wu et al. 2002; Álvarez-García et al. 2003; Frössling et al. 2003; Waldner et al. 2004; Björkman et al. 2006; Hall et al. 2006; Álvarez-García et al. 2013; Roelandt et al. 2015). However, only one inter-laboratory trial has been performed with the purpose of standardizing the serological tests used for antibody detection (von Blumröder et al. 2004). This type of study showed the usefulness of a continuous validation process to provide an accurate diagnosis and standardize different seroprevalence studies to obtain comparable results in Europe (Bartels et al. 2006).

In the present study, most of the evaluated assays (9/10) were in-house tests unlike the previous comparative studies of von Blumröder et al. (2004) and Wapenaar et al. (2007) where half of the evaluated tests were commercially available. Additionally, to our knowledge, this study evaluated the greatest number of IFATs.

The standardization approach herein started with selecting a panel of well-characterized sera composed of experimentally infected and naturally exposed animals. Furthermore, a serum panel that was geographically representative and reflected the spectrum of disease was analyzed to avoid bias resulting from host responses and overestimating the Sp, which is crucial in chronic infections (Nielsen et al. 2011). This last issue is relevant since there is no perfect reference serological assay. In a previous study, IFAT was ruled out as a true reference test (Frössling et al. 2003). Therefore, instead of using a single test as a reference test, we relied on Pre-test information and Majority criteria for reducing bias, as reported previously by von Blumröder et al. (2004) and García-Lunar et al. (2013). In the present work, the congruent results obtained from sera of Groups 2 and 3 suggest that a well-characterized population reflects field population conditions.

We found an unexpected high variability among the tests. As stated by Álvarez-García et al. (2013), it is widely known that discrepancies among serological tests exist. However, different validation studies managed to overcome this limitation. As expected, analysis of Group 2 sera had a stronger diagnostic performance compared to that of Group 3. For most tests evaluated, diagnostic characteristics worsened when analyzed using the Majority criterion. Agreement among tests increased after the application of the recalculated cutoff values for Group 2, whereas the k values hardly varied for Group 3. In the present study, initially, only IFAT 1 showed good diagnostic performance. All ELISAs improved after the application of recalculated cutoff values, and ELISAs 1 and 2 performed similarly to IFAT 1. The performance of these tests is comparable to the performances of commercial ELISAs with excellent Se and Sp values (> 95%) and were supported by high AUC values (Álvarez-García et al. 2013). In addition, IFAT 3 also performed well when the Majority test criterion was applied. In contrast, the performance of IFAT 6 and ELISA 3 showed inconsistent results when using different analyses in both groups and should be improved prior to use for routine diagnosis since they frequently demonstrated low Se or Sp values (< 90%). Furthermore, three IFATs (IFATs 2, 5, and 7) had unacceptable Se values. In the present study, Sp was not a major drawback since only ELISA 3 showed a low Sp, which increased significantly after the TG-ROC analysis. As a result, the prevalence of N. caninum infection might be notably underestimated using methods with low sensitivity.

The discrepancies among IFATs found in this work could be related with technical procedures rather than with the antigen used. The low Se values evidenced by IFATs 2, 5, 6, and 7 are most likely linked to methodological issues in the laboratories rather than the existence of false-negative reactors as stated below. Moreover, results could be influenced by inter-operator variability since subjective interpretation is a major disadvantage of IFAT. In addition, it is difficult to adjust IFAT cutoff values since the results are expressed as a discrete variable obtained through double serial dilutions. Thus, only one IFAT cutoff value could be recalculated (IFAT 1) to improve Sp without a detrimental effect on Se. In previous studies, validated IFATs gave variable results compared to ELISAs (Frössling et al. 2003; von Blumröder et al. 2004; Wapenaar et al. 2007). In the present study, all three ELISAs showed acceptable diagnostic performance, although slightly lower Se and Sp were recorded in comparison with those of European and American studies (von Blumröder et al. 2004; Wapenaar et al. 2007).

However, two main limitations arise from the approach followed in the present study. First, the existence of false-negative results that have been often attributed to either antibody fluctuations through pregnancy below the cutoff value or persistently infected seronegative animals (Aguado-Martínez et al. 2008; Guido et al. 2016) cannot be ruled out. Persistently infected cattle may remain undetected when using tachyzoite-based serological tests and could be detected by tests that employ bradyzoite stage-specific proteins (Guido et al. 2016). Thus, in order to avoid false-negative results, we employed a very restrictive criterion to select the negative population. Second, if a test is more specific and does not agree with most other tests, it does not mean that it is not a good test. In order to minimize this major drawback, we only compared tests based on whole tachyzoite antigens that are expected to behave similarly, as in previous studies (von Blumröder et al. 2004; Álvarez-García et al. 2013). In fact, main findings did not vary regardless of the criterion employed herein. However, when comparing tests based on different parasite-stage antigens the criterion Majority might lead to confusing results and a combination of sequential serological analyses and sensitive and specific complementary serological tests based on tachyzoite and bradyzoite antigens should be used to define the reference cattle populations.

We also investigated if cross-reaction with the closely related apicomplexan parasite B. besnoiti existed, since bovine besnoitiosis is a reemergent disease that is spreading in Europe (Álvarez-García 2016). Moreover, a cross-reaction between anti-N. caninum antibodies and the B. besnoiti antigen has been recorded (Shkap et al. 2002; García-Lunar et al. 2015). However, we do not know whether specific anti-B. besnoiti antibodies may cross-react with N. caninum antigens. Although the disease is not present in American cattle, countries in the Americas should test cattle for besnoitiosis to avoid its entrance. Notably, the results showed that IFAT 1 (regardless of the cutoff value employed), IFAT 6, and readjusted ELISA 1 should not be used in areas where bovine besnoitiosis is present since they showed a rate of 10–30% of false positives. For IFAT 1, the number of false positives may increase up to almost 50% with the original cutoff value. Therefore, the analytical Sp of serological tests for the detection of anti-N. caninum antibodies should be evaluated in areas where B. besnoiti is present.

Veterinary laboratory diagnosticians from Ibero-American countries should take into consideration the discordant results obtained herein among labs. Thus, there is a need to adopt a common test or at least tests with similar performances to either optimized IFAT 1 and ELISAs 1 and 2. For IFAT, operator training and microscope and reagent quality should be carefully reviewed as they greatly affect the results. In the future, the implementation of a commercial test may help to harmonize the diagnosis among labs to guarantee control program success. This recommendation is supported by the study performed by Álvarez-García et al. (2013) where commercially available ELISAs were compared. Moreover, these ELISAs are routinely employed in voluntary control programs for bovine neosporosis developed in Spain that contributed to reduce seroprevalence after a few years of monitoring the epidemiological situation (Guido et al. 2016). This study might set the basis for creating inter-laboratory control and monitoring networks for serological diagnosis of N. caninum infection to overcome the discrepancies and lack of consistent results. Additionally, the present study reinforces the need of regional validation of serological assays. A pending issue for animal health authorities is the accreditation of laboratories, which use validated assays based on multicenter studies as the one presented here.