Introduction

There is a need not only to increase food production in the coming years to cope with demand from a growing population (FAO 2013), but agricultural production must be sustainable in its economic, environmental, and social dimensions. Sustainability of agricultural production is based not only in increased production of goods and services, but on its robustness, rooted in the local communities, being autonomous, and with a global responsibility (Zahm et al. 2015), within a holistic approach (Van Passel et al. 2007; Zahm et al. 2008).

Sustainable agricultural systems involve production following a good management of the environment and looking after the social context and wellbeing of farming families and their communities (Van Passel et al. 2007). However, assessing the sustainability of agricultural systems is complex, since it deals with dynamic and holistic issues that develop and evolve in a specific site and relies on the perspective of who undertake the assessment (Webster 1997).

Different methods have been developed for the assessment of differences and variations of the sustainability of farming systems (Hayati et al. 2010). These methods can be based on simple indicators, i.e., from information taken from records or simple questions, or on complex ones that require a higher degree of knowledge or specialized equipment (Bockstaller et al. 2015).

The Intergovernmental Panel on Climate Change (IPCC) (2014) states that greenhouse gas emissions by dairy production account to 4% of the world total, and 22% of all agricultural emissions. Thus, the assessment of sustainability in dairy systems is relevant since these have been pointed out as having a large environmental footprint (Flysjö 2012). All agricultural systems must engage in sustainable production, especially the small-scale dairy systems because they comprise the majority of dairy farms worldwide even in developed countries as the European Union where the mean herd size in 2016 was 18 cows (IFCN 2017). Therefore, small-scale dairy systems represent a large potential to reduce their environmental impacts and develop sustainable production (FAO 2014a).

In Mexico, 78% of specialized dairy farms are small-scale defined by a small agricultural land, with a herd size between 3 and 35 cows plus replacements (Prospero-Bernal et al. 2017). Milk sales are the main source of income in 90% of these farms (Martínez-García et al. 2012). The small-scale farms are considered as a viable option for territorial development, as they are a source of full-time employment, enabling rural populations to remain in their communities (FAO 2010). Farmers are linked to informal milk markets, and they have developed a strong relationship with milk collectors and artisan cheese producers, giving strength to the agri-food chain (Espinoza-Ortega et al. 2007). However, the economic disparities between prices paid for milk and costs of inputs put extra strain on the economic scale of their sustainability, which is the weakest part in these systems (Fadul-Pacheco et al. 2013).

On the other hand, the literature states the need to compare methods that provide information to integrate criteria for the assessment and understanding of the sustainability in a given context (Bockstaller et al. 2009). However, there are few reports that compare methods on-farm. De Olde et al. (2016) compared the IDEA (Indicateurs de Durabilité des Explotations Agricoles) (Vilain et al. 2008), RISE V 3.0 (Response-Inducing Sustainability Evaluation) (Grenz et al. 2016) and SAFA V-3 (Sustainability Assessment of Food and Agriculture systems) (FAO 2014b) in dairy and pig farms in Denmark, stating that RISE was the method better adapted to the Danish context; noting its relevance, easiness of use, understandable, and with advantages in the software. In addition, de Olde et al. (2016) identified out of 48 methods for the assessment of sustainability, that only IDEA, RISE, and SAFA met criteria for on-farm assessments, while Binder et al. (2010) out of 35 methodologies selected IDEA and RISE.

The three methods (IDEA, RISE, SAFA) allow the on-farm assessment of sustainability, through scientifically rigorous indicators, integrated by the ecological, economic and social dimensions of sustainability (De Olde et al. 2016). They also enable the self-evaluation of each farm and the comparison among farms and do not require an optimal or reference farm for comparison (Häni et al. 2003; Zahm et al. 2008; FAO 2013). Table 1 shows the characteristics of each of these methods.

Table 1 Comparison of sustainability assessment tools

The IDEA method is the most accessible and easiest to understand. RISE and SAFA require more complex technical data. The objectives of each method address different dimensions and topics, through indicators that comprise the holistic sense of sustainability (De Olde et al. 2016), with the end goal of guiding farms towards sustainable development (Zahm et al. 2015).

The IDEA method enables the assessment of the sustainability of individual farms with a score that may be compared against other farms within the same production system, or even compare among different systems (Zahm et al. 2008). There are reports of its successful application in developing countries such as Algeria (Ghozlane et al. 2006), Tunisia (M’Hamdi et al. 2009), and Uruguay in dairy systems (Tommasino et al. 2012), also, for sheep and goat systems in Algeria and Lebanon (Ghozlane et al. 2008; Srour et al. 2009).

In Mexico, Fadul-Pacheco et al. (2013) and Prospero-Bernal et al. (2017) assessed the sustainability of the small-scale dairy systems, applying for the first time the IDEA method; and Salas-Reyes et al. (2015) also applied the same method to assess the sustainability of dual-purpose small-scale cattle farms in a subtropical area. These studies showed that the IDEA method enabled the identification of areas for improvement in the economic scale which limits the sustainability of these systems, basically in the need to reduce feeding costs to enhance the profitability and economic viability of farms.

However, there were questions on the suitability of the IDEA method, developed in France, when applied in the Mexican context. Therefore, the need arose to evaluate different methods for the assessment of sustainability within and between farms and systems that may be better adapted to the context of small-scale dairy systems with low availability of data and that are easy to apply considering the limited time and financial resources for the assessment.

From these, the IDEA, RISE, and SAFA methods were applied in the work herein reported, selected for the quality of indicators, their scientific framework and that they can be applied at farm level, and in multiple systems. The IDEA method was included as a reference for comparison, given the previous experience of the research team with this method (Fadul-Pacheco et al. 2013; Prospero-Bernal et al. 2017).

The assessment of the sustainability of small-scale dairy systems with the three methodological tools was aimed at discerning their strengths and weaknesses in the Mexican context; as well as providing a better understanding of the sustainability dynamics in these farms to identify areas of improvement and support for decision making. Therefore, the objective was to assess the sustainability of small-scale dairy systems during the rainy season. Three methods were compared (IDEA, RISE, and SAFA) to evaluate their ability to deal with such systems in the Mexican context.

Materials and methods

Study area

The work took place in the central highlands of Mexico (Fig. 1), between coordinates 20° 06’ and 20° 17’ N and at 99° 40’ and 100° 00’ W and mean altitude of 2440; a sub-humid temperate climate with rains in summer, and a dry season with frosts in winter (INEGI 2009). Mean temperature was 16.4°C and mean rainfall 776.7 mm (SMN-CONAGUA 2019).

Fig. 1
figure 1

Geographical location of the study area

Almost 90% of dairy farms in the study area were small-scale dairy producers (INEGI 2007), who relied on milk sales for their livelihoods (Martínez-García et al. 2012). Farms were characterized by herds between 3 and 35 cows plus replacements, two milkings per day, and small land areas, that relied on family labour (Fadul-Pacheco et al. 2013).

Feeding is based on cut-and-carry of temperate cultivated pastures (ryegrasses with white clover), forages as oats or bought-in alfalfa hay, complemented with cereal straws (maise, oats, barley, and wheat) and commercial concentrates (Martínez-García et al. 2015a). Some farms graze native grasslands, and some have incorporated grazing their cultivated pastures and maise silage (Prospero-Bernal et al. 2017).

Selection of farms and data collection

Ten farms participated in the study. They have participated in the project to which this work belongs (Prospero-Bernal et al. 2017), initially selected by snow-ball non-probabilistic sampling (Goodman 2011; Sedgwick 2013). Farmers accepted to participate in the study voluntarily and were informed at all times of the objectives and scope of the work under a participatory rural research approach (Conroy 2005).

At the start of the assessment, the indicators of each methodology were revised to identify their specificity and applicability (FAO 2014b; Prospero-Bernal et al. 2017; Berbeć et al. 2018; Soldi et al. 2019). Data were collected with a structured questionnaire for each method (IDEA, RISE y SAFA) (Vilain et al. 2008; FAO 2013; De Olde et al. 2016).

Questionnaires included the indicators for the environmental, social and economic dimensions (Appendixes A1, A2, and A3), adapted to the study area (Zahm et al. 2015) to ensure an approach compatible with the Mexican context (Prospero-Bernal et al. 2017). Appendix B shows the indicators that were not included for each method.

Information was collected during monthly visits to each farm when questionnaires for the three methods were applied; always by the same member of the team to reduce potential bias.

Since farmers do not keep records, semi-structured questionnaires were also applied during each visit to collect information on the quantity of feeds, milk sales, and productive, reproductive and economic information from the previous month. Milk and feed samples were collected during those visits and analyzed in the laboratory for milk composition (milk fat and protein) and chemical composition of feeds (dry and organic matter, crude protein, neutral and acid detergent fiber, and in vitro organic matter digestibility) following Fadul-Pacheco et al. (2013) and Prospero-Bernal et al. (2017).

Data collection was during the rainy season from June to November 2018, so that there were 60 questionnaires and collected data from farms for each method. Previous research with the IDEA method was in the rainy season (Fadul-Pacheco et al. 2013), and the IDEA method was taken as reference given the experience of the research team. Also, budget constraints limited the duration of the study.

Indicators were adapted to current Mexican standards for milk composition and environmental issues, and those not applicable to the Mexican context were not considered as was done in previous work (Fadul-Pacheco et al. 2013; Salas-Reyes et al. 2015). The economic analyses followed Prospero-Bernal et al. (2017) through partial budget analyses, just considering the dairy operation as the basis of livelihoods.

Interpretation of sustainability level by IDEA, RISE, and SAFA

Results for each dimension (environmental, social, and economic) were from weighing scores where each dimension may get a score from 0 to 100. SAFA scores (0–5) were transformed to a 0-100 scale to compare methods.

Sustainability score in IDEA is from the dimension with the lowest score (limiting scale) (Zahm et al. 2019). Sustainability from RISE and SAFA was from the average of the three and four dimensions, respectively (FAO 2014b; Grenz et al. 2016).

Scores for each indicator (on a 0 to 100 scale) are classified as (Grenz et al. 2016): 0–33: problematic, 34–66: critical, and 67-100: positive (Appendix A). Mean results for each method are presented in radar graphs (Figs. 2, 3, and 4).

Fig. 2
figure 2

Average score by themes of the IDEA method

Fig. 3
figure 3

Average score by themes of the RISE method

Fig. 4
figure 4

Average score by themes of the SAFA method

Statistical analyses

Descriptive statistics were applied to the 10 participating farms to justify that the sample is representative of farms encountered in the study area. Data from each method was organized following the guidelines from IDEA (Vilain et al. 2008), RISE (Grenz et al. 2016) and SAFA (FAO 2014b) (Table 2).

Table 2 Farm characteristics in the assessment of the sustainability of small-scale dairy systems by three methods (n=10)

Indicators were analyzed for each dimension. The level of sustainability was described as suggested by Binder et al. (2010) and de Olde et al. (2017) since the number of indicators and topics are different for each method, so that results are presented as independent indicators without considering interactions between them.

The Shapiro-Wilk test recommended for samples under 50 observations did not show a normal distribution of data (Field 2013); therefore, the comparison within dimensions and the level of sustainability for each method were analyszed with the Kruskal-Wallis test and the Mann-Whitney U test to detect differences (Field 2013).

Results

Characteristics of participating farms

Table 2 shows the characteristics of the participating farms, with the largest variation in farmland size (ha) and the number of cows; with the lower variation for milk fat and protein content among farms. Farms rely on family labor, with temporal hiring of labor during harvesting of forages and crops like maise and oat.

Indicators classification by color code (green, amber, and red)

Table 3 shows the classification of indicators by score, identified as positive (green), critical (amber), and problematic (red) for each method. Appendixes A1, A2, and A3 show all indicators reported in terms of their maximum possible score, the mean score and maximum and minimum scores for farms for each method.

Table 3 Indicators by color code

In the environmental scale, indicators for fertilisation have problematic scores in the three methods, with positive scores for the majority of indicators relating to animal production diversity and water management. Indicators for energy and materials use were problematic in IDEA and RISE, although positive in SAFA. However, all the others indicators in this scale having high scores, the environmental scale was classified as positive (see Appendices A1, A2, and A3).

Also, on the social scale, the majority of indicators in IDEA showed a positive score, while in RISE and SAFA most indicators are at a critical score. Indicators relating to health and safety at work have similar scores in the three methods, and it is in the social scale where fewer indicators are problematic. Most indicators are qualitative and with similar content in the three methods, which weigh scores similarly.

In the economic scale, most indicators in IDEA and SAFA showed positive scores, in contrast with RISE where most indicators were classified as problematic.

Assessment of sustainability by the three methods

Figure 2 shows results for IDEA. Two of the four themes (components) that had the lowest scores were for the economic scale: economic efficiency and viability. The other two components with a low score were the organization of space and quality of products of the land. The highest scores were for the environmental scale.

Figure 3 shows the results of the RISE method. Scores for economic viability were the lowest, similar to IDEA. The highest scores were those related to animal welfare (environmental scale) and farm management.

It is the RISE method that resulted in the lowest score for the environmental scale. This is due to more exhaustive and detailed indicators on soil management, as well as indicators on environmental protection and energy use. In contrast, IDEA takes into consideration more general indicators on crops, land areas, and the territory.

SAFA results (Fig. 4) showed that the indicator for local economy had the highest score; with overall high positive scores for the economic scale. One aspect valued by SAFA is local trade, which in the farms studied refers to the sale of milk destined to local small artisan cheesemakers obtaining a 100 score for this indicator which influence the overall high score for the economic scale. However, SAFA indicators for profitability and liquidity that reflect the economy of each farm were not high scores.

IDEA and RISE results showed that farms are at a critical (amber) level of sustainability. IDEA results indicated that the economic scale limits the sustainability of these systems. RISE and SAFA have the environmental and social scales with lower scores than for the economic scale.

Themes on product information and quality, responsibility, and land use showed the lowest scores in SAFA, which differ from IDEA and RISE in those scales.

Comparison of dimensions and sustainability level

Table 4 shows results for the scores for each environmental, social, and economic dimensions and the overall level of sustainability for each of the studied methods. There were highly significant differences among methods (P<0.001) for the environmental and economic dimensions with RISE showing the lowest score for environmental dimension.

Table 4 Sustainability scores of small-scale dairy systems by dimension and method

There were no statistical differences among methods for the social dimension (P>0.05), and there were highly significant differences (P<0.001) among methods for the economic scale. SAFA had the highest mean score for the economic dimension, with similar scores between IDEA and RISE.

In terms of overall sustainability scores, there were highly significant differences (P<0.001). The SAFA score was the highest, while IDEA and RISE showed a similar sustainability score. In spite of differences, the three methods showed an overall medium (critical) sustainability score.

Discussion

Farm characteristics

Participating farms were similar to those reported by Romo-Bacco et al. (2014) and Prospero-Bernal et al. (2017) in small-scale dairy systems in two different areas of the Mexican highlands. Both works reported the reliance on family labour (by two family members), and about 10% of hired labour. Farms have between 6 and 7 ha of farmland, with 9 to 15 milking cows that yield between 14 and 16 litres of milk per day.

Assessment of the environmental, social, and economic components of sustainability

The three methods applied enabled the assessment of the sustainability of participating farms and were sensitive to detect problematic, critical, and positive points (Grenz et al. 2016).

In the environmental scale, the three methods identified problematic indicators in crop management, due to high fertilizer use and soil degradation. Farmers are aware of the high amounts of fertilizers applied, but few have reduced their use.

Given the low schooling level of small-scale farmers, they are generally unwilling to introduce changes in their practices (Martínez-García et al. 2015b), and changes happen usually until they are convinced by the influence of their social referents from whom they take advice (Martínez-García et al. 2018).

As positive indicators, the 10 farms use manure as organic fertilizer for their pastures, and having mixed grass-clover pastures is also a positive indicator. Other positive indicators were diversity, animal welfare, and water use.

The IDEA method showed the highest scores for the environmental scale, attributed to the indicators the method evaluates, centered in diversity, management, and the territory.

RISE and SAFA, on the other hand, evaluate very specific indicators on issues of air, water, and soil, with sub-topics and indicators for a detailed assessment that requires specific information that farmers do not have and are not easy to obtain, as the balance of greenhouse gases (that were not measured) in RISE, and a whole theme on the atmosphere in SAFA.

De Olde et al. (2016) and Berbeć et al. (2018) mentioned that RISE and SAFA have the largest number of specialized indicators. Under these methods, positive indicators were those related to animal and plant diversity. Jouzi et al. (2017) pointed to one of the advantages of small farms is the rational use of local resources.

The three methods utilized have strengths but also weak points. In the IDEA method, water is a weak issue, since IDEA only has an indicator for water management related in the studied farms to the availability of irrigation for pastures. RISE and SAFA, with similar scores, have water as a specific topic with indicators on measures for the saving and control of water, water quality and availability, and amounts of water used in the farm and for irrigation.

The IDEA method is general and does not consider important issues for the assessment of sustainability; while RISE and SAFA include more indicators that yield more reliable results. However, the inclusion of more themes to the assessment implies more specialized indicators that require more information and data that are not available in small-scale farms, as well as resources and time for the assessments.

In terms of the social component of sustainability, social indicators in the three methodologies are similar. IDEA, RISE, and SAFA established as positive indicators animal welfare, labor security, economic incomes above the community means, low generation of residues, and freedom to make decisions.

Indicators for the social dimension of sustainability are complex given the constant evolution of society, which makes it difficult to develop simple and precise indicators, and the fact that assessments take place at a specific moment in time (Vilain et al. 2008).

Hayati et al. (2010) stated that these indicators lead farms towards sustainable development. However, indicators as the intensity of work in IDEA are problematic due to the heavy workload, as has been identified in previous works (M’Hamdi et al. 2009; Fadul-Pacheco et al. 2013; Prospero-Bernal et al. 2017). Nonetheless, Moretti et al. (2016) mentioned that family labor strengthens farms making them more resilient to changes.

RISE identified a low quality of social relations, in contrast to IDEA and SAFA that identified strength in the relations among farmers. Even though social indicators have been developed since the inception of the sustainability concept (WCED 1987), methodologies have been negligent by diminishing their importance. Therefore, there is a need for the development of indicators to measure the creation of social capital (Vallance et al. 2011). In this work, social indicators (Table 3) and their objectives are similar in the three methods (Binder et al. 2010).

In the economic dimension of sustainability, positive indicators were the generation of economic incomes, adequate financial autonomy, low dependency of external subsidies, and the production of food for the community, key elements for farm resilience (Jongeneel and Slangen 2013).

Problematic indicators were low specialization of production, lack of available information and in the generation of information on the management of the farm. This affects decision making and results in a lack of knowledge of the actual processes, reducing economic efficiency as detected by IDEA and RISE.

SAFA results for the economic dimension agree with de Olde et al. (2016) who indicated that this method tends to over-evaluate economic indicators, yielding results that do not coincide with the reality of farms that are not economically efficient. In contrast, IDEA and RISE are based on indicators as cash flow, incomes, and investments, which are easy to measure.

RISE allows for the lack of data in farms, while SAFA allows some specific themes to be omitted that may be irrelevant in a given context, avoiding the need for indicators that require unavailable data, using in place indicators based on practice (FAO 2013).

The economic scale is relevant in farm resilience, on which the continuity of farms relies (Hayati et al. 2010). Economic viability was an indicator with low scores in the three methods, which can be attributed to the expenditure in cattle feeding (purchase of external inputs), purchase of fertilizers, and dependence on fossil energy (gasoline and diesel). Therefore, the economic scale limits the sustainability of small-scale dairy systems (Prospero-Bernal et al. 2017).

Overall assessment of the sustainability by three methods

The three methods (IDEA, RISE, and SAFA) showed variation in the content of indicators, reference values, and methods for scoring and aggregation. This variation is due to the differences in judgment values, the context, and priorities of those involved in the development of each method (De Olde et al. 2017).

Variability in the methods gave rise to differences in the assessment of the sustainability of the studied farms, although results presented are transparent both in the use of the methods and in the results generated (De Olde et al. 2017), so that adaptation and integration of the various indicators are feasible due to their inter-relationships given their similarities as the three are multi-criteria methodologies (Binder et al. 2010).

Score values are different as each method values differently the indicators, assigning different scores based on their specific norms or assessment protocols for the scoring of indicators (Marchand et al. 2014). There are times when there are many possible variables integrating an indicator, and it may be difficult to decide which is best. At other times, variables are not easy to measure, or there are no data and must be changed for other less reliable variables (Sarandón 2002).

These aspects must be taken into consideration for a good assessment of sustainability in order to have an objective and reliable result for the farms that enable decision making in relation to weak points that need improvement.

The limitation of the three methods was the lack of information that could not be collected as farms have little data available, and there were not sufficient financial resources to undertake all laboratory analyses needed for a complete assessment.

RISE and SAFA offer possibilities to overcome the lack of information. RISE gives the option of qualitative measurements of indicators if specific data is missing as for economic or life quality indicators. SAFA allows for indicators of practice to be changed for indicators of yield which are easier to obtain. IDEA has indicators closer to on-farm situations that make it easier to adapt to specific contexts.

An aspect to take into consideration is that when adopting an existing method, like IDEA, RISE, or SAFA, the number of themes, indicators, and assessment procedures are defined, and most of the method to apply is fixed.

IDEA and RISE were specifically developed for the assessment of farm sustainability, while SAFA has a broader application that encompasses agriculture, forestry and fisheries, as well as the assessment of companies at a world scale (FAO 2013).

SAFA also proved to be the least applicable method for its use in small-scale farming. Firstly, some indicators require economic data of more than five previous years which are not available in the small-scale farms. Secondly, SAFA was not developed for small-scale farms, and thirdly, there is a large number of specialized indicators that are not easy to measure for lack of instruments, or financial and time resources. The interest in including SAFA in the study is that it was put forward as a probable better method given its development by a global organisation as FAO.

The proportion of sub-themes form a method that is dealt with by the other two is called sub-theme coverage. SAFA has an intermediate to high indicator coverage at 89% for IDEA and 92% for RISE. RISE has a coverage of 67% for SAFA and 81% for IDEA, and IDEA covers 59% for SAFA and 76% for RISE (De Olde et al. 2017).

SAFA is the method with the largest number of indicators also employed by IDEA and RISE. Soldi et al. (2019) mentioned that SAFA requires specialized work in the collection of information and is aimed at regional assessments, which are less sensitive at farm level. On the contrary, IDEA and RISE were developed to assess the sustainability of farms (De Olde et al. 2016).

IDEA has well-defined indicators, easy to collect that can be used at farms with limited information. On the contrary, RISE, as SAFA, has very specialized indicators at the environmental scale, and requires more technical and intellectual infrastructure for the assessment compared to IDEA, RISE, and SAFA may be considered for sustainability assessments with ample financial and time resources.

There will always be variability in the assessment tools as well as in the results since each method is based on the context, availability of scientific data, and knowledge of values and norms of those who develop the methods (De Olde et al. 2016).

The IDEA method was better adapted to the sustainability assessment of small-scale dairy systems in Mexico as most of its indicators may be collected on-farm and at easy to measure, compared to RISE and SAFA. Therefore, it is suggested to continue using the IDEA method in future assessments of sustainability in small-scale farming systems.

Conclusions

The IDEA, RISE, and SAFA methods share in essence the concept of sustainable development from the holistic integration of the environmental, social, and economic dimensions of sustainability, and are sensitive so that it is possible to identify problematic indicators, to make decisions that may guide farms towards an enhance sustainability.

IDEA and RISE were identified as the stronger methods for on-farm assessments and did not show differences in the social or economic scales, nor in the overall sustainability score.

IDEA was the less demanding method for environmental indicators in contrast to RISE and SAFA that concentrate efforts in this dimension. In SAFA, the economic scale is ambiguous since indicators are aimed at communities or larger regions. When applied at the farm level, SAFA does not detect small variations, particularly on the economic scale.

These three methods enable an understanding of sustainable development by generating an interaction between research institutions and farmers. Even though there is not a strong culture of sustainability in the study area, work undertaken enable to raise awareness of farmers, their families and communities.

The mean overall sustainability score over the three methods for the ten assessed farms was 55.3±5.7 over 100. There were no large differences between the three methods, even though indicators vary in the way of their measurement; they share more than 70% of objectives. This level of sustainability places farms at a critical level (Amber) following the color code, although towards the higher end, opening opportunities to enhance their sustainability.